Structure of Arrays Class Usage

Complete: 4

Contacts

Reason to use 'structure of arrays'

Computing memory architecture

A main limiting factor on computing is the speed at which CPUs can access memory. Several strategies are employed on the hardware to overcome this limitation.
  • The various memory caching layers decrease the time it takes to access memory that has already been placed into the cache.
  • Aggressive prefetching of memory to be placed in the cache based on previous or present memory accesses.
  • Out-of-order instruction execution so that memory can be fetched for some instructions while other, non-dependent instructions can run.

In order to make best use of these memory limitation mitigation strategies data structures should be designed so that data that is used nearby in time (e.g. by the same function) should be close to one another in memory. This would allow that data to reside simultaneously in the CPU memory cache and be fetched to the processing unit together.

Data layout

Array of Structures

The C++ standard library containers are all of the array of structures design. That is, they hold collections of objects. In such a design, the memory layout is such that data associated with one object are placed closest in memory. E.g. given the class
class Foo {
...
private:
   float x_;
   float y_;
   unsigned int id_;
};

Then given an std::array<Foo,2>, the memory layout would be

[0]x_ [0]y_ [0]id_ [1]x_ [1]y_ [1]id_

If one were to calculate the average x_ value for all items in the container, the code would read the first memory address of the container, then skip the following y_ and id_ values and finally get the next x_ value. Such skipping in inefficient and if the container were larger, some of the x_ values would not be in the memory cache.

NOTE: a std::vector would also be consider an 'array of structures' type object.

Structure of Arrays

A structure of arrays is also a container, but instead of groups of objects, it instead stores groups of 'data members' for objects. E.g.
template<unsigned int N>
class Foos {
...
private:
   std::array<float,N> x_;
   std::array<float,N> y_;
   std::array<unsigned int,N> id_;
};

This can store the exact same information as the std::array<Foo,2> but with the memory layout

x_[0] x_[1] y_[0] y_[1] id_[0] id_[1]

If this structure were used to calculate the average x_ value, the hardware could fetch x_[0] and x_[1] together and might be able to use a special vectorized instruction to handle the summation.

The structure of arrays are more performant than arrays of structures when

  • the structure has a 'large' number of data members
  • many instances of the structure are in a container
  • only a few of the structures data members are needed at any one time to do a computation

edm::soa::Table<>

The edm::soa::Table<> class is a structure of arrays class that uses a spreadsheet like metaphor. The table is defined by columns and rows. The columns are defined by a C++ type for the data in the column and by a string label. Rows are defined by an integer index. So the previous Foos container would conceptually be
Row index Columns
X Y ID
0      
1      

defining a Table

The template arguments of a Table should be class types that inherit from edm::soa::Column<>. These new class types declare the type of the column and a label.
namespace edm {
namespace soa {

//Can declare explicitly
struct X : public edm::soa::Column<float, X > {
   //kLabel is needed by Column<>
   static constexpr const char* const kLabel = "x";
};

//Columns can be made implicitly using a macro
SOA_DECLARE_COLUMN(Y, float, "y");
/* which is identical to
struct Y : public edm::soa::Column<float, Y > {
   static constexpr const char* const kLabel = "y";
};
*/
SOA_DECLARE_COLUMN(ID, unsigned int, "id");

//Any type can be used for a column
SOA_DECLARE_COLUMN(Label, std::string, "label");

}
}
The column declarations do not have to be done in the edm::soa namespace, but given their generic class names it is best to place them in a namespace. You are highly encouraged to re-use the same column definitions across all tables to which they would be appropriate. This will greatly aid writing utility functions (to be described later in edm::soa::TableView<>).

The Foos container then could be defined as

using es = edm::soa;
using Foos = es::Table<es::X, es::Y, es::ID>;

accessing data from a Table

Data in a table can be accessed directly to the cell or via a whole column or whole row.

direct cell

An individual cell can be accessed using the get<> member function. The template argument to the function is the column type and the argument passed to the function is the row index
using edm::soa;
Foos foos{...};

auto x = foos.get<X>(0);

whole column

An entire column can be accessed using the column<> member function. The template argument to the function is the column type. The edm::soa::ColumnValues returned from column<> can be used as a standard container.
using edm::soa;
Foos foos{...};

auto x_s = foos.column<X>();
float sum = 0;
for( auto const& x: x_s) {
   sum +=x;
}

whole row

An entire row can be accessed using the row member function. The function argument is the row index. The returned edm::soa::RowView<> object has get<> functions analogous to edm::soa::Table<>::get<> but without the function argument for the row index.

using edm::soa;
Foos foos{...};

auto foo = foos.row(1);
auto x = foo.get<X>();

Table iterator

edm::soa::Table<> provides standard iterator access to its elements. When dereferenced, the iterator returns a edm::soa::RowView<>

using edm::soa;
Foos foos{...};

float sum = 0;
for( auto foo: foos) {
   sum += foo.get<X>();
}
Performance measurements using a loop using Table iterators and accessing data from only one column and iterating directly over a edm::soa::ColumnValues<> have the same efficiency when the full compiler optimization are used.

partial Table access

It is possible to restrict access to only certain columns of a Table via a edm::soa::TableView<>. Just like the case of the edm::soa::Table<>, the template arguments to a TableView are the columns you wish to access.

using es = edm::soa;
Foos foos{...};

es::TableView<es::X, es::Y> xys{foos};
A TableView has all the same member functions as the equivalent Table.

The advantage of a TableView is one can write a function that takes a TableView and then all Tables that use the same Columns can use that function. E.g.

using es = edm::soa;

std::vector<float> distances(es::TableView<es::X,es::Y> iValues);
The distances will work with Foos or any other Table that uses the es::X and es::Y columns.

filling a Table

Constructors

A Table<> can be filled by passing one container per column where the order determines which column is filled. All containers must be of the same size.
using es = edm::soa;
using Foos = es::Table<es::X, es::Y, es::ID>;

std::array<es::X::type,4> x_s{...};
std::array<es::Y::type,4> y_s{...};
std::array<es::ID::type,4> id_s{...};

Foos foos{x_s,y_s,id_s};

Alternatively, you can pass a single container of objects where the objects hold the value of interests and where appropriate 'value_for_column' functions are defined.

namespace bar {
class Foo {
...
   float x() const;
   float y() const;
   unsigned int id() const;
...
};

using es = edm::soa;
float value_for_column(Foo const& iF, es::X*) { return iF.x();}
float value_for_column(Foo const& iF, es::Y*) { return iF.y();}
float value_for_column(Foo const& iF, es::ID*) { return iF.id();}
}
...

std::vector<bar::Foo> vFoos{...};
using es = edm::soa;
using Foos = es::Table<es::X, es::Y, es::ID>;

Foos tFoos{vFoos};

If the object in question does not have a default value for a column or you do not want the default, you can use a lambda expression to fill the column via the edm::soa::Column<>::filler static member function. E.g.,

//using bar::Foo defined in earlier example
std::vector<bar::Foo> vFoos{...};
using es = edm::soa;
using Foos = es::Table<es::X, es::Y, es::ID>;

//Use a sequential index for id rather than result of Foo::id()
unsigned int index=0;
Foos tFoos{vFoos, column_fillers(
                   es::ID::filler([&index](bar::Foo const& ){  return index++;} ) )
          };

modifying elements

It is possible to change the size of an existing Table via the resize method. If resize is called with a value less than the current size, all rows beyond the new size will be removed. If resize is called with a value larger than the current size, the new rows will contain cells which were defaultly constructed, or were set to 0 if the type is a builtin arithmetic type. In addition you can modify individual cells or whole rows.

using es = edm::soa;
using Foos = es::Table<es::X, es::Y, es::ID>;

Foos foos;
foo.resize(2)

foo.row(0).get<es::X>()= 0.2;
foo.row(0).set<es::Y>(1.3);
//foo.row(0).get<es::ID>() would remain 0

foo.row(1).copyValuesFrom( bar::Foo(...) );

edm::SOATuple<>

For now, see https://github.com/cms-sw/cmssw/blob/master/FWCore/Utilities/interface/SoATuple.h

Review status

Reviewer/Editor and Date Comments
Last reviewed by: ChrisDJones - 29-Aug-2017 Created the page

Responsible: ChrisDJones
Last reviewed by:

-- ChrisDJones - 2017-08-29

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r6 - 2017-09-07 - ChrisDJones
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback