UltimateSoA: a trivial SOA binding to your beloved OO Data Hierarchy

Goal

The main goal of UltimateSoA is to provide a Structure of Array storage for OO Data Hierarchies (inheritance and containment) with minimal requirements (in principle no changes at all) on the OO design and implementation.

code and examples in https://github.com/VinInn/MultiBodyExcercise

Prior and related works

Reconciling OOD with DOD

Reconciling OOD with DOD is a proof of concept originally developed in 2011 by VI in view of a possible use at the ESC school in Bertinoro and for some use cases in CMS. It was not pursued further as the Author considered at that time that the model would not scale to a large application (For CMS a simple "hardwired" SOA was used).
It supported a hierarchical SOA structure. It required an implementation of the SOA by the user for each class in the OO hierarchy.

The main concepts (in particular the templated reference class and the hierarchical SOA structure) are part of the present design.

arrow-street

arrow-street is a library developed at INTEL. It targets mostly hierarchies of C-struct (POD).
arrow-street requires the user to write a "reference shadow struct" following not trivial "binding" rules. Reading the examples it seems also quite intrusive into the final user code.
At first sight it seems to require that the user write even more additional code that the "proof of concept" described above.

The implementation of the SOA storage is similar to the one used in UltimateSoA, details differ.

Vc, Cilk++

Vc vectors can be used to build SOA. In principle one can declare a
Vector3D<Vc::Vector<T>>
and continue from there. Not sure how long.
As for arrow-street, it is surely intrusive in the user code. It also requires a change in the coding paradigm (direct management of arrays, no explicit loops).

Same applies to Cilk++ and other vector-notation approaches.

Opportunistic bindings

A masochist can simply use boost::zip_iterator to build an AoS view to any set of independent collections

Root

Paw and Root provide column-wise storage with automatic binding to a raw-wise data model since ages. The design fully matches the Data Oriented paradigm in particular for what concern being memory (and disk) friendly. With the advent of the new llvm backend in Root6 one can imagine that vectorization, parallelization and even offload to accelerators can be supported in TFormula and alike. I do not see how Root can provide efficient data oriented support in standard compiled code for not trivial TTree.

Basic Concepts

The basic concept, already present in the previous proof of concepts, is the use of a data class templated with a reference type. Given a typical geometrical vector class (as present in many toolkits CLHEP included) Vector3D one will use Vector3D<T&> to bind to the SOA storage as in

//
  Vector3D<float> va{-3, 0, 2.};
  Vector3D<float> vb{1, 2, 3.14};

  float x[3]={1.f}, y[3]={1.f}, z[3]={1.f}; // sort of SOA
  Vector3D<float const&>  v1(x[0],y[0],z[0]);
  Vector3D<float&>  v3(x[2],y[2],z[2]);
  
  v3 = v1+vb-va;
//

the rest is syntactic sugar (and a good C++ compiler)

For what concern the SOA storage we use a hierarchy of

std::tuple<std::vector<T>...>
that replaces the user-provided hierarchy in the previous prototype.

SOA binding

Given a OO model, as the one in here, the user has to provide type-traits with proper aliases to the corresponding reference- and value-types plus a tuple of the types to be stored in the SOA in the same order they appear in the constructor of the OO class itself. So for the Vect and Line in the example the declaration will look like

//
template<typename T>
struct UltimateSoaTraits<Vect<T>> {
  using CREF = Vect<T const &>;
  using REF = Vect<T&>;
  using SOATUPLE = std::tuple<UltimateSoa<T,false>,UltimateSoa<T,false>,UltimateSoa<T,false>>;
};

template<typename T>
struct UltimateSoaTraits<Line<T>> {
  using CREF = Line<T const&>;
  using REF = Line<T&>;
  using SOATUPLE = std::tuple<UltimateSoa<Vect<T>,true>, UltimateSoa<Vect<T>,true>>;
};
These aliases can also be included in the class itself if desired (and even deduced from the template argument pack itself)

At this point the SOA containers will be declared as

//
  UltimateSoa<float,false>  f(10);  // just a vector...

  UltimateSoa<Vect<float>,true> vv(10);

  UltimateSoa<Line<float>,true> vl(10);

that can be used as standard vectors (at present just operator[] is supported, not difficult to provide the whole interface) Using these declarations in place of the user-provided SOA-classes all the examples in the previous prototype works with no change and identical performance.

A more "complex" DoD examples

Vectorization is not the only issue here, cache friendliness is often more at stack in actual applications.
Take the following data structure and functions

template<typename Float=float, typename Int=int, typename String=std::string, typename Bool=unsigned char, typename VecInt = std::vector<int>> 
struct DataT {

  DataT(){}
  DataT(Float ix, Float iy, Float iz, Int is, Int it, String inam, VecInt iw, Bool io) :
    x(ix),y(iy), z(iz), status(is), type(it), name(inam), what(iw), ok(io){}

  template <typename D>
  DataT(D const & d) :
    x(d.x),y(d.y), z(d.z), status(d.status), type(d.type), name(d.name), what(d.what), ok(d.ok){}
  template <typename D>
  DataT & operator =(D const & d) {
    x=d.x;y=d.y; z=d.z; status=d.status; type=d.type; name=d.name; what=d.what; ok=d.ok; 
    return *this;
  }
  DataT(DataT const & d) :
    x(d.x),y(d.y), z(d.z), status(d.status), type(d.type), name(d.name), what(d.what), ok(d.ok){}
  DataT & operator =(DataT const & d) {
    x=d.x;y=d.y; z=d.z; status=d.status; type=d.type; name=d.name; what=d.what; ok=d.ok; 
    return *this;
  }

  void reset() { ok= false;}

  Float x,y,z;
  Int status;
  Int type;
  String name;
  VecInt what;
  Bool ok;

};

using Data = DataT<>;

template<typename Cont>
void comp(Cont & cont) {
#pragma GCC ivdep
  for (auto i=0U; i<cont.size(); ++i)
    cont[i].x = cont[i].y*cont[i].z;
}

template<typename Cont>
void reset(Cont & cont) {
#pragma GCC ivdep
  for (auto i=0U; i<cont.size(); ++i)
    cont[i].reset();
}

The SOA binding can be declared as

template<>
struct UltimateSoaTraits<Data> {

  template<typename ...Args>
  static auto makeRefT(DataT<Args...>) ->  DataT<typename std::add_lvalue_reference<Args>::type...> {
      return  DataT<typename std::add_lvalue_reference<Args>::type...>();
    }

  template<typename ...Args>
  static auto makeCRefT(DataT<Args...>) ->  DataT<typename std::add_lvalue_reference<typename std::add_const<Args>>::type...> {
    return  DataT<typename std::add_lvalue_reference<typename std::add_const<Args>>::type...>();
    }


  using REF = decltype(makeRefT(Data()));
  using CREF = decltype(makeCRefT(Data()));

  using SOATUPLE = std::tuple<UltimateSoa<float,false>,UltimateSoa<float,false>,UltimateSoa<float,false>,
			      UltimateSoa<int,false>,UltimateSoa<int,false>,UltimateSoa<std::string,false>,
			      UltimateSoa<std::vector<int>,false>, UltimateSoa<unsigned char,false>
			      >; 

};

A trivial test will show how, even w/o vectorization (-O2) the SOA is much faster than the AOS (factor 8 for reset, factor 4 for comp (w/o vectorization))

int main() {
  std::cout << typeid(Data).name() << std::endl;
  std::cout << typeid(UltimateSoaTraits<Data>::REF).name() << std::endl;
  std::cout << typeid(UltimateSoaTraits<Data>::CREF).name() << std::endl;


  using DoD = UltimateSoa<Data,true>;

  DoD dod(20000);
  std::vector<Data> aos(20000);

 
  reset(dod);
  reset(aos);
  
  comp(dod);
  comp(aos);

  long long td=0, ta=0;
  for (int k=0; k<1000; ++k) {
    td -=rdtscp();
    reset(dod);
    td +=rdtscp();
    ta -=rdtscp();
    reset(aos);
    ta +=rdtscp();
   }

  std::cout << dod.size() << ' ' << double(td)/dod.size() << ' ' << double(ta)/dod.size() << std::endl;

  td=0, ta=0;
  for (int k=0; k<1000; ++k) {
    td -=rdtscp();
    comp(dod);
    td +=rdtscp();
    ta -=rdtscp();
    comp(aos);
    ta +=rdtscp();
  }

  std::cout << dod.size() << ' ' << double(td)/dod.size() << ' ' << double(ta)/dod.size() << std::endl;

  return 0;
}

Implementation

For reference with provide here the current implementation of UltimateSOA (gcc 4.9 or llvm 3.4 are required to compile it)
#ifndef UltimateSOA_H
#define UltimateSOA_H

#include<vector>
#include <tuple>
#include <utility>
#include <type_traits>
#include "align_allocator.h"
#include "SoaIterator.h"

namespace details_UltimateSoa {
  template<typename T> using AVector = std::vector<T,align_allocator<T,32>>;
}

template<typename T>
struct UltimateSoaTraits {
  using VAL =  typename T::VAL;
  using CREF = typename T::CREF;
  using REF =  typename T::REF;
  using SOATUPLE = typename T::SOATUPLE;
};

template<typename T, bool=!std::is_arithmetic<T>::value>
class UltimateSoa {};


template<typename T>
class UltimateSoa<T, true> {
public:
  using VAL =  typename UltimateSoaTraits<T>::VAL;
  using CREF = typename UltimateSoaTraits<T>::CREF;
  using REF =  typename UltimateSoaTraits<T>::REF;
  using Data = typename UltimateSoaTraits<T>::SOATUPLE;

  using value_type = VAL;
  using iterator = SoaIterator<REF, UltimateSoa<T, true>>;
  using const_iterator = SoaIterator<CREF, UltimateSoa<T, true>>;

   template<std::size_t... I>
   void resize_impl(unsigned int n, std::index_sequence<I...>) { 
     using swallow = int[];
     (void)swallow{0, ((void)(std::get<I>(m_data).resize(n)),0)...};
   }
   
  void resize(unsigned int n) {
    m_n = n;
    resize_impl(n,std::make_integer_sequence<std::size_t,std::tuple_size<Data>::value>{});
  }
  
  UltimateSoa(){}
   explicit UltimateSoa(unsigned int n) {
     resize(n);
   }

  auto size() const { return m_n;}

  iterator begin() { return iterator(*this);}  
  iterator   end() { return iterator(*this,size());}  

  const_iterator begin() const { return const_iterator(*this);}  
  const_iterator   end() const { return const_iterator(*this,size());}  


  template<typename V, std::size_t... I>
  V t2r_impl(unsigned int j, std::index_sequence<I...>) {
    return V(std::get<I>(m_data)[j] ...); 
  }

  REF operator[](unsigned int j) {
    return t2r_impl<REF>(j,std::make_integer_sequence<std::size_t,std::tuple_size<Data>::value>{});
  }

  CREF operator[](unsigned int j) const {
   return t2r_impl<CREF>(j,std::make_integer_sequence<std::size_t,std::tuple_size<Data>::value>{});
  }


  Data m_data;
  unsigned int m_n=0;
};


template<typename T>
class UltimateSoa<T, false> : public details_UltimateSoa::AVector<T> {
public:
  using Storage = details_UltimateSoa::AVector<T>;
  
  using value = typename std::remove_reference<T>::type;
  using ref = typename std::add_lvalue_reference<T>::type;
  using cref = typename std::add_lvalue_reference<typename std::add_const< T>::type>::type;
  using VAL = value;
  using CREF = cref;
  using REF = ref;
  using value_type = VAL;

  template<typename ... Args>
  UltimateSoa(Args... args) : Storage(args...){}


};

template<typename T, template<typename> class V>
void swap(V<T&> a, V<T&> b) {
  V<T> tmp = a;
  a = b; b=tmp;
}

#endif

-- VincenzoInnocente - 18 May 2014

Edit | Attach | Watch | Print version | History: r13 < r12 < r11 < r10 < r9 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r13 - 2014-06-24 - VincenzoInnocente
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback