and embed the included files directly in the stringified files.
We need this, because we start sharing struct definitions and code between C/C++ and OpenCL (and potentially other languages)
preprocessor is from http://github.com/willsteel/lcpp
Use tetrahedra instead of barrel for convex demo (until performance for edge-edge is improved)
Increased #overlapping pair capacity from 12 to 16 / objec
improved parallel batching, don't try to write for static objects,
this fixed a bug, when the hash of a static object was identical with hash of dynamic objects, causing it to be assigned a bogus 100+i batching number
The parallel batching is still not enabled, because we need to measure the batching size (todo)
Will move joint setup to GPU, and then some benefit should be visible.
Don't use 64 alignment, it causes data structures size mismatch between cpu and gpu
implement CPU version of narrowphase convex collision, for comparison/debug purposes
start towards cpu/gpu sync, for adding/removing bodies (work in progress)