disable 'simdwidth' optimization for determinism (need to double-check) made the spatial batching 3D
implement CPU version of narrowphase convex collision, for comparison/debug purposes start towards cpu/gpu sync, for adding/removing bodies (work in progress)