added tetrahedral mesh test scene
expose b3Config as member variable for demos.
move a 'glFlush' out of the innerloop (render performance)
SSE -> SSE2 in premake
fix crash in broadphase (when no aabb's exist)
implement CPU version of narrowphase convex collision, for comparison/debug purposes
start towards cpu/gpu sync, for adding/removing bodies (work in progress)