Break up clipHullHullConcaveConvexKernel into multiple stages, so it might 'fit' in Apple's OpenCL implementation
Implemented bvhTraversalKernel and findConcaveSeparatingAxis on CPU (debugging, possible future CPU version)
Remove constructor from b3Vector3, to make it a POD type, so it can go into a union (and more compatible with OpenCL float4)
Use b3MakeVector3 instead of constructor
Share some code between C++ and GPU in a shared file: see b3TransformAabb2 in src/Bullet3Collision/BroadPhaseCollision/shared/b3Aabb.h
Improve PairBench a bit, show timings and #overlapping pairs.
Increase shadowmap default size to 8192x8192 (hope the GPU supports it)