BulletMultiThreaded/NarrowPhaseCollision makes use of this boxBoxDistance.
Cache some values in src/BulletMultiThreaded/SpuContactManifoldCollisionAlgorithm.cpp, to avoid DMA transfers
2) Added btConvexSeparatingDistanceUtil: this allows caching of separating distance/vector as early-out to avoid convex-convex collision detection.
btConvexSeparatingDistanceUtil is used in src/BulletCollision/CollisionDispatch/btConvexConvexAlgorithm.cpp and can be controlled by btDispatcherInfo.m_useConvexConservativeDistanceUtil/m_convexConservativeDistanceThreshold
3) Use BulletMultiThreaded/vectormath/scalar/cpp/vectormath/scalar/cpp/vectormath_aos.h as fallback for non-PlayStation 3 Cell SPU/PPU platforms (used by boxBoxDistance).
Note there are other implementations in Extras/vectormath folder, that are potentially faster for IBM Cell SDK 3.0 SPU (libspe2)
Add rayTest to btBroadphaseInterface, and implement efficient version for btDbvtBroadphase to accelerate raycasting.
btAxisSweep3, btSimpleBroadphase and btMultiSapBroadphase implement brute-force method (as before). For now, it is recommended to use btDbvtBroadphase for fastest world raycast.