We present an improved parallel Sweep and Prune algorithm that solves the dynamic box intersection problem in three dimensions. It scales up to very large datasets, which makes it suitable for broad phase collision detection in complex moving body simulations. Our algorithm gracefully handles high-density scenarios, including challenging clustering behavior, by using a double-axis sweeping approach and a cache-friendly succinct data structure. The algorithm is realized by three parallel stages for sorting, candidate generation, and object pairing. By the use of temporal coherence, our sorting stage runs with close to optimal load balancing. Furthermore, our approach is characterized by a work-division strategy that relies on adaptive partitioning, which leads to almost ideal scalability. In addition, for scenarios that involves intense clustering along several axes simultaneously, we propose an enhancement that increases the context-awareness of the algorithm. By exploiting information gathered along three orthogonal axes, an efficient choice of what range query to perform can be made per object during run-time. Experimental results show high performance for up to millions of objects on modern multi-core CPUs.