Recursive ray tracing is a powerful rendering technique used to compute realistic images by simulating the global light transport in a scene. Algorithmic improvements and FPGA-based hardware implementations of ray tracing have demonstrated realtime performance but hardware that achieves performance levels comparable to commodity rasterization graphics chips is still not available.
This paper describes the architecture and ASIC implementations of the DRPU design (dynamic ray processing unit) that closes this performance gap. The DRPU supports fully programmable shading and most kinds of dynamic scenes and thus provides similar capabilities as current GPUs. It achieves high efficiency due to SIMD processing of floating point vectors, massive multithreading, synchronous execution of packets of threads, and careful management of caches for scene data. To support dynamic scenes B-KD trees are used as spatial index structures that are processed by a custom traversal and intersection unit and modified by an update processor on scene changes.
The DRPU architecture is specified as a high-level structural description in a functional language and mapped to both FPGA and ASIC implementations. Our FPGA prototype clocked at 66 MHz achieves higher ray tracing performance than CPU-based ray tracers even on a modern multi-GHz CPU. We provide performance results for two 130 nm ASIC versions and estimate what performance would be using a 90 nm CMOS process. For a 90nm version with a 196 mm2 die we conservatively estimate clock rates of 400 MHz and ray tracing performance of 80 to 290 fps at 1024times768 resolution in our test scenes. This estimated performance is 70 times faster than what is achievable with standard multi-GHz desktop CPUs.