In order to speed up rendering, we implemeted a new ParallelRenderer. This renderer launches a number of threads, each computing a different subset of pixels. This seemed like the ideal scenario of parallelization, as those threads would never block each other since they each only write in their own set of pixels. We also considered how we wanted to split the pixels between the threads.
One naive implementation would be to have each thread computing a different line, or more generally a different area of the image. However, this is the incorrect solution, as som threads would shoot rays that intersect with almost nothing and terminate very quickly, while other threads are busy computing very complex rays, like in the area with the mask. Therefore, we numbered the pixels from top left to bottom right, divided them between each thread based on their modulo. This way, with 8 threads, the 8 pixels computed for each thread at a time are all very close together and will likely intersect similar geometry.
In our observations, when using this method, our threads generally terminate more or less at the same time. Using this new Renderer, we managed to increase performance up to 6x on our processor (Core i7-6700k).