The availability of OptiX in the latest WIP made me do a few tests. I just wanted to compare if the performance gains of OptiX over CUDA in RhinoCycles are inline with the gains that the big brother BlenderCycles is able to achieve, but that’s not the point ot this post. Apart from comparing CUDA and OptiX on an RTX 2070 Super, I also tried a Ryzen 7 2700X, without expecting anything abnormal. For a relatively simple scene at 1800x1200px and 1000samples I got:
RTX 2070 Super / CUDA: ~2m 44s
Ryzen 7 2700X / CPU: ~32m 05s
The 2070 Super was over 11 times faster than the 2700X. We know GPU rendering can be much faster compared to CPU, so I wasn’t surprised that although the CPU has 8C/16T, it was much slower than the 2070 Super. To be sure I ran more tests, lower resolution and less samples because I didn’t wanted to wait half an hour for the CPU renders to finish. Then the difference was even bigger, 2700X almost 14 times slower.
Then I checked the results against the ones from blender open data:
These results have the 2070 Super only a little over 5 times faster than the 2700X. To my surprise these results are hugely different compared those in Rhino. Either RhinoCycles is magically speeding up CUDA or RhinoCycles is putting a huge penalty on this AMD CPU.
By default, only 14 threads are enabled in the Cycles settings, of course I set that to 16 before I rendered. For CPU rendering I set the tile size to 16, for GPU to 512.
I hope this is just a simple bug in the current RH7 WIP, I haven’t yet tested with RH6. If it turns out this massive performance penalty is also present in RH6, then it would be interesting to check if the same is true for Intel CPUs.
I know that AMD GPUs are much slower in RhinoCycles because OpenCL rendering was never properly wired up unlike CUDA. I know because I also ran with an RX580 and a Vega64 for quite some time.
But here I am actually referring to CPU (my 2700X) not GPU.
The results are really shocking, if it isn’t a bug.
It would be useful to see the 3dm you are testing with.
Apart from the difference I already mentioned in the Optix thread some further differences:
we don’t use Embree yet for CPU
groundplane in use, it is a huge mesh object
we have custom patches that possibly aren’t as efficient in the split kernel as could be
clipping plane support
lighting tweaks
texture mappings specific to Rhino
adapted shader referencing from object instead of mesh data
etc
That said, the OptiX version on RTX 2070 is around 10 times faster than on Ryzen 7 2700x according the blender benchmark site with Blender 2.81 as the target.
Anyway. There is nowhere any artificial slowdown in the rendering itself. We do have a setting where users can change the throttle ms, which is a tiny sleep in milliseconds after each sample, because on low-end machines everything would otherwise become unresponsive. You can set this to 0 if you wish.
To better compare you should create a scene in Blender with the same setup as in Rhino wrt world and shaders.
I just rendered a scene on RH6 with CPU and GPU and then RH7 CPU and GPU. Tile size 16 for CPU and 512 for GPU. Throttle always set to 0.
RH6, RTX 2070 Super, CUDA: 00:50
RH6, Ryzen 7 2700X, CPU: 03:06
RH7, RTX 2070 Super, CUDA: 00:56
RH7, Ryzen 7 2700X, CPU: 09:52
The RH6 renderings show normal results. The difference between the 2070Super and the 2700X is about the same compared to the blender results. The CPU render times with RH7 are very abnormal. Clearly there is something wrong with the current WIP.