Rendering [a lot of animation] on a remote render farm

Currently working on a project for a customer who needs me [among other things] to render short 7 - 8 sec animations, but in fairly large size and good quality… and no less then 6000 variations! [same animation but different parts and materials]
The parts I will build in Rhino, but the animation and rendering are with Cycle in Blender + an optimization add-on. It’s 240 frames per animation [at the moment we may have to cut it down]
my [top of the ‘10 years ago’ line] MacBook Pro take 17 min per frame… [1200 samples 2800 x 1800 px]
One Farm machine we already tested https://irendering.net. is a 2x RTX 3090 and a threadriper., it took just under 1min per frame, [including the few seconds of saving and loading the next frame]
they do have machines up to 8x RTX3090. which I haven’t tested yet. But this would still be too long to render the 240 x 6000 frame my customer wants…
Another place I haven’t tested but perhaps someone here has? is Affordable & transparent pricing | Paperspace
they have up to 8x A100. [surprisingly for less money then I render]
can I expect a significant speed boost from this semi super-computer setup?

I did tell my customer that by the time all the parts will be ready in a few months time, perhapst the 4090s will be available in the farms.

Any suggestion and ideas, and other places to check are very welcome. [Obviously I haven’t done this kind of things before]

With thanks
Akash

FWIW the OctaneBench database shows a single A100 GPU scoring below a single 3080. But more relevantly for you, an 8 GPU A100 config score about 3 x the score for a config with 2 x RTX 3090’s.

(Octane is a GPU renderer like Cycles but I don’t know how directly performance of the one will translate to the other.)

thanks you
Yes I came to understand that the A100 is more about gpu ram then speed. the same place also has a rig with several A6000…
I’ll be testing an 8x 3090 machine today or tomorrow, Lets see how it does, [I’d imagine it will be somewhat longer then 15sec… which is 1/4 of the 2x 3090 speed.

A100 is the Data Center Ampere GPU it lacks RT cores and it has fewer Cuda cores that are clocked higher.

I’m currently Running A40 GPU similar to RTX A6000 but with slightly lower memory bandwidth. It runs stable but slower than 3090TI

Currently for Rendering, the fastest GPU is RTX - 3090TI that has the same cuda and rtx cores as A6000 but clocked even higher.

Screen Shot 2022-08-09 at 23.46.39
I just run a test on this rig. it actually gave a slower frame time compare to the rig from yesterday of the 2 x 3090. [the 8 x was about 70 sec/frame]

Does someone has an idea how can this be…? it should have been close to 4 time faster I would have though…

If the screenshot below is correct, It is because of SLI,

SLI is not recommended for GPU rendering (although in this case it is NVlink) , Generally NVlink doesn’t scale well in CUDA/ RTX raytracing.

Q: Does V-Ray Next GPU / V-Ray RT support multiple GPUs? Do they need to be in SLI?

A: The GPU-accelerated version of V-Ray is able to utilize multiple GPUs and does so very effectively. It will not be a perfect “4 GPUs is 4x faster”, but you can expect significant performance improvements with every card you add. However, since V-Ray is using the cards for compute purposes they do not need to be in SLI mode. In fact, SLI can sometimes cause problems so we recommend leaving it disabled if possible.

thank you.
I didn’t see an option to disable the SLI. I can try to ask their support.
By the way, what is it if you don’t mind explaining ? Edit : I did some reading… it seems to be an oldish tech, and the current better way is NVLink…? or can the all these gpu work together without either SLI or NVLink…?

I think it depends on the rendering engine, I’m currently Running 3 GPU setup without enabling NVLINK, so I end up with 3 separate GPUs for Rendering. Renders like V-Ray, Octane and Redshift can take advantage of multiple separate GPUs.

some renderers are more like a gaming engine like Lumion, Twinmotion or escape, these render engines can only use the primary GPU that is connected to the monitor; thus, many people are using SLI/NVlink to combine multiple GPUs into a one device that handles the rendering process.

In your case, Cycles is a Cuda Based Raytracing Engine that works best when SLI is disabled.

I don’t have blender installed, but for example below is my screenshot of rhino cycles, I keep the main GPU for display while the 2 Tesla cards are handling the Raytracing.

also to help you choose the right setup, below are the results for benchmarked gpus:
Blender Benchmark Results (Updated Scores) (cgdirector.com)

1 Like

Thank you for the the good explanation, things are getting clearer.
Perhaps one more Q. I see you have selected Cuda instead of Optix…? while in most tutorials and post I saw, people tends to prefer Optix [I haven’t had the chance yet to study the difference between these 2 with this being my first dealing with any setup of GPU rendering]
thanks a lot
Akash

See this post Cuda vs Optix - #2 by nathanletwory

In Blender, as opposed to Rhino, the top benchmark scores come with Optix

Cuda: NVIDIA GeForce RTX 3090 median benchmark 3138.25
Optix: NVIDIA GeForce RTX 3090 median benchmark 6570.59

But note that these are single GPU scores.

Thanks a lot
I’m now running one more test on that Farm, this time just using one 3090… and I’m getting the same score as with 2 gpu, [56 sec including saving and loading] and it’s actually faster then their mighty 8x rig. [it does say NVLink not applicable but it doesn’t say they use SLI…or one should infer to that.
Screen Shot 2022-08-10 at 16.01.19

AFAIK the 3090 can only do 2-way SLI (the card only has a single NVLink connector).

are you able to get real-time stats showing which GPUs are under load? I guess something is wrong with the configurations.

In blender cycles can you see if all of the 8 GPUs are in action?

I’ll have to check that. I only made sure in the beginning that cycle see and is set to use all the gpus. [right now trying to finish that test animation on just a single gpu, [they had also an intel gpu on that single machine that I didn’t select, which seemed a bit strange since this is a threadriper cpu and it’s not supposed to have a built in gpu AFAIK…?

I can try a single image later on the 8x machine, and see if I can get and actual readout of use…will need to search for how-to-this

thanks a lot
Akash