Thank you so much. The help I have received from this forum has really helped me in choosing components to use in my next computer system that I anticipate will be running some flavor of Windows 11 and Rhino 7 and then [I suspect the not to far in the future] Rhino 8.
While it is clear that more than one Nvidia GPU can be installed on the same machine and have them recognized:
But for example, if one were to install two Nvidia RXT A6000 cards on the same computer, would both cards be contributing to a Raytracing and or Rendering computation [each card has 48GB of GPU memory so there is a potential of 96GB total video memory available to be utilized] ?
Scenario # 1: The computer has a single Nvidia RTX A6000 GPU and has a model / scene that requires more or close to 48 GB of GPU memory to load / look at in Raytrace mode and thus one might suspect will not be âpractically manageableâ e.g. moving the model / scene results in a magnitude of sluggishness / lag time that one cannot work with it.
However, on a computer with two Nvidia RXT A6000 cards that same model / scene or one even larger can be worked with because the GPU memory is now pooled; you have a total of 96 GB of video memory to work with. The workload is distributed across the two GPU cards in such a way that optimizes the rendering performance / speed. Is this true ? Or is it that the model is still confined / constrained to a 48 GB GPU memory space?
Scenario # 2 Given that it is true that in a computer with two Nvidia RXT A6000 cards the GPU memory is pooled / shared and the workload optimally distributed between the two cards; will Raytrace Mode rendering speed increase for both static images and for Raytrace video production ? The model / scene might not be particularly large but rather at some fairly broad range of model / scene size the render speed increases perhaps because the the workload is managed more efficiently.
Finally, I read somewhere that a good ârule of thumbâ is that your computer have at least twice the amount of system RAM as it does GPU memory. Is this advise generally on target? Putting aside cost considerations is it adventurous to go with DDR5 system RAM ?
Iâm not up-to-date, but some years ago, both cards need to communicate over SLI to achieve any gain.
But could be that current the current graphics interfaces are able to orchestrate that.
Once you run out of VRAM, resources are allocated on the RAM. And if you run out of RAM your
drive is used for this. But really, even if you build in a memory leak on purpose, 48 GB of data is a lot.
I mean, there are always exceptions to common rules. But if you really allocate more than 48 GB on the VRAM, then you do something wrong in my opinionâŚ
Some thoughts on dual graphics cards, SLI and sharing memory.
As far as I understand it, for raytracing you donât want/need SLI. SLI comes from the gaming world and does not offer benefits in raytracing.
Even if using multiple GPUs for raytracing, the scene has to fit into the memory of each card. Therefore you would have 48GB, or slightly less if you are also using the GPUs to drive your monitor(s).
Here is some good input I found from a staff member of Otoy, makers of Octane:
actually multiple cards together will add up the cores resulting in a linear speedup, unfortunately it is not same for the vram - memory is not compounded in the same manner in gpu renderers. All the textures and hdri among other stuff used for an entire scene has to fit into a functional swap space - that is each GPU must have a copy of all scene elements it needs to process - and therefore this swap space will be limited to the size of that card with the least amount of vram present in your machine. So if you use two 4gb 980 for rendering, the rendering is faster but the memory size used for the rendering is effectively only 4gb.
Also, SLI is mostly used for game applications to be able to work with separate video cards simultaneously to process very high frame rates between the cards and the CPU. Not necessarily for detection of gpus. Octane is rendering with the GPU (not the CPU) and Octane does not need SLI to detect the gpus installed in the machine. We donât recommend SLI for rendering, in fact, Octane will run much better without it.
(source: OTOY Forums ⢠View topic - Two GPU's...only one uses Vram)
Quadros can âpoolâ their memory together. Whether your renderer supports doing that or not, whether thatâs advisable for CUDA rendering, is another matter.
Well, you you need plenty of RAM because programs normally canât access the VRAM directlyâitâs a new feature that I think technically first appeared on the PS5âit all passes through RAM first.
What I found when I had 4 X 1080tiâs was what you need is enough VIRTUAL memory to be able to âswap outâ all that VRAM.
Maybe another note, in the end there are a lot of parameters to consider. And probably nobody really knows. Even benchmarks are not always done correctly or do represent real world scenarios.
Itâs not just the hardware, but as others also mentioned itâs the software, the drivers, the model and hundreds of minor settings.
In the end, you can tweak all these things and figure out if itâs really beneficial to have 2 High-end cards. The last major setup I had was a system with two 2xP6000 and one system with 2xTitanX to do VR Visualization. 2 of the cards were beneficial, because a VR headset is basically a two monitor system. While I was pushing the limits in one software, for another software the same setup was a complete overkill. And this is because, VRAM is just a buffer and the GPU is just doing stupid matrix multiplications. What and how much you compute and allocate is really a matter of the software and the data provided. As a user, you have no chance in tweaking the software, but you have large influence on how to optimize a model/scene.
E.g. You can bake global illumination into textures, and in a static model this can lead to the same (or even better) visual quality, as having real time ray-tracing enabled. You can tweak the render mesh, you can remove invisible geometry and so on. It is a difference if a less-curved surface consists of 30 or 30000 faces. Hardly any difference in the visual quality, but when it comes to memory allocation, this sums up quickly.
Of course, you can waste money and resources to buy the best system you get. But in the end itâs questionable if you really push the limits at all, and if you do, then the issue is likely not the hardware. Usually you are way below and above the performance threshold . So 20% difference in the overall performance, has no impact at all.
I agree in many ways with the sentiments you state in both of your posts. Thank you.
Indeed you are spot on in:
(1) Modeling for rendering so as to use resources efficiently: I need to improve with this and am trying to do so. Thus, I am taking the âRendering with Rhino 7â course. But until I learn to do better I need to implement a force vs grace workflow, as indicated, out of necessity and to get the model done.
(2) Indeed there are allot of parameters to consider. And yes who knows and what will be. It is a âComplex Systemâ problem. A problem solving and predictive approach that I am a fan of.
(3) I am going to try to address some of your other points when I respond to others help on this thread.
I was not as precise as I should have been and in this is where ââŚexceptions to common rulesâŚâ may occur.
(1) Frequently I may need to visualize 3-4- 5 models at the same time that often have many similar properties but for e.g. on 2 out of the 5 models they have quite significantly different properties. And then move them and or view them is say the 4 standard views in Raytrace mode.
(2) I want the render to be as fast as possible
(3) I want to be able to make Raytrace mode render videos.
I believe I had seen this post before but apparently it did not âsink inâ. Now it is. Thank you. Since fast render speed is especially in Raytrace mode a high priority goal having the two cards seems to significantly contribute to achieving this goal. I would imagine video renders would speed up to.
for linking two of their GPUs together would not work. So yes it seems that currently -and as you state- that model size is constrained by the GPU memory of the card; you cannot pool the memory from multiple cards and have it all be available to use.
I do see that your reference is from 2015 but in 2022 I had the impression ( but could easily be wrong) that games do use Raytracing and I imagine a fast frame rate is advantageous.
So puzzling to me because it would seem that being able to pool all of your memory to do as fast as possible Raytrace rendering [both still image and video- a fast frame rate] and work in a practical way with large models would be something to aspire toward.
Thank you. By Quadros you also mean the latest Nvidia RXT A6000 etc. cards correct ? [It seems Nvidia has changed their naming system so I just wanted to check] and it certainly does seem like the memory could be pooled from several cards using the Nvida NVLINK and I guess in the case where the memeory can be pooled this allows for a larger model ?
Assuming that the render does support the pooled memeory then what I do not understand is why using CUDA may or may not be advisable. I believe you may be bringing to my attention a keen distinction I had not a clue to before.
Thank you for the tip about the need for enough Virtual memory.
Iâm just saying that itâs probably ultimately fastest for each card to have its own memory than to be sharing.
Of course all this discussion seems a bit trivial, what are you actually currently using or intending to use for rendering? Why do you think 48GB per GPU may not be enough? Is this for ILM? Almost NO ONE has a 48GB GPU, let alone is using it for raytracing instead of machine learning research or splitting it across dozens of virtual machines. Iâm still soldiering on with an 11GB 1080ti, I used to have 4 of them(sold the other 3 to upgrade the rest of my system while GPU prices were still berserk!) for rendering with iRay and that was perfectly adequate for anything I cared to throw at it, which did include 4K animations.
I think that, with Cycles or any other realtime rendering, texture memory is something to watch. Unfortunately, I think that the time you run an A6000 out of memory, Rhinoâs cache shaders/mappings to disk (SSD) will be a nightmare. I had even considered setting up a RAM drive to see if it would help. There is no reason not to do stuff like that on disk while system memory exists. Iâve asked for no-draw shaders that could be applied to unseen surfaces to speed that process.
Iâve noticed that doing 4K renders on Cycles, system memory usage tends to go up quite fast. Memory-wise, I am comfortable doing fairly complicated scenes 4Kâs on my 32GB system RAM, with a 2X render, I would want 64GB.
Simple Solution: send me the extra A6000. Itâs okay, I could get by with one : )
For rendering I am currently using Rhino Render and am quite resistant to incorporating into the workflow other render software. I believe that Rendering in Rhino 7 has a tremendous amount of excellent capabilities. And am working to learn how to be able to take advantage of them. I have every confidence that Rhino 8 will continue this path.
If by ILM you mean the company: Industrial Light & Magic then the answer is no. If ILM stands for something else then please kindly tell me.
I am not sure 48GB will be enough on a 3 monitor setup [I currently have 2 Eizo FlexScan S2133 and plan to add a 3rd if I can] whereby it is very important that one can:
With 3-5 models on screen:
Change to Raytrace mode in e.g. all 4 standard viewports and have them quickly rendered to a âgood qualityâ and then perhaps rotate them some and then quickly get a good render of the new view. Or perhaps remove some of the models from view and then bring them back into view quickly and rendered.
Do very high quality static Raytrace image and Raytrace video quickly
Raytrace as I model: For example set the perspective viewport and front viewport to Raytrace so that it obtains enough passes to get a really nice render. I am modeling in shaded or ghosted etc. mode in another viewport and want to quickly see - or get a good idea of - what the effect of the change(s) I just made are. Or I make a setting change to a material etc. wanting to see the effect quickly
I have been soldiering on with an Nvidia p4000 (8GB) in a laptop, 64GB system RAM and I have maxed it out - awhile ago.
I am also, as best I can, taking into consideration what I might be needing 1/2 - or a yr from now.
And the video walk through of the lighthouse- wonderful. I can remember being fascinated by the lighthouse lenses as clearly as if I was at this lighthouse yesterday.
I guess the general concept is what I think of as âdata densityâ and although not the proper technical term; but when having a model (s) where each measurement unit of it has a data density analogous to that of approaching Osmium; memory use / need goes up quickly.
I do not technically understand your thinking that at the point an A6000 is out of memeory become problematic for Rhinoâs cache shaders / mappings. Could you please explain this a little or if you have a link to a reference that would be great.
For maximum raytracing speed, maximizing the CUDA core count for your budget is what matters, not the quantity of RAM. The stupid-priced cards are rarely the best way to get that.
Your monitor setup is NOT demanding at all, but if you want to maximize performance, donât throttle Godâs Own Compute GPU by making it also handle the drudgery of drawing the OpenGL windows and Windows fluff. You have dedicated âdisplayâ and âcompute onlyâ cards.
And Iâm not sure why you want a $5000 video card for either the Rhino rendererâwhich is not as full-featured as add-in productsâor 3 tiny monitors. Itâs like asking if a semi truck is adequate to pick up a few cans of paint at Home Depot.
With multiple GPUs without nvlink/sli/crossfire memory is not shared between the devices.
I have an RTX A6000 and RTX A5000, my A6000 has 48GB RAM, my A5000 has 24GB RAM. I have both selected as my render device, but I can do âonlyâ ~24GB scenes to fit in memory. With multiple GPU rendering data is copied to all render devices.
I agree with your comments. I would like to add though, that the pure CUDA count does not necessarily relate to final speed. Usually the faster cards also have more cores though.
I find that a good measure of real-world raytracing performance is Octaneâs Octanebench. You can filter there by single-GPU and multi-GPU setups.
I actually made an Excel sheet somewhere comparing the different GPUs and their CUDA counts to come up with a score for Octanebench points per CUDA core and other factors like price.
I am also curious what exactly OP would need 48GB of GPU memory for.
I have really tried to explain the need for the GPU memory in some of the previous posts. However, maybe, as is sometimes the case, similar to Sherlock Holmes taking note of what did not happen, I have failed to state the central challenge I am attempting to palliate as much as possible.
As I make changes to a model (s) be it e.g. material, view, lighting, hide / show a model (s) etc. I want to be able to get a good visual - Raytraced- on the change (s) as quickly as I can because it is in the time that passes before the new visual emerges that is the workflow and creative bottleneck. To put it in very human terms, it is the cummulative wait time that fatigues me.
The amount of VRAM has no, zero, nada impact on speed, as long as the scene fits in it. And frankly throwing all the GPUs you can at the problem will only do so much until the bottleneck becomes something else in your platform, and itâs dubious that even the biggest baddest server hardware money can buy will actually give you a noticeable benefit. What sort of projects are you even working on?
I absolutely get your need and desire. I have the same desire. I love working with Octane as part of that reason - updates are very quick and results look great even after minimal wait time.
Memory will not be your bottleneck unless you are working on scenes that are gigantic, like realistic city size. Any smaller scene will easily fit into the memory of a high-end gaming card (like the RTX3080ti or RTX3090). Even if you were to use huge textures like those from V-Ray Scans and the like, I doubt you will hit the memory limit.
Therefore I would rather look into other factors that will increase the speed of your rendering. Those are:
Raytracing software
The settings of said software
The number of CUDA cores available
The speed with which your assets like materials and meshes get uploaded to the GPU memory
1 and 2 very much depend an what you like, whether it is available for the platform and host software, etc. We like using V-Ray (not the fastest, but has some cool options) and Octane (one of the fastest and a solid feature set). Number 2 can have a huge effect depending on the tradeoffs you are willing to take, etc.
Number 3 depends solely on the type of GPU and number of cards. I think 2 cards is good, after that the requirements for the PC go up quite heavily.
Number 4 also largely depends on the kind of system you have, but any good gaming system will be good for that.
So in the end I would save my money and invest in a solid gaming system. It will cost much less than a system with expensive A6000 GPUs and so on, while giving minimal, if any, speed advantages. Also if you always want the fastest speeds possible, I would rather update the PC more often. A good PC in 2 years will be faster than even a highest end one today.
I hope all that makes sense.
If I were to recommend a ready made system, I would go for something like the HP Envy TE02-0950nz.