GPU powered grasshopper components

torabiarchitect · September 2, 2024, 12:00pm

finding mesh-mesh intersection can be parallelized as it is eventually finding face-face intersection. however in a Boolean operation after finding the intersection points and classifying them, the rest my not be easy to do on GPU, as it contains lots of branching . so my plan is to integrate mesh mesh intersection and that was actually the reason I started this Gpu optimized components

seltzdesign · September 2, 2024, 2:26pm

Wow, that is actually really fast. In the cases I have tried it, it usually takes a few seconds to a minute or so and then fails.

torabiarchitect · September 2, 2024, 7:38pm

Using GPU optimized component for machine learning could be another use case. in this example the data is created in Grasshopper, but the line is fitted using a simple gradient descent algorithm while the function evaluation and gradient computation is all done on GPU.

attheeast18 · September 3, 2024, 10:40pm

My cases are relatively simple. I’m doing some Road Daylight cuts and fills so I’m just subtracting some simple shape from a topographic mesh. Mine fails too especially if it’s low resolution. Still, a few hundred milliseconds vs tens of milliseconds is big. I’m just envious at the instant booleans I see on research papers and on Youtube

seltzdesign · September 4, 2024, 8:50am

Yeah, I would be careful with those. They will usually only work for very specific use-cases or in the case of research papers will be quite a lot of work to set up.

I get that it would be nice if it was faster, but are you doing something where you need to see updates in real-time?

Honestly, if I had to choose I would much rather have more reliable booleans than faster ones.

torabiarchitect · September 23, 2024, 10:06am

Quick update on this top, here is the performance comparison between GPU powered sectioning tool vs native Mesh-Plane intersection component in Rhino.
Rhino native sectioning tool (multi-threaded): Took 140 seconds to slice a high-density mesh with 1,000 planes.
My GPU-optimized component: Finished the same task in just 10 seconds, making it 𝟏𝟒𝐱 𝐟𝐚𝐬𝐭𝐞𝐫!
But the performance gains don’t stop there:
For 1 plane: It’s 𝟔𝟐𝐱 𝐟𝐚𝐬𝐭𝐞𝐫!
For 10 planes: 𝟐𝟒𝐱 𝐟𝐚𝐬𝐭𝐞𝐫!!
For 100 planes: 𝟏𝟖𝐱 𝐟𝐚𝐬𝐭𝐞𝐫!
While performance scaling drops as the plane count increases, this is just the beginning. I’m working to eliminate some post-sectioning CPU processes and push more of the workload onto the GPU—further reducing those times.

stevebaer · September 23, 2024, 11:03pm

Did you write your own mesh-plane intersector for the GPU?

torabiarchitect · September 24, 2024, 5:23am

Hi Steve, yes , and I can tell you it is not the best implementation. Still working on it

kiteboardshaper · September 24, 2024, 6:47am

I so have a use case for this. Looking forward to seeing a future release.

Cheers

DK

torabiarchitect · September 30, 2024, 7:53am

Another benchmark, this time checking whether a point is inside a closed polygon with 1600 points. the grid is 10201 points and all of that happens in 20 ms on my laptop with intel GPU. (an old surface pro)

DanielPiker · September 30, 2024, 9:55am

It’s worth also comparing to a simpler script component

With the standard containment GH component, processing the list input parameter is what is taking most of the time there.
A simple script with a Parallel.For loop can find and output the points (from a 10201 point grid contained in a 1600 point polyline) in around 64ms, including the creation of the grid, the culling and outputting the points.

containment.gh (9.5 KB)

Once you include the time for the conversion to and from the vector type and the culling for a more direct comparison, the GPU version in your video looks like it is taking significantly longer than that.

This is not to say that geometric operations using a GPU library can’t be much faster for some problems and scales, and the potential for more integration of such libraries in GH is exciting, but making like for like comparisons can be tricky.

torabiarchitect · September 30, 2024, 10:07am

you forgot that we are not using the same machines Thanks Daniel for providing the code , Obviously you have a much better machine than me! your code runs 388 ms . if I ignore the data conversion from rhino point3d to double4 on openCL the code on “MY” GPU runs roughly 19 times faster. I am sure you get a better performance boost if you are using a gaming machine. The problem on my side comes to loading and unloading data and that is because of a decision that I made when I started writing the kernerl. I chose the double4 for homogeneous coordinates rather than double3. that means I cannot use same memory address for the points in rhino and need to convert them back and forth.

DanielPiker · September 30, 2024, 10:13am

Did you recompute the solution with F5?
The very first time any script component runs it takes significantly longer.

torabiarchitect · September 30, 2024, 10:14am

yes of course , the same applies to GPU version, first run is always slower as the kernels are being compiled.

DanielPiker · September 30, 2024, 10:39am

My point is that ignoring those data conversion steps means it is no longer a direct comparison if the conversion always has to happen when using it.
Finding realistic scenarios where a GPU computation offers real advantages over a CPU version is interesting, but we need to be careful about whether that’s what we’re actually measuring (instead of comparing more about the data loading and conversion steps) before jumping to conclusions with the test numbers.
I do think there are likely to be some applications where a GPU version is much faster in a genuinely useful way, I’d guess maybe more for large mesh or pointcloud operations where multiple processing steps happen between the data conversions.

torabiarchitect · September 30, 2024, 11:06am

You’re absolutely right that we need to consider data conversion steps in these comparisons, as the total performance includes both computation and data movement. the decision to use GPU over CPU should depend on two key factors:
Arithmetic (or operational) intensity: The ratio of mathematical operations to memory transfer. For GPU computing to provide a real advantage, the task must involve enough computational work to outweigh the time spent transferring data between the CPU and GPU.
Parallelization: The task also needs to be highly parallelizable for the GPU’s many cores to be effective. If parts of the process are inherently serial (non-parallelizable), this reduces the overall performance gain.( see the mesh slicing in above)

In some cases, tasks like large mesh point cloud operations can really benefit from the GPU’s parallel processing as you mentioned. Each use case needs to be evaluated on its own, considering both data transfer overhead and the inherent parallelism of the task.

The reason I am ignoring the data transfer for now is that I see this in a bigger context where you don’t need to unload the data from a method like point containment, this is probably something you will be using in a larger context data will be pushed to other components that are also running on GPU. a computational graph, large enough, to overcome the data latency…

Czaja · October 19, 2024, 8:24pm

Hi, I have read your experiments before, and now testing one solution provided by Daniel which is working very nice, but given that I will be probably using denser meshes, I thought, wouldn’t some of the calculations performed in this script benefit from parallel calculation by GPU?

My goal is to obtain a suitably smooth texture (mask) marking the edges of the mesh, but even with a relatively small mesh (only 14k vertices) some of the calculations take a very long time. Unfortunately, this mesh should be even denser to obtain satisfactory results, and now the calculations take 36 seconds. With larger and denser meshes calculations could take long minutes if not hours…

As you can see in the screenshots, we have here several components doing remarkably parallel calculations. For example, a tree with 14k branches and 800 indices each. Distance, Power and Mass Addition components take 95% of the time and if I understand correctly the benefits of using GPU, they can be very visible here. Please write, is there a chance to replace these several seemingly simple components with their equivalents using GPU?

I also tried with the Hops component, which could divide the work between the CPU cores, but without success. It took 15 seconds before Hops even started solving.

Sorry if I got the whole concept wrong. I don’t need you to do anything but I just rather ask if this might be a good use case.

mesh_boundary_shade - triremesh.gh (152.6 KB)

maje90 · October 19, 2024, 11:41pm

From 30+ seconds to less than 1 second here.
This code is not even multithreaded.

mesh_boundary_shade - triremesh.gh (157.1 KB) (lock solver before opening)

The problem is passing in-and-out 10^7 different values, and for each of those the grasshopper components have to “cast” their type, or “check” if the cast is needed.
See in the screenshot: “Power” component handle generic data type, not just numbers.
It’s more like the grasshopper UI struggling here. … or something like that… (maybe i’m wrong)…

Anyway, just put all your relevant steps inside a single c# script and you’re done.
You can probably color your mesh all inside a single script and let it work realtime.

GPU computing might be needed with a much higher calculation amount.

TomTom · October 21, 2024, 12:11pm

All of these operations are very cheap for the CPU. As Riccardo basically wrote, the amount of data Grasshopper can handle-well is limited. If you attach a profiler, you could measure that the problem is inefficient data-management. GH is simply not designed with so much data in mind. You need to rewrite it entirely to solve this issue. And its not something to desire, since robustness is likely of higher value. As a rule of thumb, any datasets with more than 100k items are problematic and require workarounds.

seltzdesign · October 24, 2024, 12:19pm

That is how I understand it as well. It’s not the calculations, but all the casting and checking datatypes that is not very efficient.

I think the GPU is great, and we use it a lot for transformation calculations on up to around 10 mio. objects at once. There is a lot of mapping of data, reading textures and so on and for 1 million objects it takes around 12ms for all the GPU calculations. That way we can even render everything and still be above 60fps.

In Grasshopper there are so many other roadblocks, that if you just run some optimized code or simply combine a few things in C#, it is already so much faster by just avoiding some of the bottlenecks of Grasshopper.

Also remember that the slowest part of using the GPU is putting data in VRam and reading it again, so you want to do that as little as possible. Ideally you just upload data once and then use the GPU all the way.

Still super interesting stuff.

Topic		Replies	Views
GPU Support in Grasshopper Grasshopper	21	4452	February 9, 2024
Grasshopper Performance - Scripting Slower Than Native Components? Grasshopper windows , rhinocommon	17	3198	May 13, 2025
Why isn't grasshopper multithreaded? Grasshopper windows	32	3226	September 5, 2023
Grasshopper 2 Grasshopper Developer	95	19515	February 17, 2022
Performance cpu utilization 5ghz Grasshopper	22	7288	November 16, 2019

GPU powered grasshopper components

Related topics