GPU powered grasshopper components

finding mesh-mesh intersection can be parallelized as it is eventually finding face-face intersection. however in a Boolean operation after finding the intersection points and classifying them, the rest my not be easy to do on GPU, as it contains lots of branching . so my plan is to integrate mesh mesh intersection and that was actually the reason I started this Gpu optimized components

3 Likes

Wow, that is actually really fast. In the cases I have tried it, it usually takes a few seconds to a minute or so and then fails. :sweat_smile:

1 Like


Using GPU optimized component for machine learning could be another use case. in this example the data is created in Grasshopper, but the line is fitted using a simple gradient descent algorithm while the function evaluation and gradient computation is all done on GPU.

2 Likes

My cases are relatively simple. Iโ€™m doing some Road Daylight cuts and fills so Iโ€™m just subtracting some simple shape from a topographic mesh. Mine fails too especially if itโ€™s low resolution. Still, a few hundred milliseconds vs tens of milliseconds is big. Iโ€™m just envious at the instant booleans I see on research papers and on Youtube :smiley:

Yeah, I would be careful with those. They will usually only work for very specific use-cases or in the case of research papers will be quite a lot of work to set up.

I get that it would be nice if it was faster, but are you doing something where you need to see updates in real-time?

Honestly, if I had to choose I would much rather have more reliable booleans than faster ones.

1 Like

Quick update on this top, here is the performance comparison between GPU powered sectioning tool vs native Mesh-Plane intersection component in Rhino.
Rhino native sectioning tool (multi-threaded): Took 140 seconds to slice a high-density mesh with 1,000 planes.
My GPU-optimized component: Finished the same task in just 10 seconds, making it ๐Ÿ๐Ÿ’๐ฑ ๐Ÿ๐š๐ฌ๐ญ๐ž๐ซ!
But the performance gains donโ€™t stop there:
For 1 plane: Itโ€™s ๐Ÿ”๐Ÿ๐ฑ ๐Ÿ๐š๐ฌ๐ญ๐ž๐ซ!
For 10 planes: ๐Ÿ๐Ÿ’๐ฑ ๐Ÿ๐š๐ฌ๐ญ๐ž๐ซ!!
For 100 planes: ๐Ÿ๐Ÿ–๐ฑ ๐Ÿ๐š๐ฌ๐ญ๐ž๐ซ!
While performance scaling drops as the plane count increases, this is just the beginning. Iโ€™m working to eliminate some post-sectioning CPU processes and push more of the workload onto the GPUโ€”further reducing those times.

6 Likes

Did you write your own mesh-plane intersector for the GPU?

Hi Steve, yes , and I can tell you it is not the best implementation. Still working on it

I so have a use case for this. Looking forward to seeing a future release.

Cheers

DK

2 Likes

Another benchmark, this time checking whether a point is inside a closed polygon with 1600 points. the grid is 10201 points and all of that happens in 20 ms on my laptop with intel GPU. (an old surface pro)

Itโ€™s worth also comparing to a simpler script component

With the standard containment GH component, processing the list input parameter is what is taking most of the time there.
A simple script with a Parallel.For loop can find and output the points (from a 10201 point grid contained in a 1600 point polyline) in around 64ms, including the creation of the grid, the culling and outputting the points.


containment.gh (9.5 KB)

Once you include the time for the conversion to and from the vector type and the culling for a more direct comparison, the GPU version in your video looks like it is taking significantly longer than that.

This is not to say that geometric operations using a GPU library canโ€™t be much faster for some problems and scales, and the potential for more integration of such libraries in GH is exciting, but making like for like comparisons can be tricky.

3 Likes

you forgot that we are not using the same machines :slight_smile: Thanks Daniel for providing the code , Obviously you have a much better machine than me! your code runs 388 ms . if I ignore the data conversion from rhino point3d to double4 on openCL the code on โ€œMYโ€ GPU runs roughly 19 times faster. I am sure you get a better performance boost if you are using a gaming machine. The problem on my side comes to loading and unloading data and that is because of a decision that I made when I started writing the kernerl. I chose the double4 for homogeneous coordinates rather than double3. that means I cannot use same memory address for the points in rhino and need to convert them back and forth.

Did you recompute the solution with F5?
The very first time any script component runs it takes significantly longer.

yes of course , the same applies to GPU version, first run is always slower as the kernels are being compiled.

My point is that ignoring those data conversion steps means it is no longer a direct comparison if the conversion always has to happen when using it.
Finding realistic scenarios where a GPU computation offers real advantages over a CPU version is interesting, but we need to be careful about whether thatโ€™s what weโ€™re actually measuring (instead of comparing more about the data loading and conversion steps) before jumping to conclusions with the test numbers.
I do think there are likely to be some applications where a GPU version is much faster in a genuinely useful way, Iโ€™d guess maybe more for large mesh or pointcloud operations where multiple processing steps happen between the data conversions.

Youโ€™re absolutely right that we need to consider data conversion steps in these comparisons, as the total performance includes both computation and data movement. the decision to use GPU over CPU should depend on two key factors:
Arithmetic (or operational) intensity: The ratio of mathematical operations to memory transfer. For GPU computing to provide a real advantage, the task must involve enough computational work to outweigh the time spent transferring data between the CPU and GPU.
Parallelization: The task also needs to be highly parallelizable for the GPUโ€™s many cores to be effective. If parts of the process are inherently serial (non-parallelizable), this reduces the overall performance gain.( see the mesh slicing in above)

In some cases, tasks like large mesh point cloud operations can really benefit from the GPUโ€™s parallel processing as you mentioned. Each use case needs to be evaluated on its own, considering both data transfer overhead and the inherent parallelism of the task.

The reason I am ignoring the data transfer for now is that I see this in a bigger context where you donโ€™t need to unload the data from a method like point containment, this is probably something you will be using in a larger context data will be pushed to other components that are also running on GPU. a computational graph, large enough, to overcome the data latencyโ€ฆ

2 Likes