Here’s my first sketch and attempt to make that happen!
I’ve been working on a new plugin for #Grasshopper3d that allows you to run your definitions on a #GPU. The catch? You’ll need to build your Grasshopper graphs using GPU-optimized components instead of the native ones. For this early test, I used #OpenCL to bring some of Grasshopper’s functionality to the GPU.
In this demo, I’m sampling the cosine function with a certain amplitude, domain, and frequency, then averaging over these samples. As I increase the number of samples, the CPU version slows down, but the GPU stays lightning fast—like 140 times faster!
With 10 million samples, the GPU example finishes in just 45 milliseconds, while the CPU version takes about 6 seconds.
This is just the beginning, and I’m excited to share this as open source soon. Stay tuned for more updates!
as persons on linkedin would say: interesting
In all seriousness, really interested in the how!
@torabiarchitect It would be cool if for all the components
(also, can the C# and python script editor components also increase the speed and run on the gpu?)
Can we test it?
It is possible for components with feed forward computational graph carrying unmanaged value types . At the moment I don’t have any idea how I can make it work for script components.
Will share it on food for rhino as soon as the math library is completed, so at least you can put it in use
Very cool! I’m looking forward to see your updates on this!
Are you sure that this speed increase is not more related to the data piping, and not CPU vs GPU?
If I run simple C# code (Net 8), I can compute a similar equation in 20 ms on my medium spec PC on the CPU for 1 million samples. Probably most of it is more related to appending to the list. Also there no optimisation in this code. In my opinon most performance is lost with the boilerplate of handling component in- and output.
long start = Stopwatch.GetTimestamp();
double val = 0.0;
List<Point3d> points = new();
for (int i = 0; i < 1_000_000; i++)
{
double x = val * 1.6;
double y = Math.Sin(x + 32.2) * 6.25;
points.Add(new Point3d(x, y, 0));
val += 0.001;
}
double time = Stopwatch.GetElapsedTime(start).TotalMilliseconds;
Console.WriteLine($"It took {time} ms");
////
public struct Point3d
{
public double X;
public double Y;
public double Z;
public Point3d(double x, double y, double z)
{
X = x;
Y = y;
Z = z;
}
}
You have a point , some performance is gained because I am taking symbolic values from grasshopper not actual data . However your comparison is not fair here is why
- The duration is calculated based on 10 million not 1 million
- You have hardcoded the parameters that’s already optimized!
- You don’t have the overhead of grasshopper loading and unloading data which is big part of performance loss in my case
So the reality is that GPU takes actually much less time than what you see in grasshopper profiler
Hope that answer your question
First of all its always great to extend the capability of Grasshopper, and there is of course use-cases for computing something on the GPU. I could think it becomes more evident for 4x4 matrix operations etc.
I just saw that you are not piping a list of values, but instead only a single item. That concluded that you optimised the data-piping indirectly, which already gives a great performance boost in Grasshopper. I just wanted to show that simple arithmetic operations on the CPU are extremely fast. And with optimisation I mean more things like vectorization(SIMD), parallel computing, more appropriate data structures, lowering memory footprint etc.
Very interesting!
And really interested in testing how it works
thanks for your feedback, generally the GPUs are known to be good for “computationally intensive” works , which means a lot of arithmetic’s and less memory allocation. so obviously it is better to sort of “create” your data on the device rather than trying to load it and that’s why I used “vector” represented by single value objects as you noticed.
I saw it randomly on LinkedIn yesterday, i would like to see more.
You might also look at the the Grasshopper shader project that McNeel put out (GhGL Discussion - #9 by stevebaer) which is using GLSL, a lot of the examples are talking about texture shaders, but I’ve had some success running mathematical operations on large sets with it using the compute shader.
I also have to express that l feel the title here is kind of misleading. You’re not creating a way to run entire existing definitions on the GPU - you’re introducing a select set of GPU accelerated components that can be incorporated into one’s definition. Still very cool and interesting, potentially quite useful, but you might be inviting some frustration from folks expecting one thing and seeing another.
Super interesting stuff, thank you @torabiarchitect! We have another software project where we create many objects and calculate transformations, colors, etc. on all of them based on textures or 3D Buffers.
We converted everything that can be on the GPU to it, which is a huge chunk. It is insanely faster now.
Take this example from here: Bug: Displacement from Height Map only uses 8-bit precision - #16 by seltzdesign where we create displacement meshes from animated textures. It runs at around 100fps on an RTX4090, while it takes a few seconds to prepare it for rendering in Rhino. The resolution of the mesh in our software is the same as the resolution of the texture, so around 1-2 million vertices in this case. It shows the insane speed of the GPU.
Looking forward to more!
Hi @torabiarchitect I would be interested in hearing more, as I develop a component library called Horta which is pretty quick to calculate Space Group derived distributions of geometric objects over lattices. But as each unit cell is discrete, it’s a prime candidate for GPU computing and could start to do some really exciting things.
Thanks for the hard work on this to date, and into the future!
Hi Keyan, Thank you for the feedback! I understand the concern about the project’s scope and the potential for misunderstanding its capabilities. You’re right—the goal here isn’t to run an entire Grasshopper definition on the GPU. Instead, we’re focusing on developing a select set of GPU-accelerated components that can be integrated into your existing definitions.
The primary intention is to leverage GPU computing power for tasks that are computationally intensive, particularly those involving numerical methods, optimizations, and iterative processes. These are areas where the parallel processing capabilities of the GPU can provide significant performance improvements, especially when handling large datasets or complex operations.
Horta looks really cool , great work ! the feasibility of leveraging GPU acceleration really depends on the complexity of the geometry involved. For example, working with meshes is generally more straightforward on the GPU because they are more conducive to parallel processing. This makes operations like transforming and manipulating mesh data much faster.
On the other hand, operations involving Breps (Boundary Representations), such as creation and modification, are inherently more complex. These tasks often require more sequential processing and are not as easily translated to the GPU’s parallel architecture. So while there are definitely some areas where GPU acceleration can be incredibly effective, it’s important to consider these limitations based on the specific geometric operations you’re dealing with.
Here is one particular use case for GPU-optimized components. In most optimization problems, you need to evaluate the gradient of the cost function. If you cannot analytically provide the optimizer with a gradient function, the optimizer will resort to a numeric approach, which is much more computationally expensive.
By taking advantage of symbolic representations of the computation graph in Grasshopper, I was able to derive the partial differential of the function and build a kernel in OpenCL to evaluate the derivative on the GPU. I also included an option called “Explicit” for the df/dx component, which generates the same graph explicitly on the Grasshopper canvas.
If you look closely at the explicit derivative, you will see the reuse of components from the function definition. This means the underlying code is using caching techniques to take advantage of data already available in memory, similar to what you would do when optimizing your code!
Could this be greatly applied to mesh operations? it’s really slow in grasshopper.
It is certainly possible, but the performance you get depends on the type of operation you want to perform. Anything that can be translated to a ray tracing operation is generally fast, do you have a particular use case ? I can give it a try
Thanks, Ali.
I’m generally talking about mesh boolean operations. Rhino 8 mesh booleans in grasshopper are still painfully slow. It usually takes 300ms to 1.2s for my booleans (Intersections, Subtractions, additions etc.) to finish depending on mesh resolution. But I would think that it should be faster considering I’m only subtracting a few meshes from the main mesh.