GPU Support in Grasshopper


That’s what i meant.
We can code, indeed.

But the whole purpose of grasshopper is to not code.

Unity/Houdini and others can input/output mesh, vectors, numbers, color, etc … to high parallelized (even using GPU (an not only from Nvidia cartel)) methods/pipelines.

What about us?

I’m not ranting or else… just stating.

Also, this we are OT big time. :sweat_smile:

I’ll attach a last meaningful solution using all the tips from others…


I would be more than happy to type something up, I just need to understand what you are asking for. At this point in time I can’t figure out what you are requesting.

1 Like

This matter about using the GPU capabilities, in my opinion, is not only relevant for grasshopper.

The most “frustrating” case is clipping plane on rhino: we are able to see a real-time “solid operation” of a big mesh, but when we tell rhino to actually do a solid difference of that same mesh with a big plane, it takes much more time and/or fails.
As like as other mesh solid operations… should be all doable by the gpu.

Anyway, in grasshopper there is kangaroo, which is very nice!
We can program it with simple blocks, no code needed.

Here some examples of realtime particle simulations, webGL

We can more or less replicate all of those with kangaroo+grasshopper… but its’ slow.

I have no idea how kangaroo works as whole, but if its “engine” were to use the GPU as it happens in those web pages, it would speed up greatly! … not?
Zombie solver would become much more cooler to use!

Working with high amount of numbers and iterating seems perfect for the GPU structure (so they say…).
Smoothing a mesh? The simple laplacian thing on the connected vertexes… seems a GPU job!
A remesher? We have “RemeshByColour” and “SimpleRemesh” but they are just prone to fail and explode. Using them properly is really tricky and they are slow anyway (if with big meshes).
Doing the:
1 - split edge if longer than “a”;
2 - collapse edge if shorter than “b”;
3 - some step of laplacian smoothing;
4 - project point back to original mesh;
4 - goto 1
Should be something possible for the GPU… i suppose.

The pre-thread-split case:
The high amount of points/values “hurts” grasshopper, only the amount, not the actual operation.
In my solution I did take few objects (mesh + droplet points) and output a single object (the whole edited mesh), and it’s fast.
If I skip the mesh editing, and only outputs the distance values, it’s 500% slower.

So… some sort of “list object” could be handy!?
When the usual “DeMesh” > move vertices > “ConMesh” happens, the computation time is only a small % of what we actually wait.
If we could use some sort of “list objects” that don’t make grasshopper go panic and, even better, are feed-able to components that runs on the GPU… it would be gold!

When i have 10^5 points P and 10^5 vectors V, even doing P+V on a single dedicated component would be great… as much as the other small steps before, like calculating distance, math operators, etc etc.

… like an “Evaluate” (evaluate expression from string) that runs on GPU and/or with “list objects”!

Those are raw ideas… I think whenever grasshopper make a small step in this direction, everybody would start having more ideas and uses.

For now all I know is that we have kangaroo that sometime struggles with 1000-ish particles… and I can go, open the 97th chrome tab, load a site and execute on the GPU a real-time, 10^6 particle fluid simulation, interacting with cursor… while kangaroo is still converging.

I love rhino/grasshopper and all of you guys developing it
I don’t want to offend anyone or critic anyone’s work.

Sorry for the text wall. :sweat_smile:


Maybe I asked this the wrong way.
Are applications like Houdini and Unity actually returning modified meshes from operations on the GPU? I can understand meshes being input for operations that produce visual results, but I don’t see cases where modified meshes are being returned to the application for further downstream work.

The display the you get for clipping planes is not a mathematically correct representation of the solid difference. It is a pixelated result to the resolution of your screen.


It passed some years since i searched about OpenCL computing.
I remember finding some examples doing that, managing meshes and other geometries with the GPU. Like physic body collisions , cloth deformation, etc…

Anyway, I strongly believe most of the things i wrote ^ are totally possible.
And developing in that direction shouldn’t depends on what other software have done or not.

1 Like

Hi @maje90
To try and address some of the points specific to Kangaroo-

I have been looking a bit at the AleaGPU library which could potentially allow me to use the GPU for parts of the Kangaroo solver. It probably involves rewriting quite a lot of code though - it’s not just a question of importing the library and flipping a switch.

I want to make sure that any changes I make still keep the modular nature and allow the creation of custom goals using C# and Python. As fun as it is to make flashy particle animations (where the solver only has to do one specific and limited thing fast and draw it to screen), Kangaroo is primarily intended as a useful and flexible tool for designing a wide range of things to be physically made.
Most of the solver’s time is spent in the Calculate method of the goals. Kangaroo is structured so that these goal calculations are all independent of each other, and they are called in parallel (just using the Task parallel library for CPU processing though, which gives only a very modest speedup).
Some of them are essentially doing a few simple pure vector math operations, but some have more general .net code, including calls to various RhinoCommon functions, so couldn’t easily be rewritten in a way suitable for GPU.
I think potentially it might be possible as a start to make a few specific goals that use the GPU - probably one for large collections of edge lengths, and one for sphere collisions.

One other reason Kangaroo gets slow for collisons involving thousands of particles is the broad-phase collision detection. When you have n particles, each of which might be colliding with any of the others, what you definitely want to avoid is doing the full pairwise collision calculation for all possible pairings.
So typically you divide or sort them in some way that you can quickly rule out a large proportion of these pairings, and then do the full collision check only for these. Kangaroo does use a simple broad phase algorithm in the SphereCollide and Collider goals, but this part definitely could be improved.
Getting it fast enough to run big realtime 3d fluid simulations is unlikely and not a priority, but I would like to at least make it easier to do things like fabric draping at a reasonable interactive speed for big meshes.

Here’s a Kangaroo demo that runs fairly smoothly with 10k particles (it’s a 1-to-many interaction rather than many-to-many, but I think shows that collision is probably the main issue)

As for the remeshing-
See my other reply here. I actually found and fixed a Plankton bug recently. I need to go through and make sure everything it affects is updated, but I’m fairly positive that the remesher can be made much more stable. It’s code I wrote a long time ago though, and whenever I look at it I get the urge to rewrite it completely from scratch to structure it in a more logical way. There’s still a lot I want to do with remeshing, making it more modular and customisable, better integration with Kangaroo, different relaxation options and more. This is something that I need to set aside a decent block of time to concentrate on though, and so far other priorities keep getting in the way.

About speed of remeshing - this is a fair bit harder to parallelize than particle physics, because although many topological updates to different parts of the mesh can be applied at each iteration, they are not independent - you don’t want 2 nearby edge operations trying to apply conflicting updates to the same neighbourhood, and with the mesh constantly changing there’s no easy way to split it into independent parts that can be updated in parallel.
Just calculating and caching all the edge lengths first would be easy to put in a parallel method, and I can try at least this for probably a modest speedup. The other heavy bit is probably the projection to the closest point on the target surface/mesh.


This is a very important part of Daniel’s reply. Code has to be rewritten to a different syntax and with completely different algorithms. Another unfortunate issue is that any code written is platform specific and would need to be done twice with slightly different syntax if we wanted to support GPU oriented calculations on both Windows and Mac.

Excuse me for my level of argumentations. I’m an outsider in these matters and probably sound bothering/infantile to you :laughing:

I’m aware of that. I understand, what I ask is not something easy and “painless”.

But… (i’ll take a long ride now)
GPUs are developed mainly by the “pull” of the videogame industry.
Videogame industry is bigger than the double of film and music industry together.
There are a lot of people already coding very complex and interesting stuff.
Things like DirectX, OpenGL/OpenCL, CUDA, RTX (for the glory of Nvidia cartel, i must mention those too), etc…
Videogames just use that many hardcoded stuff. (they have big bidget, sure…)

And… webpages too! In 3 seconds a 100kB .js load and “shoots between the eye” a task to the GPU and start executing!
Be it a 2006 laptop, a rasberry or a smartphone… WebGL (that is OpenGL) just works!

CAD industry:
everybody create its own code “from scratch” and pack it layer by layer, year by year… and so on.
… Is that so?

Can’t we have advantage of that huge amount of math/methods/power in the GPU?
GPU are no longer a one-way, pc-to-monitor, hardware unit since… a lot of time.
They create a video signal… but they can also elaborate other things and “send back” the result.

I think it’s not a matter of “if”, but more of “when”.

Anyway, I understood you guys.
Thank you for your exhaustive answers and your time.

I’m using your work almost every day. I’m totally thankful for what I already have!

Wanted to weight in, can you reference the Alea dll from a C# scripting component? I haven’t tried to do that but I’ve used Cloo in the past for compiled plugins and it works great.

There is support for shader coding already so one hack around would be to call glReadPixels and get the results back to the host (CPU). But a proper compute shader approach would be ideal as you have much less restrictions as to the flexibility of the code, doing stuff on a fragment shader is complex, same goes for the vertex shader, as you are very restricted.