GPU Support in Grasshopper

vs

That’s what i meant.
We can code, indeed.

But the whole purpose of grasshopper is to not code.

Unity/Houdini and others can input/output mesh, vectors, numbers, color, etc … to high parallelized (even using GPU (an not only from Nvidia cartel)) methods/pipelines.

What about us?

I’m not ranting or else… just stating.

Also, this we are OT big time. :sweat_smile:

I’ll attach a last meaningful solution using all the tips from others…

2 Likes

I would be more than happy to type something up, I just need to understand what you are asking for. At this point in time I can’t figure out what you are requesting.

1 Like

This matter about using the GPU capabilities, in my opinion, is not only relevant for grasshopper.

The most “frustrating” case is clipping plane on rhino: we are able to see a real-time “solid operation” of a big mesh, but when we tell rhino to actually do a solid difference of that same mesh with a big plane, it takes much more time and/or fails.
As like as other mesh solid operations… should be all doable by the gpu.


Anyway, in grasshopper there is kangaroo, which is very nice!
We can program it with simple blocks, no code needed.

Here some examples of realtime particle simulations, webGL

We can more or less replicate all of those with kangaroo+grasshopper… but its’ slow.

I have no idea how kangaroo works as whole, but if its “engine” were to use the GPU as it happens in those web pages, it would speed up greatly! … not?
Zombie solver would become much more cooler to use!


Working with high amount of numbers and iterating seems perfect for the GPU structure (so they say…).
Smoothing a mesh? The simple laplacian thing on the connected vertexes… seems a GPU job!
A remesher? We have “RemeshByColour” and “SimpleRemesh” but they are just prone to fail and explode. Using them properly is really tricky and they are slow anyway (if with big meshes).
Doing the:
1 - split edge if longer than “a”;
2 - collapse edge if shorter than “b”;
3 - some step of laplacian smoothing;
4 - project point back to original mesh;
4 - goto 1
Should be something possible for the GPU… i suppose.


The pre-thread-split case:
The high amount of points/values “hurts” grasshopper, only the amount, not the actual operation.
In my solution I did take few objects (mesh + droplet points) and output a single object (the whole edited mesh), and it’s fast.
If I skip the mesh editing, and only outputs the distance values, it’s 500% slower.

So… some sort of “list object” could be handy!?
When the usual “DeMesh” > move vertices > “ConMesh” happens, the computation time is only a small % of what we actually wait.
If we could use some sort of “list objects” that don’t make grasshopper go panic and, even better, are feed-able to components that runs on the GPU… it would be gold!

When i have 10^5 points P and 10^5 vectors V, even doing P+V on a single dedicated component would be great… as much as the other small steps before, like calculating distance, math operators, etc etc.

… like an “Evaluate” (evaluate expression from string) that runs on GPU and/or with “list objects”!


Those are raw ideas… I think whenever grasshopper make a small step in this direction, everybody would start having more ideas and uses.

For now all I know is that we have kangaroo that sometime struggles with 1000-ish particles… and I can go, open the 97th chrome tab, load a site and execute on the GPU a real-time, 10^6 particle fluid simulation, interacting with cursor… while kangaroo is still converging.

I love rhino/grasshopper and all of you guys developing it
I don’t want to offend anyone or critic anyone’s work.

Sorry for the text wall. :sweat_smile:

5 Likes

Maybe I asked this the wrong way.
Are applications like Houdini and Unity actually returning modified meshes from operations on the GPU? I can understand meshes being input for operations that produce visual results, but I don’t see cases where modified meshes are being returned to the application for further downstream work.

The display the you get for clipping planes is not a mathematically correct representation of the solid difference. It is a pixelated result to the resolution of your screen.

3 Likes

It passed some years since i searched about OpenCL computing.
I remember finding some examples doing that, managing meshes and other geometries with the GPU. Like physic body collisions , cloth deformation, etc…
https://www.sidefx.com/docs/houdini/nodes/sop/filecache.html

Anyway, I strongly believe most of the things i wrote ^ are totally possible.
And developing in that direction shouldn’t depends on what other software have done or not.

1 Like

Hi @maje90
To try and address some of the points specific to Kangaroo-

I have been looking a bit at the AleaGPU library which could potentially allow me to use the GPU for parts of the Kangaroo solver. It probably involves rewriting quite a lot of code though - it’s not just a question of importing the library and flipping a switch.

I want to make sure that any changes I make still keep the modular nature and allow the creation of custom goals using C# and Python. As fun as it is to make flashy particle animations (where the solver only has to do one specific and limited thing fast and draw it to screen), Kangaroo is primarily intended as a useful and flexible tool for designing a wide range of things to be physically made.
Most of the solver’s time is spent in the Calculate method of the goals. Kangaroo is structured so that these goal calculations are all independent of each other, and they are called in parallel (just using the Task parallel library for CPU processing though, which gives only a very modest speedup).
Some of them are essentially doing a few simple pure vector math operations, but some have more general .net code, including calls to various RhinoCommon functions, so couldn’t easily be rewritten in a way suitable for GPU.
I think potentially it might be possible as a start to make a few specific goals that use the GPU - probably one for large collections of edge lengths, and one for sphere collisions.

One other reason Kangaroo gets slow for collisons involving thousands of particles is the broad-phase collision detection. When you have n particles, each of which might be colliding with any of the others, what you definitely want to avoid is doing the full pairwise collision calculation for all possible pairings.
So typically you divide or sort them in some way that you can quickly rule out a large proportion of these pairings, and then do the full collision check only for these. Kangaroo does use a simple broad phase algorithm in the SphereCollide and Collider goals, but this part definitely could be improved.
Getting it fast enough to run big realtime 3d fluid simulations is unlikely and not a priority, but I would like to at least make it easier to do things like fabric draping at a reasonable interactive speed for big meshes.

Here’s a Kangaroo demo that runs fairly smoothly with 10k particles (it’s a 1-to-many interaction rather than many-to-many, but I think shows that collision is probably the main issue)

As for the remeshing-
See my other reply here. I actually found and fixed a Plankton bug recently. I need to go through and make sure everything it affects is updated, but I’m fairly positive that the remesher can be made much more stable. It’s code I wrote a long time ago though, and whenever I look at it I get the urge to rewrite it completely from scratch to structure it in a more logical way. There’s still a lot I want to do with remeshing, making it more modular and customisable, better integration with Kangaroo, different relaxation options and more. This is something that I need to set aside a decent block of time to concentrate on though, and so far other priorities keep getting in the way.

About speed of remeshing - this is a fair bit harder to parallelize than particle physics, because although many topological updates to different parts of the mesh can be applied at each iteration, they are not independent - you don’t want 2 nearby edge operations trying to apply conflicting updates to the same neighbourhood, and with the mesh constantly changing there’s no easy way to split it into independent parts that can be updated in parallel.
Just calculating and caching all the edge lengths first would be easy to put in a parallel method, and I can try at least this for probably a modest speedup. The other heavy bit is probably the projection to the closest point on the target surface/mesh.

9 Likes

This is a very important part of Daniel’s reply. Code has to be rewritten to a different syntax and with completely different algorithms. Another unfortunate issue is that any code written is platform specific and would need to be done twice with slightly different syntax if we wanted to support GPU oriented calculations on both Windows and Mac.

Excuse me for my level of argumentations. I’m an outsider in these matters and probably sound bothering/infantile to you :laughing:

I’m aware of that. I understand, what I ask is not something easy and “painless”.

But… (i’ll take a long ride now)
GPUs are developed mainly by the “pull” of the videogame industry.
Videogame industry is bigger than the double of film and music industry together.
There are a lot of people already coding very complex and interesting stuff.
Things like DirectX, OpenGL/OpenCL, CUDA, RTX (for the glory of Nvidia cartel, i must mention those too), etc…
Videogames just use that many hardcoded stuff. (they have big bidget, sure…)

And… webpages too! In 3 seconds a 100kB .js load and “shoots between the eye” a task to the GPU and start executing!
Be it a 2006 laptop, a rasberry or a smartphone… WebGL (that is OpenGL) just works!


CAD industry:
everybody create its own code “from scratch” and pack it layer by layer, year by year… and so on.
… Is that so?

Can’t we have advantage of that huge amount of math/methods/power in the GPU?
GPU are no longer a one-way, pc-to-monitor, hardware unit since… a lot of time.
They create a video signal… but they can also elaborate other things and “send back” the result.

I think it’s not a matter of “if”, but more of “when”.

Anyway, I understood you guys.
Thank you for your exhaustive answers and your time.

I’m using your work almost every day. I’m totally thankful for what I already have!

2 Likes

Wanted to weight in, can you reference the Alea dll from a C# scripting component? I haven’t tried to do that but I’ve used Cloo in the past for compiled plugins and it works great.

There is support for shader coding already so one hack around would be to call glReadPixels and get the results back to the host (CPU). But a proper compute shader approach would be ideal as you have much less restrictions as to the flexibility of the code, doing stuff on a fragment shader is complex, same goes for the vertex shader, as you are very restricted.

Related

I’m also experimenting with GPU computation in Grasshopper recently to develop a plugin using the ILGPU library, because it has the easiest syntax and can integrate directly in C# without having to write any CUDA code. But I’ve encountered some limitations. (ManagedCUDA probably also works, but I personally haven’t tried it yet. However there was a research paper: tOpos: GPGPU Accelerated Structural Optimisation Utility for Architects discussed the use of Manage CUDA library in grasshopper context.)

The process involves copying data from the CPU to the GPU and back for each iteration. This copying adds overhead, which can make GPU processing slower than using the CPU for small datasets.

Another thing is that GPUs cannot handle dynamic memory allocation, this restricts the size of the output to be fixed. Meaning you cannot have unpredictable outputs.

Moreover, GPUs are limited to primitive data types such as int , float , double , bytes , and structs for organizing these types into structures. This means types from the RhinoCommon library cannot be directly used, so you have to reimplement al these data and methods types using only primitives and struct.

That been said, it is not impossible to implement, it’s just that it might not suitable for every of situations.

1 Like

You might want to look at vvvv with FUSE. It’s real-time visual coding for C# (vvvv) and FUSE does the same, but for the GPU. You can also just write your own or adapt existing shader code. ChatGPT incidentally is also pretty good at writing/editing shader code, since it is actually super simple code.

I have been using both GH and vvvv for many years and Grasshopper is great for it’s access to all the advanced Rhino functions. If you want real-time speed, you will never get it with Grasshopper.

With vvvv and FUSE we are running simulations with well over 1 million particles rendered at 4K at >60fps. We even created our own import/export of .3dm as point clouds or instanced blocks.

I’m not saying it’s easy, but if you want to properly utilize the GPU I am not sure if starting with a very slow system (for good reasons) like Grasshopper is a good approach right now.

That said I wish that performance of GH would be much higher and could better utilize the immense power of modern CPUs and GPUs.

2 Likes

Wow, this is incredible! Can you use Rhino functions with vvvv? Or you kinda have to import/export into rhino and vvvv? Either way this is a great tool for GPU simulation.

No, you cannot use Rhino functions. We are integrating Rhino’s open source implementation of the .3dm file format, called OpenNurbs: GitHub - mcneel/opennurbs: OpenNURBS libraries allow anyone to read and write the 3DM file format without the need for Rhino..

You can find an implementation for VL here: GitHub - wolfmoritzcramer/VL.Rhino.3dm: VL Library to access Rhino *.3dm Files
We are also using our own implementation with some more features, but it’s not open source unfortunately.

It does have a few basic functions that you can use, like evaluating a curve or surface, reading and writing basically anything you can put in a .3dm file, but not actual geometric operations from Rhino.

2 Likes

This is extremely helpful! Thank you so much for providing this info.

Never say never.
Over time I’ve learnt to do complex tasks directly fully inside c# scripts, multithreaded on CPU.
They are already real-time. Or at least often, where I need it…
Ok, a script is not actually vanilla Grasshopper…

Grasshopper struggles to handle big lists, as it seems the amount of items flowing is making the UI heavier (or something like that, see here)…
I would expect this kind of problems will be handled differently in GH2 …
… so, in my scripts input and outputs are always simple “container” objects like a mesh, a point cloud, etc… not a big list.


Anyway, thanks guys for keeping this thread interesting!

I’ll definitively study those!

2 Likes

Threading on CPU is still way less powerful than on GPU when we’re talking about simulating milions of particles. Which is why vvvv is super interesting.

Although if you want to stick with Rhino grasshopper, consider using ILGPU. But be prepare to refactor a lot of your code into numeric primitives, and consider RhinoCommon library unusable inside the kernel.

That is true and vvvv is indeed a lot of fun and has lots of power, although it does have a bit of a learning curve at the beginning (there are tons of learning resources though). In the end vvvv itself is just .NET C#, but visual. So if you are at all familiar with C#, vvvv will be pretty straight forward. The bonus is that you can basically use any .NET nuget out there to extend it.

All the GPU power is in FUSE or custom shaders.

But there are of course a lot of constraints that come with GPU processing, mostly down to the fact that you can have a lot of particles for example, but they are all pretty much “on their own”. A lot of interesting things happen when looking at a group or all particles together, which is either not trivial and often not at all possible on the GPU.

The combination of CPU and GPU is really powerful though.

Absolutely, it really depends and I have also done some things that approach real-time in GH. I meant more in the way they are designed, where vvvv is basically always running and uses a Game Engine to be able to render things in real-time, while Grasshopper has more of a run-once approach and rendering pipeline.

2 Likes

Yeah, this is ture. But the development of custom node is starightforward and extremely easy if you know any c# (or just use ChatGPT), not much boilerplates to write. You get to use Visual Studio, nuget and get to update node in realtime.

Ah, typical parallelization issue. This issue pretty much exists for GPU regardless of what framework you’re using. It is very situational dependend. It cannot handles sequential task that depends on previous iterations values, or access values calculated on other threads.

Are you sure?
I often see examples of diffuse reaction algorithms where it precisely use multi threading on GPU to solve each cell simultaneously and… It IS an iterative program…


… Like this one…

:upside_down_face:?