Hi everyone,
I’m revisiting this fascinating discussion on GPU acceleration in Grasshopper after some delay, as I’ve been working on designing a data structure that mimics Grasshopper’s tree-like structure but optimized for GPU devices. I call it “Segmentized Vectors,” which is essentially a flat array of unmanaged data accompanied by a set of integer arrays that indicate the start of each sub-array (segment). This technique enables operations on vectors in a manner similar to Grasshopper’s data trees, supporting functionalities such as Grafting, Flattening, and Cross Referencing, all while optimizing for GPU execution. The plugin is nearly complete and, while still in its early stages, demonstrates that high performance can be achieved without requiring any coding. Despite the overheads of deploying a computation graph on the GPU, it often outperforms C# code, especially for tasks that are highly parallelizable. This specific case involves performing a reduced sum (mass addition) on a segmented vector. The task is inherently complex, as it requires block-wise scan operations that necessitate multiple kernel launches. Given the limitations of my laptop GPU, which supports only 256 work groups, the sum of 10 million numbers will require at least three kernel launches. With each launch, the sum is progressively reduced by a factor of 256, reflecting the hierarchical nature of the computation.
The Problem
@Czaja mentioned calculations taking around 36 seconds for a relatively small mesh of 14k vertices. When scaled to denser meshes, the calculation time could extend to minutes or even hours. The key bottleneck, as highlighted by Riccardo and others, is not the computational cost but the overhead of Grasshopper’s type casting and inefficient data handling. By encapsulating all operations within a single C# component, the overhead associated with Grasshopper’s type casting, data conversion, and graph propagation is significantly reduced. However, this approach bypasses the fundamental advantage of a node-based, zero-code programming interface, which caters to users who either prefer not to code or lack the knowledge to do so.
The GPU Solution
This is where GPU computing shines. Over the past few months, I’ve developed a set of GPU-powered components specifically tailored for Grasshopper workflows. By leveraging the computational power of GPUs, we effectively offset the inherent overhead of using a node-based interface, ensuring both performance gains and user accessibility without sacrificing the intuitive, visual programming experience. Below is a brief summary of my findings:
- Graft, Match , Flatten:
- This approach mimics the original solution proposed by Jakub but is not optimized for GPU usage due to unnecessary data manipulations across separate components. While it is 30x faster than the original Grasshopper solution, it still falls short of the performance achieved by a C# script.
- The main bottleneck lies in the Match component, which takes approximately 800 ms—nearly half of the computation time. This component essentially replicates the data dispatching process that Grasshopper performs behind the scenes.
- Using Cross-reference in Binary Map
- This method eliminates the need for the Match component by directly performing a cross-reference operation within the Distance component.
- As a result, it achieves double the speed of C# code and is approximately 60 times faster than the original Grasshopper implementation.
- GPU-Specialized Component:
- Whenever possible, it is advantageous to use GPU-optimized functions, such as inverse distance, which combine multiple operations into one.
- By replacing two components with a single optimized one, memory latency is significantly reduced, achieving a performance boost of 3 times compared to C# code and 90 times compared to Grasshopper.
- Using Floats When Possible:
- Although most modern GPUs support double-precision (64-bit) operations, 32-bit floats are generally faster and more efficient.
- On my machine, switching to 32-bit floats resulted in a 2x speedup, making the script run 6 times faster than C# code and 180 times faster than Grasshopper.
Takeaways
- Grasshopper’s Overhead: As @maje90 and @TomTom mentioned, Grasshopper’s data management limits its ability to handle large datasets efficiently. Consolidating operations into scripts or plugins is a must for performance-critical tasks.
- GPU Benefits: While GPUs involve additional complexity, the performance gains can be immense for highly parallelizable tasks. The key is to minimize data transfer and fully utilize the GPU’s parallel processing capabilities.
- Grasshopper as a UI: In workflows requiring dense datasets, Grasshopper can serve as a high-level interface, with heavy lifting done in scripts or external components (CPU or GPU).
Closing Thoughts
I hope this adds to the already rich discussion here. GPU acceleration might not always be the answer, but when combined with thoughtful data management, it opens up new possibilities for handling dense meshes and large datasets within Grasshopper. If anyone is interested, I’d be happy to share more about the implementation details or collaborate on further optimizations.
Looking forward to hearing your thoughts!
Best regards, Ali Torabi