Ghpython - Parallel Component Slower than Native Ghopper Module

Hi Everyone!

I’m trying to write a parallel Ghpython component that could solve a computationally heavy MeshRay function, but I keep getting slower results with it than with the native Grasshopper MeshRay module.

57ms for the Grasshopper module, 2.4 seconds for the Python component.

My computer has 24 cores! It must be something in the way I’ve written the function. I’ve attached the file that I’m testing with. Can someone please run it and advise on why this might be happening? I’m at a loss…

ghpython_MT_JS.gh (20.9 KB)

Hi,
I’m experiencing something really simillar. When I run MeshRay directly in Grasshopper, it takes approximately 50ms for 2500 repetitions. But when I run it using ghpythonlib components (MeshXRay) in Rhino Python Editor it takes 120 ms for 1 repetition!

Do anybody know, where could be the problem?

Tomas

While you’re waiting for an answer here have a read of what was said on the GH Forum about Parallel Python Components being slower

Thanks Danny,
but I’m experiencing this even without using parallel component. The problem should be somewher in the MeshXRay I guess.

Tomas

Tomas,

I’ve been over it many times since I initially posted, and can’t seem to find any reason why it performs so poorly compared to the native Grasshopper component. I’ve also tried with the Occlusion function with similar results.

It seemed to me as though MeshXRay would be a perfect candidate for parallelization…

Does anyone have any insight? It would be very much appreciated!

James

Hello everyone,

occasionally somebody posts this kind of results, and I’ve noticed that very often this has to do with speed-up structures that are used to make finding points faster.
For example, Gwyll, in this discussion, was experiencing exactly this.

I will have a look at this case now - and see if I discover something.
Thanks,

Giulio

Giulio Piacentino
for Robert McNeel & Associates
giulio@mcneel.com

Here is what I think I found:

  • the larger part of this performance degradation is due to the cost of mimicking Grasshopper functionality, I presume due to creating lots and lots of components.
  • if we use RhinoCommon for this operations, we start saving lots of time. We still do not get down to the same operational time as the “native” version, at least not if we do not have really a lot of meshes and really a lot of cores to use.
  • the extra time is due to making data structures look GH-specific (creating GH_DataTree, or GH_Structure) after parallel execution.
  • I think that, by fiddling more with this, we could get to similar timings as in the “native” version. Probably, we would have to use GH_Structure to avoid having Grasshopper making copies for us, or use some other tricks similar to this.
  • the cost of parallelizing this operation seems to outperform the benefits so far.
  • we might want to try to construct DataTree from several threads, but I do not think this structure is thread-safe.

So, all in all, this has been fun so far. I hope you guys can make something out of this, and with some more tweaks, make it even faster :slight_smile: The best environment to make this really, really fast would in my opinion be a C# compiled component, though, given how things are set-up in Grasshopper.

It would be best to test for bottlenecks on a system with truly many cores, and with really many meshes (and meshes > 2*number of cores, of course).

Giulio

Giulio Piacentino
for Robert McNeel & Associates
giulio@mcneel.com

ghpython_2.gh (17.8 KB)

It is true that datatrees are not threadsafe, and they still aren’t threadsafe in GH2 (it’s actually the first class I’ve redesigned).
I’m not sure what it would take to make them fully threadsafe, and how those changes would impact regular performance.

However in the case of threaded python component calls I can think of 3 other things that probably cause much more delay than making shallow copies of some data structures, but a lesson I learned the hard way is that performance bottlenecks are rarely where you expect them to be and that this ought to be properly profiled before we start typing up optimizations.


David Rutten
david@mcneel.com

Ah… Rutten’s rule #2 - profile first, optimize second

1 Like

Yeah, I’ve profiled, like always. The findings above come from that.

Rebuilding the DataTree class takes about half the time of the whole script on my machine. Is there something like SetBranch() on DataTree that would not copy the whole data? AddRange is the only real method that is called in that part. Also, Removing one of the two AddRange give 1/4 of the performance back. So, I’m pretty sure, there we have the bad guy… in this case.

Thanks,


Giulio Piacentino
for Robert McNeel & Associates
giulio@mcneel.com

There’s probably about 5 things that currently happen when using a GH component from python that don’t actually need to happen. Continually constructing components anew and continually creating new documents for them to reside in only to destroy those instances a moment later would be one thing that can be tuned up. Copying data trees is also unnecessary (especially if a deep copy is involved anywhere, I don’t know if it is).

Since we’re planning to move the ability to access GH components as method calls into the core for GH2 so that they can be used from C# and VB as well, I assume this can be properly designed the second time around. Also, if GH2 components support multi-threading by themselves then there is no longer a reason to have a multi-threading option on the caller side.


David Rutten
david@mcneel.com

.[quote=“DavidRutten, post:11, topic:6360”]
if GH2 components support multi-threading by themselves then there is no longer a reason to have a multi-threading option
[/quote]

Yeah, definitely. I was just trying to give an answer to this contingent question. The sample submitted by me does not involve ghpythonlib (so no components creation) and does not deep copy. However, Point3d is a struct, so it will be copied anyways, unless we would find a way to set the whole list/array at once, or to create if from the multiple threads.

I’ve got some prototype code working with current specific focus on calling MeshXRay as a function from python that doesn’t involve all of the overhead that occurs when we first put this together David. I’m getting performance back down to numbers similar to direct usage of the Mesh|Ray component.

Even though I’m focusing on the MeshRay “function”, the code can be expanded to encompass most components out there.

I’ll try to remember to show this to you when we get together in London.

1 Like

Hi guys,
I’m just curious if there is any progress with the MeshXRay function and its performance?

Thanks,
Tomas