Ghpython - Parallel Component Slower than Native Ghopper Module

jamesrsherman · February 27, 2014, 8:57am

Hi Everyone!

I’m trying to write a parallel Ghpython component that could solve a computationally heavy MeshRay function, but I keep getting slower results with it than with the native Grasshopper MeshRay module.

57ms for the Grasshopper module, 2.4 seconds for the Python component.

My computer has 24 cores! It must be something in the way I’ve written the function. I’ve attached the file that I’m testing with. Can someone please run it and advise on why this might be happening? I’m at a loss…

ghpython_MT_JS.gh (20.9 KB)

frenezulo · March 24, 2014, 10:52am

Hi,
I’m experiencing something really simillar. When I run MeshRay directly in Grasshopper, it takes approximately 50ms for 2500 repetitions. But when I run it using ghpythonlib components (MeshXRay) in Rhino Python Editor it takes 120 ms for 1 repetition!

Do anybody know, where could be the problem?

Tomas

dannyboyesiom · March 24, 2014, 11:05am

While you’re waiting for an answer here have a read of what was said on the GH Forum about Parallel Python Components being slower

frenezulo · March 24, 2014, 11:18am

Thanks Danny,
but I’m experiencing this even without using parallel component. The problem should be somewher in the MeshXRay I guess.

Tomas

jamesrsherman · March 24, 2014, 12:17pm

Tomas,

I’ve been over it many times since I initially posted, and can’t seem to find any reason why it performs so poorly compared to the native Grasshopper component. I’ve also tried with the Occlusion function with similar results.

It seemed to me as though MeshXRay would be a perfect candidate for parallelization…

Does anyone have any insight? It would be very much appreciated!

James

piac · March 25, 2014, 8:49am

Hello everyone,

occasionally somebody posts this kind of results, and I’ve noticed that very often this has to do with speed-up structures that are used to make finding points faster.
For example, Gwyll, in this discussion, was experiencing exactly this.

I will have a look at this case now - and see if I discover something.
Thanks,

Giulio

Giulio Piacentino
for Robert McNeel & Associates
giulio@mcneel.com

piac · March 25, 2014, 2:00pm

Here is what I think I found:

the larger part of this performance degradation is due to the cost of mimicking Grasshopper functionality, I presume due to creating lots and lots of components.
if we use RhinoCommon for this operations, we start saving lots of time. We still do not get down to the same operational time as the “native” version, at least not if we do not have really a lot of meshes and really a lot of cores to use.
the extra time is due to making data structures look GH-specific (creating GH_DataTree, or GH_Structure) after parallel execution.
I think that, by fiddling more with this, we could get to similar timings as in the “native” version. Probably, we would have to use GH_Structure to avoid having Grasshopper making copies for us, or use some other tricks similar to this.
the cost of parallelizing this operation seems to outperform the benefits so far.
we might want to try to construct DataTree from several threads, but I do not think this structure is thread-safe.

So, all in all, this has been fun so far. I hope you guys can make something out of this, and with some more tweaks, make it even faster The best environment to make this really, really fast would in my opinion be a C# compiled component, though, given how things are set-up in Grasshopper.

It would be best to test for bottlenecks on a system with truly many cores, and with really many meshes (and meshes > 2*number of cores, of course).

Giulio

Giulio Piacentino
for Robert McNeel & Associates
giulio@mcneel.com

ghpython_2.gh (17.8 KB)

DavidRutten · March 27, 2014, 10:11pm

It is true that datatrees are not threadsafe, and they still aren’t threadsafe in GH2 (it’s actually the first class I’ve redesigned).
I’m not sure what it would take to make them fully threadsafe, and how those changes would impact regular performance.

However in the case of threaded python component calls I can think of 3 other things that probably cause much more delay than making shallow copies of some data structures, but a lesson I learned the hard way is that performance bottlenecks are rarely where you expect them to be and that this ought to be properly profiled before we start typing up optimizations.

–
David Rutten
david@mcneel.com

dannyboyesiom · March 27, 2014, 10:23pm

Ah… Rutten’s rule #2 - profile first, optimize second

piac · March 28, 2014, 9:10am

Yeah, I’ve profiled, like always. The findings above come from that.

Rebuilding the DataTree class takes about half the time of the whole script on my machine. Is there something like SetBranch() on DataTree that would not copy the whole data? AddRange is the only real method that is called in that part. Also, Removing one of the two AddRange give 1/4 of the performance back. So, I’m pretty sure, there we have the bad guy… in this case.

Thanks,

–
Giulio Piacentino
for Robert McNeel & Associates
giulio@mcneel.com

DavidRutten · March 28, 2014, 2:37pm

There’s probably about 5 things that currently happen when using a GH component from python that don’t actually need to happen. Continually constructing components anew and continually creating new documents for them to reside in only to destroy those instances a moment later would be one thing that can be tuned up. Copying data trees is also unnecessary (especially if a deep copy is involved anywhere, I don’t know if it is).

Since we’re planning to move the ability to access GH components as method calls into the core for GH2 so that they can be used from C# and VB as well, I assume this can be properly designed the second time around. Also, if GH2 components support multi-threading by themselves then there is no longer a reason to have a multi-threading option on the caller side.

–
David Rutten
david@mcneel.com

piac · March 28, 2014, 5:41pm

.[quote=“DavidRutten, post:11, topic:6360”]
if GH2 components support multi-threading by themselves then there is no longer a reason to have a multi-threading option
[/quote]

Yeah, definitely. I was just trying to give an answer to this contingent question. The sample submitted by me does not involve ghpythonlib (so no components creation) and does not deep copy. However, Point3d is a struct, so it will be copied anyways, unless we would find a way to set the whole list/array at once, or to create if from the multiple threads.

stevebaer · March 31, 2014, 11:41pm

I’ve got some prototype code working with current specific focus on calling MeshXRay as a function from python that doesn’t involve all of the overhead that occurs when we first put this together David. I’m getting performance back down to numbers similar to direct usage of the Mesh|Ray component.

Even though I’m focusing on the MeshRay “function”, the code can be expanded to encompass most components out there.

I’ll try to remember to show this to you when we get together in London.

frenezulo · June 10, 2014, 8:39am

Hi guys,
I’m just curious if there is any progress with the MeshXRay function and its performance?

Thanks,
Tomas

Topic		Replies	Views
Performance improvement of a GH_Python script Grasshopper windows , ghpython , python , performance	4	1282	October 11, 2021
Grasshopper Performance - Scripting Slower Than Native Components? Grasshopper windows , rhinocommon	11	2675	April 30, 2021
New version of ghpythonlib.components Serengeti (Rhino WIP) ghpythonlib , grasshopper , python	20	35209	November 10, 2016
Computational speed Python vs. Cluster Grasshopper windows	5	2206	July 28, 2021
Mesh Component List - Slower in C-Sharp and Python Scripting rhinocommon , grasshopper	4	604	May 13, 2018

Ghpython - Parallel Component Slower than Native Ghopper Module

Giulio

Giulio

Related topics