Ghpythonlib paralell weird behavior (not paralelling the first thread)

Hey everyone,

when working with the ghpythonlib.paralell component I have found some weird behaviour. For me, the component does not run any threads paralell to the first thread. From the second Thread on, it perfectly parallels among my CPU power.

In the below example you can see, that no Thread is started in paralell to Thread 1, but after Thread1 has finished, there are always 5 thread running parallely.

Any help would be appreciated! Thank you!

I get the same printout (with minor changes each time after line 3). When I monitor the CPU usage on a more computationally intensive task I get the impression that my 4 cores are reaching full capacity at the same time but there is a lag between starting the script and all 4 going full speed so that could well be the time it takes for the first one to finish

“knock knock”
“race condition”

                           "who's there?"

Hi Graham,

thank you for your answer! It is nice, that you can confirm these results. I guess the minor changes after line 3 are exactly what we should expect; they even appear on my side if I recompute the script a few times. To be honest, I am not exactly sure, if I correctly understand your example. It would be awesome if you could explain a bit about it!

The component I am trying to paralell right now takes 5.2s and computes results for 9 branches. The result looks something like this:

Started Calculation on Branch 0
Finished Calculation on Branch 0
Started Calculation on Branch 1
Started Calculation on Branch 3
.......

Basically these are in line with the results from the firstFurther evidence, that the first Branch isn’t getting computed parallely. Computing one branch in my example takes ~700ms. Calculating all nine Branches right now takes 5.2 second, which means I got some improvements through paralleling. However, if I enter these lines of code into my calculation function

if branchNumber == 0:
    print "Cancelled Thread 0!"
    return

My calculation becomes exactly 700ms faster (exaclty the time it takes to calculate the first branch)! To my understanding this means, that nothing is getting computed paralelly to the first branch.

However, I am unsure if this is due to my bad usage of the component, due to a bug in the integration of the package, if it just is the behavior to be expected (for whatever reasons) or I am misinterpreting the results.

Thank you for any further advice on this!

That’s a nice workaround but not very satisfying in terms of understanding what’s going on.
The below script is a little old but if it stilll works then you could check whether it shows the same behaviour?

https://www.google.fr/amp/s/stevebaer.wordpress.com/2013/12/11/ghpython-node-in-code/amp/

Hmm maybe that’s not helpful as it seems to be running a very large number of threads so the time for the first one is insignificant

Hey,

I have already read Steve Baer’s blog post aswell. I will try out the example file though, to spot potential differences. Thank you!

Just an update from my side in case I get lucky and somebody finds this. I tested this in RH6 and the behavior is exactly the same.

Also, in practice you can sort of “fix” the impact of this bug(?) by including a dummy entry in your list and then skipping over that one in your calculations. Say you are having 10 entries that need to be calculated in your function calculateSolution(entry), that needs a minute to calculate each time. Then add a dummy in front of your list:
entries.insert(0, "dummy")
And then in your calculateSolution(entry) you just immediately return and waste no time on the first entry.
if(entry=='dummy') return

So that can potentially save you a minute. Anyways, this does not seem to be a proper solution and I am still looking for help on this!

David

I too see this behavior. I don’t care because I have a very large number of branches, but I do see the first one start and end before any other is started.

Thank you for confirming this behavior. Let’s see if anyone knows a solution or if it is going to be fixed.

For everyone who might come across the same issue some time:

I just looked at how ghpythonlib.parallel is integrated and found out, that this component was actually purposely built, to work exactly like I described; i.e. the first element of the list is supposed to be computed serial (first element runs exclusively, all other elements run truly parallel).

In the code this is justified with the following comment:

# Run first piece serial in case there is "set up" code in the function
# that needs to be done once. All other iterations are done parallel

I think for my use cases, this should not be an issue and I will probaly just take the code of the ghpythonlib.parallel component as a model to build my own component.

Potentially the GH-Team could include a fourth boolean parameter, so the user can decide whether the component should behave like this or not. On the other hand this would of course add extra complexity to the component and surely the intention here was to reduze the complexity as much as possible.

Also, if the list is really long, and filled with elements, that can be solved quickyly, this really doesn’t matter. However, for example for my case of a short list of long boolean difference operations, this can sometimes double the time needed for the calculation.

David

2 Likes