Grasshopper Python Parallel Implementation

Could someone give an overview of how the parallels module in pythongh is implemented? I have many questions, which I think anyone trying to speed up their code would want to know before spending time.

For example, is the code written in the component copied, and then python multiprocessing is being used in the background (I’m assuming its not multithreading)? Is only the function that is being called copied? How would it know what other functions to call, if that one was also using functions?

Is anything in Rhino or rhinoscriptsyntax able to be used in parallel, or are there only certain functions? If so, is there a list of them?

If it uses something like multiprocessing module, what is the overhead for spawning each child? Is it just the starting of a python instance, or are there more items that need to be created related to grasshopper?

The parallel module uses .NET’s parallel.foreach routine to perform computations on multiple threads

No, only the Rhino.Geometry namespace of RhinoCommon is intended for thread safe usage
https://developer.rhino3d.com/api/RhinoCommon/html/N_Rhino_Geometry.htm

Although the original question was on python, I am also curious on c# or the c++ api if it is different…

Does the Rhino.Geometry being thread safe means that it can run in parallel, or that it is literally just thread-safe so it will handle memory and such?

Is there a list of methods that can actually be called in parallel?

I am wondering how it deals with passing geometry to each thread in parallel. Meaning, if you do a line-line intersection of one line (lineA) to 100 lines (lineList), and I want to parallelize it so 20 lines of lineList are checked on each core against lineA. Something like a concurrency bag in .net would allow lineA to be accessed by each core, but not at the same time. Is there a way to make a copy to pass to each core without overhead of copying it within the rhino document?

The entire Rhino.Geometry namespace is meant to be thread-safe in the sense that it can be run in parallel.

So Rhino handles the copying of the geometry so that the methods can be run in parallel? I will make a test example of the issue I am running into once I get a better grasp of how it is supposed to work.

Thanks

This is not the case. You shouldn’t have to copy geometry to perform operations on geometry in parallel

Maybe I am not understanding what the API does, but If the function is to calculate the intersection between two lines, and the function gets the two lines, how can two different cores calculate the intersection at the same time without reading from the same memory address that is containing the geometry?

Only one core would compute per function call in a parallel setup. This means that for a single line/line intersection call only one core is used. If you have a loop with 100,000 line/line intersections, then those calls would be spread over the available number of cores.

This article may help
https://docs.microsoft.com/en-us/dotnet/standard/parallel-programming/data-parallelism-task-parallel-library

Sorry it was not clear. I am talking about how those 100,000 line/line intersections can happen when it is ONE line to 100,000 lines. Meaning:

A = a single line A
B = [Array of 100,000 lines]

Now, I need to check the intersection between line A against every element of B. This would be 100,000 calls to the line-line intersection method of Rhino. However, for each call, it will also try to access line A.

If there are 4 cores, the following would be done in parallel:

line-line-Intersection(A,B[0])
line-line-Intersection(A,B[1])
line-line-Intersection(A,B[2])
line-line-Intersection(A,B[3])

Now there are 4 calls accessing the same object (line A), presumably from the same memory address. Since you mention that geometry does not need to be copied for parallel operations on it, I am wondering how Rhino is storing the geometry that makes multiple calls to it happen in parallel?

A line is a struct, so it gets passed by value, meaning its is copied here when being passed without a ref statement. Nevertheless you can break almost any thread-safe system if you don‘t know what you are doing or vice versa. I almost never experienced a real need for parallel compution in RC, true bottlenecks have almost always been a weak algorithm. The cost for async coding is high if you compare the extra time needed for async coding vs the longer calculation time. But well it always depends on the problem

1 Like

Okay line is a struct, what about a curve,brep,mesh. Are those also passed by value to a rhino geometry call?

you‘ll probaly get a crash (sometimes) Thats where my second sentence comes in :slightly_smiling_face::wink:

What do you mean crash? CurveIntersection takes in curves. Just change all my past mentions of “line” to curve.

If you refer to the same instance from different threads on the exact same time you kill the Rhino instance.

Well even worse, you not always get the crash, just on true simultaneously occurrences. Thats the problem with parallel computing. You can code a threadlock in, maybe they already have, but then half of the performance gain is gone. In my oppinion, you are in charge of coding threadsafe. Therefore don‘t create such situations in first place. And again, true performance improvements are rare. Its almost always another bottleneck slowing down your system

This should not be true. If this happens, then it is a bug that we need to fix in Rhino.

Back to the original intent of this post…
If functions in the Geometry and Geometry.Intersect namespace don’t modify the data in a class they should be thread safe for use in Tasks and Parallel.For loops. Many different threads can reference a single curve for intersection calculations because those functions are only looking at values on the curve and not actually modifying the curve.

1 Like

Hi Steve,
To you point that the calls “don’t modify the data”

They are still reading the data from the curve, correct? So when does the data get copied so that each core can access the “values on the curve”, or is it not copied, and each core needs to read the same memory address that contains the “values on the curve”.

.

Multiple cores can access the same locations in memory for reading and writing. There is no copying

Well they can’t access them in parallel, which is what I need. So my question is what method there is for copying rhino objects without them being drawn or ‘created’ in the scene?

This is not true. Memory is not restricted in any way that forces only one thread to see it at any time. There is no copying.

From the documentation you linked before:

In sequential code, it is not uncommon to read from or write to static variables or class fields. However, whenever multiple threads are accessing such variables concurrently, there is a big potential for race conditions. Even though you can use locks to synchronize access to the variable, the cost of synchronization can hurt performance.