Could someone give an overview of how the parallels module in pythongh is implemented? I have many questions, which I think anyone trying to speed up their code would want to know before spending time.
For example, is the code written in the component copied, and then python multiprocessing is being used in the background (I’m assuming its not multithreading)? Is only the function that is being called copied? How would it know what other functions to call, if that one was also using functions?
Is anything in Rhino or rhinoscriptsyntax able to be used in parallel, or are there only certain functions? If so, is there a list of them?
If it uses something like multiprocessing module, what is the overhead for spawning each child? Is it just the starting of a python instance, or are there more items that need to be created related to grasshopper?
Although the original question was on python, I am also curious on c# or the c++ api if it is different…
Does the Rhino.Geometry being thread safe means that it can run in parallel, or that it is literally just thread-safe so it will handle memory and such?
Is there a list of methods that can actually be called in parallel?
I am wondering how it deals with passing geometry to each thread in parallel. Meaning, if you do a line-line intersection of one line (lineA) to 100 lines (lineList), and I want to parallelize it so 20 lines of lineList are checked on each core against lineA. Something like a concurrency bag in .net would allow lineA to be accessed by each core, but not at the same time. Is there a way to make a copy to pass to each core without overhead of copying it within the rhino document?
So Rhino handles the copying of the geometry so that the methods can be run in parallel? I will make a test example of the issue I am running into once I get a better grasp of how it is supposed to work.
Maybe I am not understanding what the API does, but If the function is to calculate the intersection between two lines, and the function gets the two lines, how can two different cores calculate the intersection at the same time without reading from the same memory address that is containing the geometry?
Only one core would compute per function call in a parallel setup. This means that for a single line/line intersection call only one core is used. If you have a loop with 100,000 line/line intersections, then those calls would be spread over the available number of cores.
Sorry it was not clear. I am talking about how those 100,000 line/line intersections can happen when it is ONE line to 100,000 lines. Meaning:
A = a single line A
B = [Array of 100,000 lines]
Now, I need to check the intersection between line A against every element of B. This would be 100,000 calls to the line-line intersection method of Rhino. However, for each call, it will also try to access line A.
If there are 4 cores, the following would be done in parallel:
Now there are 4 calls accessing the same object (line A), presumably from the same memory address. Since you mention that geometry does not need to be copied for parallel operations on it, I am wondering how Rhino is storing the geometry that makes multiple calls to it happen in parallel?
A line is a struct, so it gets passed by value, meaning its is copied here when being passed without a ref statement. Nevertheless you can break almost any thread-safe system if you don‘t know what you are doing or vice versa. I almost never experienced a real need for parallel compution in RC, true bottlenecks have almost always been a weak algorithm. The cost for async coding is high if you compare the extra time needed for async coding vs the longer calculation time. But well it always depends on the problem
If you refer to the same instance from different threads on the exact same time you kill the Rhino instance.
Well even worse, you not always get the crash, just on true simultaneously occurrences. Thats the problem with parallel computing. You can code a threadlock in, maybe they already have, but then half of the performance gain is gone. In my oppinion, you are in charge of coding threadsafe. Therefore don‘t create such situations in first place. And again, true performance improvements are rare. Its almost always another bottleneck slowing down your system
This should not be true. If this happens, then it is a bug that we need to fix in Rhino.
Back to the original intent of this post…
If functions in the Geometry and Geometry.Intersect namespace don’t modify the data in a class they should be thread safe for use in Tasks and Parallel.For loops. Many different threads can reference a single curve for intersection calculations because those functions are only looking at values on the curve and not actually modifying the curve.
Hi Steve,
To you point that the calls “don’t modify the data”
They are still reading the data from the curve, correct? So when does the data get copied so that each core can access the “values on the curve”, or is it not copied, and each core needs to read the same memory address that contains the “values on the curve”.
Well they can’t access them in parallel, which is what I need. So my question is what method there is for copying rhino objects without them being drawn or ‘created’ in the scene?
In sequential code, it is not uncommon to read from or write to static variables or class fields. However, whenever multiple threads are accessing such variables concurrently, there is a big potential for race conditions. Even though you can use locks to synchronize access to the variable, the cost of synchronization can hurt performance.