But it is important to notice that booleans can only be performed sequentially, which also this component does in that it processes one branch per thread, meaning, all the spheres in the corresponding branch are booleaned sequentially against one flat Brep (= the regular foreach-loop inside the outer parallel loop).
Separate branches otoh can be processed in parallel, one branch per thread.
However, it is interesting that appending all breps into one brep (D input) before performing boolean operation speeds up things quite significantly but as well failing in result also quite significantly :
If you run both components at the same time, utilizing all cores, how can you then know which component is the fastest? They would fight about available processing power and there’s probably no guarantee that the resources are distributed evenly. I’d check them in independent runs. If the difference remains, then that is that. Differences could depend on how the wrappers are implemented under the hood of C# resp. Python, or if the C# component isn’t a wrapper at all(?).
And this is after appending to one brep all the spheres. The parallel version wont change and non parallel is a bit better. But it is not the good way of doing booleans, because you do not know when the failure occurs: