Multiprocessing in python

I’m trying to use the multiprocessing library in a rhino script I’m working on but I can’t get it to work.

I got this piece of code from an AI just to test the library and rhino gives me an error in line 14 saying: “PicklingError: Can’t pickle <function square_number at 0x000002F0427BF940>: attribute lookup square_number on main failed”

import multiprocessing

# Function to be executed in parallel (must be defined at the module level)
def square_number(n):
    return n * n

# Data to process
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

if __name__ == "__main__":
    # Create a Pool of workers (one per core)
    with multiprocessing.Pool() as pool:
        # Apply the function 'square_number' to each item in the list 'numbers'
        result = pool.map(square_number, numbers) # this line gives me the error

    print(result)

this code works fine outside rhino

I believe you’re running into what the note here multiprocessing — Process-based parallelism — Python 3.9.21 documentation mentions:

Functionality within this package requires that the __main__ module be importable by the children.

I don’t know if this can be worked around, maybe @eirannejad can give you a solution.

When trying to use multiprocess.Pool I see also a couple of crashes and these errors

The crash doesn’t take down the Rhino instance I run this in, probably because I assume that the pool tries to run another Rhino process instead?

1 Like

Hi @robneto.eng .
It’s nice to see you learning new stuff, hope everything is good.
Getting to your problem, instead of creating separate processes, we can use threads. Threads are like lightweight sub-processes that run within the same main process. This avoids the need for pickling altogether, as threads can directly access data and functions within the main script.

import time
import threading

# Function to be executed in parallel (can be defined at the module level)
def square_number(n):
    return n * n

# Data to process
numbers = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
threading_results = [None] * len(numbers) # Pre-allocate results

def process_number(index, number):
    threading_results[index] = square_number(number)

start_time = time.time()
threads = []
for i, num in enumerate(numbers):
    thread = threading.Thread(target=process_number, args=(i, num))
    threads.append(thread)
    thread.start()

for thread in threads:
    thread.join()

end_time = time.time()
threading_time = end_time - start_time
print("Threading Execution Time: {:.6f} seconds".format(threading_time))
print("Results:", threading_results)

# For comparison, the original sequential approach:
start_time = time.time()
sequential_results = [square_number(n) for n in numbers]
end_time = time.time()
sequential_time = end_time - start_time
print("Sequential Execution Time: {:.6f} seconds".format(sequential_time))
print("Results (Sequential):", sequential_results)

Keep in mind that multiprocessing or threading will not always see an improvement in performance, expecially in Rhino.
What I would do if I were you I would try to reduce to computational complexity if that is the issue.
Hope this sparks some ideas,
Farouk

2 Likes

multiprocessing module determines the parent process to be Rhino since that is the main process and it has python 3 embedded inside it. You’d need to set the executable in the multiprocessing module. We have a helper library named rhinocode that can provide the executable python path shipped with Rhino.

import rhinocode
import multiprocessing

multiprocessing.set_executable(rhinocode.get_python_executable())

On another note, multiprocessing module is not a very efficient way of running a compute function in parallel (Spawning a new process takes considerable time in operating system). There are other options e.g. Numba that are worth looking at. Numba compiles your compute function to machine code and runs it in multi-threaded parallel inside the same process without being affected by python 3 GIL.

1 Like

BTW if you are seeing at pickle errors when running functions-defined-in-script-global-scope, you can avoid this problem by putting the functions in a separate python file (module) and importing them in your script.

This error usually happens in embedded python environments e.g. Jupyter, or Rhino where the __main__ scope is not the traditional main that an instance of python 3 has