Rhino 8 and GH's python - parallel computation is slower than sequential

Hi All,
I need to perform a lot of ray-mesh intersections and do some calculations after that.
I try to do it using ghpythonlib.parallel but in my case parallelization is much worse than sequential operations.
The parallelized function takes a. tuple as argument and does some stuff on mesh, which is a global variable inside python script component.
Could you please have a look on attached gh file and help me figure out why using parallelization is much worse? The example is just to illustrate my problem with an example of something that is a bottleneck in my case.

Best regards,

The code:

import Rhino.Geometry as rg
from Rhino.Geometry.Intersect import Intersection as isct
from ghpythonlib.parallel import run

## number of rays per side
n = len(P)
print("number of rays: ", n)

## output variables
P_out = []
Nrms = []

## list storing results of parallel and sequential computation
res = []

## function that intersects rays with mesh and retrieving normals
## input argument is a tuple for parallel computation
def do_stuff(ray_o_ray_v):
    ## output variables
    p_out = None
    nrm = None

    ## unpacking of input data
    pp, vv = ray_o_ray_v

    ## creations of ray and intersecting of the ray with mesh
    ray = rg.Ray3d(pp, vv)
    lngth = isct.MeshRay(mesh, ray)

    ## get point of intersection and normal at this point
    if lngth >= 0.0:
        p = pp + vv * lngth
        mcp = mesh.ClosestMeshPoint(p, 0.0)
        p_out = mcp.Point
        nrm = mesh.NormalAt(mcp)
        nrm = rg.Line(mcp.Point, mcp.Point + nrm * 3.0)

    return(p_out, nrm)

if is_parallel:
    data = [(P[ii], V[ii]) for ii in range(n)]
    res = list(run(do_stuff, data, True))

    for ii in range(n):
        res.append(do_stuff((P[ii], V[ii])))

## putting data to component's output lists
for ii in range(len(res)):

slow_parallel_computation.gh (26.3 KB)

I have some workaround to improve calculation speed in posted problem, maybe it will be useful for other newbies like me - rewriting the code to C# reduced the calculation time by a factor of five. I also tried to parallelize it with C# and then to use some batching of data, but sometimes it is helpful and sometimes not.
It seems that for heavy calculations C# is a must.

Hi there, im in a similar situation where i have rays passing throught a reference Point and grid points, and then those rays intersect a mesh. Could u share your C# code of parallel with batching please?