Strange parallel implementation behaviour

I have two questions, the first is:
I am having the most strange behaviour when implementing an inner System.Threading.Tasks.Parallel.For loop, I have a nested loop and I am parallelizing the inner loop.
Single threaded approach:

for( int i = 0; i < numberOfFeatures; ++i )
{
  path = new GH_Path(i);
  feature = layer.GetFeature(i);
  bool pattern = false;

  if( feature != null )
  {
    geo = feature.GetGeometryRef();
    if( geo != null )
    {
      ring = geo.GetGeometryRef( 0 );
      int pointCount = ring.GetPointCount();
      Point3d[] tempPointArray = new Point3d[pointCount];
      for (int j = 0; j < pointCount; ++j)
      {
        ring.GetPoint( j, pointList );
        tempPointArray[j] = new Point3d( pointList[0], pointList[1], pointList[2] );
      }

      if( tempPointArray.Length > 1 )
      {
        pattern = true;
        Polyline polyOut = new Polyline( tempPointArray );
        IGH_GeometricGoo geoGoo = GH_Convert.ToGeometricGoo( polyOut );
        geoOutput.Append( geoGoo, path );
      }
    } 
  }

  IGH_Goo gooPattern = GH_Convert.ToGoo( pattern );
  cullPattern.Append( gooPattern, path );
}

Multithreaded:

for( int i = 0; i < numberOfFeatures; ++i )
{
  path = new GH_Path(i);
  feature = layer.GetFeature(i);
  bool pattern = false;

  if( feature != null )
  {   
    geo = feature.GetGeometryRef();
    if( geo != null )
    {
      ring = geo.GetGeometryRef( 0 );
      int pointCount = ring.GetPointCount();
      double[][] points = new double[pointCount][];
      Point3d[] tempPointArray = new Point3d[pointCount];

      System.Threading.Tasks.Parallel.For( 0, pointCount, j => 
      {
        points[j] = new double[3];                      
        ring.GetPoint( j, points[j]);
        tempPointArray[j] = new Point3d( points[j][0], points[j][1], points[j][2] );
      });

      if( tempPointArray.Length > 1 )
      {
        pattern = true;
        Polyline polyOut = new Polyline( tempPointArray );
        IGH_GeometricGoo geoGoo = GH_Convert.ToGeometricGoo( polyOut );
        geoOutput.Append( geoGoo, path );
      }
    }             
  }

  IGH_Goo gooPattern = GH_Convert.ToGoo( pattern );
  cullPattern.Append( gooPattern, path );
}

*cullPattern is a GH_Structure<IGH_Goo> and geoOutput a GH_Structure<IGH_GeometricGoo>

The purpose of the code is to extract GIS vector data through the C# GDAL’s bindings. The question is why does the multithreaded approach take longer to run than the single-threaded one (even for large enough files) when running as a GH plugin. The second question is, for debugging purposes I have a Console App that does basically the same, I have profiled both approaches with the console app and the multithreaded one is much faster, but I can’t really profile the same code as I can’t get RhinoCommon or the GH kernel to run on the Console App, the app can’t find the references, is it intended behaviour for I don’t know intelectual property stuff? To profile I would need to reference RhinoCommon.dll, Grasshopper.dll and GH_IO.dll I think.

@dale @DavidRutten

Thanks Felipe

Your main problem is here:

If you write two version of serial loop and inside first you create Class instance ( … new double[3]) while inside second loop there is no such code, second loop will be much faster (easy to test).
So your problem has nothing to do with Parallel.For but with your code.
There are probably other things that can speed up your parallel.for execution as well, but first you have to create your solution without that new new double[3] …

1 Like

here is simple code test
Loop_Performance_Issue_01.gh (8.1 KB)

1 Like

What’s up with the increment operator in front of the loop variable?

Yes but if I try to overwrite the same array with multiple threads accesing it at the same time I would have a bad time, at first I thought the compiler would create copies for each thread like it should for any variable but it doesn’t, maybe I could optimize it a bit by simply creating the pointList array as in the serial example but (one dimension) inside the parallel for, I think it would work and maybe it is in fact less expensive as memory would stay a little bit more contigous?
I’ll give it a try. Your example is obviously what you’d expect but with real data I am getting strange lower performance. Thanks.

In C# it shouldn’t make a difference to use post or pre increment but it’s just a vestige from my C++ and specially GLSL.
Does RhinoCommon or the Grasshopper dll’s have any protection against usage in a standalone app or it’s just me doing something wrong?

It doesn’t matter, I was just wondering why the non-standard approach.

There are no ‘protections’ in GH, but RhinoCommon functions cannot just be invoked unless it is running inside Rhino.exe process. And without Rhino core functions GH won’t run at all.

Thanks! That clears it up. How do you recommend debugging GH components? I used to use the ReloadAssemblies command before but I think that’s deprecated.

If you have control over ring.GetPoint() method you cold write an (optimized) overload which returns Point3d for given index (int i). That could improve performance of Parallel.For loop significantly…

No I don’t have control over that it’s a library function.

I didn’t remember but yeah the jagged array was to stay away from this:

I came to a conclusion, trying to beat single-threaded non CLI C++ (the library I am using) by using multi-threaded C# is like bringing a knife to a gun fight I desist.