C# Multithreads Task Capable component - Nested Loop

Dear Community :man_technologist:t2:

I have been struggling lately with implementing a multi-threaded capable GH component written in C# on Visual Studio. I’m not sure that this could improve my component’s performance (in terms of speed). But besides learning on the subject, my goal is to create a case study and compare it with/without parallelism implementations.

I have been collecting links on the subject but still cannot wrap my head around what could be the best option for me.

My goal is to compute distances from a 3D sparse grid to a specific point. I start with a bounding box, numbers of pts in XYZ directions (nX, nY, nZ), from which I extract gaps (dimX, dimY, dimZ).
Initially, For my Non-Task capable version, I used a nested for loop (for i++ * dimX, for j++ * dimY, for k++ * dimZ) to create a Point3d, compute the distance, and add both in separate lists.

I tried to translate this into tasks managed by a Parallel.For loop like this:


                Parallel.For(0, nX * nY * nZ, new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount }, i =>
                {
                   {
                        SolveResults r = ComputeDistances(ptDistanceObject, 
                                                        dimX * i / (nY * nZ) + dimX,
                                                        dimY * ((i % (nY * nZ)) / nZ) + dimY,
                                                        dimZ * ((i % (nY * nZ)) % nZ) + dimZ);
                        values[i] = r.Value;
                        points[i] = r.Point;
                    }
                });
                return;

I am creating two arrays (values, points) to make sure that the elements will be in the right place (+ thread-safe) and follow the SolveResults structure with the { get; set; } variables.

Somehow, values and points are never updated, but if I remove the return; I get first the empty result in the first branch of a tree and the right values/points in separate branches.

Moreover, creating a thread for each iteration might not be the best idea ever. Still, I want to understand how iterating through a collection of Tasks works, then challenge it with another solution that could allocate a thread per Z values only, for example.

Looking for help and guidance, please.

All the best :yin_yang:

Remy

Sounds like a job for the KdTree-Algorithm…

Don’t worry too much. The Task’s library takes care of many things for you, especially the Parallel.For loop. All you need is to make sure to provide and take out your data over an array or another thread-safe collection. Without a Parallel.For loop, obviously you should consider creating as many chunks as the system effectively can run in parallel. Otherwise, you gain nothing. And of course the amount of data has to be large enough to justify the extra overheat.

If performance is really the issue, you’d rewrite the problem in C++ with something like OpenMP and the algorithm may be several times faster.

Parallel.For and TPL will be slower if the single task is too easy as the overhead of switching between tasks are huge.

And what’s the complete code? What’s ComputeDistances? What’s the input/output code? Your description is too vague and difficult to figure out what goes wrong.

Could you maybe explain how can I write code in C++ and use it in GH?
Just a few hints to search the web.
Thanks a lot!

using System;
using System.Collections.Generic;
using System.Threading;
using System.Threading.Tasks;

using Grasshopper.Kernel;
using Rhino.Geometry;

namespace Workshop
{
    public class GhcDenseGridParrallel : GH_TaskCapableComponent<GhcDenseGridParrallel.SolveResults>
    {

        public GhcDenseGridParrallel()
        : base("DenseGridParrallel", 
                "DenseGridOriginalP",
                "Description",
                "Workshop",
                "Tests")
        {
        }
        protected override void RegisterInputParams(GH_Component.GH_InputParamManager pManager)
        {
            double defaultDim = 5;
            Box box = new Box(Plane.WorldXY, new Interval(-defaultDim, defaultDim), new Interval(-defaultDim, defaultDim), new Interval(-defaultDim, defaultDim));

            pManager.AddPointParameter("Distance Object", "D", "Distance Object", GH_ParamAccess.item);
            pManager.AddBoxParameter("Box", "B", "Bounding Box", GH_ParamAccess.item, box);
            pManager.AddIntegerParameter("Number X", "NX", "Number of points in X", GH_ParamAccess.item, 50);
            pManager.AddIntegerParameter("Number Y", "NY", "Number of points in Y", GH_ParamAccess.item, 50);
            pManager.AddIntegerParameter("Number Z", "NZ", "Number of points in Z", GH_ParamAccess.item, 50);
        }

        protected override void RegisterOutputParams(GH_Component.GH_OutputParamManager pManager)
        {
            pManager.AddNumberParameter("Values", "V", "Distance values", GH_ParamAccess.list);
            pManager.AddPointParameter("Points", "P", "Query points for which the distance was evaluated", GH_ParamAccess.list);
        }

        public class SolveResults
        {
            public double Value { get; set; }
            public Point3d Point { get; set; }
        }

        private static SolveResults ComputeDistances(Point3d distanceObject, double cx, double cy, double cz)
        {
            SolveResults result = new SolveResults();
            Point3d newPt = new Point3d(cx, cy, cz);
            result.Value = distanceObject.DistanceTo(newPt);
            result.Point = newPt;
            return result;
        }

        protected override async void SolveInstance(IGH_DataAccess DA)
        {
            Point3d go = new Point3d(0, 0, 0);
            int nX = 10;
            int nY = 10;
            int nZ = 10;
            Box bx = new Box();

            DA.GetData(0, ref go);
            DA.GetData(1, ref bx);
            if (!DA.GetData(2, ref nX)) return;
            if (!DA.GetData(3, ref nY)) return;
            if (!DA.GetData(4, ref nZ)) return;

            BoundingBox box = bx.BoundingBox;

            double x1 = box.Min.X;
            double y1 = box.Min.Y;
            double z1 = box.Min.Z;
            double x2 = box.Max.X;
            double y2 = box.Max.Y;
            double z2 = box.Max.Z;

            double[]values = new double[nX * nY * nZ];
            Point3d[] points = new Point3d[nX * nY * nZ];

            ParallelLoopResult loopResult = new ParallelLoopResult();

            if (InPreSolve)
            {
                double dimX = (x2 - x1) / (nX - 1);
                double dimY = (y2 - y1) / (nY - 1);
                double dimZ = (z2 - z1) / (nZ - 1);

                if (nX <= 0)
                {
                    AddRuntimeMessage(GH_RuntimeMessageLevel.Error, "nX must be bigger than zero");
                    return;
                }
                if (nY <= 0)
                {
                    AddRuntimeMessage(GH_RuntimeMessageLevel.Error, "nY must be bigger than zero");
                    return;
                }
                if (nZ <= 0)
                {
                    AddRuntimeMessage(GH_RuntimeMessageLevel.Error, "nZ must be bigger than zero");
                    return;
                }
      
                loopResult = Parallel.For(0, nX * nY * nZ, new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount }, i =>
                {
                   {
                        SolveResults r = ComputeDistances(go, 
                                                        dimX * i / (nY * nZ) + x1,
                                                        dimY * ((i % (nY * nZ)) / nZ) + y1,
                                                        dimZ * ((i % (nY * nZ)) % nZ) + z1);
                        values[i] = r.Value;
                        points[i] = r.Point;
                    }
                });
                return;
            }

            
            /*
            if (!loopResult.IsCompleted)
            {
                double dimX = (x2 - x1) / (nX - 1);
                double dimY = (y2 - y1) / (nY - 1);
                double dimZ = (z2 - z1) / (nZ - 1);

                if (nX <= 0)
                {
                    AddRuntimeMessage(GH_RuntimeMessageLevel.Error, "nX must be bigger than zero");
                    return;
                }
                if (nY <= 0)
                {
                    AddRuntimeMessage(GH_RuntimeMessageLevel.Error, "nY must be bigger than zero");
                    return;
                }
                if (nZ <= 0)
                {
                    AddRuntimeMessage(GH_RuntimeMessageLevel.Error, "nZ must be bigger than zero");
                    return;
                }


                List<double> _values = new List<double>();
                List<Point3d> _points = new List<Point3d>();

                for (int x = 0; x < nX; x++)
                {
                    for (int y = 0; y < nY; y++)
                    {
                        for (int z = 0; z < nZ; z++)
                        {
                            double cx = x * dimX + x1;
                            double cy = y * dimY + y1;
                            double cz = z * dimZ + z1;
                            Point3d newPt = new Point3d(cx, cy, cz);
                            _values.Add(go.DistanceTo(newPt));
                            _points.Add(newPt);
                        }
                    }
                }
                //values = _values.ToArray();
                //points = _points.ToArray();
            }
            */
            DA.SetDataList(0, values);
            DA.SetDataList(1, points);
        }
    }
}

@gankeyu Sorry for the lack of information. Here is my code. I comment out the non-tasks oriented part to test. I get weird results or empty arrays if if remove/add return; after my Parallel.loop.for.

Any clue?

Tahnk you

@TomTom Thank you for your answer. I did share my code in the thread. It seems that I have a problem collecting the data back from the threads. You are right regarding the algorithm to use (KdTree etc…). Let’s say that I choose this example to understand the logic behind multithreading in GH, even if it doesn’t make a lot of sense at the moment.

Any advice?

Thank you

Curious as well @Baris :slightly_smiling_face:

I’m not sure why you are using a parallel.for loop in a task capable component. The task capable component was designed to work with tasks where every iteration is computed with a separate task.

If you want to use a parallel.for loop, just create a regular GH component.

Thank you @stevebaer !
Would it make sense to work with tasks here? Is it possible to loop through different Tasks with a common goal (like the Parallel loop)?

You’ll have to experiment with different approaches and measure your results to know what works best for the algorithm you are trying to solve along with the data you are working with. You would need to profile

  • no parallelization
  • with Parallel.For
  • with tasks
  • with combination of Parallel.For and tasks

I will and post the results here! thank you @stevebaer

  1. You should ‘return’ in PreSolve because otherwise the output is set even if not all inputs are calculated.

  2. A single task is too easy so the overhead of parallelism may increase execution time. Try to distribute more tasks (say Parallel.For on X only) in one execution.

  3. In your case, it’s better not to use Task Capable component as it seems to create a lot of heap objects and put pressure to Gen0 GC.

image
image

The following code might be further optimized by vectorization or intercepting GH’s SetDataList procedure.

using System;
using System.Linq;
using System.Collections.Generic;
using System.Threading.Tasks;
using Grasshopper.Kernel;
using Grasshopper.Kernel.Types;
using Rhino.Geometry;
using System.Reflection;
using System.Reflection.Emit;

namespace PancakeAlgo.Misc
{
    public class DenseGridOptimized : GH_Component
    {
        /// <summary>
        /// Initializes a new instance of the DenseGridBaseline class.
        /// </summary>
        public DenseGridOptimized()
          : base("DenseGridOptimized", "DenseGridOptimized",
              "Description",
              "Pancake", "Algo")
        {
        }

        /// <summary>
        /// Registers all the input parameters for this component.
        /// </summary>
        protected override void RegisterInputParams(GH_Component.GH_InputParamManager pManager)
        {
            double defaultDim = 5;
            Box box = new Box(Plane.WorldXY, new Interval(-defaultDim, defaultDim), new Interval(-defaultDim, defaultDim), new Interval(-defaultDim, defaultDim));

            pManager.AddPointParameter("Distance Object", "D", "Distance Object", GH_ParamAccess.item);
            pManager.AddBoxParameter("Box", "B", "Bounding Box", GH_ParamAccess.item, box);
            pManager.AddIntegerParameter("Number X", "NX", "Number of points in X", GH_ParamAccess.item, 50);
            pManager.AddIntegerParameter("Number Y", "NY", "Number of points in Y", GH_ParamAccess.item, 50);
            pManager.AddIntegerParameter("Number Z", "NZ", "Number of points in Z", GH_ParamAccess.item, 50);
        }

        /// <summary>
        /// Registers all the output parameters for this component.
        /// </summary>
        protected override void RegisterOutputParams(GH_Component.GH_OutputParamManager pManager)
        {
            pManager.AddNumberParameter("Values", "V", "Distance values", GH_ParamAccess.list);
            pManager.AddPointParameter("Points", "P", "Query points for which the distance was evaluated", GH_ParamAccess.list);
        }

        /// <summary>
        /// This is the method that actually does the work.
        /// </summary>
        /// <param name="DA">The DA object is used to retrieve from inputs and store in outputs.</param>
        protected override void SolveInstance(IGH_DataAccess DA)
        {
            Point3d go = new Point3d(0, 0, 0);
            int nX = 10;
            int nY = 10;
            int nZ = 10;
            Box bx = new Box();

            DA.GetData(0, ref go);
            DA.GetData(1, ref bx);
            if (!DA.GetData(2, ref nX)) return;
            if (!DA.GetData(3, ref nY)) return;
            if (!DA.GetData(4, ref nZ)) return;

            BoundingBox box = bx.BoundingBox;

            double x1 = box.Min.X;
            double y1 = box.Min.Y;
            double z1 = box.Min.Z;
            double x2 = box.Max.X;
            double y2 = box.Max.Y;
            double z2 = box.Max.Z;

            double dimX = (x2 - x1) / (nX - 1);
            double dimY = (y2 - y1) / (nY - 1);
            double dimZ = (z2 - z1) / (nZ - 1);

            var _values = new List<GH_Number>(nX * nY * nZ);
            var _points = new List<GH_Point>(nX * nY * nZ);

            _values.SetListLength(nX * nY * nZ);
            _points.SetListLength(nX * nY * nZ);

            var values = _values.GetInternalArray();
            var points = _points.GetInternalArray();

            Parallel.For(0, nX, x =>
            {
                var position = x * nY * nZ;
                double cx = x * dimX + x1;
                double cy = y1;

                var sqX = (cx - go.X) * (cx - go.X);

                for (int y = 0; y < nY; y++)
                {
                    double cz = z1;

                    var sqY = (cy - go.Y) * (cy - go.Y) + sqX;

                    for (int z = 0; z < nZ; z++)
                    {
                        var sqZ = (cz - go.Z) * (cz - go.Z) + sqY;
                        var dist = Math.Sqrt(sqZ);

                        Point3d newPt = new Point3d(cx, cy, cz);

                        values[position] = new GH_Number(dist);
                        points[position] = new GH_Point(newPt);

                        ++position;
                        cz += dimZ;
                    }

                    cy += dimY;
                }
            });

            DA.SetDataList(0, _values);
            DA.SetDataList(1, _points);
        }

        /// <summary>
        /// Provides an Icon for the component.
        /// </summary>
        protected override System.Drawing.Bitmap Icon => null;

        /// <summary>
        /// Gets the unique ID for this component. Do not change this ID after release.
        /// </summary>
        public override Guid ComponentGuid => new Guid("6317B759-68A1-4395-955D-931D8C7A95E6");
    }

    /// <summary>
    /// some code is from https://stackoverflow.com/questions/4972951/listt-to-t-without-copying
    /// CC-BY-SA 3.0
    /// </summary>
    public static class ListExtensions
    {
        internal static class ArrayAccessor<T>
        {
            internal delegate void SetListLength(List<T> list, int size);

            internal static Func<List<T>, T[]> Getter;
            internal static SetListLength CountSetter;
            private static void Method1()
            {
                var dm = new DynamicMethod("get", 
                    MethodAttributes.Static | MethodAttributes.Public,
                    CallingConventions.Standard,
                    typeof(T[]),
                    new Type[] { typeof(List<T>) },
                    typeof(ArrayAccessor<T>), 
                    true);

                var il = dm.GetILGenerator();
                il.Emit(OpCodes.Ldarg_0); // Load List<T> argument
                il.Emit(OpCodes.Ldfld, typeof(List<T>).GetField("_items", BindingFlags.NonPublic | BindingFlags.Instance)); // Replace argument by field
                il.Emit(OpCodes.Ret); // Return field
                Getter = (Func<List<T>, T[]>)dm.CreateDelegate(typeof(Func<List<T>, T[]>));
            }
            private static void Method2()
            {
                var dm = new DynamicMethod("set",
                    MethodAttributes.Static | MethodAttributes.Public,
                    CallingConventions.Standard,
                    null,
                    new Type[] { typeof(List<T>), typeof(int) },
                    typeof(ArrayAccessor<T>),
                    true);

                var il = dm.GetILGenerator();
                il.Emit(OpCodes.Ldarg_0); 
                il.Emit(OpCodes.Ldarg_1); 
                il.Emit(OpCodes.Stfld, typeof(List<T>).GetField("_size", BindingFlags.NonPublic | BindingFlags.Instance)); // Replace argument by field
                il.Emit(OpCodes.Ret); // Return field
                CountSetter = (SetListLength)dm.CreateDelegate(typeof(SetListLength));
            }
            static ArrayAccessor()
            {
                Method1();
                Method2();
            }
        }

        public static T[] GetInternalArray<T>(this List<T> list)
        {
            return ArrayAccessor<T>.Getter(list);
        }

        public static void SetListLength<T>(this List<T> list, int newLength)
        {
            ArrayAccessor<T>.CountSetter(list, newLength);
        }
    }
}

1 Like

My advice: Measure your performance! Open a new console application in Visual Studio. Reference BenchmarkDotNet and simply test different implementations! Use the profiler to spot the bottlenecks

Performance is really a complicated topic. You are gonna need to measure it properly and not waste your time on optimizing the wrong parts of your code!

Multithreading is not the holy grail of performance optimization, because it requires an operation being able to parallelize. In most cases, this is not doable. Furthermore, there are a lot of things only specialized developers truly understand. Such as making use of SIMD instructions, branchless programming and IL or ASM optimized code. Apart from that, the C# compiler is quite smart in doing this for you, this is why in some case you get equal or even better performance as writing C/C++ code. Especially if you are less skilled in those languages. Many factors play a role, your operating system, your hardware, your compiler, down to things like memory allocation (and garbage collecting it), types used, collections, library code etc…

Using the Tasks library is really simple, and you find a lot of tutorials and great explanations.
But really, lower your expectations. In the end, you also want to write reliable and easy-maintainable code.

1 Like

:thinking: I never run into a situation where C# is faster than C++. Less prone to error, maybe.

Recently I refactored a daylight calculation algorithm from C# to cpp and it’s ~5x faster with the exactly same algorithm.

If you compare it to the lowest level, sure. But most bottlenecks I deal with, are not related to number crunching, but inefficient coding. And a good compiler helps you with that. In reality, the performance gaps in average applications, are not worth to investigate at all. That’s btw the point of my last post.

One potential improvement in C# I come up with, is, CLR’s GC vs C++ default allocator (new/delete). IIRC CLR is better designed at a mixture of many small objects and large objects (LOH), which may plays a role in data structure using a lot of references/pointers, e.g. red-black trees.

2 Likes

Thank you for the great comments @TomTom

I was wondering, is there a proper way to use the BenchmarkDotNet package with GH C# script (while running a debug run test)? Or is it better to simplify the program, remove Rhinocommon and create a Benchmark only on the structure of the code that I want to test (Parallel nested loop)?

Yes but RhinoCommon won’t work standalone. So you probably need to implement some basic structs like Point3d.

Now, this is really up to your use-case. Simple answer, no there is no simple or default way of doing this.

First, you don’t need BenchmarkDotNet. You can write your own little benchmarking code. Usually you repeat measuring calling a function, and you force a ‘Garbage Collector’ to collect before that. The most simple way of implementing this, is using the StopWatch class.

There are plugins which work independent of Rhinocommon. In that case, it makes sense to design your system to work without Rhino. This also allows you to use any Nuget package with ease and simplifies Debugging, since you can run it standalone.

For learning, there is not need to depend yourself to Rhino. Just create a new Console App and tinker around it, until you understand it better.

And there are plugins relying on Rhinocommon. In this case as @gankeyu said, you cannot run Rhinocommon outside of Rhino, but you might be able to run BenchmarkDotNet in Rhino. Its uncharted terrain, but I would claim this is doable.

In the end, the point is to profile your code in order to figure out the true bottlenecks. Very often the performance issue are at totally unexpected places. Like unexpected copying of a large data collection…