Super-super fast Point List to Array

OK, so we did the “Super-super fast point inclusion” thing, but now I need to find the fastest code for converting a List(Of Point3d) to a “blittable” Array (being compatible with Gpu processing).

The follwowing code is way too slow in copying the a List(Of Point3d) to blittable arrays. So how is the array handled in the fastest possible way with .NET? (remake from first row until gpu.For(...) )

   [GpuManaged]
    public Point3d GPU_SolveInstance()
    {
        // Prepare Point list as X, Y and Z arrays instead of Point List.
        int Cnt = InPts.Count;
        double[] x_arr = new double[Cnt];
        double[] y_arr = new double[Cnt];
        double[] z_arr = new double[Cnt];

        // Copy to (blittable) Array
        int i = 0;
        foreach (Point3d pt in m_Pts)
        {
            x_arr[i] = pt.X;
            y_arr[i] = pt.Y;
            z_arr[i] = pt.Z;
            i += 1;
        }

        double x_tmp = 0;
        double y_tmp = 0;
        double z_tmp = 0;

        //  ######################################
        //  Below is where blittable types matters, and this 
        //  code compiles and runs fine, but slow. 
        //  So, better/faster overall array handling is wanted : 

        // Add all axis values
        Action<int> op = (j =>
        {
            x_tmp += x_arr[j];
            y_tmp += y_arr[j];
            z_tmp += z_arr[j];
        });
        gpu.For(0, Cnt, op);

        //  ##############################


        // Average axis values
        Point3d cpt = new Point3d();
        cpt.X = x_tmp / Cnt;
        cpt.Y = y_tmp / Cnt;
        cpt.Z = z_tmp / Cnt;

        return cpt;
    }

This is BTW one of the nightmares with using a new language - learning the benefits and performance of different data types and processing.

// Rolf

1 Like

OK, to bump this thread up I can summarize what I know so far, and my question is about the fastest/cheapest way to convert a list of point3d (and vector3d) to arrays. More specifically:

Q1: Starting from the struct GH_Point3D, what is the absolut fastest way to convert a list of these structs into arrays instead?

###Nested Array Structure Preferred (not 2d)

Some array structures are preferred over others, and in this case nested array structures matches the Alea Gpu library API. Also, the array structure flipped “double[3][points.Count]” is preferred over the more intuitive “double[points.Count][3]” for xyz points.

Two dimensional arrays like “double[,]” are not useful. Gpu functions generally takes one dimensional arrays, for example the following two functions: Gpu.Sum(double[] arg) and Gpu.Average(double[] arg).

I have learned today that I can create comaptible structures by gradually building nested arrays, like so:

double[][] arrpoints_2d = new double[3][];
arrpoints_2d[0] = new double[m_Pts.Count];
arrpoints_2d[1] = new double[m_Pts.Count];
arrpoints_2d[3] = new double[m_Pts.Count];

which makes the 2nd dimension accessible as a “separate array”, and after copying the pointlist to the array (with a for-loop) I can then finally call gpu.Average like so:

double x = gpu.Average(arrpoints_2d[0]);	 // 2nd dim accepted
double y = gpu.Average(arrpoints_2d[1]);		
double z = gpu.Average(arrpoints_2d[2]);	    

But all this allocation and shuffling takes time. Disregard for now that this specific example is not meaningful for gpu.processing, but it demonstrates the need for conversion of lists to arrays which are very costly.

So how can this conversion be optimized from point lists to the above array format? Is it possible to access GH’s underlaying data structures more efficient than only the list format. Perhaps @DavidRutten would have some suggestions.

// Rolf

Eh, why are you using types from GH_IO.dll? That’s just a bunch of bridge types to help store RhinoCommon data without the need to reference RhinoCommon.dll

Well, I’m using List(Of Point3d)'s, whatever underlaying types that is based on.

I drilled down from what I thought came tumbling in through the inports in GrassHopper components, and I assumed that 3d points and 3d vectors are defined as structures of x, y and z doubles. But I’m not picky, if they are defined as something else, just let me know.

But most interesting is of course how to get them point lists into nested double[3][list.Count] with the absolute least possible processing effort. :slight_smile:

// Rolf

You’d almost always use Rhino.Geometry.Point3d or Rhino.Geometry.Vector3d. These are structs defined in RhinoCommon. In rare occasions you’ll need to deal with GH_Point or GH_Vector, which are classes defined by Grasshopper what provide an IGH_Goo implementation of the aforementioned RhinoCommon structs. They exist almost exclusively to make it possible for points and vectors to be stored in data trees, although GH_Point does also provide the functionality to do with referencing point objects from the Rhino document.

If you have a scripting component and you set the input to have a type hint of Point3d and an access value of list, then your RunScript method will be provided with a List<Point3d> argument:

The fastest way to go from a List<Point3d> to a Point3d[], is to call the .ToArray() method on said list.

Yes, I’m aware of that List<Point3d> (C#) and List(Of Point3d) (Vb:NET) is the same thing. But ToArray() returns

double[count,3]
and not nested arrays ([,] is a 2 dimensional array),
and not
double[3][count]
as stated here :

So my question remains.

// Rolf

double[][] array = new double[list.Count][];
for (int i = 0; i < list.Count; i++)
{
  Point3d p = list[i];
  array[i] = new double[] { p.X, p.Y, p.Z }; 
}

There’s no need to heavily optimise this unless you’re dealing with thousands upon thousands upon thousands of points, so one possible speedup might be to multi-thread the loop. Figure out how many processors are at your disposal, divide your point list into that many chunks and run them all at the same time.

I suspect that the most significant time hog in this process is the construction of all the secondary arrays. Type instantiation is very fast, but it still takes times and memory that you wouldn’t need to spend if you were using a two-dimensional array. Don’t get me wrong, I’m all for nested arrays, but if performance is the main guiding principle, then it’s something to keep in mind.

Thanks David! Your advice is well taken.

I’m processing from around 50.000 points to half a million. Or more. I realize I will have to look into .Net threading model for this.

Yes it is. Problem is, the Alea Gpu model relies heavily on simple arrays, it doesn’t take multi dimensional arrays.

// Rolf

Please do profile your code before you start adding optimisations. You’ll have to write proper profiling measurements anyway otherwise you will not be able to compare different versions of code. I recommend using System.Diagnostics.Stopwatch, it’s a very low overhead, very accurate way to measure the duration of some code.

1 Like

You’re absolutely right. I’m the last guy to dive into technical depth if I can get away with less. Been there, but not done that. :slight_smile:

But right now I’m evaluating strategies for when I actually need to speed things up. I’ve learned a lot in the last days, which means that I’m getting more and more capable when I hit the performance wall. I want to be able to go right through performance walls, when I encounter them. Otherwise the application I aim to build is at risk of not being practically useful.

So with this I’m learning how to shoot the moose before going out hunting the moose.

// Rolf

This array is flipped in the wrong direction (to be useful for Alea args, that is) and allocating also the size of the nested dimension, the last part in [3][list.Count], won’t compile.

This works though:

        double[][] points = new double[3][];
        points[0] = new double[m_Pts.Count];
        points[1] = new double[m_Pts.Count];
        points[2] = new double[m_Pts.Count];

But it’s not very fast.

// Rolf

That should actually be faster since you’re only constructing 4 arrays, not tens of thousands. You didn’t post the code that fills the values in these arrays. Is the performance measured for the entire process, instantiation only, or population only?

True. Allocating in this direction takes only 1ms. I had tested only in the other direction before, so yes, this is much faster. The copying code is similar to the code far above, but here’s the complete code:

    [GpuManaged]
    public static Point3d GPU_SolveInstance_R2(double[][] points)
    {
        var centroid = new Point3d();
        centroid.X = gpu.Average(points[0]);
        centroid.Y = gpu.Average(points[1]);
        centroid.Z = gpu.Average(points[2]);
        return centroid;
    }

    protected override void SolveInstance(IGH_DataAccess DA)
    {
        m_DA = DA;
        if ((InPts.Count == 0))
        {
            AddRuntimeMessage(0, "The input point list contains no data.");
            return;
        }

        long ticks = 0;
        runtimer = System.Diagnostics.Stopwatch.StartNew();

        // Create a nested array for the Gpu method
        double[][] points = new double[3][];
        points[0] = new double[m_Pts.Count];
        points[1] = new double[m_Pts.Count];
        points[2] = new double[m_Pts.Count];

        // Copy point list
        for (var i = 0; i < m_Pts.Count; ++i)
        {
            Point3d pt = m_Pts[i];
            points[0][i] = pt.X;
            points[1][i] = pt.Y;
            points[2][i] = pt.Z;
        }

        runtimer.Stop();
        ticks = runtimer.ElapsedMilliseconds;
        ticksList.Add("Time for malloc: " + ticks.ToString());

        // Restart Stopwatch for the Gpu processing.
        ticks = 0;
        runtimer = System.Diagnostics.Stopwatch.StartNew();

        DA.SetData(PortNo(OutPrt.C), GPU_SolveInstance_R2(points));

        runtimer.Stop();
        ticks = runtimer.ElapsedMilliseconds;
        ticksList.Add("Time for Gpu exec: " + ticks.ToString());
        DA.SetDataList(PortNo(OutPrt.Info), ticksList);
    }

I get 1ms resp ~7 ms, so I’m content with the array allocation speed of only 1ms.

Again, this is not a suitable case for using Gpu, only for testing of the data preparation.

// Rolf

Just a side note: The reason for separating the Gpu code into its own method is that Alea.Gpu injects some instrumentation code in order to automagically manage the memory and data transport back and fourth to the Gpu. This is restricted to only methods with the attribute [GpuManaged] and therefore the instrumentation code does not infest the rest of the code in the project.

// Rolf

Hi,

I intend to add new optimization algorithm to Grasshopper. Guide me, please.

best regard

I think the following thread contains some interesting info needed for very fast point lists/arrays.

// Rolf

Thank you.