C# fastest way to add a list of vertices to a mesh

I am developing a plugin, which creates a mesh based on depth scans coming in from a Kinect v2 sensor. Here is a basic proof of concept in action.

Since the Kinect is spitting out 512x424 pixels at 30FPS, I need the meshing part to be as fast as possible. I therefore define all the faces only once and replace the list of vertices at every tick of the sensor.
mesh.Vertices.AddVertices(vertices);
where vertices is an array of Point3d:
Point3d[] vertices;
This still takes a relatively long time (around 17 ms on my quad core machine).

Any ideas on how to speed this up? While researching the topic I found this post by @stevebaer.

This got me thinking - essentially what I need to do is change the Z value of each Point3d, the X and Y coordinates remain unchanged. Rather than replacing the entire list of vertices 30 times a second, can I just swap their respective Z coordinates in an unsafe way?

I was curious about this so did some benchmarks.

  System.Diagnostics.Stopwatch StopWatch;
  Mesh QuadMesh;
  Random Rnd;

Method 1: computes in 10-25ms on my average ultrabook

    int w = 512;
    int h = 424;
    if (QuadMesh == null || Reset)
    {
      StopWatch = new System.Diagnostics.Stopwatch();
      Rnd = new Random();
      // Config
      QuadMesh = new Mesh();
      QuadMesh.Vertices.Capacity = w * h;
      QuadMesh.Faces.Capacity = (w - 1) * (h - 1);
      QuadMesh.Vertices.UseDoublePrecisionVertices = false;

      // Vertices
      for (int y = 0;y < h;y++)
        for (int x = 0;x < w;x++)
          QuadMesh.Vertices.Add(x, y, 0);

      // Faces
      for (int y = 1;y < h; y++)
        for (int x = 1;x < w; x++)
          QuadMesh.Faces.AddFace(y * w + x, y * w + x - 1, (y - 1) * w + x - 1, (y - 1) * w + x);
    }


    StopWatch.Restart();
   for (int x = 0;x < w;x++)
    {
      for (int y = 0;y < h;y++)
      {
        int v = y * w + x;
        QuadMesh.Vertices.SetVertex(v, x, y, Rnd.NextDouble() * 10, false);
      }
    }
    Print(StopWatch.ElapsedMilliseconds.ToString());

    MeshOut = QuadMesh;

Method 2: Updates the mesh in 8-15ms. This method is a great deal more consistent whereas the previous method fluctuates quite a lot.

  Point3f[] CurrentVertexList;

    int w = 512;
    int h = 424;
    if (QuadMesh == null || Reset)
    {
      StopWatch = new System.Diagnostics.Stopwatch();
      Rnd = new Random();
      // Config
      QuadMesh = new Mesh();
      QuadMesh.Vertices.Capacity = w * h;
      QuadMesh.Faces.Capacity = (w - 1) * (h - 1);
      QuadMesh.Vertices.UseDoublePrecisionVertices = false;

      CurrentVertexList = new Point3f[w * h];
      // Vertices
      for (int y = 0;y < h;y++)
        for (int x = 0;x < w;x++)
          CurrentVertexList[y * w + x] = new Point3f(x, y, 0);

      // Faces
      for (int y = 1;y < h; y++)
        for (int x = 1;x < w; x++)
          QuadMesh.Faces.AddFace(y * w + x, y * w + x - 1, (y - 1) * w + x - 1, (y - 1) * w + x);
    }


    for (int x = 0;x < w;x++)
    {
      for (int y = 0;y < h;y++)
      {
        int v = y * w + x;
        CurrentVertexList[v] = new Point3f(x,y,(float)Rnd.NextDouble()*10);
      }
    }
    
    StopWatch.Restart();
    
    QuadMesh.Vertices.Clear();
    QuadMesh.Vertices.AddVertices(CurrentVertexList);
    Print(StopWatch.ElapsedMilliseconds.ToString());

    MeshOut = QuadMesh;

Method 3: Replaces array items one by one. This was the slowest method at 25-40ms

    for (int x = 0;x < w;x++)
    {
      for (int y = 0;y < h;y++)
      {
        int v = y * w + x;
        QuadMesh.Vertices[v] = new Point3f(x,y,(float)Rnd.NextDouble()*10);
      }
    }

So in this case Method 2 provided the most consistent fast results.

One very important thing to keep in mind is that the most taxing part of your pipeline is likely to be the display. We’ve found that it’s rare to get Grasshopper running at more than 20-30fps while Rhino’s viewports are drawing (regardless of computation complexity). When Redraw is disabled the framerate dramatically increases. This effect is massively increased when working with NURBS too, that that should not be relevant here.
Of course running on better hardware helps, and so does avoiding 4K screens (in Rhino 6) as there’s not currently a way to run a lower resolution viewport (unfortunately).

An easy way to check for your scenario is just to put a stopwatch in your update loop. For example, in the example above (method 3) I get the following:

image

As you can see the computation time is 33ms, but the time between updates is significantly larger at 214ms (total). This is because of both rendering the viewports (with the large mesh) and rendering the Grasshopper canvas and Rhino window.

If I disable rendering of the Rhino viewport I get:

image

Which is clearly suggesting that 75% of the total frame time is going to rendering the viewport, with say 15% going toward rendering grasshopper, the canvas, and running the solver, and only 10% of that being the actual mesh computation. This makes logical sense to me, because I’m running on an iGPU and a 4K display and drawing a (new) mesh of 217,088 vertices every frame. (New in the sense that the GPU is updating the vertex list each frame).

So, TL;DR: It’s likely that a great deal of your overhead isn’t in the computation but rather in the display.

Thanks @camnewnham!

Your tests confirm our experience as well. The fastest and most reliable way to replace vertices in a mesh is to do it all at once via Mesh.Vertices.AddVertices().
Measured with System.Diagnostics.Stopwatch() it updates in around 15-17ms on my machine. Please refer to the original post for a screen grab showing the time spent by the logic on individual steps in the computation.

Our function looks like this:

public static Mesh CreateQuadMesh(Mesh mesh, Point3d[] vertices, Color[] colors, int xStride, int yStride)
    {
        int xd = xStride;       // The x-dimension of the data
        int yd = yStride;       // They y-dimension of the data


        if (mesh.Faces.Count != (xStride - 2) * (yStride - 2)) // reset if user trims the mesh
        {
            mesh = new Mesh();
            mesh.Vertices.Capacity = vertices.Length;      // Don't resize array
            mesh.Vertices.UseDoublePrecisionVertices = true;
            mesh.Vertices.AddVertices(vertices);       

            for (int y = 1; y < yd - 1; y++)       // Iterate over y dimension
            {
                for (int x = 1; x < xd - 1; x++)       // Iterate over x dimension
                {
                    int i = y * xd + x;
                    int j = (y - 1) * xd + x;

                    mesh.Faces.AddFace(j - 1, j, i, i - 1);
                }
            }
        }
        else
        {
            mesh.Vertices.Clear();
            mesh.Vertices.UseDoublePrecisionVertices = true; 
            mesh.Vertices.AddVertices(vertices);       
        }

        if (colors.Length > 0) // Colors only provided if the mesh style permits
        {
            mesh.VertexColors.SetColors(colors); 
        }
        return mesh;
    }

All faces are defined only once or if user changes the extent of the scanned mesh. On each tick, we pass a new array of vertices and replace it all at once in the mesh definition. For whatever reason, this only works with double precision vertices, using Point3f results in an invalid mesh.

The above logic is OK, it allows us to store a pre-calculated list of faces and replace just the vertices. What I would like to do, is optimize it even further, though. The only difference between ticks is essentially the Z coordinate of each vertex, which we get from the Kinect sensor. So rather than replacing the entire list of vertices, I would like to access the existing ones residing somewhere in memory and replace the Z coordinate of each of them.

@stevebaer posted the following logic in another thread, which reads from the VertexPoint3fArray in an unsafe way.

My question is how to write to it.

static unsafe double[][] FastVerts(Mesh mesh)
{
using (var meshAccess = mesh.GetUnsafeLock(false))
{
    int arrayLength;
    Point3f* points = meshAccess.VertexPoint3fArray(out arrayLength);
    var double2DArray = new double[3][];
    double2DArray[0] = new double[arrayLength];
    double2DArray[1] = new double[arrayLength];
    double2DArray[2] = new double[arrayLength];
    for( int i=0; i<arrayLength; i++ )
    {
        double2DArray[0][i] = points->X;
        double2DArray[1][i] = points->Y;
        double2DArray[2][i] = points->Z;
        points++;
    }
    return double2DArray;
}
}

If you do this in the Rhino C++ API it may be 4X to 10X faster than going thru Rhinocommon as you can use one memcpy operation to replace all the vertices in one operation that runs at 3GB/sec. To access the Rhino C++ API I use Python to call a DLL created using a modified version of the Visual Studio Plug In creation flow. If you are interested in pursuing this approach, I can send you the details that I got from Dale at McNeel. I used this method to improve my pointcloud import speed 250X compared to Rhino’s Import tool. What was a 1 hour import time for a 333M pointcloud with 14GB of data is now 14 sec. The data is read from the text file, parsed into values (X,Y,Z,R,G,B for each line) and then stored in an xyz array and colors array as 32-bit integer (0xaabbggrr). The xyz array stores the points (vertices in your case) as x1,y1,z1,x2,y2,z2… so they look like a sequence of Point3d values. This then allow memcpy to be used to copy them into the point array of the pointcloud without a separate step to convert x,y,z to Point3d. The memcpy line looks like: memcpy dest src bytes
Where dest points to the start of the pointcloud point array and src points to the start of the xyz array.

I started with Python + rs. Then went to Python + Rhinocommon + DLL without access to Rhino functions and now am using Python + DLL with C++ API. I get to use Python for very easy definition of the overall flow and the tap into the C++ API to get fantastic speed. I just started doing this this week and I am making giant improvements in the performance of all my scripts.

Regards,
Terry.

2 Likes

Sounds very interesting, @Terry_Chappell. I would appreciate it if you could share the description from Dale.

My current workflow is fairly standard, I guess -> C# + Rhinocommon + Grasshopper. Only recently did I discover the unsafe class and a more general concept of direct access to memory. But if performance is key, maybe the C++ api is the way to go.

One downer is that even though the vertices can be updated in the geometry of the mesh very quickly,
the mesh has to be replaced in the document to see the results and this can be much slower. So you may not see much improvement over what you have now. But I think it is worth trying once.

My work has been on a Windows machine using Microsoft Visual Studio 2017. Do you have a Windows machine? If so I could put together a DLL that includes all the C++ API calls and you could call this from a Python script. This way you would not have to learn how to get Visual Studio running to make the DLL and test it out the first time. Then if your testing pans out (or if you just want to anyway) then you could go thru the full 9 yards of getting Visual Studio working with the C++ API. It took me less than 30 minutes to go from a vanilla DLL to one with Rhino C++ API calls but I already knew how to make DLL’s with Visual Studio.

For a Mac, a dylib needs to be generated which should be doable but I have no experience in doing this.

Regards,
Terry.

I’d be interested in the details of how this is done as well actually… If possible? Sounds pretty useful!

Just some remarks. If you are using VS2017 to write C++ Code chances are high you are using the managed .Net framework just as C#. That makes the speed quite equal. As others pointed out, you can access memory directly by using the unsafe compiler flag in C#. You can also invoke from C++ dlls with p/invocation. Just as Rhinocommon does. If you do it from Ironpython then you get the same overhead like doing it in C#. I don‘t know for what reason a C++ should reach a speed performance by 10x under this conditions, but I would also like to find out. Because I think quite often speed comparisons are not quite objective, since it really is a matter of how you measure, and in which language someone is better at. Still don‘t negate the fact that writing optimal performing code in C++ is simpler and and that the language itself provides better performance per se, I just believe its rather in range of 1.5x to 2.0x at best.

You make good points about C#. I know nothing about C# possible performance when interfacing with Rhino procedures. What I do know is that with the Rhino C++ API I can access Rhino data structures at 2 to 4 GB/sec which allows me to import & export .OBJ meshes and .TXT pointclouds 20 to 250 times faster that Rhino’s Import/Export tool. So far no one has demonstrated that a C# script or plugin can import a 333M point colored pointcloud file with 14GB of data in 14 sec or a 18M faces mesh file with 2 GB of data in 10 sec. These tasks require 1 hour and 200 sec respectively with Rhino Import/Export. Soon I will be posting the Python code and C++ API DLL for these and you can have a go with C# to see what is possible. It will be very beneficial if you can show how C# can achieve a similar level of performance as it could benefit the many Rhino users who employ C# in their custom GH components.

Regards,
Terry.

OK, I’ve played a bit with the unsafe access to the VertexPoint3fArray it does speed things up significantly. In my use case of 512x424 pixels, processing time goes from 17 to <1 ms. That’s great, but it also breaks the mesh in the process :slight_smile:

As mentioned earlier - even though I’m inputting an array of Point3f as vertex list, I still need to use double precision vertices for the component to generate a valid mesh. It seems that operating directly on the VertexPoint3fArray overrides the double precision flag.

Any ideas on how to fix this?

mesh.Vertices.UseDoublePrecisionVertices = true;

                unsafe
                {
                    using (var meshAccess = mesh.GetUnsafeLock(true))
                    {
                        int arrayLength;
                        Point3f* points = meshAccess.VertexPoint3fArray(out arrayLength);
                        for (int i = 0; i < arrayLength; i++)
                        {
                            points->X = vertices[i].X;
                            points->Y = vertices[i].Y;
                            points->Z = vertices[i].Z;
                            points++;
                        }
                        mesh.ReleaseUnsafeLock(meshAccess);
                    }  
                }

Hi @mrhe,

So, setting UseDoublePrecisionVertices = false; before running your code will not work?

Some background: I’ve added the GetUnsafeLock() method in Rhino 5/WIP times, when the Mesh class did not yet have complete double precision vertices support. Right now, you can only get or set the single precision vertices array through this method and therefore UseDoublePrecisionVertices has to be set to false before usage. If you are so hard-pressed for speed (real-time visualization, etc), you will likely still be better off using the single precision array anyways.

I added a request for VertexPoint3dArray() to our tracking system.

Is this explanation helpful?


EDIT: Also, very important! When using this method in a plug-in, you should always check that the return is not null.

Point3f* points = meshAccess.VertexPoint3fArray(out arrayLength);
if (points == null) DoSomethingElse();

because in the future there might be meshes without the single precision array.

HI @piac. Thanks a lot for a thorough explanation. Unfortunately setting UseDoublePrecisionVertices = false; results in an invalid mesh. It is not necessarily related to the unsafe access since it is happening also with my previous meshing logic. The only reason why I switched to Point3d and UseDoublePrecisionVertices was because this magically resulted in valid meshes.

I don’t remember the exact error message I was getting with Point3f but it was something about topology vertices not being aligned. Same code after switching to Point3d works, though.

@mrhe this needs some code to be tested. Can you also tell the exact error message, please?
My suspicion is that some vertices are extremely near, and make some faces invalid when converted to Point3f. But I cannot be sure without the message.

Is this method of unsafe access to the mesh in C# allowing the geometry of the mesh in the doc to be changed without requiring that the entire mesh be replaced? I thought this was never allowed in Rhino.

No, this method provides pointer access to the float array of a mesh that is not part of the document. Once the mesh is in the document you should not be directly modifying it as this breaks undo.

So this is similar to the Rhino C++ API code I have been using. Is it possible in C# to use what you lead me to use in C++

memcpy dest src bytes

where dest = mesh.m_V.Array() and src = xyz array where xyz = x1,y1,z1,x2,y2,z2…
So that the extra step of dealing with Point3f can be eliminated?

You said

Should not be directly modifying it

I though it was not possible to modify it. My attempts to directly modify the vertices of a mesh in the doc have all failed in C++ but the same code works on the geometry of a new mesh not yet added to the doc. Perhaps there is some way to modify the doc’s mesh geometry using void pointers or some other trick? Probably better for me not to know but I must ask in case someone is doing this and I need to keep pace.

Yes; should not and can not are different things. It is not advisable, but you can const_cast a mesh in the doc and modify it. This is getting off topic of accessing vertices quickly from C#

I was just thinking that the fastest way to modify existing vertices in the mesh is to modify the existing vertices in the mesh. This seems on topic, no? Or maybe you are saying this is not possible in C#, only in C++?

I’m sorry, I thought we were heading off into C++ techniques for const casting meshes in the document. I would recommend reading some articles online about the purpose of const in C++ which may help give a better understanding of the intent of some of our C++ SDK functions.

All of this is possible in both C++ and C# and the speeds are relatively close to each other. Our import and export plug-ins haven’t traditionally taken performance as the top priority when they were developed and instead try to examine and deal with all of the potential junk that may be coming in or going out. The point cloud import plug-in is written in C++