RTree Point3dKNeighbors help

Hi guys,

I am trying to implement Point3dKNeighbors() in the RTree class, but I dont seem to be understanding how it works, to bad is not such as straight forword to implement as Search() . There are 2 main problems that I am having happening.

  1. When I change num to be larger than 1, I get the following error:

Requested more items than the quantity present in tree. (line: 0)
which I dont understand why its happening, I assume that this method internally will create an RTree with all the points to search.

  1. Even if I assign num to 1, the return value, which are the indexes of the needle points is 0 but that index correspods to a point that is far away from the hay point. And here I ask another question why does this function ask for a collection of hay points? why not a single hay point to search from?

// hay = a series of points to search from
// needle = points to search for

int num = 1;

IEnumerable<int[]> t = RTree.Point3dKNeighbors(hay, needle, num);

foreach(var i in t)
{
  int [] index = i;

  for (int j = 0; j < index.Length; j++)
  {
    Print(index [j].ToString());
  }
}

Any clues on how to use this method properly would be great!

Hi, I hope the demonstration below is useful. The hay is the overall collection of points and the needle(s) are the points to search. For each needle, num nearest neighbors are found.

protected override RunCommand(RhinoDoc doc, RunMode mode)
{
  Point3dList pts = new Point3dList();
  var r = new Random();
    // create 100 random points in the cube [0,0,0-1,1,1]
  for (int i = 0; i < 100; ++i)
  {
    pts.Add(r.NextDouble(), r.NextDouble(), r.NextDouble());
  }

  // define a number of needles to search nearest K neighbors for
  Point3d[] needles = new Point3d[]
  {
    new Point3d(0, 0, 0),
    new Point3d(1, 0, 0),
    new Point3d(0, 1, 0),
    new Point3d(0, 0, 1),
    new Point3d(1, 1, 0),
    new Point3d(1, 0, 1),
    new Point3d(0, 1, 1),
    new Point3d(1, 1, 1),
    new Point3d(0.5, 0.5, 0.5)

  };
  // perform the search. Each integer array consists of the 5 nearest neighbors of the
  // needle.
  int[][] found = RTree.Point3dKNeighbors(pts, needles, 5).ToArray();
  doc.Objects.AddPoints(needles);
  doc.Objects.AddPointCloud(pts);
  for (var i = 0; i < found.Length; i++)
  {
    int[] set = found[i];
    for (int j = 0; j < set.Length; ++j)
    {
      doc.Objects.AddLine(needles[i], pts[set[j]]);
    }
  }

  return Result.Success;
}
2 Likes

Ok Great, I see what was the problem, here is my version just for the sake of it:

 private void RunScript(List<Point3d> hay, List<Point3d> needles, ref object A)
  {
    IEnumerable<int[]> found = RTree.Point3dKNeighbors(hay, needles, 1);

    List<Point3d> result = new List<Point3d>();
    foreach (var item in found)
    {
      int[] data = item;
      for (int j = 0; j < data.Length; ++j)
      {
        result.Add( hay[data[j]]);
      }
    }
    
    A = result;
  }

But I do have a question, I also tested RTree.Search() but both are slower than the GH component Closest Points… I wonder why

There is overhead because your code must be compiled at runtime. Also, if I’m not mistaken, the GH code uses its own spatial searching algorithm and datastructures.

1 Like

Ehm, not really. The overhead is entirely in the creation of the spacial structure, and this creates a larger advantage later. The creation itself is “slow”. While for small needles count, the GH algorithm choice is faster (in terms of milliseconds), as more points are requested, things change dramatically. GH simply performs a linear search over each needle point.

As needles and hay grow, the RTree.Point3dKNeighbors method quickly becomes faster than the Grasshopper components – and O(N^2) vs O( N*log N) so at that. For tens of thousands of points, this is so overwhelmingly faster that it’s hard to measure the improvement, because the CPs component simply takes too long*.

However, the RTree creation is costly, so for small numbers you see the linear search performing radically faster in terms of few milliseconds. I guess one could switch between the two depending on the amount of points searched.

Here, 4 minutes+ vs 16 seconds.

* If I change the above amount to 100’000 points, the CPs component above is expected to take about 20 minutes, while your C# (I tested) takes 45 seconds – on my system. As you can see, RTree.Point3dKNeighbors is written to crunch bigger amounts of points, while not consuming too much memory, such as when targeting point clouds and similar. For small amounts, a linear search will be faster.

Here the definition for your own testing.
actual-test.gh (10.3 KB)

3 Likes

What is a common fast method to search closest points by a given distance?

Search sphere maybe?

https://developer.rhino3d.com/api/RhinoCommon/html/M_Rhino_Geometry_RTree_Search_3.htm

What is “fast” depends from many additional conditions. A method that uses the RTree but is pre-built is RTree.Point3dClosestPoints.

Hi Giulio,

Yep, I forgot about that very important detail and your example clearly shows it. The Closest Points component in Grasshopper will, in every case outperform RTree Closest Points if its a linear search O(n) but the magic happens when the search becomes O(n2) and this is what my initial example was proving.

@Petras_Vestartas if your search is linear, your fastest option will always be a traditional CP’s search, without any special data strcutures behind

1 Like

For the record, in terms of speed / performance I tested a search of 2 million points by comparing Rhinocommon’s RTree.Point3dClosestPoints to Accord.NET’s KDTree.Nearest and the comparison is just absurd. RTree took nearly 2 hours to complete the calculation while KDTree just 47.6 seconds!. I would say that probably even a few seconds less, because I had to internally do some data structure conversions to implement it.

1 Like

It would be great if you attached a repeatable test when mentioning something performance-related.

When you mention 2 million, do you mean 2 million (proverbial) haystack points, 2 million needles or 2 million divided among the two?

I general, as mentioned above, RTree construction often is the worst part of RTree usage, so you get the most from RTree searches where, ideally, you’d have “not so many” points in the haystack to be searched, and “a lot” of needles. There are other data structures that perform this with even slightly different patterns, especially for degenerate cases (where many points are at the same location, for example). See this:

https://www.bigocheatsheet.com/

Can you show your inputs?

If I recall correctly, the closest point algorithm uses octrees to compute the closest point.

There’s different inputs for both. There’s no way of telling what your screenshot is doing, but in my previous experiments there was not such an enormous difference between the two.

I’m pretty convinced you’re running into performance issues other than testing RTrees vs KDtrees. (For example testing the C# input/output casting performance, which can be horrible for large numbers)

Yeah, I did not attach the file because the KdTree stuff is inside a library that I am developing within VS.

Sorry for not being clearer, what I mean was that there where 1 million hay points and 1 million needles in my test.

But the code for the KDTree is as follows:

  protected override void SolveInstance(IGH_DataAccess DA)
        {
            GH_Structure<GH_Number> _PointCloud = new GH_Structure<GH_Number>();
            GH_Structure<GH_Number> _testPoints = new GH_Structure<GH_Number>();
            int _num = 0;

            DA.GetDataTree(0, out _PointCloud);
            DA.GetDataTree(1, out _testPoints);
            DA.GetData(2, ref _num);

            DataTree<Point3d> result = SharpKDTree.Knearest(_PointCloud, _testPoints, _num);

            DA.SetDataTree(0, result);
        }

@arendvw here I take the opportunity to answer your question regarding the inputs. KDTree obviously will not take a Point3d because that type only lives within the Rhino world. It will ask for a more “generic” .NET type as input, so in this case a double [][] So I have to convert from a DataTree<Point3d> to DataTree<double> and then to a double [][] and to display the result in Rhino I convert the output in to Point3d again. But of course, I could only output the indexes if I wanted to, to save a couple of seconds.

Continuing with the rest of the code:

 /// <summary>
        /// Find K-nearest points from a collection of points to a cloud of points to search
        /// </summary>
        /// <param name="PointCloud"></param>
        /// <param name="testPoints"></param>
        /// <param name="num"></param>
        /// <returns>For every testPoint it will return K-neighbours </returns>
        ///     
        public static DataTree<Point3d> Knearest(GH_Structure<GH_Number> PointCloud, GH_Structure<GH_Number> testPoints, int num)
        {

            DataTree<Point3d> output = new DataTree<Point3d>();

            // Conversions between data structures 
            GH_Number[][] pointCloudTemp = PointCloud.GH_StructureToJaggedArray();

            double[][] pCloud = Utilities.ConvertGH_NumberToDouble(pointCloudTemp);

            GH_Number[][] testPointsTemp = testPoints.GH_StructureToJaggedArray();

            double[][] tPoints = Utilities.ConvertGH_NumberToDouble(testPointsTemp);

            KDTree<int> tree = KDTree.FromData<int>(pCloud);



            for (int i = 0; i < tPoints.Length; i++)
            {


                    GH_Path path = new GH_Path(i);
           
                    // Actually use KDTree's Nearest Neighbour Search
                    KDTreeNodeCollection<KDTreeNode<int>> neighbours = tree.Nearest(tPoints[i], num);

                    List<double[]> neighbourNodes = new List<double[]>();
                    for (int k = 0; k < neighbours.Count; k++)
                    {


                        KDTreeNode<int> node = neighbours[k].Node;
                        neighbourNodes.Add(node.Position);


                    }

    
                    foreach (var item in neighbourNodes)
                    {
                        double[] d = item;


                        output.Add(new Point3d(d[0], d[1], d[2]), path);

                    }
            }
            return output;
        }

So yeah, its pretty straight forward I am just implementing the Acord.NET KDTree.

http://accord-framework.net/docs/html/T_Accord_Collections_KDTree_1.htm#!

I am not sure if the GH ClosestPoints uses an Octree or not , but I highly doubt it from what I understood from Giulio’s comment above.

Thanks; I think there would definitely be some space for optimizations in the current RTree implementation.

OK, I’ll take your word for this right now. I would have liked to experiment a little, but it would take too long to reproduce this just for the sake of it right now.

Today it does not AFAICT, but there’s no fixed limitation regarding its implementation. @DavidRutten knows more.

I can package it up for you in a C# scripting component if you are really interested in taking a look.

True, building the tree takes longer than a single closest point search, so it only really makes sense to do this if you can re-use the tree lots of times.

Don’t worry; unfortunately I am too busy on another project right now – it will be cool to see this when you publish these results with everyone else.

Ok, cool, but what do you mean by publishing the results with everyone else?

I’ve created a quick implementation of both rtree (rhino’s) and the kdtrees (accord’s)

Code is here
GHA/Grasshopper files are here

Two different sets of points (100k points each).

@piac @rawitscher-torres There’s something freakish with the rtree methods when searching for a low amount of neighbours. (I’ve not experienced this before with the Rtree.Search methods).

It somehow speeds up dramatically when searching for more neighbours.

Speeds up 10x when searching for 5 neighbours each

… things seem to even out after 10 neighbours

Searching for a minimum of 5 points (and then looking for the n closest) seems speed things up quite a bit.

1 Like

Yeah its a good point that you are making with the RTree and I think its because of what @DavidRutten said

I am pretty sure that internally the method constructs a new RTree class per point its searching from. And I also think yout KDTree implementation has different performance due to your usage of LINQ + Lambda expressions, they tend to slow things down the road. But of course your version is sweet and short.

http://www.davejsaunders.com/2017/05/06/memory-leak-
lambdas.html#targetText=Lambda%20expressions%20are%20a%20great,end%20up%20on%20the%20heap.

I have still not done some comparisons myself, but its an ongoing topic with developers.
Try testing your code for 1,000,000 and see how it goes, would be interesting to see.