RTree Point3dKNeighbors help

rawitscher-torres · August 15, 2019, 9:16am

Hi guys,

I am trying to implement Point3dKNeighbors() in the RTree class, but I dont seem to be understanding how it works, to bad is not such as straight forword to implement as Search() . There are 2 main problems that I am having happening.

When I change num to be larger than 1, I get the following error:

Requested more items than the quantity present in tree. (line: 0)
which I dont understand why its happening, I assume that this method internally will create an RTree with all the points to search.

Even if I assign num to 1, the return value, which are the indexes of the needle points is 0 but that index correspods to a point that is far away from the hay point. And here I ask another question why does this function ask for a collection of hay points? why not a single hay point to search from?

// hay = a series of points to search from
// needle = points to search for

int num = 1;

IEnumerable<int[]> t = RTree.Point3dKNeighbors(hay, needle, num);

foreach(var i in t)
{
  int [] index = i;

  for (int j = 0; j < index.Length; j++)
  {
    Print(index [j].ToString());
  }
}

Any clues on how to use this method properly would be great!

menno · August 15, 2019, 10:47am

Hi, I hope the demonstration below is useful. The hay is the overall collection of points and the needle(s) are the points to search. For each needle, num nearest neighbors are found.

protected override RunCommand(RhinoDoc doc, RunMode mode)
{
  Point3dList pts = new Point3dList();
  var r = new Random();
    // create 100 random points in the cube [0,0,0-1,1,1]
  for (int i = 0; i < 100; ++i)
  {
    pts.Add(r.NextDouble(), r.NextDouble(), r.NextDouble());
  }

  // define a number of needles to search nearest K neighbors for
  Point3d[] needles = new Point3d[]
  {
    new Point3d(0, 0, 0),
    new Point3d(1, 0, 0),
    new Point3d(0, 1, 0),
    new Point3d(0, 0, 1),
    new Point3d(1, 1, 0),
    new Point3d(1, 0, 1),
    new Point3d(0, 1, 1),
    new Point3d(1, 1, 1),
    new Point3d(0.5, 0.5, 0.5)

  };
  // perform the search. Each integer array consists of the 5 nearest neighbors of the
  // needle.
  int[][] found = RTree.Point3dKNeighbors(pts, needles, 5).ToArray();
  doc.Objects.AddPoints(needles);
  doc.Objects.AddPointCloud(pts);
  for (var i = 0; i < found.Length; i++)
  {
    int[] set = found[i];
    for (int j = 0; j < set.Length; ++j)
    {
      doc.Objects.AddLine(needles[i], pts[set[j]]);
    }
  }

  return Result.Success;
}

rawitscher-torres · August 15, 2019, 12:37pm

Ok Great, I see what was the problem, here is my version just for the sake of it:

 private void RunScript(List<Point3d> hay, List<Point3d> needles, ref object A)
  {
    IEnumerable<int[]> found = RTree.Point3dKNeighbors(hay, needles, 1);

    List<Point3d> result = new List<Point3d>();
    foreach (var item in found)
    {
      int[] data = item;
      for (int j = 0; j < data.Length; ++j)
      {
        result.Add( hay[data[j]]);
      }
    }
    
    A = result;
  }

But I do have a question, I also tested RTree.Search() but both are slower than the GH component Closest Points… I wonder why

menno · August 15, 2019, 1:05pm

~~There is overhead because your code must be compiled at runtime. Also, if I’m not mistaken, the GH code uses its own spatial searching algorithm and datastructures.~~

piac · August 15, 2019, 9:40pm

Ehm, not really. The overhead is entirely in the creation of the spacial structure, and this creates a larger advantage later. The creation itself is “slow”. While for small needles count, the GH algorithm choice is faster (in terms of milliseconds), as more points are requested, things change dramatically. GH simply performs a linear search over each needle point.

As needles and hay grow, the RTree.Point3dKNeighbors method quickly becomes faster than the Grasshopper components – and O(N^2) vs O( N*log N) so at that. For tens of thousands of points, this is so overwhelmingly faster that it’s hard to measure the improvement, because the CPs component simply takes too long*.

However, the RTree creation is costly, so for small numbers you see the linear search performing radically faster in terms of few milliseconds. I guess one could switch between the two depending on the amount of points searched.

Here, 4 minutes+ vs 16 seconds.

* If I change the above amount to 100’000 points, the CPs component above is expected to take about 20 minutes, while your C# (I tested) takes 45 seconds – on my system. As you can see, RTree.Point3dKNeighbors is written to crunch bigger amounts of points, while not consuming too much memory, such as when targeting point clouds and similar. For small amounts, a linear search will be faster.

Here the definition for your own testing.
actual-test.gh (10.3 KB)

Petras_Vestartas · August 18, 2019, 1:56am

What is a common fast method to search closest points by a given distance?

Dancergraham · August 18, 2019, 10:27am

Search sphere maybe?

https://developer.rhino3d.com/api/RhinoCommon/html/M_Rhino_Geometry_RTree_Search_3.htm

piac · August 19, 2019, 9:49am

What is “fast” depends from many additional conditions. A method that uses the RTree but is pre-built is RTree.Point3dClosestPoints.

rawitscher-torres · August 19, 2019, 11:33am

Hi Giulio,

Yep, I forgot about that very important detail and your example clearly shows it. The Closest Points component in Grasshopper will, in every case outperform RTree Closest Points if its a linear search O(n) but the magic happens when the search becomes O(n2) and this is what my initial example was proving.

@Petras_Vestartas if your search is linear, your fastest option will always be a traditional CP’s search, without any special data strcutures behind

rawitscher-torres · August 20, 2019, 7:20am

For the record, in terms of speed / performance I tested a search of 2 million points by comparing Rhinocommon’s RTree.Point3dClosestPoints to Accord.NET’s KDTree.Nearest and the comparison is just absurd. RTree took nearly 2 hours to complete the calculation while KDTree just 47.6 seconds!. I would say that probably even a few seconds less, because I had to internally do some data structure conversions to implement it.

piac · August 20, 2019, 9:06am

It would be great if you attached a repeatable test when mentioning something performance-related.

When you mention 2 million, do you mean 2 million (proverbial) haystack points, 2 million needles or 2 million divided among the two?

I general, as mentioned above, RTree construction often is the worst part of RTree usage, so you get the most from RTree searches where, ideally, you’d have “not so many” points in the haystack to be searched, and “a lot” of needles. There are other data structures that perform this with even slightly different patterns, especially for degenerate cases (where many points are at the same location, for example). See this:

https://www.bigocheatsheet.com/

arendvw · August 20, 2019, 9:37am

Can you show your inputs?

If I recall correctly, the closest point algorithm uses octrees to compute the closest point.

There’s different inputs for both. There’s no way of telling what your screenshot is doing, but in my previous experiments there was not such an enormous difference between the two.

I’m pretty convinced you’re running into performance issues other than testing RTrees vs KDtrees. (For example testing the C# input/output casting performance, which can be horrible for large numbers)

rawitscher-torres · August 20, 2019, 7:06pm

Yeah, I did not attach the file because the KdTree stuff is inside a library that I am developing within VS.

Sorry for not being clearer, what I mean was that there where 1 million hay points and 1 million needles in my test.

But the code for the KDTree is as follows:

  protected override void SolveInstance(IGH_DataAccess DA)
        {
            GH_Structure<GH_Number> _PointCloud = new GH_Structure<GH_Number>();
            GH_Structure<GH_Number> _testPoints = new GH_Structure<GH_Number>();
            int _num = 0;

            DA.GetDataTree(0, out _PointCloud);
            DA.GetDataTree(1, out _testPoints);
            DA.GetData(2, ref _num);

            DataTree<Point3d> result = SharpKDTree.Knearest(_PointCloud, _testPoints, _num);

            DA.SetDataTree(0, result);
        }

@arendvw here I take the opportunity to answer your question regarding the inputs. KDTree obviously will not take a Point3d because that type only lives within the Rhino world. It will ask for a more “generic” .NET type as input, so in this case a double [][] So I have to convert from a DataTree<Point3d> to DataTree<double> and then to a double [][] and to display the result in Rhino I convert the output in to Point3d again. But of course, I could only output the indexes if I wanted to, to save a couple of seconds.

Continuing with the rest of the code:

 /// <summary>
        /// Find K-nearest points from a collection of points to a cloud of points to search
        /// </summary>
        /// <param name="PointCloud"></param>
        /// <param name="testPoints"></param>
        /// <param name="num"></param>
        /// <returns>For every testPoint it will return K-neighbours </returns>
        ///     
        public static DataTree<Point3d> Knearest(GH_Structure<GH_Number> PointCloud, GH_Structure<GH_Number> testPoints, int num)
        {

            DataTree<Point3d> output = new DataTree<Point3d>();

            // Conversions between data structures 
            GH_Number[][] pointCloudTemp = PointCloud.GH_StructureToJaggedArray();

            double[][] pCloud = Utilities.ConvertGH_NumberToDouble(pointCloudTemp);

            GH_Number[][] testPointsTemp = testPoints.GH_StructureToJaggedArray();

            double[][] tPoints = Utilities.ConvertGH_NumberToDouble(testPointsTemp);

            KDTree<int> tree = KDTree.FromData<int>(pCloud);



            for (int i = 0; i < tPoints.Length; i++)
            {


                    GH_Path path = new GH_Path(i);
           
                    // Actually use KDTree's Nearest Neighbour Search
                    KDTreeNodeCollection<KDTreeNode<int>> neighbours = tree.Nearest(tPoints[i], num);

                    List<double[]> neighbourNodes = new List<double[]>();
                    for (int k = 0; k < neighbours.Count; k++)
                    {


                        KDTreeNode<int> node = neighbours[k].Node;
                        neighbourNodes.Add(node.Position);


                    }

    
                    foreach (var item in neighbourNodes)
                    {
                        double[] d = item;


                        output.Add(new Point3d(d[0], d[1], d[2]), path);

                    }
            }
            return output;
        }

So yeah, its pretty straight forward I am just implementing the Acord.NET KDTree.

http://accord-framework.net/docs/html/T_Accord_Collections_KDTree_1.htm#!

I am not sure if the GH ClosestPoints uses an Octree or not , but I highly doubt it from what I understood from Giulio’s comment above.

piac · August 20, 2019, 8:05pm

Thanks; I think there would definitely be some space for optimizations in the current RTree implementation.

OK, I’ll take your word for this right now. I would have liked to experiment a little, but it would take too long to reproduce this just for the sake of it right now.

Today it does not AFAICT, but there’s no fixed limitation regarding its implementation. @DavidRutten knows more.

rawitscher-torres · August 20, 2019, 8:18pm

I can package it up for you in a C# scripting component if you are really interested in taking a look.

DavidRutten · August 20, 2019, 8:19pm

True, building the tree takes longer than a single closest point search, so it only really makes sense to do this if you can re-use the tree lots of times.

piac · August 20, 2019, 8:33pm

Don’t worry; unfortunately I am too busy on another project right now – it will be cool to see this when you publish these results with everyone else.

rawitscher-torres · August 20, 2019, 8:48pm

Ok, cool, but what do you mean by publishing the results with everyone else?

arendvw · August 20, 2019, 10:11pm

I’ve created a quick implementation of both rtree (rhino’s) and the kdtrees (accord’s)

Code is here
GHA/Grasshopper files are here

Two different sets of points (100k points each).

@piac @rawitscher-torres There’s something freakish with the rtree methods when searching for a low amount of neighbours. (I’ve not experienced this before with the Rtree.Search methods).

It somehow speeds up dramatically when searching for more neighbours.

Speeds up 10x when searching for 5 neighbours each

… things seem to even out after 10 neighbours

Searching for a minimum of 5 points (and then looking for the n closest) seems speed things up quite a bit.

rawitscher-torres · August 21, 2019, 7:43am

Yeah its a good point that you are making with the RTree and I think its because of what @DavidRutten said

I am pretty sure that internally the method constructs a new RTree class per point its searching from. And I also think yout KDTree implementation has different performance due to your usage of LINQ + Lambda expressions, they tend to slow things down the road. But of course your version is sweet and short.

http://www.davejsaunders.com/2017/05/06/memory-leak-
lambdas.html#targetText=Lambda%20expressions%20are%20a%20great,end%20up%20on%20the%20heap.

I have still not done some comparisons myself, but its an ongoing topic with developers.
Try testing your code for 1,000,000 and see how it goes, would be interesting to see.

Topic		Replies	Views
Using rtree Scripting	5	4736	September 20, 2013
Python - RhinoList.PointCloudKNeighbors Grasshopper	6	468	September 11, 2020
Rtree and nearest neighbour searches in general (c#) Rhino Developer	3	1148	September 21, 2016
Field of data points Grasshopper Developer	17	1073	December 18, 2019
Matching equal point sets Rhino Developer	6	367	October 25, 2020

RTree Point3dKNeighbors help

Related topics