Group points by distance in c#

Baris · May 6, 2019, 3:38am

Hi everyone,

I tried to cluster points by distance as the Point groups component does.
It is just for learning.

I am able to assign the points that are close enough to eachother to a branch, I got stuck assign the far away ones.

I think I maybe need to get first all distances and then compare against the treshhold- but I am not completly sure how to do this and if this would be the right way?

My attempt so far is this:

  private void RunScript(List<Point3d> pts, double tresh, ref object Indices, ref object Distances, ref object Groups)
  {
    //set up the trees to store the data
    var distances = new DataTree<double>();
    var groups = new DataTree<Point3d>();
    var indices = new DataTree<int>();

    //loop through every possible combination to get all the possible distances
    for (int i = 0; i < pts.Count; i++)
      for (int j = i + 1;j < pts.Count; j++)
      {
        double d = pts[i].DistanceTo(pts[j]);
        dists.Add(d);
        if (d < tresh) //add points and indices to tree if they are close enough together
        {
          groups.Add(pts[i], new GH_Path(i));
          groups.Add(pts[j], new GH_Path(i));
          indices.Add(i, new GH_Path(i));
          indices.Add(j, new GH_Path(i));
        }
      }

    Indices = indices;
    Distances = dists;
    Groups = groups;

  }

I am quite sure there is a better way, maybe using LINQ or a Hashset, but at the moment I would be ok with any/ a slow solution.

I hope someone could help and I described the problem so anybody can understand.

Regards!

File:20190505_GroupByDistance.gh (5.4 KB)

DavidRutten · May 6, 2019, 9:14am

LINQ is mostly syntactic sugar, it will allow you to write down your logic in a different way, but the cases where it outperforms just straight up loop code are few and far between.

HashSets will help you quickly find identical points, it will not help with finding nearby points.

Ignoring the possible optimisations for the time being, you first have to come up with a working approach. I suggest something along these lines:

Given a set of points P.
Pick and remove one point from P, put in into a group G and mark it as active p_x*.
For every remaining point p_x in P, measure the distance to all active points in G, and if one of those distances is less than your threshold, remove that point from P and add it to some temporary list T.
Mark all points in G inactive.
Move all points from T into G and mark them active.
Repeat until no points in P are found close enough to any points in G.

Example:

P=\{p_0, p_1, p_2, p_3, p_4,…, p_n\}, T=\{\}, G=\{\}
P=\{p_1, p_2, p_3, p_4,…, p_n\}, T=\{\}, G=\{p_0*\}
P=\{p_1, p_3,…, p_n\}, T=\{p_2, p_4,…\}, G=\{p_0*\}
P=\{p_1, p_3,…, p_n\}, T=\{p_2, p_4,…\}, G=\{p_0\}
P=\{p_1, p_3,…, p_n\}, T=\{\}, G=\{p_0, p_2*, p_4*,…\}
P=\{p_1, p_3,…, p_n\}, T=\{\}, G=\{p_0, p_2, p_4,…, p_k\}

Once you have distributed all points in P into one or more G collections and the results are correct, you can then start worrying about speeding it up.

Baris · May 7, 2019, 2:24am

Hi David,

thanks a lot for the detailed explanation and the manual!

I mentioned LINQ cause I remembered there was a distinct() method.
But as with the Hashset, just aplicable to group same values as I understand now.

I"ll come back in a few days after studying it in detail.Hopefully with some working code.

Just wanted to say a big thank you in the meantime.

Baris · May 8, 2019, 1:55am

Whilst I was searching another thing I found by coincindence a solution in this Thread: