A better approach replicating 'Curves in Curves'

Indeed that Method iis the best way to do that IF your regions are closed (or closable within tolerance) AND coplanar (within tolerance). Shown the R5 SDK ref:

Other than that is a classic Clustering task using the Region Containment as criterium (kinda like ccx events are used for Clustering intersecting curves or Pt to Pt Distance for Clustering points or … etc etc).

That said Clustering can contain more than one criterium meaning endless combos (for intstance you can Cluster crvs that intersect only 2 times and their length is greater than X and they are open and …).

Notify if you want a C# that does that (it’s rather fast as well).

…I see what you mean.

I don’t think I’ve explained what I’m trying to achieve particularly well. I’m basically ordering a very dense urban figure ground. Because I know the majority of the mass is built space I’m also confident that the inner ‘un-built’ spaces will have a smaller curve length.

So… when I said its not too critical I probably should have said its not too critical in this particular scenario. However, I agree that a comparison between outside and inside curve length may throw up issues in other applications.

Thanks Qythium,

This sounds a very promising solution but I have to fess up, I’m only familiar with GH components and have no experience with Rhino Common, is the code complex to write / edit?

The principle is hopefully straight forward… lets say thousands of polygons are randomly assigned different attributes, layer name, colour etc etc… My goal is make sure that all the inner polygons are assigned the same attributes as its parent ‘outer’ polygon. That’s it. They don’t need to be grouped in any other way apart from having the same attributes. So essentially its a (re)ordering exercise.

@qythium and @PeterFotiadis I would be really interested to know if a script component could help here as the GH component route is proving extremely slow for the number of curves i’m currently testing.

So you know, all curves are closed, coplanar and the inner polygons do not intersect with the external polygon.

Curve In Curve Test 2.gh (108.2 KB)

DO NOT attempt to do Clustering without knowledge of coding.

Other than that … the only thing worth considering here is to implement some proper // approach (big N of closed curves etc). It doesn’t matter if there’s some improper data around (they will be rejected).

I’ll prepare a small demo on that matter soon. And given the opportunity maybe I’ll provide some other cases (ccx, distance etc) that outline the whole hard Clustering concept (very common in engineering, mind).

PS: Hard means that any item belongs in exact one Cluster.

Here we are. A classic hard Clustering based in one condition (Region Containment). No proper Curve checks are included mind (meaning that the thing is 100% Academic):

Clusters_CurveContainment_V1.gh (225.8 KB)

The def is WIP since this is another typical case that begs for some // stuff. (try the 857 curves demo: Elapsed time goes to Mars). But what could be the best // approach? Answers: the Lord, District 9, North Pole.

More soon with 3 ways to skin the cat: // for no reason, // with reason, // because hope dies last .

1 Like

Assuming the curve/polyline data isn’t too messy, a simpler approach might be to make boundary surfaces (which essentially handles the grouping for you):

180419_BoundarySurfaces.gh (14.4 KB)

This probably won’t scale too well (i.e. will be slow), but should be fairly predictable/stable.

3 Likes

WHAT AN IDIOT: maybe some espresso (and/or cigar) related thingy, maybe just stupidity

Get the fixed V1 (the // V1A soon).

Clusters_CurveContainment_V1_Fixed.gh (236.6 KB)

Hmm, the boundary surface method does actually seem to scale surprisingly well/fast, at least with this circles data set (as compared to the region containment method I mean, and with this case):


180419_BoundarySurfaces_PerformanceTest_00.gh (242.1 KB)

1 Like

Indeed. But some // methods (WIP) give some signs of hope (time slashed by a factor of 10 [min] - 20 [max] for the moment).

My approach will be to create early exit condition before starting comparing curves for containment condition - see image:

If we consider special case C. then algorithm will be valid for any curve, not only polylines.

Question is if curves are polylines or they can be any arbitrary closed curve?

WIP - to be continued…

1 Like

Indeed that could broaden the usage of the thing (but this check is also expensive).

In the mean time V2 works 100% faster (prior implementing the 3 // Methods mentioned - no time found for that).

BTW cList is of type Curve (not Polyline) but dealing with curvy stuff is obviously slower. For instance and for rather the same N of curves the non curvy collection yields ~3 times faster results (1600 VS 5600 ms).

Clusters_CurveContainment_V2.gh (231.8 KB)

1 Like

Main bottleneck here is NxN checking (number of checks O(n2)).
Using RTree will speed up thing rapidly especially in case of large amount of curves.
Here is update method:
public void MakeClusters(List < Curve > cList, ref DataTree < Curve > curveClusters, ref DataTree < int > connClusters, int singleItemCluster)
{
Plane plane = Plane.WorldXY;

bool[] sampled = new bool[cList.Count];
int branch = 0;
double tol = RhinoDoc.ActiveDoc.ModelAbsoluteTolerance;

//RTree improvment - start
var boundingBoxes = cList.Select((objR, index) => new { index, objR }).ToDictionary(x => x.index, y => y.objR.GetBoundingBox(true));
RTree tree = new RTree();
for (int i = 0; i < boundingBoxes.Count; i++)
{
  tree.Insert(boundingBoxes[i], i);
}
DataTree<Curve> curveClustersLocal = new DataTree<Curve>();
DataTree<int> connClustersLocal = new DataTree<int>();

for (int i = 0; i < cList.Count; i++)
{
  if (sampled[i]) continue;
  Curve crvA = cList[i];
  var bboxA = boundingBoxes[i];
  tree.Search(bboxA, (to, ta) =>
    {
    int j = ta.Id;
    if (i != ta.Id && !sampled[j])
    {
      Curve crvB = cList[j];
      RegionContainment rg = Curve.PlanarClosedCurveRelationship(crvA, crvB, plane, tol);
      if (rg == RegionContainment.BInsideA)
      {
        if (!sampled[i]) UpdateTrees(crvA, ref curveClustersLocal, ref connClustersLocal, ref sampled, i, branch, true);
        if (!sampled[j]) UpdateTrees(crvB, ref curveClustersLocal, ref connClustersLocal, ref sampled, j, branch, true);
      }
    }
    });
  branch++;
}
curveClusters = curveClustersLocal;
connClusters = connClustersLocal;
//RTree improvment - end

RebuildTree<Curve>(ref curveClusters);
RebuildTree<int>(ref connClusters);

if(singleItemCluster == 2){
  int bCount = curveClusters.BranchCount;
  for(int i = 0; i < sampled.Length;i++){
    bool bs = sampled[i];
    if(!bs) {
      curveClusters.Add(cList[i], new GH_Path(bCount));
      connClusters.Add(i, new GH_Path(bCount));
      bCount++;
    }
  }
}

}

If shapes are polylines only, code can be rewritten and it would perform significantly faster.

Good stuff I confess. Bravo.

Clusters_CurveContainment_V2A.gh (342.4 KB)

Yeah I think that’s the main insight here - I’m sure @PeterFotiadis is aware of this, but optimizing a O(n²) algorithm even by a factor of 1000% becomes totally insignificant when you start scaling up the inputs, because the quadratic term very very quickly swallows up any improvements made in linear time.

Using the examples above, even a fast check that works in 500 ms for 857 curves will take 7+ hours for a 200K curve dataset :scream:

Using boundary surfaces also appears to suffer from the same O(n²) factor.

Essentially any method using an RTree, however inefficient, will eventually outperform a method involving n² number of checks, because it scales approximately linearly with the size of the data.

I had a go at making the most basic / concise implementation I could (using Python of course :slight_smile: )
curve_clusters.gh (113.2 KB)

from Rhino.Geometry import *
from Rhino.RhinoDoc import ActiveDoc as rhdoc
tol = rhdoc.ModelAbsoluteTolerance

ids = range(len(crvs))
# initialize with dictionary of sets containing its own key
# i.e all curves in separate clusters
clusters = {i: set([i]) for i in ids}

rtree = RTree()
for i, crv in enumerate(crvs):
    # populate the RTree with bounding boxes
    rtree.Insert(crv.GetBoundingBox(True), i)


def check_containment(crvA, crvB):
    # checks if either A contains B or B contains A
    # relatively expensive, so further optimizations can be made here eg. for polylines
    rg = Curve.PlanarClosedCurveRelationship(crvA, crvB, Plane.WorldXY, tol)
    return (rg in (RegionContainment.AInsideB, RegionContainment.BInsideA))


def callback(sender, e):
    a, b = e.Id, e.IdB
    if a == b:
        return
    if b in clusters[a] or a in clusters[b]:
        return
    if check_containment(crvs[a], crvs[b]):
        clusters[a].add(b)
        clusters[b].add(a)


RTree.SearchOverlaps(rtree, rtree, 0.01, callback) # O(n)


# iterate through the clusters from the highest to lowest index
for i in reversed(ids):
    # still O(n) despite nested for loop, because inner loop is ~O(1)
    for j in list(clusters[i]):  # create new list to avoid changing the iterable
        # merge with sets of a greater index
        if j > i:
            try:
                clusters[i].update(clusters.pop(j))
            except KeyError:  # j has already been popped
                pass

mapping = [0] * len(crvs)
for k, vals in clusters.items():
    for v in vals:
        mapping[v] = k

num_clusters = len(clusters)


5 Likes

Okay it looks like your C# is ~2x faster :sweat_smile: ( as expected )
I wonder if there’s a performance difference between using RTree.Search versus RTree.SearchOverlaps due to overhead from all the callbacks…

Math.Pow(hmm,hmm) … that’s for the P thingy (what about the Truth Out There ?, he he)

This thread started as any other thread and heads towards other things I confess.

Must clear my mind from doing Panigale custom maps (a useless thing considering the ponies available as standard) and fucus to the Truth Out There (a WIP thread exploiting the truth out there).

Screen%20Shot%20104

2 Likes

Great example there, bookmarked :clap:

Another improvement in special case when we know that curves do not intersect will be to check if curveB.PointAtStart is inside curveA.
Code inside tree.Search would look like:

  tree.Search(bboxA, (to, ta) =>
    {
    int j = ta.Id;
    if (i != ta.Id && !sampled[j])
    {
      Curve crvB = cList[j];
      Point3d ptStart = crvB.PointAtStart;
      var pointContainment = crvA.Contains(ptStart);
      if (pointContainment == PointContainment.Inside)
      {
        if (!sampled[i]) UpdateTrees(crvA, ref curveClustersLocal, ref connClustersLocal, ref sampled, i, branch, true);
        if (!sampled[j]) UpdateTrees(crvB, ref curveClustersLocal, ref connClustersLocal, ref sampled, j, branch, true);
      }
    }
    });

This will shorten time of execution almost to half.

It’s a question of trust the data (as always). But other than that I see that you are a man who walks the walk (as Gythium … if we forget that little detail, he he) > bravo again.

Clusters_CurveContainment_V2B.gh (341.3 KB)

BTW: It’s funny that even a primitive data validation check increases the Elapsed time about 10-15%

You know what? In fact we don’t need a RTree at all. Just BBoxes and some canditate BBox questions (but that is what a RTree does anyway [pluts the fact that does N*N comparisons]). Rounding the BBox domains (user controlled option: say in 2 decimals etc helps as well).

See : Mesh Inclusion Error and the useBox magic option. Not 100% the same thing but close. In fact I was ready to use a trick like this … but then you did the RTree thingy and the rest are history.

More in V3 (and some // stuff) … provited that Valentino could deliver some decent results in the Texas MotoGP (for inspiration) . V4 in the mean time would answer the following critical question: given 1M plates full with Swiss cheese and 1Z rats circling around how we can cluster the clouds in the sky on a per rat color basis?

Screen%20Shot%20110