Yeah I think that’s the main insight here - I’m sure @PeterFotiadis is aware of this, but optimizing a O(n²) algorithm even by a factor of 1000% becomes totally insignificant when you start scaling up the inputs, because the quadratic term very very quickly swallows up any improvements made in linear time.

Using the examples above, even a fast check that works in 500 ms for 857 curves will take 7+ hours for a 200K curve dataset

Using boundary surfaces also appears to suffer from the same O(n²) factor.

Essentially *any* method using an RTree, however inefficient, will eventually outperform a method involving n² number of checks, because it scales approximately linearly with the size of the data.

I had a go at making the most basic / concise implementation I could (using Python of course )

curve_clusters.gh (113.2 KB)

```
from Rhino.Geometry import *
from Rhino.RhinoDoc import ActiveDoc as rhdoc
tol = rhdoc.ModelAbsoluteTolerance
ids = range(len(crvs))
# initialize with dictionary of sets containing its own key
# i.e all curves in separate clusters
clusters = {i: set([i]) for i in ids}
rtree = RTree()
for i, crv in enumerate(crvs):
# populate the RTree with bounding boxes
rtree.Insert(crv.GetBoundingBox(True), i)
def check_containment(crvA, crvB):
# checks if either A contains B or B contains A
# relatively expensive, so further optimizations can be made here eg. for polylines
rg = Curve.PlanarClosedCurveRelationship(crvA, crvB, Plane.WorldXY, tol)
return (rg in (RegionContainment.AInsideB, RegionContainment.BInsideA))
def callback(sender, e):
a, b = e.Id, e.IdB
if a == b:
return
if b in clusters[a] or a in clusters[b]:
return
if check_containment(crvs[a], crvs[b]):
clusters[a].add(b)
clusters[b].add(a)
RTree.SearchOverlaps(rtree, rtree, 0.01, callback) # O(n)
# iterate through the clusters from the highest to lowest index
for i in reversed(ids):
# still O(n) despite nested for loop, because inner loop is ~O(1)
for j in list(clusters[i]): # create new list to avoid changing the iterable
# merge with sets of a greater index
if j > i:
try:
clusters[i].update(clusters.pop(j))
except KeyError: # j has already been popped
pass
mapping = [0] * len(crvs)
for k, vals in clusters.items():
for v in vals:
mapping[v] = k
num_clusters = len(clusters)
```