Can a "cull duplicate objects" component be available in GH2?

In the mockup example below, there is sometime a need to have a “cull duplicate objects” component. Indeed, the beam lattice contains a lot of duplicate beams. The actual number of beams is not 2000.


cube array.gh (10.1 KB)

This component should be fast (n.log(n)) and robust to numerical approximation of machine computation.

“Cull duplicte object” components already exist (TT Toolbox or Milkbox) but they are not available for mac (TT toolbox) or not in n.log(n) (Milkbox).

For certain types of data (numbers, points, vectors, colours) the comparison will not be difficult and you can get away with a single tolerance value. For more complicated types it becomes a far more specific comparison with potentially loads of options.

For example line segments. Are lines that go in opposite directions still duplicates? What if you compare a line to a linear, two-segment polyline? They look the same, same end-points, but the polyline has an additional point in the middle. Are these two objects duplicates?

What about an arc and a nurbs-curve that has the same shape? The derivatives of those curves are definitely different, even though the shapes are duplicates. And this just gets more and more complicated as the types become more complex (do mesh quads vs. two-triangle n-gons count as duplicates? is the order of faces in a brep relevant? Should texture mapping be taken into account? How about validity, two breps can be identical within tolerance while one being considered valid and the other invalid. UserStrings and other meta-data?)

I don’t see a single component doing all that. And the runtime will be whatever it’s going to be. It might be as quick as O(n) for when no tolerances are involved, probably O(n \cdot log(n)) for stuff that can be compared using a kD-tree, maybe O(n^2) for worst-case comparisons of very complex data types.

The idea of a single generic component doing all culling operations is probably not applicable. Having a “cullDuplicate” component per would probably fit the need for most cases.

For instance, cullDuplicatePoints, cullDuplicateLines, cullDuplicateVector, cullDuplicateColors, …

In most cases, I do suspect this would be enough. I suppose an average user (1) know what type of object he needs to sort before he applies this component and (2) that the type is simple. Also, the parameters of each component would be easy to understand is it is specific.

There might be use cases when the type of object is complex and this solution would not apply. In this case, a generic component would be nice to have. But from my user’s point of view, I would have to accept to have to solve the case by a scripting solution (one reason, as you mentioned, is the problem of parameters complexity).

Post scriptum : if I am right, a GH project is a set of Acyclic Directed Graph (ADG). In the example above, there is one ADG and all the 2000 lines in the BoxArray originate from the same single “Line” component in the ADG. That points to a solution for this kind of settings. In this situation, we could solve the generic problem by stating that two objects A and B are the “same” if

  1. When you backtrack the ADG from A and B, A and B have a common ancestor component X
  2. the set of geometric transformations that go from X to A is the “same” that the geometric transformation that go from X to B

In this component “cullDuplicateTransformation”, you do not compare “objects” but you compare transformations that go from X to A and X to B. However, I have no idea if that would simplify the component design.

As David Rutten explained you will need different methods/functions implemented as c# component for different geometries with all specific features that different geometry type has.
Here is example of C# component for curves :
RemoveDuplicateCurves.gh (246.3 KB)

It uses code from previous post where are some explanations considering used algorithm:

It removes all duplicate curves, but it can be easily modified to keep one and remove the rest of duplicates.
R