Grasshopper deleting duplicate lines efficiently

Hi everyone, I attach a screen shot of a script which currentley takes almost 5 hours to remove duplicates. I know that with such a large amount of data it will take a while but wondered if anyoune had an idea of how to make it faster. Thanks Bran

Did you try to avoid the duplicate lines from the beginning ?

How many lines?

If you really expect someone to help you, please post a Rhino file with the lines or a Grasshopper file with the relevant inputs internalizedā€¦

128 K points from the image. And 4.8 hours :exclamation: No thanks. When I have to do this (very rarely), I use Cull Duplicates (points) using the MidPt (Curve Middle) of each line.

1 Like

You havenā€™t given us a definition of what you mean by equality or any examples.

Is a line (segment?) a duplicate if itā€™s close but not quite, or are the duplicates exact down to the last decimal place? If the second (fuzzy duplicates), how do you decide which one to delete?

In any event, Grasshopper probably isnā€™t the best tool for this unless you just write a one stage script component.

Suggestion:
Come up with a definition of equality. Phrase it as a >, <, equals comparison. If it were a test for point equality, pseudocode:

If pt1.x > pt2.x then return greater
if pt1.x < pt2.x then return lesser
if pt1.y > pt2.y then return greater
if pt1.y < pt2.y then return lesser
if pt1.z > pt2.z then return greater
if pt1.z < pt2.z then return lesser
return equal (because theyā€™re exactly the same)

Sort the lines based on that definition, using a existing library (Python, C#, etc.).

Duplicates will now be adjacent in the list, and there may be a series of more than two in a row. Walk through the sorted list and delete the items you donā€™t want, making any related changes along the way (such as merging fuzzy duplicates if you have a situation like one of two duplicates making the proper connection to another element on one end and the other having the proper connection point to another element on the other end).

If itā€™s just lines or segments, Iā€™ll bet you can get the duplicate processing down to seconds on reasonable hardware. I donā€™t know what you need to do in Rhino after you identify the duplicates, but this problem should be a cup of coffee tops, not a workday.

Difficult to see anything without a definition but a flattened list at that scale is always going to be slow.

I would probably try to maintain local Voronoi clusters within their own branches and check against duplicates locally.

But as another user pointed out, the best use of time would be studying if you can avoid creating duplicate lines in the first place. The answer is almost always yes.

Alternatively, rather than ā€œremoving duplicate linesā€ you could do some work with indices and list/cull per Voronoi cluster as needed. So you just choose not to list all possible lines (if a repeatable pattern is observable that isā€¦)

An example would be if you had 10,000 cubes returned as a brep wireframe you could list edges 1, 3, and 5 and just ignore the others. Easier to ā€œpopulateā€ lists of data going forward then subtract/compare looking backward (usually).

I would reduce your Voronoi count to a manageable number like 200 for testing, focus on an algorithm or workflow that does not create duplicate lines and then scale up from thereā€¦

P.S. With the midpoints hidden. one short red line (mistakenly deleted) shows up when you bump the ā€˜Toleranceā€™ slider from 0.004 to 0.005.

P.P.S. This cone is only one unit in height (the default) so it makes sense that tolerance is small?

At this point it seems a bit strange that you create voronoi cells and continue using proximity linksā€¦

Once you have the lines inside and outside your brep, what happens next?