Hi everyone, I attach a screen shot of a script which currentley takes almost 5 hours to remove duplicates. I know that with such a large amount of data it will take a while but wondered if anyoune had an idea of how to make it faster. Thanks Bran
Did you try to avoid the duplicate lines from the beginning ?
How many lines?
If you really expect someone to help you, please post a Rhino file with the lines or a Grasshopper file with the relevant inputs internalizedā¦
128 K points from the image. And 4.8 hours No thanks. When I have to do this (very rarely), I use Cull Duplicates (points) using the MidPt (Curve Middle) of each line.
You havenāt given us a definition of what you mean by equality or any examples.
Is a line (segment?) a duplicate if itās close but not quite, or are the duplicates exact down to the last decimal place? If the second (fuzzy duplicates), how do you decide which one to delete?
In any event, Grasshopper probably isnāt the best tool for this unless you just write a one stage script component.
Suggestion:
Come up with a definition of equality. Phrase it as a >, <, equals comparison. If it were a test for point equality, pseudocode:
If pt1.x > pt2.x then return greater
if pt1.x < pt2.x then return lesser
if pt1.y > pt2.y then return greater
if pt1.y < pt2.y then return lesser
if pt1.z > pt2.z then return greater
if pt1.z < pt2.z then return lesser
return equal (because theyāre exactly the same)
Sort the lines based on that definition, using a existing library (Python, C#, etc.).
Duplicates will now be adjacent in the list, and there may be a series of more than two in a row. Walk through the sorted list and delete the items you donāt want, making any related changes along the way (such as merging fuzzy duplicates if you have a situation like one of two duplicates making the proper connection to another element on one end and the other having the proper connection point to another element on the other end).
If itās just lines or segments, Iāll bet you can get the duplicate processing down to seconds on reasonable hardware. I donāt know what you need to do in Rhino after you identify the duplicates, but this problem should be a cup of coffee tops, not a workday.
Difficult to see anything without a definition but a flattened list at that scale is always going to be slow.
I would probably try to maintain local Voronoi clusters within their own branches and check against duplicates locally.
But as another user pointed out, the best use of time would be studying if you can avoid creating duplicate lines in the first place. The answer is almost always yes.
Alternatively, rather than āremoving duplicate linesā you could do some work with indices and list/cull per Voronoi cluster as needed. So you just choose not to list all possible lines (if a repeatable pattern is observable that isā¦)
An example would be if you had 10,000 cubes returned as a brep wireframe you could list edges 1, 3, and 5 and just ignore the others. Easier to āpopulateā lists of data going forward then subtract/compare looking backward (usually).
I would reduce your Voronoi count to a manageable number like 200 for testing, focus on an algorithm or workflow that does not create duplicate lines and then scale up from thereā¦
P.S. With the midpoints hidden. one short red line (mistakenly deleted) shows up when you bump the āToleranceā slider from 0.004 to 0.005.
P.P.S. This cone is only one unit in height (the default) so it makes sense that tolerance is small?
At this point it seems a bit strange that you create voronoi cells and continue using proximity linksā¦
Once you have the lines inside and outside your brep, what happens next?