Inefficient selecting script

hey, I want to select duplicate lines within a certain tolerance in Rhino and wrote the following script for that:

import rhinoscriptsyntax as rs

toleranz = rs.GetReal("tolerance in mm", 0.1, 0.001) 

object_ids = rs.GetObjects("pick curves", rs.filter.curve)

def find_lines_with_matching_endpoints(tolerance, object_ids):
    
    linien_ids = object_ids

    already_found_list = []

    for i, linie_1_id in enumerate(linien_ids): 
        linie_1_start, linie_1_end = rs.CurveStartPoint(linie_1_id), rs.CurveEndPoint(linie_1_id)
        
        for j in range(i+1, len(linien_ids)):
            if linien_ids[j] not in already_found_list: 
                linie_2_start, linie_2_end = rs.CurveStartPoint(linien_ids[j]), rs.CurveEndPoint(linien_ids[j])
                
                match_count = 0 
                
                counter = 0

                for p1, p2 in [
                    (linie_1_start, linie_2_start), 
                    (linie_1_start, linie_2_end),
                    (linie_1_end, linie_2_start), 
                    (linie_1_end, linie_2_end)
                    ]:
                    if rs.Distance(p1, p2) <= tolerance: 
                        match_count += 1
                    counter +=1 
                    if counter == 2 and match_count == 0:
                        break
                if match_count == 2:
                    rs.SelectObject(linie_1_id)
                    rs.SelectObject(linien_ids[j])
                    already_found_list.append(linie_1_id)
                    already_found_list.append(linien_ids[j])



                

find_lines_with_matching_endpoints(toleranz, object_ids) 

now I want to feed some 4000 lines into this abomination and to nobodys suprise it crashes 9 out 10 times, could anyone help me cause I’m not really sure where to even start optimazing python code like this

1 Like

The function doesn’t return anything.

Surely there’s a native de-duping tool in Grasshopper?

this is in the python.script editor not in grasshopper, my company prefers pure python script, did I post this wrong?

Iguess to get more concrete. How could I select all of the curves at once after I have them in my list? I alrdy flag lines that I have encounterd and I skip the tolernace check if the 1st endpoint is alrdy a dud.

I wanted to delete it first. Unfortunately I cannot undelete it anymore. Must have changed here.

Now in my first post I was pointing to 2 things. Pre-filtering by line direction and preventing the square-root in DistanceTo. But of course these are likely minimal optimisations for a 4000 curves. I guess there are two problems. rs.CurveStartPoint are taking an string ID and then it extracts the point. This is likely one of the culprits here as you said. You likely need to use the rs.coerce functionality and extract the curves underlying Rhinocommon geometry. I also think
if linien_ids[j] not in already_found_list is borderline in some case. The more duplicates the worse it gets. Instead just create an array with n-amount of booleans, where each index corresponds to one curve. You can get of this hidden loop like this. Think of it as a filter mask.

I think your biggest problem is using Rhinoscript. That is going to add a huge amount of overhead compared to using Rhinocommon directly.

To get the list of lines you would use GetObjectList
You would want to set ObjectEnumeratorSettings to only lines that are selected.

The length of each line is a property of the line so perhaps subtract line1.Length from line2. Length would return only the pairs of lines that are the same length.
This should narrow down to a much smaller list of line pairs if your line lengths are not all the same. You can then proceed to comparing the endpoints distance in the list of line pairs.

In Rhino common line1.From and line1.To are the start and endpoints of line1.

thanks for the reply tom, your tipps really helped to speed thinks up, now it solves in seconds instead of minutes, it didn’t crash btw, apperently it just needed upto 15minutes to calculate… found out while going on break and letting it do its stuff. your 1st comment was and is deleted btw, I just read it quickly :stuck_out_tongue:

oh, is rs rly that bad for performance? I mean I got it down to workable pace now, but seldup is instant even for thousand of lines so theres still room for improvment, so I’ll try converting it to Commonapi and see how that changes the performance :slight_smile:

Another optimization is to presort the lines list by (sum of xyz coordinates of endpoints). Then, as you’re going through the iteration, if the coordinate sum of the line2 exceeds that of line1, you can immediately break the iteration of the inner for loop.

def coord_sum(line):
	return sum([p.X + p.Y + p.Z for p in (rs.CurveStartPoint(line), rs.CurveEndPoint(line))])

linien_ids = sorted(linien_ids, key=lambda line: coord_sum(line))

# inside inner for loop:
		if coord_sum(line_2_id) - coord_sum(line_1_id) > tolerance: break

You can try this one. Finds around 4000 near-dup lines (within a given tolerance) in about 5 seconds. Works on curves other than lines as well. Progress bar included.

SelNearDupCrvs.py (5.9 KB)

1 Like