Python exercise

not really rhino related:

i tried this week’s challenge at:
programmingpraxis.com/2015/02/03/closest-pair-part-1/

[quote]
Today’s exercise comes from the field of computational geometry; given a set of points in two-dimensional space, find the pair with the smallest distance between them. The simple solution uses brute force to calculate the distance between all pairs, taking time O(n2).

Your task is to write a program that calculates the closest pair of points.[/quote]

there doesn’t seem to be much discussion at that site and everyone is using different languages…

i sort of think i didn’t do it the best way… what’s your solution?

here’s how i did it:
(with the points being visually shown in rhino… otherwise, no rhinoscriptsyntax (though in this forum, show those solutions too)

import rhinoscriptsyntax as rs
import random
import math


def closest(amt, pts):

    dlist = []
    for i in range(amt - 1):
        pt = pts.pop(0)
        x = pt[0]
        y = pt[1]

        for q in range(len(pts)):
            xlen = x - pts[q][0]
            ylen = y - pts[q][1]
            distance = math.sqrt((xlen ** 2) + (ylen ** 2))
            dlist.append([distance, i + 1, q + 2 + i])

    for i, d in enumerate(dlist):
        print "Point {} to {} = {:.10f}".format(d[1], d[2], d[0])

    dlist.sort()

    msg = "\nClosest points are {} and {} \ndistance = {}"
    print msg.format(dlist[0][1], dlist[0][2], dlist[0][0])


def pointmaker(amt):

    pts = []

    for i in range(amt):
        x = round(random.uniform(0, 100), 2)
        y = round(random.uniform(0, 100), 2)

        point = (x, y)
        pts.append(point)

        pt = rs.coerce3dpoint([x, y, 0])
        rs.AddPoint(pt)

    for i, pt in enumerate(pts):
        print "Point {} - {}".format(i + 1, pt)
    print""
    closest(amt, pts)

pointmaker(5)
>>> 
Point 1 - (77.24, 4.74)
Point 2 - (24.78, 36.76)
Point 3 - (3.03, 97.83)
Point 4 - (70.01, 93.6)
Point 5 - (96.81, 55.99)

Point 1 to 2 = 61.4600032541
Point 1 to 3 = 119.049872742
Point 1 to 4 = 89.1536454667
Point 1 to 5 = 54.8593419574
Point 2 to 3 = 64.8275203906
Point 2 to 4 = 72.6397859303
Point 2 to 5 = 74.5527585003
Point 3 to 4 = 67.1134360616
Point 3 to 5 = 102.690184536
Point 4 to 5 = 46.1817290711

Closest points are 4 and 5 
Distance = 46.1817290711
>>>

Here’s my entry - I’m sure this could be done far more efficiently though… and of course, I cheated, I used some Rhino-specific functions…

Try it out on 2000 points, file attached…

Edit - replaced the original file, had some layer junk in it from something else I didn’t notice…
2000Pts.3dm (684.1 KB)

import Rhino, copy, time

def FindClosestPair():
    ptIDs=rs.GetObjects("Collection of points to process",1,preselect=True,minimum_count=3)
    if not ptIDs: return
    count=len(ptIDs)
    pt_list=rs.coerce3dpointlist(ptIDs)
    
    pts=copy.copy(pt_list)
    bb=rs.BoundingBox(ptIDs)
    curr_dist=bb[0].DistanceTo(bb[6])
    
    st=time.time()
    for i in range(count-1):
        index1=len(pts)-1
        test_pt=pts.pop()
        index2=rs.PointArrayClosestPoint(pts,test_pt)
        check_dist=test_pt.DistanceTo(pt_list[index2])
        if check_dist<curr_dist:
            curr_dist=check_dist
            index_pair=[index1,index2]
        #print "Current index pair: {} - {}".format(index_pair[0],index_pair[1])
        
    rs.SelectObject(ptIDs[index_pair[0]])
    rs.SelectObject(ptIDs[index_pair[1]])
    msg="For {} points: elapsed time = {:.4} seconds.".format(count,time.time()-st)
    msg+=" / Closest points are {} and {}".format(index_pair[0],index_pair[1])
    msg+=" / Distance = {:.3}".format(curr_dist)
    print msg
FindClosestPair()

whoa… that’s way faster than the thing i did :hushed:

i put the timer on mine and it’s -->

For 2000 points: elapsed time = 192.9 seconds. 

if i take out the print parts, (or the one that lists all the distances), it speeds up considerably but still nowhere close to the speed of your script… without print :

For 2000 points: elapsed time = 37.63 seconds.

anyway, i’m going to have a more thorough look at your script later… i was mostly interested in seeing other methods for going through the list of points and doing only the necessary comparisons but now this speed thing has me interested to

Wait… @jeff_hammond 37.63 seconds? Something is really wrong here… My computer does the 2000 points in… 0.05 seconds.

I am trying it on a serious file with 500K points… hmm… been going for about 10 minutes now, hasn’t finished yet… :open_mouth:

Finished! 1639 seconds - 27.5 minutes… :smile:

–Mitch

Hi Jeff, speed is what counts in the end :wink:

however, I think the exercise is how to compare each pair of points without doing redundant checks. This can be done like this:

import rhinoscriptsyntax as rs
import time

def ClosestPoints(ptIDs):
    
    pts=rs.coerce3dpointlist(ptIDs)
    
    bb=rs.BoundingBox(ptIDs)
    minDist=bb[0].DistanceTo(bb[6])
    
    for i in range(len(pts)-1):
        for j in range(i+1, len(pts)):
            distance = pts[i].DistanceTo(pts[j])
            if distance < minDist:
                minDist = distance
                p1 = i
                p2 = j
    
    #rs.AddLine(pts[p1],pts[p2])
    msg = "Closest points are {} and {} \ndistance = {}"
    print msg.format(p1, p2, minDist)


if __name__=="__main__":
    ptIDs=rs.GetObjects("Select Points",1,preselect=True,minimum_count=3)
    if ptIDs:
        st=time.time()
        ClosestPoints(ptIDs)
        print time.time()-st

And when you’ve proudly figured that out they tell you that there are these itertools combinations :wink:

Hi Jess,

Mine’s faster… :smile:

(at least I think so in testing here…)

Sure! But Jeff is also interested in the exercise of comparing all elements of a list. So I’ve posted the standard nested loop solution. I was impressed by the speed of your solution. pop is really fast!

This list.pop looks very interesting. Doing it manually normally. Does it also exist in .NET?

list.pop([i])
#Remove the item at the given position in the list, and return it. If no index is specified, a.pop() removes and returns the last item in the list. (The square brackets around the i in the method signature denote that the parameter is optional, not that you should type square brackets at that position. You will see this notation frequently in the Python Library Reference.) 

[quote=“Helvetosaur, post:5, topic:16458”]
37.63 seconds? Something is really wrong here…
[/quote]’

it’s the list i was using - dlist

first appending each distance to the list, then enumerating the list, then printing the list, then sorting it to find the smallest distance…

with a set of 2000 points, there are nearly 2 million individual measurements which take place so i guess all the list appending/printing/sorting was taking its toll with that many numbers.

anyway, i took out the list and just tracked the smallest distance as they measurement is taken (the same thing you did with if check_dis<curr_dist)

so it sped up a lot… now at ~1.7 secs for 2000 pts…

so i guess what i learned so far is:
while it may be necessary to print out each step of what the script is doing just to make sure it’s making the proper calculations, be sure to take those parts out once it’s confirmed to be working :wink:

I think the speed is more likely due to my use of rs.PointArrayClosestPoint() which probably runs much faster than a loop with a distance check from the test point to each point in a list… There’s probably some cool secret code in the 3dPointList Class that runs much faster than brute force…

–Mitch

Yeah, that’s about on par with Jess’s script.

Yep, printing to the command line slows things down a lot more than you might imagine. Even updating the rs.StatusBarProgressMeter() (Windows only for now) slows things down somewhat. Wonder if the screen redrawing necessary is responsible. You have to balance the utility of informing the user as to progress and calculation time.

Yep,

hey jess… i think yours is the clean/streamlined version of what i was trying :wink:

a couple of questions… where is DistanceTo coming from? is that rhino? (not in my rhinoscript reference plus there’s no rs. in front… but i can’t find it in my python reference either)

i saw the itertools module but i couldn’t get anything to work with it.

if i try:

import itertools
print itertools.count(10)

it prints count(10)

if i try

print itertools.chain("ABC","DEF")

it gives:

<chain object at 0x0000000000000031>

??? am i using them wrong or should they be working differently?

A point3d is an object in RhinoCommon (which python has access to). That means it has methods and properties. One of the methods that point3d has is point3d.DistanceTo(some other point). Even though this particular method is not rhinoscriptsyntax native, you can use it without importing RhinoCommon, because it is part of the point3d structure. However none of this is covered in the rhinoscriptsyntax help.

http://4.rhino3d.com/5/rhinocommon/

Have a look at this:

import rhinoscriptsyntax as rs
import itertools
import time

def ClosestPoints(ptIDs):
    
    pts=rs.coerce3dpointlist(ptIDs)
    
    bb=rs.BoundingBox(ptIDs)
    minDist=bb[0].DistanceTo(bb[6])
    
    for a, b in itertools.combinations(pts, 2):
        distance = a.DistanceTo(b)
        if distance < minDist:
            minDist = distance
            p1 = a
            p2 = b
    
    rs.AddLine(p1,p2)
    msg = "Closest distance = {}"
    print msg.format(minDist)


if __name__=="__main__":
    ptIDs=rs.GetObjects("Select Points",1,preselect=True,minimum_count=3)
    if ptIDs:
        st=time.time()
        ClosestPoints(ptIDs)
        print time.time()-st

on my laptop now (2.6GHz no turbo -vs- 3.5GHz w/ turbo up to 4ghz on my desktop)

jess’s script is taking about 2 seconds and my ‘new’ version is around 4.5 secs… but apparently, my newer iMac is 2-3x faster than my older mbp at this stuff.

i’ll look into it some more throughout the day… it’s probably due to the way i’m getting the distances… first finding the x,y difference for each pair of points then quadratic eq for each distance… i think the rhinoscript commands are a lot faster at some of this stuff

Maybe. But even if it is more or less the same iterative distance check, then it is still C++ and it is just one external function call per point object. While the nested loop combinations are n*(n+1)/2 function calls.

ah… ok. thanks.
so i was just using itertools wrong… your script works fine for me.

nice! thanks.
(yep, not covered at all in the help… no biggie though, i’ll just learn from you guys :smiley: )




hey @Jess

what is this doing?

 if __name__=="__main__":

__name__ is a special variable which represents the top level script environment. If you run that script directly from your editor then __name__ is set to __main__ by the interpreter and this conditional statement is True, thus the following code block will be executed.

If you save this script in your scripts folder then you can import it from other scripts and use the ClosestPoint(ptIDs) function.like using any other module. Then the top level environment will not be __main__ and therefore the following code block will not be executed.

In other words, with this thing you can directly develop and run modules which you can also use from other modules without saving a special version of it.

neat. thanks.
while i don’t completely understand the ins&outs of it, i do see the purpose and benefits.