Multi-Threading Point Cloud Processing


#1

Greetings All,
I have been attempting to import point clouds into Rhino.
The point cloud is a scanned building facade, the file is over 5 GB as a .pts file.
When I attempt to open the file in Rhino it becomes unresponsive and eventually crashes.

It seems like it should be possible to write a script which reads the .pts file line by line and adds a point to the Rhino document at the appropriate location. I have devised the following script to read the large pts file one line at a time and add a point to the document.

rs.EnableRedraw(False)
with open(path) as infile:
    for line in infile:
        if line.count(" ")<=2:
            Splits=line.split(" ")
            x=float(Splits[0])
            y=float(Splits[1])
            z=float(Splits[2])
            rs.AddPoint(x,y,z)
rs.EnableRedraw(True)
print "Finished"

This script more or less works to transfer all of the points into the Rhino Document but is very time consuming. When the script is running it is using a very small percentage of the CPU available so it occurred to me that this process could be sped up by taking advantage of System.Threading.Tasks. I looked through Steve’s blog post on multi-threading but I am not sure how to apply it. In his example the intersection calculation is multi-threaded while adding the curves to the document takes place on a single thread, is there a reason the points are added to the document in a single thread? I understand how multi-threading is very useful for preforming multiple independent calculations but could it be useful for something like adding 70 million points to a single document?
Thanks in advance,
5chmidt


#2

The problem is that if you import 70 million points and add them to the document as individual point objects, you are probably also going to run out of memory, not to mention the time it will take. That’s why point clouds were created, so Rhino does not have to store them as individual objects.

I tried creating a file with 1 million individual points (using just Rhino’s array command) - after about 5 minutes I gave up and killed it… Trying again with 10K individual points, it took only a couple of seconds, saving the file gave me a 12.5 Mb file size. (as soon as I increased the number to 100K, it slowed down drastically) Extrapolating the file size for 70 megapoints, your file will be 7000 times 12.5Mb or 87.5Gb… I don’t know how you can load that file size into memory…

On the other hand, a 70 million point cloud should be possible with a very good machine, I guess - the biggest I’ve handled is about 20 million, and that’s quite OK here, but I have a recent machine with 32 Gigs of RAM. Loading that 550Mb file into Rhino, it uses only about 1.5Gb of RAM. Again extrapolating, a 70 million point cloud would give you a file size of about 2 gigs, and perhaps a RAM usage of around 5 gigs…

It sounds like you may need a specialized application for handling copious data like this - there used to be PointTools for Rhino, but it has been taken over by Bentley and as far as I know, is no longer available as a Rhino plug-in - but it may be as a stand-alone app.

–Mitch


#3

One other thing to try - Steve recently posted a trick for adding all the points at once to the document:

import rhinoscriptsyntax as rs
from System.Collections.Generic import IEnumerable
import Rhino
import scriptcontext as sc

#get your file
pts=[]
with open(path) as infile:
    for line in infile:
        if line.count(" ")<=2:
            Splits=line.split(" ")
            pts.append(Rhino.Geometry.Point3d(Splits[0],Splits[1],Splits2])
sc.doc.Objects.AddPoints.Overloads[IEnumerable[Point3d]](pts)
sc.doc.Views.Redraw()

However, this will still create the same file size in the end…

–Mitch


#4

Hello Peter,

i highly suggest to not attempt to create points but multiple pointclouds in Rhino from such a large file. Everything else will make rhino run out of memory or run it so unresponsive that it takes too much time to work with. Also RhinoScript runs faster than python in this regard, sometimes even faster when you multiprocess python.

What is your purpose once you have the points (as clouds) in Rhino ? Since you´ve mentioned a building, i guess you want to extract walls as faces etc. This can be quite tedious if not automated. The Ransac algorithm implemented in CloudCompare (x64 version) does a great job once you are familiar with the settings. But be aware that with such file sizes it`s no joy, if you can, try to work “out of core” and read only fractions of the file.

c.


#5

I have switched from adding individual points to adding a series of PointClouds to the document. The add point cloud method is more efficiently handling the large numbers of Points in Rhino. Here is the script I am using to do this:

import rhinoscriptsyntax as rs
import os
import time
import System.Drawing.Color as color

file=rs.OpenFileName("Find .pts file")
if file:
    start=time.ctime()
    Pt_Clouds=[]
    Pts=[]
    Colors=[]
    i=0
    j=0
    CloudSize=500000
    rs.EnableRedraw(False)
    with open(file) as infile:
        for line in infile:
            if i>CloudSize:
                """
                If there are 1000 Points in the List,
                Add the List as a Point Cloud to the Document.
                """
                if Colors!=[]:
                    if len(Colors)==len(Pts):
                        Pt_Clouds.append(rs.AddPointCloud(Pts,Colors))
                    else:
                        Pt_Clouds.append(rs.AddPointCloud(Pts))
                else:
                    Pt_Clouds.append(rs.AddPointCloud(Pts))
                Pts=[]      #Clear Pts List
                Colors=[]
                j+=1        #Count Number of Clouds Added
                i=0         #Count Number of Pts in List
            
            if line.count(" ")>=5:
                Split=line.split(" ")
                
                x=float(Split[0])
                y=float(Split[1])
                z=float(Split[2])
                
                a=float(Split[3])
                r=float(Split[4])
                g=float(Split[5])
                b=float(Split[6])
                
                Colors.append(color.FromArgb(a,r,g,b))
                
            elif line.count(" ")>=2:
                Cords=line.split(" ")
                x=float(Cords[0])
                y=float(Cords[1])
                z=float(Cords[2])
                
            Pts.append([x,y,z]) #Add Point to List
            i+=1
                
            
    end=time.ctime()
    TimePassed=start-end
    Minutes=round((TimePassed/60)-.5)
    Seconds=TimePassed-(Minutes*60)
    msg="Added ("+str(j)+"), "+str(CloudSize)+" Point Clouds to Document \n"
    msg+="Adding Point Cloud took "+str(Minutes)+" and "+str(Seconds)
    
    rs.MessageBox(msg)
    print msg

I am adding the point clouds to the document in 500,000 point chunks and it seems to be functioning pretty well.

The only issue at present is the amount of time it takes to load all of these points into Rhino. I have run a few tests with a 7 million point .pts, the file is 1.81 GB which is a pretty reasonable size, it takes somewhere in the range of 27 minutes and 30 seconds to add the whole thing to the rhino document. I am not sure if this sort of task could be accelerated utilizing multiple threads or if I will just have to suffer through some long importing times.

Additionally, the script is construction the point clouds are divided up based the order in which they appear in the .pts file which is not always the most organized system, so I will do some testing with a few sorting techniques to determine how they effect the time required to bring the point cloud into Rhino.


(Steve Baer) #6

If I wrote this correct (didn’t have a pts file on hand to test), the following modified script may import faster.

import rhinoscriptsyntax as rs
import os, time
from System.Drawing import Color
import scriptcontext, Rhino


file=rs.OpenFileName("Find .pts file")
if file:
    start=time.ctime()
    i=0
    clouds_added=0
    CLOUDSIZE=500000
    rs.EnableRedraw(False)
    cloud = Rhino.Geometry.PointCloud()
    with open(file) as infile:
        for line in infile:
            if i>CLOUDSIZE:
                scriptcontext.doc.Objects.AddPointCloud(cloud)
                # create a new cloud for adding points
                cloud = Rhino.Geometry.PointCloud()
                clouds_added+=1
                i=0

            pieces = line.split(" ")
            if len(pieces)>=3:
                x = float(pieces[0])
                y = float(pieces[1])
                z = float(pieces[2])
                point = Rhino.Geometry.Point3d(x,y,z)
                if len(pieces)>=7:
                    a=float(pieces[3])
                    r=float(pieces[4])
                    g=float(pieces[5])
                    b=float(pieces[6])
                    color = Color.FromArgb(a,r,g,b)
                    cloud.Add(point, color)
                else:
                    cloud.Add(point)
                i+=1


    end=time.ctime()
    TimePassed=start-end
    Minutes=round((TimePassed/60)-.5)
    Seconds=TimePassed-(Minutes*60)
    msg="Added ("+str(clouds_added)+"), "+str(CLOUDSIZE)+" Point Clouds to Document \n"
    msg+="Adding Point Cloud took "+str(Minutes)+" and "+str(Seconds)

    rs.MessageBox(msg)
    print msg

#7

Steve,
I did some testing with your revised script, changing the rhinoscriptsyntax to Rhino common cut the process time from 27.5 minutes to 19.5 minutes for the same .pts file containing 7-million something points. I am going to add some timers to the code in an attempt to find what the current bottleneck is.

When I execute the code I am only using 4 GB of RAM at peak and a single core operating around 75% capacity.

I have done some testing with CloudCompare, recommended by Clement. It has a very efficient method for importing point cloud data (around 5 mins to open the same file). However, if it is possible I would prefer to keep everything in Rhino, so the point cloud information could be viewed simultaneously with remodeled NURBS surfaces.

Thanks,
5chmidt


#8

I added some timers to the Rhino common script Steve posted.

import rhinoscriptsyntax as rs
import os, time
from System.Drawing import Color
import scriptcontext, Rhino


ReadTimers=[]
AddCloudTimers=[]

file=rs.OpenFileName("Find .pts file")
if file:
    start=time.time()
    i=0
    clouds_added=0
    CLOUDSIZE=500000
    rs.EnableRedraw(False)
    cloud = Rhino.Geometry.PointCloud()
    
    #Start Read Timer#
    ReadStart=time.time()
    
    with open(file) as infile:
        for line in infile:
            if i>CLOUDSIZE:
                #Stop Read Timer#
                ReadStop=time.time()
                ReadTimers.append(ReadStop-ReadStart)
                
                #Start Add Cloud Timer#
                AddStart=time.time()
                
                scriptcontext.doc.Objects.AddPointCloud(cloud)
                # create a new cloud for adding points
                cloud = Rhino.Geometry.PointCloud()
                clouds_added+=1
                i=0
                
                #Stop Add Cloud Timer#
                AddStop=time.time()
                AddCloudTimers.append(AddStop-AddStart)
                
                #Start New Read Timer#
                ReadStart=time.time()

            pieces = line.split(" ")
            if len(pieces)>=3:
                x = float(pieces[0])
                y = float(pieces[1])
                z = float(pieces[2])
                point = Rhino.Geometry.Point3d(x,y,z)
                if len(pieces)>=7:
                    a=float(pieces[3])
                    r=float(pieces[4])
                    g=float(pieces[5])
                    b=float(pieces[6])
                    color = Color.FromArgb(a,r,g,b)
                    cloud.Add(point, color)
                else:
                    cloud.Add(point)
                i+=1


    end=time.time()
    TimePassed=end-start
    Minutes=round((TimePassed/60)-.5)
    Seconds=TimePassed-(Minutes*60)
    msg="Added ("+str(clouds_added)+"), "+str(CLOUDSIZE)+" Point Clouds to Document \n"
    msg+="Adding Point Cloud took "+str(Minutes)+" minutes and "+str(Seconds)+" seconds."
    print msg


    for i in range(len(ReadTimers)):
        try:
            print "Read/Split File Time: "+str(ReadTimers[i])+" seconds"
            print "Add Point Cloud Time: "+str(AddCloudTimers[i])+" seconds"
        except:
            pass

The results I got back were surprising.

Overall Time:
Added (139), 500000 Point Clouds to Document
Adding Point Cloud took 20.0 minutes and 5.91965484619 seconds.

Average Time Per CLOUDSIZE loop:
Read/Split File Time: 8.64199829102 seconds
Add Point Cloud Time: 0.0139999389648 seconds

This means it is taking considerably longer to read the file and assign a point and color than it is to add a Point Cloud geometry. It seems like the reading, splitting, point and color assigning tasks could be split up and multi-threaded because scriptcontext.doc.Objects.AddPointCloud does not care what order the points arrive in as long as they are all present when it attempts to create the PointCloud object.

If anyone has any suggestions for how this could be more efficiently accomplished please let me know. I will start putting together a more efficient test script.

Thanks,
5chmidt


#9

Hi Peter,

if you´re reading this from a hdd or ssd and have enough ram available, try to put your file on a dedicated Ramdisk for faster reading. Apart from this, you could try to get rid of the timers in the loop and just measure by hand if it makes a difference. If not add them back.

The open function of python optionally allows a buffering argument. Not shure if it will make a difference if you´re setting the buffer to a higher size eg.:

open(file, "r", 33554432) #32mb 

Another optimisation might be to use map and set all the splitted pieces to float in one go instead of doing it one by one. eg:

pieces = map(float, line.split(" "))

you might start with multiprocessing if it`s not faster too :wink:

c.


#10

Edit: After testing with a smaller file without colors, i´ve found that using a different buffer size did not change much. But putting everything inside a def and using map as described above together with this minor change:

pieces = map(float, line.split(" "))
    if len(pieces)>=3:
        point = Rhino.Geometry.Point3d(pieces[0],pieces[1],pieces[2])

decreased the import time in my case from 21s to 13s. I have no idea why :wink:

ImportPts.py (2.2 KB)

c.