PointCloud AddRange

Mrothart · July 17, 2019, 10:31pm

Terry,

This works. Thanks again!

Terry_Chappell · July 17, 2019, 11:08pm

How long did it take for your 8 GB file? Same time as before except this time you got your point cloud? 1200 sec or so?

Here is a simpler, shorter, faster version that incorporates some of nathancoatney’s methods. It runs 15% faster mostly due to directly loading the .NET Point3dList() and List[Color]() structures just after splitting a line.

from Rhino.Geometry import PointCloud
from Rhino.Collections import Point3dList
from System.Collections.Generic import List
from System.Drawing import Color
from scriptcontext import doc
import rhinoscriptsyntax as rs
from time import time
from itertools import islice

def ImportXYZRGB():
	#File open
	filtr = 'Text Files (*.txt)|*.txt| XYZ Color files (*.xyz)|*.xyz||'
	strPath = rs.OpenFileName("XYZRGB file to import", filtr)
	if not strPath: return
	timea = time()
	file = open(strPath)
	if not file: return
	# Create point cloud object.
	cloud = PointCloud()
	# Keep count of lines read.
	total = 0
	# Format of 3D point and RGB color in each line is XYZRGB: 1.2345 2.5682 3.9832 155 200 225
	while True:
		# Get group of lines to process.
		n_lines = list(islice(file, 1000))  # neat trick from stack overflow brought by nathancoatney of Rhino Forum.
		if not n_lines: break
		total += len(n_lines)
		# Define .NET structures for holding points and colors.
		points = Point3dList()
		colors = List[Color]()
		# Load points and colors for point cloud.
		for line in n_lines:
			d = line.strip().split(' ')
			points.Add(float(d[0]),float(d[1]),float(d[2]))
			colors.Add(Color.FromArgb(int(d[3]),int(d[4]),int(d[5])))
		# Add points & colors to point cloud.
		cloud.AddRange(points, colors)
	file.close()
	# Add visible point cloud to document.
	obj = doc.Objects.AddPointCloud(cloud)
	doc.Views.Redraw()
	timeb = time()
	print 'Read {0} lines from file and added point cloud in {1:.4f} sec.'.format(total, timeb - timea)

if __name__ == "__main__":
	ImportXYZRGB()

I also tried nathancoatney’s parallel version but it ran 10% slower using 2 threads on my machine so I cannot recommend it.

Please try this new version and let me know how much faster it is. Your 1200 sec time could drop close to 1000 sec or 17 min. A downloadable copy is provided below:

ImportBigXYZRGB.py (2.2 KB)

This simple Python script is 2.7X faster than using Rhino’s Import function which does work for my 4 GB test case.

Regards,
Terry.

nathancoatney · July 18, 2019, 2:52am

I’m not sure what is going on with the parallel foreach. I can’t even get decent speedups processing the lines to lists of floats and ints, with no Rhino objects involved.

I’ve tried several different containers and methods, so it is beyond me I guess.

Terry_Chappell · July 18, 2019, 4:16pm

My experiences with parallel processing are similar. The only cases where I get 2X to 3X speedup are when there is a call to a Rhinocommon function that takes a considerable time to execute. My best example is generating contours for a large map. While that function is executing, the Python code can use the interpreter to get more done, launching more calls for generating contours. If it is just pure Python, then the parallel code runs slower. I believe this is because there are not multiple threads of the interpreter running. The cloud import code is mostly Python with no long calls to Rhinocommon functions so it sees little or negative speedup.

Regards,
Terry.

Terry_Chappell · July 18, 2019, 4:24pm

@Mrothart,

Here is a newer version that is another 15% faster. This makes it 3.6X faster than Rhino’s File -> Import. The 15% speedup in the Python script comes from using map() in two places:
(1) It is used to convert the split string into 6 floats in one go. The float results are used in both Point3dList and Color.FromArgb, which accepts either integers or floats.
(2) The RGB color is converted to a 32-bit color and collected in a temporary list for the group of n_lines. Then map is used to convert this list to .NET Colors. This speeds up Color generation 20%.

from Rhino.Geometry import PointCloud
from Rhino.Collections import Point3dList
from System.Collections.Generic import List
from System.Drawing import Color
from scriptcontext import doc
import rhinoscriptsyntax as rs
from time import time
from itertools import islice

def ImportXYZRGB():
	# Format of 3D point and RGB color in each line is XYZRGB: 1.2345 2.5682 3.9832 155 200 225
	filtr = 'Text Files (*.txt)|*.txt| XYZ Color files (*.xyz)|*.xyz||'
	strPath = rs.OpenFileName('XYZRGB file to import', filtr)
	if not strPath: return
	timea = time()
	file = open(strPath)
	if not file: return
	total = 0
	cloud = PointCloud()
	while True:
		# Get group of 1000 lines to process.
		n_lines = list(islice(file, 1000))  # neat trick from stack overflow brought by nathancoatney of Rhino Forum.
		if not n_lines: break
		total += len(n_lines)
		# Define Rhino structure for holding points.
		points = Point3dList()
		# Create list for holding 32-bit representation of RGB color.
		tcolors = []
		# Load points and colors for point cloud.
		for line in n_lines:
			d1,d2,d3,d4,d5,d6 = map(float,line.strip().split(' '))
			points.Add(d1,d2,d3)
			# Convert R,G,B color to 32-bit color.
			tcolors.append(65536.*d4 + 256.*d5 + d6)
		# Convert 32-bit color to .NET Color.
		colors = List[Color](map(Color.FromArgb,tcolors))
		# Add points & colors to point cloud.
		cloud.AddRange(points, colors)
	file.close()
	# Add visible point cloud to document.
	obj = doc.Objects.AddPointCloud(cloud)
	doc.Views.Redraw()
	timeb = time()
	print 'Read {0} lines from file and added point cloud in {1:.4f} sec.'.format(total, timeb - timea)

if __name__ == "__main__":
	ImportXYZRGB()

Give it a try. This should import your 8 GB cloud in less than 900 sec.

XYZRGB_Import.py (2.3 KB)

Regards,
Terry.

Terry_Chappell · July 30, 2019, 6:50pm

I created a version that calls a DLL to speedup file reading and conversion to XYZ coordinates and colors. This hybrid Python/DLL version is 5X faster than an all Python version and 15X faster than using Rhino’s Import tool. It imports my 4GB test case with 73.7M colored points in 51 sec! So I think it could import your 8GB colored point cloud in around 3 min.

To use this version you need to download the Python code and the DLL. The DLL with .dll extension cannot be attached here so a link to it has been provided. If you do not want to use the link, you can re-generate the DLL in Microsoft Visual Studio 2017 using the c++ source code read_cloud.cpp attached below.

XYZRGB_with_DLL.py (7.5 KB)

read_cloud.cpp (5.6 KB)

If you want to use the DLL, you need to change the second line in the Python script to point to where you put the DLL after downloading (or creating with Visual Studio). For me this line is:
dll_name = r'C:\Users\Terry\source\repos\ReadCloud\x64\Release\ReadCloud.dll'
The only important part of this line is ReadCloud.dll which you can put anywhere on your computer and then fill in the path to it. The r at the start of the string indicates to Python to treat this as a raw string so you do not have to worry about using double backslashes before certain characters.

Below are readable versions of these files.

XYZRGB_with_DLL.py

# If you want to use DLL for 5X faster cloud import, set its path in the next line.
dll_name = r'C:\Users\Terry\source\repos\ReadCloud\x64\Release\ReadCloud.dll'
"""
This Python script reads a colored point-cloud file with XYZRGB format and creates a colored point-cloud in Rhino 6 or 7 WIP.
It uses cloud.AddRange to provide an approximately 3X speedup over cloud.Add and it overcomes the 70M point limit of the
Rhino Point3dList structure used in AddRange by processing small groups of 2000 points at a time. For 5X faster execution time,
it calls a DLL which reads a 100 KB block at a time and then parses the block into lines and the lines into the X,Y,Z coordinates
and R,G,B color of a point. The 100KB block fits entirely within the L1-cache of the processer resulting in higher performance.
The coordinates and R,G,B colors, after conversion to 32-bit format, are passed back to Python where they are converted to
Point3d and .NET Color and then added to the point cloud using AddRange.
This script is 5X faster than an all Python version and 15X faster than Rhino's Import command. For example, it reads
a 4GB file and creates a 73.7M point cloud in 51 sec vs 256 sec (Python only) vs 790 sec or 13:10 min (Rhino Import).
Thus the script imports a colored point cloud at about 1.4M points/sec.
By Terry Chappell 7/30/2019.
"""
from Rhino.Geometry import PointCloud
from Rhino.Collections import Point3dList
from System.Collections.Generic import List
from System.Drawing import Color
from scriptcontext import doc
import rhinoscriptsyntax as rs
from time import time
import os
from ctypes import c_longlong, cdll as c_cdll, c_char, c_long, c_int, c_double, byref as c_byref, c_wchar_p

# This is used when DLL is available.
def ImportXYZRGB_DLL():
	# Format of 3D point and RGB color in each line is XYZRGB: 1.2345 2.5682 3.9832 155 200 225
	filtr = 'Text Files (*.txt)|*.txt| XYZ Color files (*.xyz)|*.xyz||'
	strPath = rs.OpenFileName('XYZRGB file to import', filtr)
	if not strPath: return
	timea = time()
	# Zero number of points read.
	total = 0
	# Get size of file in bytes.
	file_size = os.path.getsize(strPath)
	# Read 100K bytes at a time.  This nicely fits in the L1 cache and results in few misses for best performance overall.
	block_size = 100000
	# Estimate the number of points that are contained in the 100K bytes block.
	est_points = block_size // 30 # 3 FP numbers + 3 Int numbers.
	# Set dimension of points and colors arrays based upon estimated number of points.
	nxyz = int(est_points)
	ncolors = int(est_points)
	print 'Reading file {0} with {1:,} bytes in {2:,}KB blocks and converting to point cloud . . .'.format(strPath, file_size, block_size // 1000)
	# Define c-types variables for interfacing to DLL.
	cx = (c_double * (nxyz))() # X-coordinant of point in cloud.
	cy = (c_double * (nxyz))() # Y-coordinant
	cz = (c_double * (nxyz))() # Z-coordinant
	ccolors = (c_long * (ncolors))() # Color for each point
	cnum_points = (c_int * (4))() # Number of points read + room for 3 other values for debug.
	cmemblock = (c_char * (block_size))() # Working area for reading file.
	coffset = (c_longlong * (1))(0) # Offset into file as each block is read.
	#coffset = c_longlong(0) # Offset into file as each block is read.
	cfile_length = (c_longlong * (1))(file_size) # File length for detecting when last block read occurs.
	# Initialize cloud for holding results of read.
	cloud = PointCloud()
	offset = 0
	while offset < file_size:
		# Call DLL to open file, read block, parse block, return lists of points and colors.
		soReadCloud.read_cloud(strPath, c_byref(cx),c_byref(cy),c_byref(cz), c_byref(ccolors), c_byref(cnum_points), c_byref(cmemblock), block_size, c_byref(coffset), c_byref(cfile_length))
		# Get number of points read from block.
		num_points = cnum_points[0]
		offset = coffset[0]
		# Sum total number of points read.
		total += num_points
		# Create list for holding point3d.
		points = Point3dList()
		# Load points using coordinates returned by DLL.
		for i in xrange(num_points): points.Add(cx[i],cy[i],cz[i])
		# Get active colors from ctypes colors list.
		colors = ccolors[:num_points]
		# Convert 32-bit colors to .NET Colors.
		colors = List[Color](map(Color.FromArgb,colors))
		# Add points & colors to point cloud.
		cloud.AddRange(points, colors)
	# Add visible point cloud to document.
	obj = doc.Objects.AddPointCloud(cloud)
	doc.Views.Redraw()
	timeb = time()
	print 'Read {0:,} lines and added point cloud in {1:.4f} sec.'.format(total, timeb - timea)

# This is used when DLL was not found.
def ImportXYZRGB_Python():
	# Select point cloud to import.
	filtr = 'Text Files (*.txt)|*.txt| XYZ Color files (*.xyz)|*.xyz||'
	# Format of 3D point and RGB color in each line must be XYZRGB like this: 1.2345 2.5682 3.9832 155 200 225
	strPath = rs.OpenFileName('XYZRGB file to import', filtr)
	if not strPath: return
	timea = time()
	# Zero counter for number of points read.
	total = 0
	# Use 100K block size which easily fit in L1-cache for better performance.
	block_size = 100000
	# Zero counter for number of bytes read so use in detecting end of file.
	bytes_read = 0
	# Get file size in bytes so end of file can be detected.
	file_size = os.path.getsize(strPath)
	# Zero offset for aligning to line boundary.
	offset = 0
	# Reset done flag.
	done = False
	print 'Reading file {0} with {1:,} bytes in {2:,}KB blocks and converting to point cloud . . .'.format(strPath, file_size, block_size // 1000)
	# Initialize point cloud.
	cloud = PointCloud()
	# Read file, process lines into points and load points into cloud.
	# Binary read is 3X faster than reading lines.
	with open(strPath, 'rb',0) as file:
		while not done:
			file.seek(offset, 1) # Offset is relative to current position.
			block = file.read(block_size)
			if not block: break
			# Split block into lines.
			lines = block.splitlines()
			# Remove last line as it is likely incomplete.
			lline = lines.pop()
			# Back up file pointer by length of last line.
			offset = -len(lline)
			# Sum number of bytes read so end can be detected.
			bytes_read += block_size + offset
			# If this is the last group, add back last line.
			if bytes_read >= file_size: lines.append(lline); done = True
			# Count number of points read.
			total += len(lines)
			# Create lists for holding point3d and 32-bit colors.
			points = Point3dList()
			tcolors = []
			# Parse each line to extract points and colors.
			for line in lines:
				# Parse line into 6 float values.
				d1,d2,d3,d4,d5,d6 = map(float,line.strip().split(' '))
				# Add X,Y,Z coordinates to Point3dList.
				points.Add(d1,d2,d3)
				# Convert R,G,B color to 32-bit color for faster Color.FromArgb.
				tcolors.append(65536.*d4 + 256.*d5 + d6)
			# Convert 32-bit colors to .NET Colors.
			colors = List[Color](map(Color.FromArgb,tcolors))
			# Add points & colors to point cloud.
			cloud.AddRange(points, colors)
	# Add visible point cloud to document.
	obj = doc.Objects.AddPointCloud(cloud)
	doc.Views.Redraw()
	timeb = time()
	print 'Read {0:,} lines and added point cloud in {1:.4f} sec.'.format(total, timeb - timea)

if __name__ == "__main__":
	# Try to get DLL.
	try:
		soReadCloud = c_cdll.LoadLibrary(dll_name)
		use_DLL = True
	# Use Python code when DLL not found.
	except:
		print 'WARNING: Did not find DLL at {}. A 5X slower all Python version will be used.'.format(dll_name)
		use_DLL = False
	if use_DLL: ImportXYZRGB_DLL() 
	else: ImportXYZRGB_Python()

read_cloud.cpp

// ReadCloud.cpp : Defines the exported functions for the DLL application.
//

#include "stdafx.h"
#include <iostream>
#include <fstream>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>     // for _countof

using namespace std;

#define DLLEXPORT extern "C" __declspec(dllexport)
#define UCLASS()
#define GENERATED_BODY()
#define UFUNCTION()


// This function parses a line into 3 double and 3 int. It is used in place of strtod: strtod(line, &pEnd); d2 = strtod(pEnd, &pEnd); d3 = strtod(pEnd, &pEnd);
void block2xyzc(const char *p, int &i, int &loc, long long offset, long long file_length, double *x, double *y, double *z, long *colors) {
	static double pow10[17] = { 1., 10., 100., 1000., 10000., 100000., 1000000., 10000000., 100000000., 1000000000.,
		10000000000., 100000000000., 1000000000000., 10000000000000., 100000000000000., 1000000000000000., 10000000000000000. };
	// Define variables for 3 doubles.
	double r1, r2, r3;
	// Define variables for 3 integers and number of bytes read from block.
	int r4, r5, r6, bytes_read = 0;
	// Define variable for saving pointer location at start of parsing block.
	const char *p_start;
	// Initialize flags for controlling continuing parsing, storing data and returning after last line read.
	bool next = true, second = false, last = false;
	// Parse each line of block.
	while (next) {
		// If storage is enabled, save values from prior line.
		if (second) {
			// Store XYZ of prior line.
			x[i] = r1; y[i] = r2; z[i] = r3;
			// Store 32-bit color of prior line.
			colors[i++] = (((r4 << 8) + r5) << 8) + r6;
		}
		// Toggle second to true to start storing results above.
		second = true;
		// If last line read, return.
		if (last) { return; }
		// Get pointer location at start.
		p_start = p;
		// Zero count of line length.
		loc = 0;
		// Zero 3 double and 3 integer values at start of parsing.
		r1 = 0.0, r2 = 0.0, r3 = 0.0; r4 = 0, r5 = 0, r6 = 0;
		// Zero variables used for adding digits after decimal point.
		double f = 0.0;	int n = 0;
		// Reset negative flag at start.
		bool neg = false;
		// Skip any white space and convert first double at start of line.
		while (*p == ' ') { ++p; }
		// Get possible minius sign at start.
		if (*p == '-') { neg = true; ++p;  }
		// Convert digits before decimal points.
		while (*p >= '0' && *p <= '9') { r1 = (r1*10.0) + (*p - '0'); ++p; }
		// Get digits after decimal point.
		if (*p == '.') {
			// Zero digits after decimal point and power of ten and advance pointer to first digit.
			f = 0.0; n = 0; ++p;
			// Convert digits after decimal point.
			while (*p >= '0' && *p <= '9') { f = (f*10.0) + (*p - '0');	++p; ++n; }
			// Scale digits by power of 10 to make fraction.
			r1 += f / pow10[n];
		}
		// Create negative result if minus sign was present and reset minus flag.
		if (neg) { r1 = -r1; neg = false; }
		// Skip white space and convert second double.
		while (*p == ' ') { ++p; }
		if (*p == '-') { neg = true; ++p; }
		while (*p >= '0' && *p <= '9') { r2 = (r2*10.0) + (*p - '0'); ++p; }
		if (*p == '.') {
			f = 0.0; n = 0; ++p;
			while (*p >= '0' && *p <= '9') { f = (f*10.0) + (*p - '0');	++p; ++n; }
			r2 += f / pow10[n];
		}
		if (neg) { r2 = -r2; neg = false; }
		// Skip white space and convert third double.
		while (*p == ' ') { ++p; }
		if (*p == '-') { neg = true; ++p; }
		while (*p >= '0' && *p <= '9') { r3 = (r3*10.0) + (*p - '0'); ++p; }
		if (*p == '.') {
			f = 0.0; n = 0; ++p;
			while (*p >= '0' && *p <= '9') { f = (f*10.0) + (*p - '0');	++p; ++n; }
			r3 += f / pow10[n];
		}
		if (neg) { r3 = -r3; }
		// Skip white space and convert 3 integers towards end of line.
		while (*p == ' ') { ++p; }
		while (*p >= '0' && *p <= '9') { r4 = (r4 * 10) + (*p - '0'); ++p;}
		while (*p == ' ') { ++p; }
		while (*p >= '0' && *p <= '9') { r5 = (r5 * 10) + (*p - '0'); ++p;}
		while (*p == ' ') { ++p; }
		while (*p >= '0' && *p <= '9') { r6 = (r6 * 10) + (*p - '0'); ++p;}
		// Find length of line for partial-line case with no \0 or \n.
		loc = p - p_start;
		// Skip \0 at end of line.
		++p;
		// If new-line character found here, continue parsing lines.
		if (*p == '\n') { next = true; ++p; } else { next = false; }
		// Sum bytes read including \0\n at end.
		bytes_read += loc + 2;
		// When end-of-file reached, set last = true in order to return after storing last line of information.
		if ((offset + bytes_read + 2) > file_length) { last = true; }
	}
}

DLLEXPORT void read_cloud(wchar_t *file_name, double *x, double *y, double *z, long *colors,
	int *num_points, char *memblock, int block_size, long long *offset, long long *file_length)
{
	// Set file_size as streampos type.
	streampos size_of_block = block_size;
	// Set file offset to start of next block.
	streampos file_offset = offset[0];
	// Create index for number of points.
	int i = 0;
	// Define integer variables for backing up offset to next memblock to align with line boundary,
	int loc = 0;
	//
	// Open file for binary read.
	//
	ifstream in_file(file_name, ios::binary | ios::in);
	// Import point cloud.
	if (in_file.is_open()) {
		// Move file pointer to start of block.
		in_file.seekg(file_offset);
		// Read memblock of data into memory.
		in_file.read(memblock, size_of_block);
		// Parse memblock into X,Y,Z coordinates and 32-bit colors.
		block2xyzc(memblock, i, loc, offset[0], file_length[0], x, y, z, colors);
		// Offset next read by block_size minus length of partial line so read will be aligned to start of line.
		offset[0] += block_size - loc;
		// Return number of points read.
		num_points[0] = i;
	}
}

.
Using the DLL version, about half of the time is spent loading the Point3dList in these few lines:

# Create list for holding point3d.
points = Point3dList()
# Load points using coordinates returned by DLL.
for i in xrange(num_points): points.Add(cx[i],cy[i],cz[i])

This is very slow compared to all other operations. If the .Add method could be enhanced to take a list of (x,y,z) tuples and quickly convert these to Point3d for populating the Point3dList, then another 30% speedup would be possible with the cloud reading speed rising to 1.8 M points per sec.

Regards,
Terry.

ivelin.peychev · July 31, 2019, 5:06pm

# Get group of 1000 lines to process.
n_lines = list(islice(file, 1000)) # neat trick from stack overflow brought by nathancoatney of Rhino Forum.

Hi @Terry_Chappell, @nathancoatney,

Could you please explain what benefit does this give?
Does it store each 1000 lines as separate objects in the memory or something?

Here are my two attempts (one using a dotnet library the other python’s csv module), but I’m still two times slower than your script and I do not quite get why. I believe it is because of this reading in chunks also converting the RGB to a float which I don’t get how to use in my case.

from Microsoft.VisualBasic.FileIO import TextFieldParser


##################################
### Read TXT create pointcloud ###
##################################
import System
import rhinoscriptsyntax as rs
import scriptcontext as sc
import Rhino
import time
from System.Drawing import Color

#import csv
import clr

clr.AddReference("Microsoft.VisualBasic")

from Microsoft.VisualBasic.FileIO import TextFieldParser


tol = sc.doc.ModelAbsoluteTolerance

def TST():
    filtr = 'Text Files (*.txt)|*.txt| XYZ Color files (*.xyz)|*.xyz||'
    strPath = rs.OpenFileName('XYZRGB file to import', filtr)
    if not strPath: return
    
    ts = time.time()
    cloud = Rhino.Geometry.PointCloud()
    reader = TextFieldParser(strPath)
    reader.SetDelimiters(",")
    #fields = reader.ReadFields()
    point_list = []
    
    color_list = []
    while reader.EndOfData == False:
        try:
            fline = reader.ReadLine()
            row = fline.split(",")
            #point_list.append(Rhino.Geometry.Point3d(float(row[0]),float(row[1]),float(row[2])))
            # Convert R,G,B color to 32-bit color.
            #color_list.append(65536.*int(row[3]) + 256.*int(row[4]) + int(row[5]))
            
            #cloud_item = Rhino.Geometry.PointCloudItem()
            cloud_item = Rhino.Geometry.PointCloud.AppendNew(cloud)
            cloud_item.Location = Rhino.Geometry.Point3d(float(row[0]),float(row[1]),float(row[2]))
            cloud_item.Color = Color.FromArgb(int(row[3]),int(row[4]),int(row[5]))
            
            
            
            #cloud.Add(Rhino.Geometry.Point3d(float(row[0]),float(row[1]),float(row[2])),Color.FromArgb(int(row[3]),int(row[4]),int(row[5])))
            #print fline
        except:
            break
    # Convert 32-bit color to .NET Color.
    #colors = System.Collections.Generic.List[Color](map(Color.FromArgb,color_list))#map(Color.FromArgb,color_list)
    
    #cloud.AddRange(point_list,colors)
    
    rs.EnableRedraw(False)
    
    sc.doc.Objects.AddPointCloud(cloud)
    
    print "Elapsed time is {:.2f}".format(time.time()-ts)
    print "Boo!"

if __name__ == "__main__":
    
    rs.EnableRedraw(False)
    
    TST()

import csv

##################################
### Read TXT create pointcloud ###
##################################
import System
import rhinoscriptsyntax as rs
import scriptcontext as sc
import Rhino
import time
from System.Drawing import Color

import csv

tol = sc.doc.ModelAbsoluteTolerance




def TST():
    filtr = 'Text Files (*.txt)|*.txt| XYZ Color files (*.xyz)|*.xyz||'
    strPath = rs.OpenFileName('XYZRGB file to import', filtr)
    if not strPath: return
    
    cloud = Rhino.Geometry.PointCloud()
    ts = time.time()
    with open(strPath,'r') as csvfile:
        
        dict_reader = csv.DictReader(csvfile, fieldnames=['x','y','z','r','g','b'], restkey=None, restval=None, dialect='excel',delimiter=',', quotechar="'",quoting=csv.QUOTE_NONNUMERIC)
        # csv.QUOTE_NONNUMERIC -> tells the reader to convert all non quoted data to floats
        
        for row in dict_reader:
            cloud.Add(Rhino.Geometry.Point3d(row['x'],row['y'],row['z']),Color.FromArgb(row['r'],row['g'],row['b']))#
            
    
    sc.doc.Objects.AddPointCloud(cloud)
    print "Elapsed time is {:.2f}".format(time.time()-ts)
    print "Boo!"

if __name__ == "__main__":
    
    #rs.EnableRedraw(False)
    
    TST()

I guess having fewer lines doesn’t mean better

Terry_Chappell · July 31, 2019, 10:44pm

Ivelin,

Thanks for looking at this simple but challenging problem.

I did know about the details of using the csv parser or Microsoft TextFieldParser so it was really interesting to look at your code. The challenge with these approaches is that they add only 1 point at a time to the cloud. From experiments I did, this is about 3X slower than collecting a group of points & colors and then using cloud.AddRange to add the group. Here are the timings I got for reading a point cloud with 9M points from a 0.5GB file:

Using the files you provided with a ’ ’ separator:
csv: 172 sec
TextFieldParser: 169 sec
Using the file I posted above
XYZRGB_with_DLL.py: 31 sec (without DLL enabled. This automatically uses a Python-only version)
XYZRGB_with_DLL.py: 6.2 sec with DLL enabled.

Both of the XYZRGB_with_DLL.py runs use cloud.AddRange. They also use 32-bit colors which run thru the step:
colors = List[Color](map(Color.FromArgb,tcolors))
about 50% faster. In addition they access the file with a binary read which is up to 7X faster than reading by lines. On top of this when the DLL is enabled, there is no step where the data is broken into lines and then parsed into 6 values (XYZRGB). Instead a whole block of data is parsed continuously with the newline characters used to trigger the storage of each line’s worth of data. This makes the parsing much faster as the data is not read twice, once to find the newline characters and again to find the field separators (space for the XYZRGB files I am using). Using binary read to get a block of data has the challenge of cutting the last line in the block in the middle but this is easy to deal with by just offsetting the read of the next memory block by the length of this last line and not storing the incomplete last line of a block. These details are documented in the code listing.

It took me some time to work all this out and get it working as my knowledge of c++ is quite weak. But by searching the web and spending a few hours playing with the code each day when I was on vacation at the beach for a week, and bouncing ideas off my family, we were able to reduce the time to read a 4GB file with 73.7M points gradually from 256 sec using only Python, to 100 sec with the first version using the DLL, then to 75 sec by using a string-to-double or integer parser I found on the web, and then to 67 sec by parsing all 6 values in a line together and finally down to 51 sec by parsing all the values in a 100KB memory block together. Now the code spends only 10 sec in the DLL to do the binary read of all 4GB and parsing this into 73.7M points and colors while it spends 21 sec in Python doing the Rhino step of loading the XYZ into a Point3dList and 10 sec loading the colors into a .NET Color list. So there seems to be not much more opportunity to improve the binary-read to 73.7M points step while there appears to be some opportunity on the Rhino side to improve the time it takes to load the Point3dList. I say this because the DLL takes about 4 sec to do a binary read of the 4GB file and then 6 sec to do all the parsing and data storage. So of the 10 sec inside the DLL I can only influence the 6 sec parsing time and here I have run out of ideas for improvement. I am amazed that the c++ code is able to parse 0.67 GB of data per sec. But this does make sense as the code (from visual inspection) uses about 6 operations per byte or 1.5 ns/byte in a 4 GHz processor.

I hope my explanation helps to illuminate the steps we took to improve the performance of the code. I am really surprised at the outcome; now I can load a 4GB, 73.7M point cloud in 51 sec vs waiting over 13 minutes using the Rhino Import tool. Or the little 0.5GB, 9M point cloud in 6.2 sec. Next I am going to apply this approach to reading my 3D mesh models from the .obj file generated by Metashape. These were taking many minutes but now I believe these can be reduced to seconds.

Regards,
Terry.

ivelin.peychev · August 1, 2019, 12:51am

Hi Terry,

Thanks for the exhaustive explanation. I’m relatively new to programming and I don’t know how to work with binary data. Apparently something I have to work on.

Your vacation seems like it was a lot of fun.

I have an idea.
If Rhino is taking too much time, how about eliminating Rhino from the equation?
Instead of using Rhino’s IronPython Engine, use CPython and the module rhino3dm

Here’s my simple attempt.
Unfortunately I can’t figure out how to work with pointcloud in rhino3dm context. Maybe @stevebaer, @nathanletwory or @Alain could help here.

Prerequisites:

Tkinter
Rhino3dm
the other modules should be included in the CPython3 distro.

# First attempt to import point cloud from txt to 3dm
# Currently creating 3dm file to the desktop containing the points

import rhino3dm
import os
import time
import csv

from tkinter.filedialog import askopenfilename

def TST():
    #filtr = 'Text Files (*.txt)|*.txt| XYZ Color files (*.xyz)|*.xyz||'
    #strPath = rs.OpenFileName('XYZRGB file to import', filtr)
    strPath = askopenfilename(initialdir = "/",title = "Select file",filetypes = (("TXT files","*.txt"),("all files","*.*")))
    
    
    if not strPath: return
    # Create a File3dm object
    model = rhino3dm.File3dm()
    
    # create point cloud
    cloud = rhino3dm.PointCloud()
    #cloud = Rhino.Geometry.PointCloud()
    ts = time.time()
    
    # I don't get the underscore ?
    point_list = rhino3dm._rhino3dm.Point3dList
    with open(strPath,'r') as csvfile:
        
        dict_reader = csv.DictReader(csvfile, fieldnames=['x','y','z','r','g','b'], restkey=None, restval=None, dialect='excel',delimiter=',', quotechar="'",quoting=csv.QUOTE_NONNUMERIC)
        # csv.QUOTE_NONNUMERIC -> tells the reader to convert all non quoted data to floats
        
        for row in dict_reader:
            model.Objects.AddPoint(float(row['x']),float(row['y']),float(row['z']))
            #cloud.Add(Rhino.Geometry.Point3d(row['x'],row['y'],row['z']),Color.FromArgb(row['r'],row['g'],row['b']))#
            
    
    #model.Objects.AddPoint3dList(point_list)
    #model.Objects.AddPointCloud(cloud)
    #sc.doc.Objects.AddPointCloud(cloud)
    
    # Full path to 3dm file to save
    desktop = os.path.join(os.path.join(os.environ['USERPROFILE']), 'Desktop') 
    filename = 'point_cloud.3dm'
    path = os.path.join(desktop, filename)
    
    # Write model to disk
    model.Write(path, 6)
    
    
    print ("Elapsed time is {:.2f}".format(time.time()-ts))
    

if __name__ == "__main__":
    
    TST()

ivelin.peychev · August 1, 2019, 10:56am

Hi Terry (@Terry_Chappell)

I assume PointCloud is not completely implemented in CPython’s rhino3dm module:

So… I’ve decided to go back to IronPython but this time I used the standalone installation (not the Rhino embedded engine)

This works using my method (csv) perhaps you can apply your method with the dll and make it even faster. Now it doesn’t rely on Rhino application that has tons of unnecessary things done in background.

"""
A script to import points and colors from txt file, then create a PointCloud
and save the 3dm file to the desktop

by Ivelin Peychev

Prerequisites:
I. DotNet libraries:
    1. IronPython2.7.x (standalone installation)
    2. Rhino3dmIO.dll
    3. librhino3dmio_native.dll (this is included in the Rhino3dmIO nuget package
    4. System namespace included in the "mscorlib.dll" should be in the system.
    5. System.Drawing for the Color
    6. System.Windows.Forms for the open file dialog
II. Python modules:
    1. csv - for reading the txt file in a friendlier manner.
    2. time - for tracking the time needed for the task to complete
    3. os - for saving the file
"""

import clr

"""
using just this one leads to error that librhino3dmio_native.dll cannot be found
librhino3dmio_native.dll has to be in the same folder as Rhino3dmIO.dll
"""

clr.AddReferenceToFileAndPath(r"Z:\ipy2\nuget_dlls\McNeel\Rhino3dmIO.Desktop\lib\net45\Rhino3dmIO.dll")
clr.AddReference("System.Windows.Forms")
clr.AddReference("System.Drawing")
clr.AddReference("mscorlib")

import os
import time
import csv

import System
import Rhino
import Rhino.FileIO

from System.Windows.Forms import OpenFileDialog, DialogResult
from System.Drawing import Color

def TST():
    openFileDialog = OpenFileDialog()
    openFileDialog.InitialDirectory = "c:\\"
    openFileDialog.Filter = "txt files (*.txt)|*.txt|All files (*.*)|*.*"
    openFileDialog.FilterIndex = 2
    openFileDialog.RestoreDirectory = True
    if (openFileDialog.ShowDialog() == DialogResult.OK):
        #Get the path of specified file
        strPath = openFileDialog.FileName
        print strPath
        
    
    if not strPath: return
    
    """ So far so good I can get the path to the txt containing points"""
    
    # Create a File3dm object
    model = Rhino.FileIO.File3dm()
    
    # create point cloud
    cloud = Rhino.Geometry.PointCloud()
    ts = time.time()
    
    # I don't get the underscore ?
    point_list = Rhino.Collections.Point3dList()
    tcolors = System.Collections.Generic.List[Color]()
    with open(strPath,'r') as csvfile:
        
        dict_reader = csv.DictReader(csvfile, fieldnames=['x','y','z','r','g','b'], restkey=None, restval=None, dialect='excel',delimiter=' ', quotechar="'",quoting=csv.QUOTE_NONNUMERIC)
        # csv.QUOTE_NONNUMERIC -> tells the reader to convert all non quoted data to floats
        
        for row in dict_reader:
            #model.Objects.AddPoint(float(row['x']),float(row['y']),float(row['z']))
            point_list.Add(row['x'],row['y'],row['z'])
            tcolors.Add(Color.FromArgb(row['r'],row['g'],row['b']))
            #cloud.Add(Rhino.Geometry.Point3d(row['x'],row['y'],row['z']),Color.FromArgb(row['r'],row['g'],row['b']))#
            
    """rhino3dm.PointCloud does not have attribute AddRange"""
    cloud.AddRange(point_list,tcolors)
    
    
    model.Objects.AddPointCloud(cloud)
    #sc.doc.Objects.AddPointCloud(cloud)
    
    # Full path to 3dm file to save
    desktop = os.path.join(os.path.join(os.environ['USERPROFILE']), 'Desktop') 
    filename = 'point_cloud.3dm'
    path = os.path.join(desktop, filename)
    
    # Write model to disk
    model.Write(path, 6)
    
    
    print ("Elapsed time is {:.2f}".format(time.time()-ts))
    print ("Done!")

if __name__ == "__main__":
    
    TST()

https://www.nuget.org/packages?q=rhino3dmio
https://ironpython.net/download/

ivelin.peychev · August 1, 2019, 1:55pm

Here’s adaptation of @Terry_Chappell’s method to standalone IronPython.

1.7M points in 6.44 s.

"""
A script to import points and colors from txt file, then create a PointCloud
and save the 3dm file to the desktop

by Ivelin Peychev & Terry Chappell

Prerequisites:
I. DotNet libraries:
    1. IronPython2.7.x (standalone installation)
    2. Rhino3dmIO.dll
    3. librhino3dmio_native.dll (this is included in the Rhino3dmIO nuget package
    4. System namespace included in the "mscorlib.dll" should be in the system.
    5. System.Drawing for the Color
    6. System.Windows.Forms for the open file dialog
II. Python modules:
    1. csv - for reading the txt file in a friendlier manner.
    2. time - for tracking the time needed for the task to complete
    3. os - for saving the file
    4. itertools - for islice
"""

import clr

"""
using just this one leads to error that librhino3dmio_native.dll cannot be found
librhino3dmio_native.dll has to be in the same folder as Rhino3dmIO.dll
"""

clr.AddReferenceToFileAndPath(r"Z:\ipy2\nuget_dlls\McNeel\Rhino3dmIO.Desktop\lib\net45\Rhino3dmIO.dll")
clr.AddReference("System.Windows.Forms")
clr.AddReference("System.Drawing")
clr.AddReference("mscorlib")

import os
import time
import csv

import System
import Rhino
import Rhino.FileIO

from System.Windows.Forms import OpenFileDialog, DialogResult
from System.Drawing import Color

from itertools import islice


def TST():
    openFileDialog = OpenFileDialog()
    openFileDialog.InitialDirectory = "c:\\"
    openFileDialog.Filter = "txt files (*.txt)|*.txt|All files (*.*)|*.*"
    openFileDialog.FilterIndex = 2
    openFileDialog.RestoreDirectory = True
    if (openFileDialog.ShowDialog() == DialogResult.OK):
        #Get the path of specified file
        strPath = openFileDialog.FileName
        print strPath
        
    
    if not strPath: return
    
    """ So far so good I can get the path to the txt containing points"""
    
    # Create a File3dm object
    model = Rhino.FileIO.File3dm()
    ts = time.time()
    
    """ """
    
    file = open(strPath)
    if not file: return
    total = 0
    cloud = Rhino.Geometry.PointCloud()
    while True:
        # Get group of lines to process.
        n_lines = list(islice(file, 1000))  # neat trick from stack overflow brought by nathancoatney of Rhino Forum.
        if not n_lines: break
        total += len(n_lines)
        # Define .NET structures for holding points and colors.
        points = Rhino.Collections.Point3dList()
        # Create list for holding 32-bit representation of RGB color.
        tcolors = []
        # Load points and colors for point cloud.
        for line in n_lines:
            d1,d2,d3,d4,d5,d6 = map(float,line.strip().split(','))
            points.Add(d1,d2,d3)
            # Convert R,G,B color to 32-bit color.
            tcolors.append(65536.*d4 + 256.*d5 + d6)
        # Convert 32-bit color to .NET Color.
        colors = System.Collections.Generic.List[Color](map(Color.FromArgb,tcolors))
        # Add points & colors to point cloud.
        cloud.AddRange(points, colors)
    """..."""
    model.Objects.AddPointCloud(cloud)
    #sc.doc.Objects.AddPointCloud(cloud)
    
    # Full path to 3dm file to save
    desktop = os.path.join(os.path.join(os.environ['USERPROFILE']), 'Desktop') 
    filename = 'point_cloud.3dm'
    path = os.path.join(desktop, filename)
    
    # Write model to disk
    model.Write(path, 6)
    
    
    print ("Read {0} points in {1:.2f} seconds.".format(total,time.time()-ts))
    print ("Done!")

if __name__ == "__main__":
    
    TST()

Update: Cleaning up some Python2 modules .
This allows now the script to be compiled into a single executable

"""
A script to import points and colors from txt file, then create a PointCloud
and save the 3dm file to the desktop

Cleaning up some Python2 modules.
This allows now the script to be compiled into a single executable

by Ivelin Peychev & Terry Chappell

Prerequisites:
I. DotNet libraries:
    1. IronPython2.7.x (standalone installation)
    2. Rhino3dmIO.dll
    3. librhino3dmio_native.dll (this is included in the Rhino3dmIO nuget package
    4. System namespace included in the "mscorlib.dll" should be in the system.
    5. System.Drawing for the Color
    6. System.Windows.Forms for the open file dialog
    7. System.IO included in "mscorlib.dll"
"""

import clr

"""
using just this one leads to error that librhino3dmio_native.dll cannot be found
librhino3dmio_native.dll has to be in the same folder as Rhino3dmIO.dll
"""

clr.AddReferenceToFileAndPath(r"Z:\ipy2\nuget_dlls\McNeel\Rhino3dmIO.Desktop\lib\net45\Rhino3dmIO.dll")
clr.AddReference("System.Windows.Forms")
clr.AddReference("System.Drawing")
clr.AddReference("mscorlib")

#import os
#import time
#import csv

#import System
#import Rhino
#import Rhino.FileIO

from time import time
from System.IO import Path
from Rhino.Geometry import PointCloud
from Rhino.Collections import Point3dList
from Rhino.FileIO import File3dm
from System.Collections.Generic import List
from System.Environment import ExpandEnvironmentVariables
from System.Windows.Forms import OpenFileDialog, DialogResult
from System.Drawing import Color
from System.Windows.Forms import MessageBox, MessageBoxButtons, MessageBoxIcon


from itertools import islice


def TST():
    openFileDialog = OpenFileDialog()
    openFileDialog.InitialDirectory = "c:\\"
    openFileDialog.Filter = "txt files (*.txt)|*.txt|All files (*.*)|*.*"
    openFileDialog.FilterIndex = 2
    openFileDialog.RestoreDirectory = True
    if (openFileDialog.ShowDialog() == DialogResult.OK):
        #Get the path of specified file
        strPath = openFileDialog.FileName
        #print strPath
        
    
    if not strPath: return
    
    """ So far so good I can get the path to the txt containing points"""
    
    # Create a File3dm object
    model = File3dm()
    ts = time()
    
    """ """
    
    file = open(strPath)
    if not file: return
    total = 0
    cloud = PointCloud()
    while True:
        # Get group of lines to process.
        n_lines = list(islice(file, 1000))  # neat trick from stack overflow brought by nathancoatney of Rhino Forum.
        if not n_lines: break
        total += len(n_lines)
        # Define .NET structures for holding points and colors.
        points = Point3dList()
        # Create list for holding 32-bit representation of RGB color.
        tcolors = []
        # Load points and colors for point cloud.
        for line in n_lines:
            d1,d2,d3,d4,d5,d6 = map(float,line.strip().split(','))
            points.Add(d1,d2,d3)
            # Convert R,G,B color to 32-bit color.
            tcolors.append(65536.*d4 + 256.*d5 + d6)
        # Convert 32-bit color to .NET Color.
        colors = List[Color](map(Color.FromArgb,tcolors))
        # Add points & colors to point cloud.
        cloud.AddRange(points, colors)
    """..."""
    model.Objects.AddPointCloud(cloud)
    #sc.doc.Objects.AddPointCloud(cloud)
    
    # Full path to 3dm file to save
    desktop = Path.Combine(ExpandEnvironmentVariables(r"%USERPROFILE%"), 'Desktop')
    filename = 'point_cloud.3dm'
    path = Path.Combine(desktop,filename)
    
    
    # Write model to disk
    model.Write(path, 6)
    
    
    #print ("Read {0} points in {1:.2f} seconds.".format(total,time.time()-ts))
    MessageBox.Show("Read {0} points in {1:.2f} seconds.".format(total,time()-ts), "Result", MessageBoxButtons.OK, MessageBoxIcon.Exclamation)
    #print ("Done!")

if __name__ == "__main__":
    
    TST()

nathancoatney · August 1, 2019, 4:39pm

About the itertools islice, if I remember correctly, it is like a ‘chunked’ generator, so it is providing n lines of the file for each next(). I think it is also supposed to be faster than file.read or readline. In CPython, maybe it is talking in C to the file object, I’m not sure how it works under the hood. In Ironpython, it seemed to be faster but I didn’t test extensively. I mainly wanted the ‘chunks’ for the //foreach and it seemed the best way to get them.

I also started looking at the io module to read the file as bytes, hoping this would change the bytes->string->parse->float/int cycle to bytes->parse->float/int, but ran out of time. I may try that again out of curiosity.

I’m also still interested in the //foreach. This seems like a good case for it, I just haven’t figured out what I am doing wrong. Or, it may some gotcha from using a hosted implementation IronPython. Maybe I’ll try it outside of Rhino.

Terry_Chappell · August 1, 2019, 7:50pm

This is interesting work you are doing using methods I am not familiar with. But I am not seeing any improvement vs the ImportXYZRGB_Python() procedure inside the XYZRGB_with_DLL.py I posted above. This code would take 5.7 s to read 1.7M points on my computer. You need to use binary read for best performance. I saw a 3X improvement if file read time over reading lines: For a 4GB file, the read time dropped from 45 sec to 15 sec in Python and 4 sec in C++. Could you try putting the ImportXYZRGB_Python() procedure inside your version above and see how long that takes? This should provide a significant improvement.

My ImportXYZRGB_DLL() procedure inside XYZRGB_with_DLL.py would read the 1.7M points in 1.1 sec on my computer. This is the number we would like to improve upon. The DLL called by this procedure does most of the heavy lifting; binary file read, parsing block of data into XYZ and 32-bit color. The only way I know how to improve upon the c++ code for the DLL is to either improve the algorithm for converting a string to value (but it only takes 6 ops/value) or hand-coding this algorithm in assembly code which I did on the original IBM PC with 8088 processor but do not want to get down in the dirt to do this again now 38 years later as the ROI is too low. 3X more time is spent loading the Point3dList and .NET Colors list. I thought this is what your Rhino inside approach would improve.

Another approach is to write it like a plugin using Microsoft Visual Studio following the guidelines given here: https://developer.rhino3d.com/guides/cpp/your-first-plugin-windows/
and using c++ calls to Rhinocommon. But I am concerned that all this extra effort could be wasted if the steps to generate the point cloud do not run any faster when called from c++. This would be the thing to test first or ask about.

Currently I am focusing on adapting the DLL to reading an .OBJ file for a 3D mesh model. So of my models take minutes to load and I am hoping to bring these down to seconds.

Regards,
Terry.

ivelin.peychev · August 2, 2019, 12:36am

I’m getting there. I do not understand why would the scripts be executed faster when ran inside rhino, perhaps it’s due to the fact that all assemblies are already in the memory.

Here’s the direct adaptation of the “with_DLL” version running “Rhino-free”. Instead of the 0.1 s. that I get for the while loop from inside Rhino, here I get 0.23 s.

Tested with this file (64 K points).
64K_points_spaces.txt (2.5 MB)

save these files in the same folder as I reference the Python (no cpp) method from the second file.

points_to_3dm_dll_v2.py (6.4 KB)
points_to_3dm_v1.py (4.0 KB)

There are a couple of things I wish to try further:

multi threading
clean all CPython modules used currently.

Perhaps the true benefit of it all running outside Rhino would be the possibility to run this script on multiple TXT files and create separate 3dm files. For what you’d have to open multiple Rhino instances or open-run_script-save-new…etc.

Terry_Chappell · August 2, 2019, 12:54am

I did not quite follow what you said about the timing. How long does points_to_3dm_dll_v2.py take to import the points? What other version are you comparing to (while loop from inside Rhino is not something I recognize)?

Regards,
Terry.

Terry_Chappell · August 2, 2019, 1:00am

I agree, the best route for further improvement is multi threading since the point-cloud import can be done in parallel with each tread importing a well definable part of the cloud. The challenge is finding a way to accomplish the Rhinocommon operation of loading the Point3dList for each part in parallel. Without this, not much more speedup will be possible.

Regards,
Terry.

nathancoatney · August 2, 2019, 2:22am

After failing at a python version using bytes streams instead of open(‘r’) (I’m not sure it would be any faster anyway), I decided to refactor down to the ‘fastest’ python version using everything in this thread and a recently remembered hint of making local references to globals/built-ins.

I used @Terry_Chappell 's 9M point point cloud. Here are the timings on my machine, vs. the previous python-only solutions:

78.1814498901 Terry_Chappell’s 1st script
66.6666870117 nathancoatney’s non // foreach
75.5000000000 Terry_Chappell’s 2nd script
54.8070000000 Terry_Chappell’s 3rd script
41.1619949.41 This script:

import rhinoscriptsyntax as rs
import scriptcontext as sc
import Rhino as R
from itertools import islice
import System.Drawing.Color as Color
import System.Collections.Generic.List as List


def parse(path, batch_size=1000, separator=' '):
    with open(path, 'r') as f:
        lf = float  # local references to global functions might be faster
        li = int
        pc = R.Geometry.PointCloud()
        while True:
            n_lines = list(islice(f, batch_size))  # this gives us a batched iterator
            if not n_lines:
                break
            points = R.Collections.Point3dList()  # this is faster than a python or .net list
            # for some reason, PointCloud.AddRange() doesn't like colors in a python list,
            # so using .net list here
            colors = List[Color]()
            for line in n_lines:
                split = line.strip().split(separator)
                points.Add(lf(split[0]), lf(split[1]), lf(split[2]))
                colors.Add(Color.FromArgb(li(split[3]), li(split[4]), li(split[5])))
            pc.AddRange(points, colors)
        return pc


if __name__ == '__main__':
    import time
    path = rs.OpenFileName('Select Point Cloud File', 'Text Files (*.txt)|*.txt| XYZ Color files (*.xyz)|*.xyz|any|*.*')
    start = time.time()
    pc = parse(path)
    guid = sc.doc.Objects.AddPointCloud(pc)
    print('parse time: {}'.format(time.time()-start))
    print('point cloud with {} points added to doc at {}'.format(pc.Count, guid))

Most of the speedup is from making local references to the global float and int function. I remembered this ‘trick’ from CPython and it seems to do some good in IronPython as well. Also the Rhino.Collections.Point3dList is faster than a .net List.

I tried using Terry_Chappell’s map floating point color trick but didn’t see any gains…

If the improvement holds up on other’s machines then maybe it will be useful to someone that needs to stay in python.

Terry_Chappell · August 2, 2019, 5:54am

Nathan and Ivelin,

I updated my script to improve the Python-only performance by using Nathan’s trick of making the float and int procedures local. I also found this benefits .NET Color so I made it local also. Then I played with the block size and found a smaller block of 25K is 5% faster. With these changes, the time on my computer dropped from 31 sec to 18.6 sec. But if you install the DLL, the results are much faster, only 6.2 sec for the 9M point cloud or 3X faster. The code will still run without the DLL installed as it automatically uses the Python-only procedure ImportXYZRGB_Python that is included; it should run in about 35 sec on Nathan’s computer for the 9M test case. I think it runs faster on my computer because of the Intel PCIe 3.0 M.2 SSD with 3 GB/sec read bandwidth.

I tried your other suggestions and carefully timed them over 6 runs. The results:

(1) My binary read approach is 2.7% faster than using itertools’s islice iterator. It takes more lines of code but was done this way to help develop the code used inside the DLL where the binary read offers a 3X advantage over reading lines. The binary read inside the DLL is lightning fast, spending only 4 sec to read a 4GB file.
(2) R.Geometry.PointCloud() provided no speedup over PointCloud().
If you look at the imports list in my script, you will see that PointCloud is actually a Rhino.Geometry.PointCloud the same as what you are using.
(3) R.Collections.Point3dList() provided no speedup over Point3dList.
Again this is because Point3dList is actually Rhino.Geometry.Point3dList in my imports list.

Here is the update script (it uses the same DLL as posted before).

# If you want to use the DLL for 3X faster cloud import, set its path here:
dll_name = r'C:\Users\Terry\source\repos\ReadCloud\x64\Release\ReadCloud.dll'
"""
Python script to import XYZRGB format colored point cloud into Rhino 6 or 7 WIP.
It uses cloud.AddRange to provide an approximately 3X speedup over cloud.Add and
avoids the 70M point limit of the Rhino Point3dList structure used in AddRange
by processing groups of 2000 points at a time. For 3X faster execution, it calls
a DLL which reads a 100 KB block at a time and then parses the block into into
the X,Y,Z coordinates and R,G,B color of a point. The 100KB block fits entirely
within the L1-cache of the processer resulting in higher performance.
The coordinates and R,G,B colors are passed back to Python where they are loaded
into Point3d and .NET Color lists and then added to the cloud using AddRange.
The script with DLL is 3X faster than the included all-Python version and 15X
faster than Rhino's Import command. For example, it reads a 4GB file and creates
a 73.7M point cloud in 51 sec vs 179 sec (Python only) vs 790 sec or 13:10 min
(Rhino Import). The script imports a colored point cloud at about 1.4M pts/sec.
If the DLL is not installed, the script still runs using the included all-Python
procedure ImportXYZRGB_Python.
By Terry Chappell 8/2/2019.
"""
from Rhino.Geometry import PointCloud
from Rhino.Collections import Point3dList
from System.Collections.Generic import List
from System.Drawing import Color
from scriptcontext import doc
import rhinoscriptsyntax as rs
import Rhino as R
from time import time
from itertools import islice
import os
from ctypes import c_longlong, cdll as c_cdll, c_char, c_long, c_int, c_double,\
	byref as c_byref, c_wchar_p

# This is used when DLL is available.
def ImportXYZRGB_DLL():
	# Make Color local for faster performance.
	lColor = Color
	# Format of file should be XYZRGB: 1.2345 2.5682 3.9832 155 200 225
	filtr = 'Text Files (*.txt)|*.txt| XYZ Color files (*.xyz)|*.xyz||'
	strPath = rs.OpenFileName('XYZRGB file to import', filtr)
	if not strPath: return
	timea = time()
	# Zero number of points read.
	total = 0
	# Get size of file in bytes.
	file_size = os.path.getsize(strPath)
	# Read 100K bytes blocks. Fits in the L1 cache and for better performance.
	block_size = 100000
	# Estimate the number of points that are contained in the block.
	est_points = block_size // 30 # 3 FP numbers + 3 Int numbers.
	# Set size of points & colors arrays based upon estimated number of points.
	nxyz = int(est_points)
	ncolors = int(est_points)
	print 'Reading file {0} with {1:,} bytes in {2:,}KB blocks and converting to point cloud . . .'\
		.format(strPath, file_size, block_size // 1000)
	# Define c-types variables for interfacing to DLL.
	cx = (c_double * (nxyz))() # X-coordinant of point in cloud.
	cy = (c_double * (nxyz))() # Y-coordinant
	cz = (c_double * (nxyz))() # Z-coordinant
	ccolors = (c_long * (ncolors))() # Color for each point
	cnum_points = (c_int * (4))() # Number of points read + room 3 debug values.
	cmemblock = (c_char * (block_size))() # Working area for reading file.
	coffset = (c_longlong * (1))(0) # Offset into file as each block is read.
	cfile_length = (c_longlong * (1))(file_size) # File size for detecting EOF.
	# Initialize cloud for holding results of read.
	cloud = PointCloud()
	# Zero offset for aligning to line boundary.
	offset = 0
	# Read file in blocks and process strings into points and colors for cloud.
	while offset < file_size:
		# Call DLL to open file, read & parse block and return points & colors.
		soReadCloud.read_cloud(strPath, c_byref(cx),c_byref(cy),c_byref(cz),
			c_byref(ccolors), c_byref(cnum_points), c_byref(cmemblock),
			block_size, c_byref(coffset), c_byref(cfile_length))
		# Get number of points read from block.
		num_points = cnum_points[0]
		# Get offset for next block read.
		offset = coffset[0]
		# Sum total number of points read.
		total += num_points
		# Define Rhino.Collections Point3dList for holding XYZ coordinates. 
		points = Point3dList()
		# Load points using coordinates returned by DLL.
		for i in xrange(num_points): points.Add(cx[i],cy[i],cz[i])
		# Get active colors from ctypes-colors list.
		colors = ccolors[:num_points]
		# Convert 32-bit colors to .NET Colors.
		colors = List[lColor](map(lColor.FromArgb,colors))
		# Add points & colors to point cloud.
		cloud.AddRange(points, colors)
	# Add visible point cloud to document.
	obj = doc.Objects.AddPointCloud(cloud)
	doc.Views.Redraw()
	timeb = time()
	print 'Read {0:,} lines and added point cloud in {1:.4f} sec.'\
		.format(total, timeb - timea)

# This is used when DLL was not found.
def ImportXYZRGB_Python():
	# Local references for float, int, Color make them over 20% faster.
	lf = float # This saves 7 sec for 9M cloud import or 25%
	li = int # This saves 6.7 sec or 23%
	lColor = Color # This saves 1.7 sec off 20.2 sec or 8%
	pc = PointCloud() # No savings.
	# Select point cloud to import.
	filtr = 'Text Files (*.txt)|*.txt| XYZ Color files (*.xyz)|*.xyz||'
	# Format of file should be XYZRGB: 1.2345 2.5682 3.9832 155 200 225
	strPath = rs.OpenFileName('XYZRGB file to import', filtr)
	if not strPath: return
	timea = time()
	# Zero counter for number of points read.
	total = 0
	# Use 25K block size which easily fit in L1-cache for better performance.
	block_size = 25000
	# Zero counter for number of bytes read to use in detecting end of file.
	bytes_read = 0
	# Get file size in bytes so end of file can be detected.
	file_size = os.path.getsize(strPath)
	# Zero offset for aligning to line boundary.
	offset = 0
	# Reset done flag.
	done = False
	print 'Reading file {0} with {1:,} bytes in {2:,}KB blocks and converting to point cloud . . .'\
		.format(strPath, file_size, block_size // 1000)
	# Read file, process lines into points and load points into cloud.
	# Binary read is 2.7% faster than using itertools islice iterator.
	# Use next line with 'r' read option.
	#batch_size = 1000
	with open(strPath, 'rb') as file:
		while not done:
			#Use next 13 lines with 'rb' read option.
			file.seek(offset, 1) # Offset is relative to current position.
			# Read block of data.
			block = file.read(block_size)
			if not block: break
			# Split block into lines.
			lines = block.splitlines()
			# Back up file pointer by length of last line.
			offset = -len(lines[-1])
			# Sum number of bytes read so end can be detected.
			bytes_read += block_size + offset
			# If not last group, remove last line as it is likely incomplete.
			if bytes_read < file_size: del lines[-1]
			# If this is the last group, set done flag.
			else: done = True
			# Use next 2 lines with 'r' read option.
			#lines = list(islice(file, batch_size))  # 2.7% slower (6 run ave.).
			#if not lines: break
			# Count number of points read.
			total += len(lines)
			# Create lists for holding point3d and 32-bit colors.
			points = Point3dList()
			colors = List[lColor]()
			# Parse each line to extract points and colors.
			# Use lf, li and lColor for better performance.
			for line in lines:
				d1,d2,d3,d4,d5,d6 = line.split(' ')
				points.Add(lf(d1), lf(d2), lf(d3))
				colors.Add(lColor.FromArgb(li(d4), li(d5), li(d6)))
			# Add points & colors to point cloud.
			pc.AddRange(points, colors)
	# Add visible point cloud to document.
	obj = doc.Objects.AddPointCloud(pc)
	doc.Views.Redraw()
	timeb = time()
	print 'Read {0:,} lines and added point cloud in {1:.4f} sec.'\
		.format(total, timeb - timea)

if __name__ == "__main__":
	# Try to get DLL.
	try:
		soReadCloud = c_cdll.LoadLibrary(dll_name)
		ImportXYZRGB_DLL()
	# Use Python code when DLL not found.
	except:
		print 'WARNING: Did not find DLL at {}. A 3X slower all Python version will be used.'\
			.format(dll_name)
		ImportXYZRGB_Python()

XYZRGB_DLL_Python.py (7.8 KB)

Regards,
Terry.

ivelin.peychev · August 2, 2019, 8:14am

Regarding this:

The latest scripts that I posted are working without having to launch Rhino application. Since the time inside the scripts calcuates how much time it take for the while loop to complete (which is reading all lines and creating the point cloud) I ran your script with the dll first from inside Rhino and afterwards using my script that runs outside of Rhino. I assume RhinoPythonEditor does some pre-loading of the assemblies prior to launching the script.

The same algorithm (with the dll) when launched from RhinoPythonEditor on a 64K point file takes 0.1 seconds, and when run without using rhino (standalone IronPython) it takes 0.2 seconds. I did not test with larger files.
I will try to do today, or during the weekend. I will also try to add the new improvements and see if cleaning the scripts from all CPython modules and compiling it to .exe will do any good.

Terry_Chappell · August 2, 2019, 6:17pm

Ivelin,

I like the work you are doing on this. It helps me to learn about this different approach which may benefit some cases.

Regards,
Terry.