Join many lines - O(n2)?


#1

I want to join 100k+ straight line segments. The native join function does not scale well. My first approach was to segment the lines in octrees and join them in batches. Which is faster, but i think it can be much faster.
Commands like select duplicates are magnitudes faster - how do they work? Is it a simple comparator, or can i use Point3d.GetHashCode code for that? Tolerances need epsilon compare - are they then slower?
I can build the resultin polyline myself and keep track of ‘used’ points, i just wonder what is the best way to go about that…
Also related - do Point3dLists use any kind of acceleration structure behind the scenes? I guess they are not, because they dont seem to slow down on list changes - what makes them so fast?


(David Rutten) #2

Join is pretty optimised, but is hampered somewhat by the fact that it needs to tale tolerances into account. If your lines are guaranteed to have the exact same end-point coordinates, then you can build a Dictionary using the end-points as keys and the line indices as values. This allows you to look up all lines that start/end at a specific coordinate in O(log N).

Another approach would be to create two arrays containing all your lines and sort them by ascending x-coordinates of the start and end point respectively. Binary search in arrays like this is also pretty fast and this approach allows you to add some tolerances to the search.

Finally, as always, you need to first figure out what part of your algorithm is taking the most time. Is it the searching for nearby lines? Is it the removal of joined lines from existing collections? You can only start looking for sensible optimisations once you know what needs to be optimised.


#3

Thanks for the infos, the binary search sounds promising (already had a look at that)
Regarding the native Rhino join command: It is one of this unwritten rules that one should never join more than 8-10k curves at once, but rectangle select them in batches, join, and finally join all together. At least in architectural work this happens often, and everybody learns this the hard way…
I was surpised that this process automated in a script/gh-node actually could speed up the join by several magnitudes. Of couse finding the right batch size/geometry is tricky, but maybe little optimizations like this could be in Rhino 6? Or is it too hacky/flimsy?


#4

@atair I think you are onto something with GH, I followed what David said and benchmarked the various parts and found that the join could be done in ~60 seconds while just adding 100,000 line polylines to the document was taking a 140 seconds. @DavidRutten any other ideas to speed this up?

Below is the code I used:

protected override Result RunCommand(RhinoDoc doc, RunMode mode)
{
	const int divisions = 10;

	// Select two curves to intersect
	var go = new Rhino.Input.Custom.GetObject();
	go.SetCommandPrompt("Select curves");

	go.GeometryFilter = Rhino.DocObjects.ObjectType.Curve;
	go.GetMultiple(2, -1);

	if (go.CommandResult() != Rhino.Commands.Result.Success)
		return go.CommandResult();

	// Calculate Bounding Box;

	var start = DateTime.Now;

	List<Curve> curves = new List<Curve>(go.ObjectCount);
	List<Guid> objectsToDelete = new List<Guid>();

	BoundingBox box = new BoundingBox();

	foreach (var o in go.Objects())
	{
		Curve c = o.Curve();

		if (c != null && c.IsValid)
		{
			curves.Add(c);
			objectsToDelete.Add(o.ObjectId);
			box.Union(c.GetBoundingBox(false));
		}
	}

	List<List<Curve>> dividedCurves = new List<List<Curve>>();

	double xdivisions = (int)Math.Min(box.Diagonal.X, divisions);
	double ydivisions = (int)Math.Min(box.Diagonal.Y, divisions);
	double zdivisions = (int)Math.Min(box.Diagonal.Z, divisions);

	xdivisions = Math.Max(xdivisions, 1);
	ydivisions = Math.Max(ydivisions, 1);
	zdivisions = Math.Max(zdivisions, 1);

	for (int i = 0; i < xdivisions; i++)
	{
		double minX = box.Min.X + (box.Diagonal.X / (xdivisions)) * i;
		double maxX = box.Min.X + (box.Diagonal.X / (xdivisions)) * (i + 1);

		for (int j = 0; j < ydivisions; j++)
		{
			double minY = box.Min.Y + (box.Diagonal.Y / (ydivisions)) * j;
			double maxY = box.Min.Y + (box.Diagonal.Y / (ydivisions)) * (j + 1);

			for (int k = 0; k < zdivisions; k++)
			{
				double minZ = box.Min.Z + (box.Diagonal.Z / (zdivisions)) * k;
				double maxZ = box.Min.Z + (box.Diagonal.Z / (zdivisions)) * (k + 1);

				BoundingBox testBox = new BoundingBox(minX, minY, minZ, maxX, maxY, maxZ);

				var newList = new List<Curve>();


				foreach (var c in curves)
					if (testBox.Contains(c.GetBoundingBox(false).Center))
						newList.Add(c);


				dividedCurves.Add(newList);
			}
		}
	}

	var divideCurvesTime = DateTime.Now - start;

	start = DateTime.Now;

	List<Curve> joinedCurves = new List<Curve>();

	foreach (var cl in dividedCurves)
	{
		if (cl.Count < 1)
			continue;

		if (cl.Count == 1)
		{
			joinedCurves.Add(cl[0]);
			continue;
		}

		RhinoApp.Wait();

		var dividedResult = Rhino.Geometry.Curve.JoinCurves(cl);

		if (dividedResult == null)
			continue;

		foreach (var c in dividedResult)
			c.RemoveShortSegments(doc.ModelAbsoluteTolerance);

		joinedCurves.AddRange(dividedResult);
	}

	var subJoinCurvesTime = DateTime.Now - start;
	start = DateTime.Now;

	if (joinedCurves.Count < 1)
		return Result.Failure;

	var result = Rhino.Geometry.Curve.JoinCurves(joinedCurves);

	var mainJoinTime = DateTime.Now - start;
	start = DateTime.Now;

	if (result != null && result.Length > 0)
	{
		foreach (var r in result)
			doc.Objects.AddCurve(r);


		doc.Views.Redraw();

		foreach (var g in objectsToDelete)
			doc.Objects.Delete(g, true);

	}

	var addCurvesTimes = DateTime.Now - start;
	start = DateTime.Now;

	RhinoApp.Write("Time divideCurves:" + divideCurvesTime.TotalSeconds + " subJoinCurves:" + subJoinCurvesTime.TotalSeconds);
	RhinoApp.WriteLine(" mainJoin:" + mainJoinTime.TotalSeconds + " addCurves:" + addCurvesTimes.TotalSeconds);

	return Result.Success;
}

I also posted my test file and the rhp to my blog.


(Radovan Grmusa) #6

Hi

If your curves to join are lines then solution is simple. We can do it without using joinCurves method because it slows things down. Instead we construct polyLines based on our Lines and codnition that end point of conecting-lines are in distance<tolerance. For searching such points we can use RTree - see code below and coments in code for expalnation. Code can be written in shorter form, but becasue of readibilty I keep it like this.
I tested it with 100.000 lines which ends up joined in 10 polylines, execution time less then 1 sec.
Here is the code - you call static method JoinLines with parameters lines and tolerance JoinLines(List< Line > lines, double tolerance ), and method returns List< Polyline > as result of joined lines:

    public static List<Polyline> JoinLines(List<Line> lines, double tolerance )
    {
        //our lines will be joined in polylines (list of polylines)
        List<Polyline> joinedLines = new List<Polyline>();

        //We will keep track of available lines for joining in HashSet
        HashSet<int> availableLineIds = new HashSet<int>();
        //Ones one particular line[i] is joined it will be removed (its inex) from  HashSet<int> availableLineIds

        //We will do search for lines to be joined by using RTree, 
        //  START/END points of one line must be in close distance (<tolerance) to START/END points of another line.
        //  If we find such points we add line to newly constructed PolyLine

        //So we will populate RTree with start points and end points of every line
        //=> we have list of total 2*N points for N lines, 
        //so if [i] is index of a Line, 
        //  then [2*i] will be index of coresponding START point
        //  and [2*i + 1] will be index of coresponding END point of i-th Line
        RTree rTreeLinePoints = new RTree();
        
        for (int i = 0; i < lines.Count; i++)
        {
            //insert START point of i-th line, index of start point in RTree is 2*i
            rTreeLinePoints.Insert(lines[i].From, i * 2);
            //insert END point of i-th line, index of end point in RTree is 2*i+ 1
            rTreeLinePoints.Insert(lines[i].To, i * 2 +1);
            //and we track available lines (its indexes) in availableLineIds
            availableLineIds.Add(i);
        }

        //availableLineIds has to have at least one line to start joining
        while (availableLineIds.Count > 0)
        {
            Polyline nextPolyline = new Polyline();
            // We take first available line ...
            int lineIndex = availableLineIds.First();
            // ... and with that initial line we (start to ) create PolyLine
            nextPolyline.Add(lines[lineIndex].From);
            nextPolyline.Add(lines[lineIndex].To);

            //remove initial line from list of available lines
            availableLineIds.Remove(lineIndex);

            //first we try to join lines to START point of initial line = add to the beginig of Polyline list ("left side")            
            while (true)
            {
                Point3d basePoint = nextPolyline[0]; //start of our polyline will be point to look for next line
                //we call our method to find next line wihich is closest to our basaePoint and distance<tolerance
                var nextLine = addNextLine(  lines,   rTreeLinePoints,  availableLineIds, basePoint, tolerance);                    
                if (nextLine==Line.Unset)
                {
                    //no more lines can be joined to the "left side"
                    break;
                }
                //This is how Rhino is doing joining.
                // => It computes midpoint between lines-ends and replace line start/end point with new midpoint
                // => It actulay alters/modifies original line...
                Point3d midPoint = (basePoint + nextLine.From) / 2;
                nextPolyline[0] = midPoint;
                nextPolyline.Insert(0, nextLine.To);
            }
            //
            //and then we try to join lines to END point of initial line = add to the end of Polyline list ("right side")  
            while (true)
            {
                Point3d basePoint = nextPolyline.Last;
                var nextLine = addNextLine(lines, rTreeLinePoints, availableLineIds, basePoint, tolerance);
                if (nextLine == Line.Unset)
                {
                    break;
                }
                Point3d midPoint = (basePoint + nextLine.From) / 2;
                nextPolyline[nextPolyline.Count - 1] = midPoint;
                nextPolyline.Add(nextLine.To);
            }
            // So we have polyline taht we add to list of our polylines that are result of joining lines                
            joinedLines.Add(nextPolyline);
        }
        return joinedLines;
    }

    public static Line addNextLine(List<Line> lines, RTree rTreeLinePoints, HashSet<int> availableLineIds, Point3d basepoint, double tolerance)
    {
        Line nextLine = Line.Unset;
        double distance = tolerance;
        Sphere sphere = new Sphere(basepoint, distance);
        int idPointFound = -2;
        var searchedAll = rTreeLinePoints.Search(sphere, (o, e) => 
                            {
                                int lineIndex = (int)(e.Id / 2);
                                if (availableLineIds.Contains(lineIndex))
                                {
                                    Point3d  pointOnLine = (e.Id % 2)==0 ? lines[lineIndex].From : lines[lineIndex].To;
                                    double curDistance = basepoint.DistanceTo(pointOnLine);
                                    if (idPointFound<0 || curDistance < distance)
                                    {
                                        //we try to find the closest point (closest start/end of line) to testing point                                            
                                        distance = curDistance;
                                        idPointFound = e.Id;
                                        e.SearchSphere = new Sphere(basepoint, distance);
                                    }                                        
                                }
                            } );
        if (!searchedAll)
        {
            Exception myE = new Exception("Tree not completely searched.");
            throw(myE);
        }
        //
        if (idPointFound>=0)
        {
            int lineIndex = (int)(idPointFound / 2);
            //we direction of line
            if (idPointFound % 2 == 0)
            {
                // closest point is start point of line 
                nextLine = new Line(lines[lineIndex].From, lines[lineIndex].To);
            }
            else
            {
                // closest point is end point of line 
                nextLine = new Line(lines[lineIndex].To, lines[lineIndex].From);
            }
            availableLineIds.Remove(lineIndex);
        }
        //
        return nextLine;
    }

Radovan


(Radovan Grmusa) #7

And one additional thing about joining curves - see attached images.
Every curve that has to be joined will be alttered and this is probably reason of slow execution of join command. Join = create new curves that match the other one at end points, delete previous one, and 100.000 times like that…


and after joining


#8

Thanks - this is awesome!