Rhinocommon function to get duplicate curves

Hi @Xavier_Ayme,

Your only choice in V5 is to write your own comparison function.

– Dale

Hi @dale,

Sorry to bother, but comparing curves is exactly what I did,my question is what kind of property should I look for in order for the comparison to be super fast. Should this information be confidential I’d understand.

Thanks!

–Xavier

just do it nested, ask for obvious similarities first, such as cp-count and degree, is it closed, rational etc. Break if not equal. if all these properties are equal you can move on by doing a controlpoint comparison, again build in a break in order to quit as soon as it is clear that there is difference. Also include weights, just in case a Curve shares the same point but is weighted different. Exclude all features you won’t have in your file. Shouldn’t be that costly. Be aware that points are structs, no classes.

Hello @TomTom and @dale,

Thanks for your help. Yep that’s what I was doing but I think I’ve found the bottleneck : it was not the comparison itself, it was that we were running a nested loop checking each curve against each other, which ended up being really slow for a large number of curves. Discovering today how RTree works, it just saved the whole process :slight_smile: Computing time went down from almost a minute to a second, RTrees are absolutely awesome!

Cheers

Hi Xavier

Just curoius about your solution.
In posted file is 10000++ curves, no duplicates. How does your alogorithm work with such collection of curves? How fast and how correct with detecting duplicates it is?
CurveDuplicates_1.3dm (3.9 MB)

Radovan

Hello @RadovanG,

Ok, you got me, it basically doesn’t work with your file, I didn’t even manage to finish the computation :frowning:

Basically, my algorithm does an R Tree check on the curve bounding box for narrowing down the possible duplicates. In your case, it just returns everyone since all bounding boxes are pretty much the same. So it ends up being a brute force approach, 12000 curves square. Any curve takes around 100ms to check, so you make the math, 1200 seconds at least whereas seldup completed in a few seconds. So I still don’t know how SelDup can be that fast…

Cheers

Hi Xavier

You should write your own hashing function (including tolerance as well) and by doing it that way you can narrow down to coolection of potetntial maches (duplicates) and then on this collection perform something like NurbsCurve.EpsilonEquals() comparison. I have done something like that before but will be able to post you sample code today late afternoon.

Radovan

Hi Radovan,

Ok great! Waiting for it :slight_smile:

– Xavier

Downthere is the sample code for testing curve duplicity. I was bussy last few weeks so you got it now.
As mentioned earlier:

  1. you have to calculate your own hash-code for each curve and put curves with same hash-code in separate groups (use ILookUp for it)

  2. if a group has only one object(curve) it means that curve in such group has no duplicate(s), so we skip such groups and items from such groups will not be marked as duplicte ones

  3. It is important that hash-code algorithm returns the same value for the same object

  4. Potentialy different objects can have the same hash-code value, this will happen rearly but it will happen

  5. Becasue of 4) you may use different hash-salt-value in hash-code algorithm to try to shrink groups more; this can be done recursively until a group can not be broken into subgroups

  6. To be able to implement calculation with tolerance in our hash-code there is method for point3d ‘rounding’,
    Point3d PointRounded(Point3d point, int precision).

  7. At the end comparison can be done by EpsilonEquals, but all curves has to be converted to nurbscurves
    Remark: hash-code algorithm was implemented in two steps, but can be merged into one hash-code mathod.
    Usage: call FindDuplicteCurves( curves, tolerance) method

     //
    
     public delegate uint CreateCurveHashCodeDelegateX(NurbsCurve crv, Double tolerance, int salt);
    
     public struct HashIndexPair
     {
         public uint Hash { get; set; }
         public int Id { get; set; }
         public HashIndexPair(uint hashPrm, int idPrm)
         {
             Hash = hashPrm;
             Id = idPrm;
         }
     }
    
     public static List<bool> FindDuplicteCurves(List<Curve> curves, double tolerance)
     {
         //Reverse curve direction where necesary
         var curvesToCheck = curves.Where(x => x != null)
                                     .Select(x => {
                                         if (x.PointAtStart > x.PointAtEnd) x.Reverse();
                                         return x;
                                     }).ToList();
         var nurbsCurves = curvesToCheck.Select(x => x.ToNurbsCurve()).Where(x => x != null).ToList();
         if (curves.Count != nurbsCurves.Count)
         {
             RhinoApp.WriteLine("FAILED TO CONVERT TO NURBS CURVES.  CurvesCount={0},  NurbesCurveCount={1}", curves.Count, nurbsCurves);
             return new List<bool>();
         }
         if (nurbsCurves.Count < 2)
         {
             return new List<bool>();
         }
         //Now we group curves in groups based on the same hash-code retrun by our hasing method; for grouping we use ILookup 
         //First we use CreateCurveHashCodeX1 method for calculating our hash-value...
         CreateCurveHashCodeDelegateX createMyCurveHashCode1 = CreateCurveHashCodeX1;
         var curvesLookupStart = nurbsCurves.Select((x, i) => new HashIndexPair(createMyCurveHashCode1(x, tolerance, 101), i)).ToLookup(y => y.Hash, y => y.Id);
         var curvesLookUpShrinked = ShrinkLookup(curvesLookupStart, nurbsCurves, tolerance, 313, createMyCurveHashCode1);
         //... after that we use CreateCurveHashCodeX2 method for calculating our hash-value.
         CreateCurveHashCodeDelegateX createMyCurveHashCode2 = CreateCurveHashCodeX2;
         var curvesLookUpShrinked2 = ShrinkLookup(curvesLookUpShrinked, nurbsCurves, tolerance, 313, createMyCurveHashCode2);
         //Now we have to do final check on remaining items in each group if they are EpsilonEquals()
         //If curve1 is EpsilonEquals to curve2 we mark both curves as duplicates by seting DuplicateStatus[] to true for each curve-index
         bool[] DuplicateStatus = new bool[curves.Count];
    
         foreach (var group in curvesLookUpShrinked2)
         {
             var ids = group.Select(x => x).ToList();                          
             for (int i = 0; i < ids.Count - 1; i++)
             {
                 for (int j = i+1; j < ids.Count ; j++)
                 {
                     if (!DuplicateStatus[ids[i]] || !DuplicateStatus[ids[j]])
                     {
                         if (nurbsCurves[ids[i]].EpsilonEquals(nurbsCurves[ids[j]], tolerance))
                         {
                             DuplicateStatus[ids[i]] = true;
                             DuplicateStatus[ids[j]] = true;
                         }
                     }
                 }
             }
         }
         return DuplicateStatus.ToList();
     }
    
     public static ILookup<uint, int> ShrinkLookup(ILookup<uint, int> lookUpToShrink, List<NurbsCurve> curves, double tolerance, int salt,
                                             CreateCurveHashCodeDelegateX createMyCurveHashCodeDelegate)
     {
         List<IGrouping<uint, int>> result = new List<IGrouping<uint, int>>();
         foreach (var group in lookUpToShrink)
         {
             int grpCount = group.Count();
             if (grpCount == 1)
             {
                 // skip single-item group => group has one item that has no duplicates
                 continue;
             }
             var groupLookUp = group.Select(x => new HashIndexPair(createMyCurveHashCodeDelegate(curves[x], tolerance, salt), x))
                                     .ToLookup(y => y.Hash, y => y.Id);
             if (groupLookUp.Count > 1)
             {
                 //group has been divided into more subgroups by using actual hashing-method => try to shrink subgroups more
                 var shrinkMore = ShrinkLookup(groupLookUp, curves, tolerance, salt * 2 + 1, createMyCurveHashCodeDelegate);
                 result.AddRange(shrinkMore);
             }
             else
             {
                 result.AddRange(groupLookUp);                    
             }
         }
         var resultLookup = result.SelectMany(x => x.Select(y => new HashIndexPair(x.Key, y))).ToLookup(x => x.Hash, x => x.Id);
         return resultLookup;
     }        
    
     public static uint CreateCurveHashCodeX1(NurbsCurve crv, Double tolerance, int salt)
     {
         int myHash = 0;
         unchecked
         {
             for (int i = 0; i < crv.Knots.Count; i++)
             {
                 myHash ^= crv.Knots[i].GetHashCode() * (2 * i + salt);
             }
             myHash ^= crv.Degree * 1559;
             myHash ^= crv.Dimension * 2029;
             myHash ^= saltHash(crv.Domain.GetHashCode(), salt) * 1699;
             myHash ^= saltHash(crv.IsClosed.GetHashCode(), salt) * 4099;
             myHash ^= saltHash(crv.IsPeriodic.GetHashCode(), salt) * 4447;
             myHash ^= saltHash(crv.IsValid.GetHashCode(), salt) * 5557;
             // point at start/end w. tolerance
             int roundPrecision = (int)Math.Log10(1d / tolerance);
             Point3d pAtEnd = PointRounded(crv.PointAtEnd, roundPrecision);
             myHash ^= saltHash(pAtEnd.GetHashCode(), salt) * 6007;
             Point3d pAtStart = PointRounded(crv.PointAtStart, roundPrecision);
             myHash ^= saltHash(pAtStart.GetHashCode(), salt) * 6977;
             myHash ^= crv.SpanCount * 9199;
         }
         return (uint)myHash;
     }
    
     public static uint CreateCurveHashCodeX2(NurbsCurve crv, Double tolerance, int salt)
     {
         int myHash = 0;
         int roundPrecision = (int)Math.Log10(1d / tolerance);
         unchecked
         {
             for (int i = 0; i < crv.Points.Count; i++)
             {
                 var controlPointLocation = PointRounded(crv.Points[i].Location, roundPrecision);
                 myHash ^= saltHash(controlPointLocation.GetHashCode(), salt) * (31 * i + 1);
                 myHash ^= saltHash(crv.Points[i].Weight.GetHashCode(), salt) * (97 * i + 1);
             }
         }
         return (uint)myHash;
     }
    
     public static Point3d PointRounded(Point3d point, int precision)
     {
         double p_X = Math.Round(point.X, precision);
         double p_Y = Math.Round(point.Y, precision);
         double p_Z = Math.Round(point.Z, precision);
         return new Point3d(p_X, p_Y, p_Z);
     }
    
     public static int saltHash(int hash, int divisor)
     {
         int remainder;
         Math.DivRem(hash, divisor, out remainder);
         return hash + remainder;
     }
    
     //
1 Like

Wow, this is really impressive @RadovanG! I will have a look to the code and come back to you. Yiu deserve a big thanks in advance though :slight_smile:

Wow, very good example and a very interesting piece of c#. Its also a good reference for rare language features such as the xor assign operator and the unchecked keyword to prevent overflow checking.

I think you could speed up “CreateCurveHashCodeX1” by hashing only knots and controlpoints, because the knotvector determines the degree, if its periodic and the domain, whereas the points do the rest. You can also assume that you always check for valid curves. It might also good to think about splitting the hashing furthermore, so that you only precise the hashing if there are more than 1 element in a hash/lookup table entry.

On a side-note, the unchecked keyword is necessary only when using constants. Since myHash isn’t a constant all calculations are by default marked as unchecked by the compiler, thus the unchecked is not necessary here (see unchecked documentation).

1 Like

You are right about hashing - hash function(s) can be more optimised and redundant code removed. Anyway this was just example how to use it in conection with curve-duplicity check.
Regarding unchecked and checked keywords, Nathat is right that by default arithmetic operations are unchecked if they are non-constant expresions. But I am not sure how code is compiled when overflow checking is enabled by compiler options or environment configuration, so I prefer to expicitely use keyword.

R

The code above should not call EpsilonEquals(), or alternatively only use tolerance == 0. That is because two items that are equal within epsilon tolerance, will likely have different hashes and therefore EspilonEquals will not be called anyways.

Code has method PointRounded(…) which is used to “round” point3d location based on supplied tolerance so two points inside tolerance will be “rounded” to same x,y,z values => hash value will be the same for both “rounded” points. This rounding is applied to end and start points of curve before getting hash value as well as to control points of nurbscurve before getting hash value for each ctrl point.
For example inside CreateCurveHashCodeX1() method there are lines of code for rounding PoitAtEnd:

int roundPrecision = (int)Math.Log10(1d / tolerance);
Point3d pAtEnd = PointRounded(crv.PointAtEnd, roundPrecision);
myHash ^= saltHash(pAtEnd.GetHashCode(), salt) * 6007;

This means that two points (PointAtEnd) which “round” to the same point3D will return same hash code.
The same logic is valid for other Point3d properties of curve and implies that hashes will be the same so code that calls EpsilonEquals will be called for these curves.
Probably there should be some fine-tuning of tolerance implementation in hash-code calculation, for example roundPrecision should be increased by 1 to be sure to catch all points inside tolerance, etc.

@RadovanG sorry this does not work, and will give false negatives that are, nonetheless, within tolerance.

The reason is that rounding breaks space into subunits, in 3D it breaks it into small boxes, where each box groups all points that derive from the same tolerance increment (if you round by 0.1, the box will be about 0.1 x 0.1 x 0.1 units). The problem is the gap between boxes. It’s perfectly fine for two point to be within tolerance, and yet to round to two distinct boxes. Think about a X coordinate pair of 3.050000001 and 3.0499999999: one will round to the 3.1 box, the other to the 3.0 box. Yet they are within tolerance of 0.000000002. So they will give a false negative.

Other than by reductio ad absurdum with 3D boxes and a practical example, you can also decompose this problem into a single coordinate problem. It’s not possible to have a single hashcode of a float with a tolerance.

You could get away by having two different hashcodes that round to out-of-tolerance gaps, then compare both hashcodes. However, these different hashcodes would not be easy to compose into a single logic when multiple points form a curve that you want to compare.

My suggestion would be to use some other method. If having false negatives is OK in your case, you can still use this logic, but you need to call it heuristic and make sure that it is also known for its users that it is such.

Yes, you are absolutely correct. Thanks for pointing it out.
It is not possible to use tolerance the way I did, some points will be “rounded” to different values even if they are very close to each other (=> distance less than tolerance).
So I updated code to do the job but WITHOUT TOLERANCE. I tested it on few millions different curves and it seems to give the same result as SelDupAll command in Rhino. And it is very fast at the same time.
Here is the code:
//

    public delegate uint CreateCurveHashCodeDelegate(NurbsCurve crv, int salt);

    public struct HashIndexPair
    {
        public uint Hash { get; set; }
        public int Id { get; set; }
        public HashIndexPair(uint hashPrm, int idPrm)
        {
            Hash = hashPrm;
            Id = idPrm;
        }
    }

    public static List<bool> FindDuplicteCurvesA(List<Curve> curves)
    {
        //Reverse curve direction where necesary
        var curvesToCheck = curves.Where(x => x != null)
                                    .Select(x => {
                                        if (x.PointAtStart > x.PointAtEnd) x.Reverse();
                                        return x;
                                    }).ToList();
        var nurbsCurves = curvesToCheck.Select(x => x.ToNurbsCurve()).Where(x => x != null).ToList();
        if (curves.Count != nurbsCurves.Count)
        {
            RhinoApp.WriteLine("*** FAILED TO CONVERT TO NURBS CURVES.  CurvesCount={0},  NurbesCurveCount={1}", curves.Count, nurbsCurves);
            return new List<bool>();
        }
        if (nurbsCurves.Count < 2)
        {
            return new List<bool>();
        }
        //Now we group curves in groups based on the same hash-code retrun by our hasing method; for grouping we use ILookup 
        CreateCurveHashCodeDelegate createMyCurveHashCode1 = CreateCurveHashCodeX3;
        var curvesLookupStart = nurbsCurves.Select((x, i) => new HashIndexPair(createMyCurveHashCode1(x, 101), i)).ToLookup(y => y.Hash, y => y.Id);
        var curvesLookUpShrinked = ShrinkLookup(curvesLookupStart, nurbsCurves, 313, createMyCurveHashCode1);
        //
        bool[] DuplicateStatus = new bool[curves.Count];
        foreach (var group in curvesLookUpShrinked)
        {
            var ids = group.Select(x => x).ToList();
            for (int j = 0; j < ids.Count; j++)
            {
                DuplicateStatus[ids[j]] = true;
            }
        }
        return DuplicateStatus.ToList();
    }

    public static ILookup<uint, int> ShrinkLookup(ILookup<uint, int> lookUpToShrink, List<NurbsCurve> curves, int salt,
                                    CreateCurveHashCodeDelegate createMyCurveHashCodeDelegate)
    {
        List<IGrouping<uint, int>> result = new List<IGrouping<uint, int>>();
        foreach (var group in lookUpToShrink)
        {
            int grpCount = group.Count();
            if (grpCount == 1)
            {
                // skip single-item group => group has one item that has no duplicates
                continue;
            }
            var groupLookUp = group.Select(x => new HashIndexPair(createMyCurveHashCodeDelegate(curves[x], salt), x))
                                    .ToLookup(y => y.Hash, y => y.Id);
            if (groupLookUp.Count > 1)
            {
                //group has been divided into more subgroups by using actual hashing-method => try to shrink subgroups more
                var shrinkMore = ShrinkLookup(groupLookUp, curves, salt * 2 + 1, createMyCurveHashCodeDelegate);
                result.AddRange(shrinkMore);
            }
            else
            {
                result.AddRange(groupLookUp);
            }
        }
        var resultLookup = result.SelectMany(x => x.Select(y => new HashIndexPair(x.Key, y))).ToLookup(x => x.Hash, x => x.Id);
        return resultLookup;
    }

    public static uint CreateCurveHashCodeX3(NurbsCurve crv, int salt)
    {
        int myHash = 0;
        unchecked
        {
            for (int i = 0; i < crv.Knots.Count; i++)
            {
                myHash ^= crv.Knots[i].GetHashCode() * (2 * i + salt);
            }
            for (int i = 0; i < crv.Points.Count; i++)
            {
                myHash ^= saltHash(crv.Points[i].Location.GetHashCode(), salt) * (31 * i + 1);
                myHash ^= saltHash(crv.Points[i].Weight.GetHashCode(), salt) * (97 * i + 1);
            }
            myHash ^= crv.Degree * 1559;
            myHash ^= crv.Dimension * 2029;
            myHash ^= saltHash(crv.Domain.GetHashCode(), salt) * 1699;
            myHash ^= saltHash(crv.IsClosed.GetHashCode(), salt) * 4099;
            myHash ^= saltHash(crv.IsPeriodic.GetHashCode(), salt) * 4447;
            myHash ^= saltHash(crv.IsValid.GetHashCode(), salt) * 5557;
            myHash ^= crv.SpanCount * 9199;
        }
        return (uint)myHash;
    }

    public static int saltHash(int hash, int divisor)
    {
        int remainder;
        Math.DivRem(hash, divisor, out remainder);
        return hash + remainder;
    }

Addendum: to be 100% sure that curves are equal probably EsilonEqual check on resulting list should be processed because still two different curves can return the same hash-code value.

Hi @RadovanG,

Really impressive piece of code, it works like a charm.

Quick question : why do you use a salt into your hashing function? I thought that salt hash had to do with security concerns, I guess your are not really worried about your sel dup function being hacked :smiley:

Cheers

–Xavier

Hi Xavier

Here is updated code for FindDuplicteCurvesA() method which at the end implements EsilonEqual comparison for ceases when two different curves returns the same hash-code value.

    public static List<bool> FindDuplicteCurvesA(List<Curve> curves)
    {
        //Reverse curve direction where necesary
        var curvesToCheck = curves.Where(x => x != null)
                                    .Select(x => {
                                        if (x.PointAtStart > x.PointAtEnd) x.Reverse();
                                        return x;
                                    }).ToList();
        var nurbsCurves = curvesToCheck.Select(x => x.ToNurbsCurve()).Where(x => x != null).ToList();
        if (curves.Count != nurbsCurves.Count)
        {
            RhinoApp.WriteLine("*** FAILED TO CONVERT TO NURBS CURVES.  CurvesCount={0},  NurbesCurveCount={1}", curves.Count, nurbsCurves);
            return new List<bool>();
        }
        if (nurbsCurves.Count < 2)
        {
            return new List<bool>();
        }
        //Now we group curves in groups based on the same hash-code retrun by our hasing method; for grouping we use ILookup 
        CreateCurveHashCodeDelegate createMyCurveHashCode1 = CreateCurveHashCodeX3;
        var curvesLookupStart = nurbsCurves.Select((x, i) => new HashIndexPair(createMyCurveHashCode1(x, 101), i)).ToLookup(y => y.Hash, y => y.Id);
        var curvesLookUpShrinked = ShrinkLookup(curvesLookupStart, nurbsCurves, 313, createMyCurveHashCode1);
        //
        bool[] DuplicateStatus = new bool[curves.Count];
        foreach (var group in curvesLookUpShrinked)
        {
            var ids = group.Select(x => x).ToList();
            for (int i = 0; i < ids.Count-1; i++)
            {
                for (int j = i + 1; j < ids.Count; j++)
                {
                    if (!DuplicateStatus[ids[i]] || !DuplicateStatus[ids[j]])
                    {
                        if (nurbsCurves[ids[i]].EpsilonEquals(nurbsCurves[ids[j]], RhinoMath.ZeroTolerance))
                        {
                            DuplicateStatus[ids[i]] = true;
                            DuplicateStatus[ids[j]] = true;
                        }
                    }
                }
            }
        }
        return DuplicateStatus.ToList();
    }

Regarding saltHash function it is there to try to minimize appearance of same hash values for series of different input values. It does not have to be there, you can use hash-values without salting it.
But what you have to do is to multiply hash with some constant int before you XOR it with next hash value for next property. For example, if you have two sets of three numbers {a1, b, b} and {a1, d, d} and b != d, then (a1^b)^b will give you a1 and (a1^d)^d will give you also a1 !!

Radovan

1 Like