Why "Set Union" is so much faster than "Create Set"?

I am trying to get all identical points out of 230k points, and the times each point is duplicated.
I wrote a simple c# script just to get all identical points, which is taking 1.3m to compute.
Then I tried using “Crest Set”; which is taking 2m.
And finally, I tried " Set Union", which is only taking 132ms.

I want to ask what is the logic behind “Set Union” and why it is so efficient?
Is there any place I can read the source code of it?

Thank you!


my c# script:

private void RunScript(List iPoint, ref object oPoint, ref object oIndex)
{

 List<Point3d> uniquePoint = new List<Point3d>();
 List<int> uniqueIndex = new List<int>();
 uniquePoint.Add(iPoint[0]);
 uniqueIndex.Add(0);

 for(int i = 0;i < iPoint.Count;i++)
 {
   bool Add = true;
   for(int j = 0;j < uniquePoint.Count;j++)
   {
     if(iPoint[i] == uniquePoint[j])
     {
       Add = false;
       break;
     }
   }
   if(Add)
   {
     uniquePoint.Add(iPoint[i]);
     uniqueIndex.Add(i);
   }
 }
 oPoint = uniquePoint;
 oIndex = uniqueIndex;

}

Looking at the code, the reason why Create Set is so much slower than Set Union is because of the M output. Creating a set is very fast, I.e. O(log(n)) but finding the index within a set for a colliding value is O(n).

This is a fairly common problem with the component design of grasshopper. Some components would work a lot faster if they could omit certain outputs, but at present there’s no mechanism in place to decide that during the computation phase.

This is something I should like to address for gh2, but I haven’t done so yet.

1 Like

Made also a test, thought Linq would be still slower.

1 Like

Thank you! Now I realize whyCreate Set is slower than Set Union.
But the script I wrote is just doing “Set Union”, in a much slower way. Do you know what is the logic Set Union is using to get all identical data?

“Distinct()” is new to me.
Why am I getting error when trying distinct?
Thank you!

private void RunScript(List<Point3d> x, ref object A)
  {
    A = x.Distinct();
  }

Error (CS1061): ‘System.Collections.Generic.List<Rhino.Geometry.Point3d>’ does not contain a definition for ‘Distinct’ and no extension method ‘Distinct’ accepting a first argument of type ‘System.Collections.Generic.List<Rhino.Geometry.Point3d>’ could be found (are you missing a using directive or an assembly reference?) (line 57)

Does your script have a using System.Linq; statement at the top?

Yes, a bunch of data types in grasshopper implement the IGH_QuickCast interface which provides hashing and comparison. The data is then stored in a dictionary.

If I were to write this today I’d base it on hashsets instead of dictionaries as they are O(1) instead of O(log(n)). There are still issues with sets made up of mixed data types, but those are rare and can probably be special cased.

That’s why…

I somehow understand the difference now. I don’t think I am able to right this script, but I have a direction now. Thank you.