Optimize large data manipulation

bbalbastre · April 2, 2024, 7:07am

Hi everyone,

I need help with working on a large amount of data in Grasshopper (GH). The computational cost is too high!

Here’s the problem I’m facing:

I have a data tree of surfaces. I am getting the points that defines each surface and culling the duplicate points (by leaving one) in order to define a grid of non duplicated points well numbered. Then, I extract the initial points that define each surface and use the “FIND SIMILAR” and “ITEM List” components to assign new vertex numbering (this is necessary for exporting the data to another FEM program, including point coordinates, IDs, and quads constructed from point IDs).

I have identified two main bottlenecks in my process: the “CULL DUPLICATES” component and the “FIND SIMILAR” component.

I came across a post by @AndersDeleuran on the forum (link provided) that suggested a solution, but it didn’t work as the computational time was even longer than using the standard GH components.

Delete duplicated points in ghpython - Grasshopper - McNeel Forum

Is there a way to optimize this process, potentially using Python or C# components? I would appreciate any feedback or ideas.

Looking forward to your response!

NOTE: I have isolated the initial problem but the real one takes 1 hour to compute the FIND SIMILAR component!

@Quan_Li @DanielPiker

Sample 1.gh (3.5 MB)

Erik_Beeren · April 2, 2024, 9:55am

hi @bbalbastre,

Heteroptera has smoe nice network analyse tools.

Sample 1.gh (3.5 MB)

AndersDeleuran · April 2, 2024, 12:04pm

I’m unclear what you tried, but it might be as simple as wrapping the output in the Grasshopper point type:

inno · April 2, 2024, 12:20pm

Just a note, the way the definition is written now -despite the calculation time- is “suppressing” points

even if you get the very same amount of final quads as the amount of starting quads, because the CullDuplicatePoints Tolerance is set to 0.1 → leave one, all the points closer to 0.1 are merged together into one single point, and that means very weird results

you can verify that like this:

Sample 1_inno.gh (6.1 MB)

you end up with:
40x quads identified by one single point-ID repeated 4x times
2078x quads where you have just 2 different point-IDs in the 4 values
69x quads where you have just 3 different point-IDs in the 4 values
9261x quads where the 4 point-IDs are all different (these are proper quads)

the most important thing I’m not fully understanding is if you want to change the grid to somehow make it uniform along the different branches of squared surfaces, or if you just want to use it “as it is”, and retrieve “for each quad-point which grid-point it is” and assign it a unique ID

bbalbastre · April 2, 2024, 12:47pm

Hi @AndersDeleuran

Maybe I am using the component you did in a wrong way but connecting all the points to the python component using the same tolerance, the computational time is 7.4 s against 1s with the Native Gh component

No hints are used in the inputs by the way

bbalbastre · April 2, 2024, 1:19pm

Hi @inno , I quickly prepared the sample and I did not change the tolerance but as you said it has to be lower than 0.1 if not strange results can be obtained. Anyway the problem is the same.

The idea is that I will have a data tree, which hundreds of QUADS in each branch. As the QUADs share point coordinates, I am flattening all the QUAD vertexs in order to keep only one point per coordinate, and once I have it, I number these unique points.

Then, the goal is to match the quad vertex with these unique vertex in order to know the vertex number I need to use to create the QUADs

inno · April 2, 2024, 2:46pm

I think I would do something like this:

Sample 1_inno.gh (6.2 MB)

a big bottleneck is the Replace Paths, that can be replaced by Elefront’s Create Tree or by some custom script to do the very same in hopefully less time

Replace Path is used just to maintain the initial data structure of the Surfaces, and probably some smarter mind might solve the situation even without using it

if you don’t need that initial data-structure, then you could just mesh and deconstruct, which looks pretty fast:

bbalbastre · April 2, 2024, 3:18pm

Hi @inno,

Thank you very much for your reply! Your approach is very interesting, and the reduction in computational time is amazing! Is it possible with this approach to set a logical vertex numbering? I’m concerned that this approach might duplicate points on the edges between two branches. The real case involves plenty of non-manifold edges, so I’m not sure if this approach can work.

The idea is that I have 3D surfaces (horizontal, vertical, and diagonal surfaces are always planar), with plenty of non-manifold edges. Each surface is made up of hundreds of quads. What I proposed is to gather all the quad vertices, remove any duplicate points, sort all these 3D points along the x, y, and z axes, assign a logical numbering to these points, match the original quad vertices with the new culled and sorted points, and then reconstruct the quads with the new numbering. Perhaps now I’ve explained it better.

The approach I proposed solved the case I mentioned but not within a reasonable computational time, that is why I am looking for a more efficient components to replace this approach.

Is it possible to set an order for the unique vertex like this for example?

Then one quad would be made by the numbers (0,1,6,7) and so on

Erik_Beeren · April 3, 2024, 7:20am

Hi @bbalbastre

The workflow proposed by @inno is basically the same as the heteroptera exaple. I agree with @inno that meshes and mesh topology are the most commonly used way to deal with this problem. In both cases each vertice is unique. So you can use the indices as ID’s. When inputting more surfaces you will have to make a small work around. Because indices and data trees get mixed up, you can trim the tree of quad indices. Now you get a list of ordered vertices defining the quads. Because you know this list is built up out of four indices (or points) you can split the list. The same workflow can be used with meshes.

Sample 1.gh (3.5 MB)

bbalbastre · April 3, 2024, 9:23am

Thanks for your reply, @Erik_Beeren

I believe that with that approach, you are unable to assign a unique coordinate considering the entire structure and associate that value with the quads.

Let’s use this surface as an example. It’s a data tree structure composed of surfaces in its branches with non-manifold edges:

With the script I developed, I only retain unique point coordinates in the entire structure. These points are numbered, sorted, and then the quads have to be adjusted to this new numbering:

As you can observe, surface {20;7} is constructed using 0, 1, 4, 3 numbers (unique numbers).

When utilizing your code, or @inno’s code, you are obtaining the vertex index associated with each quad within a branch, but you are not generating a unique ID point for each coordinate in the entire structure, which is what I want.

The issue is that the code I developed works, but it’s not efficient.

Sample 2.gh (3.6 MB)

Thanks for your help!

inno · April 3, 2024, 9:55am

index of each point = unique ID

BUT the numbering of the unique ID is randomish (actually I call it random because I have no idea how it works under the hood… it might easily be not random at all )

briefly, the -eventually- branched tree of meshes in A is passed through B (which is flattened) to get a unique mesh of the whole thing, so overlapping vertexes are merged together
at the same time, the non-flattened meshes are passed also to C where they maintain their data-structure, and the rest of the definition up to Create Tree actually is there just to reference to the original data tree based on Face Center proximity

there might for sure be better ways for handling this last part, but my brain isn’t big enough

random data-structure checks:

Sample 1_inno2.gh (6.1 MB)

Erik_Beeren · April 3, 2024, 10:26am

Or simple like this. Meshes are definitely the way to go.

Sample 2.gh (3.6 MB)

TomTom · April 3, 2024, 11:14am

Hi, there is a very simple algorithm you can apply. See this console application example
written in C#. Takes on my medium PC less than 100ms for 10000 points.
It can further be optimized, but the logic is quite obvious when you follow the comments.

internal class Program
{
    private static void Main(string[] args)
    {
        // Given 8 points, 2 duplicates; expect 6 unique points

        // X             [5,6,7,3,3,7,8,2]
        // Y             [3,4,7,5,5,7,4,1]
        // Z             [2,3,7,3,3,7,3,3]

        // VertexIndices [0,1,2,3,4,5,6,7]
        // FaceIndices   [0,0,0,0,1,1,1,1]
        // Duplicates    [    A,B,B,A    ]   
        // Mask(0=unset) [1,2,3,4,4,3,5,6]

        // because:

        // 1.Run         [1 0 0 0 0 0 0 0]  (1 * '1')
        // 2.Run         [1 2 0 0 0 0 0 0]  (1 * '2')
        // 3.Run         [1 2 3 0 0 3 0 0]  (2 * '3')
        // 4.Run         [1 2 3 4 4 3 0 0]  (2 * '4')
        // 5.Run         [1 2 3 4 4 3 5 0]  (1 * '5')
        // 6.Run         [1 2 3 4 4 3 5 6]  (1 * '6')
               
        // Create points...
        var points = new Point3d[8];
        points[0] = new Point3d(5, 3, 2);
        points[1] = new Point3d(6, 4, 3);
        points[2] = new Point3d(7, 7, 7);
        points[3] = new Point3d(3, 5, 3);

        points[4] = new Point3d(3, 5, 3);
        points[5] = new Point3d(7, 7, 7);
        points[6] = new Point3d(8, 4, 3);
        points[7] = new Point3d(2, 1, 3);
                    
        var facemask = new int[8];
        facemask[0] = 0;
        facemask[1] = 0;
        facemask[2] = 0;
        facemask[3] = 0;
        facemask[4] = 1;
        facemask[5] = 1;
        facemask[6] = 1;
        facemask[7] = 1;

        // This is where the magic happens 'Cull and reindex'
        
        var uniquePoints = new List<Point3d>();
        int currentIndex = 1;
        var vertexmask = new int[points.Length];
        
        for (int i = 0;i < points.Length; i++)
        {           
            if (vertexmask[i] != 0) continue;  // Skip when mask is set

            vertexmask[i] = currentIndex;
            uniquePoints.Add(points[i]);
            for (int j = i+1;j < points.Length; j++) 
            {               
                if (vertexmask[j] != 0) continue; 

                if (Point3d.EpsilonEquals(points[j], points[i], epsilon: 0.0001))
                {
                    vertexmask[j] = currentIndex;
                }
            }
            currentIndex++;
        }
        
        // Add indices to a data tree. This will be slightly different in Rhino/GH
        var indexTree = new DataTree<int>();
        int lastIndex = -1;
        for (int i = 0; i < facemask.Length; i++)
        {
            currentIndex = facemask[i];
            if (lastIndex != currentIndex)
            {
                indexTree.AddBranch(currentIndex);
                lastIndex = currentIndex;
            }
            var current = indexTree.Branches.ElementAt(currentIndex);
            current.AddItem(vertexmask[i]-1); //-1 because we change back to 0 based indexing
        }

        // Print the result
        indexTree.Print();

        Console.WriteLine("\nUnique points:");
        uniquePoints.ForEach(p => Console.WriteLine($"({p.X} {p.Y} {p.Z})"));
    }
}

public struct Point3d
{
    public double X;
    public double Y;
    public double Z;

    public Point3d(double x, double y, double z)
    {
        X = x; 
        Y = y; 
        Z = z;
    }
    public static bool EpsilonEquals(Point3d p1, Point3d p2, double epsilon = 0.00001)
    {
        return Math.Abs(p1.X - p2.X) <= epsilon &&
                Math.Abs(p1.Y - p2.Y) <= epsilon &&
                Math.Abs(p1.Z - p2.Z) <= epsilon;
    }
}

public class DataTree<T>
{
    public List<Branch<T>> Branches { get; } = new List<Branch<T>>();

    public void AddBranch(int path)
    {
        Branches.Add(new Branch<T>(path));
    }
    public void Print()
    {
        Console.WriteLine("Tree:");
        Branches.ForEach(b=>b.Print());
    }
}


public class Branch<T>
{    
    public int Path { get; }
    public List<T> Data { get; } = new List<T>();

    public Branch(int path)
    {
        Path = path;
    }

    public void AddItem(T item)
    {
        Data.Add(item);
    }

    public void Print()
    {
        Console.WriteLine($"  Branch {Path}");
        Data.ForEach(item =>
          Console.WriteLine($"  └─ {item}"));
    }
}

This simple console app will yield:

Tree
  Branch 0
  └─ 0
  └─ 1
  └─ 2
  └─ 3
  Branch 1
  └─ 3
  └─ 2
  └─ 4
  └─ 5

Unique points
(5 3 2)
(6 4 3)
(7 7 7)
(3 5 3)
(8 4 3)
(2 1 3)

bbalbastre · April 3, 2024, 12:51pm

Hi @inno

You are right! That script perfectly works for that sample, maintaining the data structure, creating unique points and providing the right indexes.

However I tried to apply the script to the original input and it does not work. I suspect when converting the surfaces to meshes, new faces are added in some places, making the number of data and paths different

Thanks again for your help!

Sample 3.gh (19.0 MB)

bbalbastre · April 3, 2024, 12:51pm

Thanks for the answer @Erik_Beeren . That is a good solution although I am very interested in preserving the data structure at the end

bbalbastre · April 3, 2024, 12:53pm

Thank you for your answer @TomTom

Unfortunately I am a very beginner of c# and I do not know how to apply that script to this case to be honest!

TomTom · April 3, 2024, 2:17pm

reindexing.gh (16.3 KB)

The more duplicates you have the faster it will perform. Btw you can improve performance by e.g. parallelising this. But the harder you optimize, the less readable it gets…

inno · April 3, 2024, 4:44pm

I honestly think Tom’s solution has no rivalry in terms of speed, it will always be the best choice

but before reading his post I was trying something… so I’m posting this just as “interesting stuff”

looks like Combine & Clean retains the order of the original meshes faces
there was actually no reason to think it wouldn’t, but hey you never know

this means a lot because it allows to rebuild the initial branching (assuming you have just one layer of branches) with a List Length + Partition List (which means SPEED, ok… not Tom’s speed but still )

on a side note, the last file you have posted:

based on the number of vertexes/edges, the surfaces in Original Input are not all quads:

[data is not internalized: too heavy]
Sample 3_data_check_not_internalized.gh (13.1 KB)

bbalbastre · April 4, 2024, 6:13am

Wow @TomTom works incredibly fast! You did magic! Thanks a lot!

Thank you @inno for your effort! What a discovery what you showed me with the Combine and clean command, I am learning a lot with your scripts, thank you for taking the time

TomTom · April 4, 2024, 8:13am

Btw, if you deep dive into coding (you definitely should), try to carefully understand this nested loop. I did not include the comments of the initial code, but this sort of looping and index masking is what turned out to be quite useful in various situations for me while using GH.

Topic		Replies	Views
Huge amount of data. Processing strategies Grasshopper macgrasshopper , grasshopper2	10	1415	July 25, 2020
rg.Point3d.CullDuplicates Ghpython REALLY SLOW Grasshopper windows , ghpython	2	703	April 11, 2021
Advice for Speeding Program Execution when dealing with Point Clouds in Grasshopper RhinoLands (Lands Design) windows	2	381	March 17, 2021
Computational time problem Grasshopper windows	7	670	March 18, 2022
Ghpython - Parallel Component Slower than Native Ghopper Module Grasshopper	13	4805	June 10, 2014

Optimize large data manipulation

Related topics