Looking for low latency reading and writing of large data sets from Excel

I tried to install Eyas from Steve Lewis, but it doesn’t work.
Moreover, the development of this tool seems to have been abandonned.

Is there any other way to process large numerical data sets in Rhino efficiently ?

Well, lunchbox, can read excel, csv, xml, and json…You can write a text file and read it with ReadFile component,(comes with grasshopper), Other plugin call TT Toolbox, has components for reading and writing excel, or with GHPython or C#.

Hi Antonio,

I know there are many tools to read and write Excel, but we need a tool with a focus on speed, hence the mention of “low latency” in my title.

Can you explain a bit what you call low latency, and for what use?, i don’t seem to imaging any good use apart of mining

the fastest way i know is in C# using OleDbConnection and DataTable. At the end you can iterate over DataTable and read what you need. Probably LunchBox uses this method.

just in addition, generally spoken there are two different ways in reading and writing data. One is loading the file into memory, the other is streaming parts of it. Depending of file size both methods have advantages.
However I doubt that Excel is the right tool to store a lot of data. Instead you should use a Database. Databases are optimised in a way, that they can read and write data in very efficiently, besides other useful features such as parallel access etc.

If need to stick to excel, you might divide and conquer your file into multiple files, by creating a hashfunction to read and write to each subfile.

One way(easy) I can imaging is using GHPython with Pandas(Cython) module or Sklearn, and numpy. In Any case it will depend on the language that you use, that’s why most “low latency” solutions tend to be in C/C++, one posible solution would be to write a plugin for rhino in C++, (that’s why was easier to use excel in the case Eyas). You can probably look at the time of solution in some of those cases to really test the performance, and to really test if Eyas is “Zero Canvas Latency”, I just can imaging how they solve that bottleneck.

using c language is kind of a micro optimisation. As long as you rape excel as a database, it doesn’t really matter using c or python, since the bottleneck is excel itself.

1 Like


For us (i am a colleague of Oliver -osuire) the bottleneck seems to be how GH manage big data trees.

We have a GH definition doing the following :

  1. read excel file (N=32000 lines)

  2. sort the excel datas and build a data tree whose structure is designed for visualization of results in Rhino. The sorting is done once. Then viewing the results is to select the corresponding branches.
    The {A;B;C;D;E} data tree has 39000 branches, 660000 values.
    B, C, D, E indices represent different ways for the user to consult the whole datas.

  3. the user select results (that is to say in GH : select some tree branches) and those are schown in Rhino

In termes of execution times 1 & 2 steps are OK
But 3 is slow for the user. Browsing the definition also!

So we wonder how to change this definition design.

Is our data tree too big ?


so where is the result visualised? in a panel? if so don’t do that. Panels are slow.
You want to select a result? Again it may be beneficial to split up. That is what databases are actually doing. They don’t load the whole database, they search for a key and open the part up where it is stored. This is also called hashing. Grasshopper or Excel don’t own this feature.

Not sure if this is relevant to your situation, but: Sending large amounts of data through wires is inherently expensive, both on the output and input side. This is especially true for scripting components. One workaround is to wrap the data in an item (such as a Python dictionary for instance) or send the data through another mechanism (such as the RhinoPython sticky for instance).

Edit: Also, providing an explicit example that demonstrates your pipeline will help with tracking down bottlenecks and suggest potential solutions :wink:

1 Like

Results are shown in Rhino with dots (prévisualized with TextDot component (*)).
Indeed panels are slow - i already removed all debugging panels in our definition.



yap, true. Script components pass data per value, not per (memory-) reference. This was implemented as security feature for inexperienced coders. You can also get the reference directly by simply going upstream on a connected parameter, retrieving it from the connected component . But Anders version does also work…

@JLH @osuire
Well anyway, its hard to tell and nothing easy to solve, even if the problem is found, since it involves a lot of work. Nobody would do this just for fun I guess.
However it might help you, that improving reading and writing rates does not solve the problem. Its more about how to access and safe your data.
So it may solve the problem if you safe sorted combinations in multiple trees, or files and if a user likes to select, he selects/opens up the right file/tree and retrieves the item from there. However without scripting I hardly see any chance in doing so.

As mentioned above. You will need code.

You can make a class and read your data into this class as array. Or even a dictionary with <No, object>…

To export your data i would use a StrinBuilder. You can append every line you want. At the end you can pass the text to a file.csv. When you open *.csv with Excel you can save it in xls format.

You may keep in mind that TextDot for many objects can be expensive.


Thank you very much for your answers, it’s clearer to me now, since im am now aware that all datas in GH are passed by value. Our big results data tree is coopied so many times that i get memory overflow on my computer…

To my understanding, we should

  1. code a c#/VB/Python component in which the sorting is done
  2. a) design a data structure (dictonary) optimized for accessing the results.
    b) access it directly (by reference) in the GH definition

Is there a way to make global variable in VB, like “sticky” variables in Python ?

Here is a screenshot of our zoomed out definition canvas, with comments written in red.

I think that the “Sort datas” group is executed even if it has not to (i noticed that by internalizing resulting datas and desactivating this group then).

basically yes.

global variables are bad, and should be avoided at any cost.
To be clear: Native and compiled custom components are passing data per reference if not explicit passed by value. However script components are always loading data in per value. This was introduced as safety feature for inexperienced coders to avoid “hard to debug” issues. You just need to pass data by reference, you can do this in script components as follows: (Its c#, but can be done in others as well, its commented.
You should rerun the solution twice to get useful profiler time values.

HackToPassByRef_commented.gh (10.2 KB)

I have no idea if a lookup access is faster as retrieving an item from a list, but in case you don’t know the index and you are forced to search for a value, it definitely speeds up, since its not searching inside the whole data instead only in the partition where the result is expected.


Hi TomTom

Thank you for your answer.

For information, I tried yesterday to translate your definition from C# to VB, and I got a very strange and severe crash of GH (memory overflow… even after restarting GH with no definition opened. I had to reset the installation GH after that.). Anyway I can move on C# for the future.