Data Structures within the Grasshopper Canvas (Lookup Tables? etc)

Hi all,
I was hoping to get some ideas on how other users handle and pass big data-structures.

What i wanted to ask, is anyone aware of any GET/SET type methodologies where we can push data into a titled datatable, then fish them back based on title to clean up messy scripts and create a nice clean workflow?

I saw some data-tables and datasets as part of the Lunchbox Plugin, but couldn’t figure out how to actually use them in the GH canvas.

I’m working on a fairly complex (for me at least) geotechnical modelling script to present underground data parametrically. Like all good projects mine has been going about 10 years, and had substantial scope creep (You can see Rev 1,2,3,4 in sequence below):

To tidy things up i wrote a basic data-table type function (Which i called GET/SET in line with programming nomenclature).

Getting me here:

image

Each package in the linear chain effectively looks up the info it needs, calculates the extra parameter, then pushes it back into the dataset.

image

I have attached a basic script which demonstrates where I have been going with this. Presume this has probably be done before (or that it hasn’t for good reason!) - so any thoughts would be appreciated.

Demonstration of Get-SET (GH Lookup).gh (21.6 KB)

Note: The attached script is OBJECT BASED (I.e. each branch of the tree has pieces of different meta-data associated with one object, or data segment). I also have a parameter based lookup which goes the other way, with each list containing a single data type, but applicable to multiple objects. This deals better with lists which are not of uniform length, but i am more concerned about data corruption if i accidentally delete a list item and the data becomes misaligned.

Thanks in advance!
LJ

My plugin, Pancake, supports a structure called Association. It allows for named parameter & dynamic manipulation, data extraction & data exchange with Datatable/CSV/Json/XML, which may probably help you.

example file: exampleassoc.gh (12.6 KB)

More examples file are available with the plugin.

Actually in C# there is a lookup table datastructure available. You could use a script component to pass this collection around. One for setting one for getting.

The problem with lookup tables and dictionaries is its really bad performance if you have a lot of string keys. For each traversal lots of string comparision are performed. Which slows down performance in this case. But its hard to judge. Depends on the use case.

Edit: Values of lookup tables are immutable, where as in a dictionary values can be changed. But I personally would continue doing things like you do and just be careful to not corrupt data. Keep it simple…

Hi Keyu,
Thanks for the input.

Would i be correct in suggesting the way your association works is that you have a “Master object” (Which in your example you named ‘Geometry’), your New Association component then allows you to assign data to that master object, and enter a title for t data against the component (which in your example you named ‘whatever’)?

Is there any way to set the titles associated with data using another data stream?

In my case, I am reading a large series of geotechnical data entries from excel in the following form (titles highlighted):

This has to be dynamic, as its setup in a way that users can customise what they input, this feeds a selection to then modify what is viewed - so the titles have to be read from the sheet, then data needs to be retrievable based on those titles.

It looks like there are some excellent data handling components, so i’ll keep playing, but i did run into one issues with run time. Running a dataset of 90,000 pieces of data (associated with 2835 objects) - i got a run time of about 550ms with your read data component, compared with 65ms with my cluster.

Is there something i might be doing wrong here? Or are the pancake components generally suited to smaller datasets?

Cheers
LJ

No. Objects inside association are, almost, equal.

The principle object is only used if the downstream component doesn’t support Association. If you don’t use anything like that, all members are equal.

:joy: I didn’t realize you have so much data. Generally it’s not very efficient for very large datasets. Hopefully I’m going to improve it a little bit in the next release.

I’m afraid it’s better to write script components to achieve the best performance.

@TomTom - Thanks also, reason i am querying this is my Get / Set components are taking a few seconds to process across the whole script on a large dataset. I can definitely improve this (at the moment, basically everything is processed, and only select data visualised - so i can adjust my gates to only process data which will be displayed). But if there was a way for me to improve run times it would make it a more robust solution.

Unfortunately I’m an engineer (not a coder) I dabble in VB and can muster IF statements and loops at a pinch in C# and Python but that’s about the grand sum of my programming skills.

The way my component operates, it takes the datatree, splits based on the first tree branch (which carries the titles) - performs the lookup against the titles to get an index, then retrieves that index from the remainder of the dataset. So in each case, its searching 32 strings for a match - would you expect this to slow things down?:

Image of the GET cluster here:

Any other tips you might have to achieve the same outcome with a bit better performance would be great.

:smiley: Normally I have less data… but i let the geologists go wild on a particular review of a large dam - and they got a wee bit carried away (I would normally work with about 5% of that).

Also worth noting that 90% of those datapoints in my trees are blank… so proper association with Metadata might let me get rid of the blanks and greatly reduce the tree size (rather than relying on maintaining list and tree structures - many full of empties)!

I’ll keep playing and see where i end up!

So you have 90000x2835 entries in total, right?

Pancake’s Assoc is, sort of, describing an object with different metadata. With this practice, it’s probably better to convert parameters, such as From & To, into Assoc’s metadata, rather than storing lines inside Assoc.

Not quite, sorry i didn’t explain that well. I have 2835 “objects” (segments of boreholes where geological conditions are assessed as different). Each of those has 30 ish pieces of data, in many cases the data is simply “empty” where no data was recorded - but my methods to date required my lists to be uniform (hence the empties).

So all up, 90,000 cells of data - 30ish cells, for each of 2835 objects.

If i go back to the spreadsheet in my second post, that should make it clearer. You see if I am logging “ROCK” data (far left column) then the “SOIL” data is blank, and vice versa - so every branch has a fair few empties.

I should probably also provide some context, the reason for this work is 3D visualisation of this geotech data to get a good feel for foundation condition, fracturing, failure wedges, weak zones etc.

Any of the data in the sheet can be visualised based on numerical value, auto assignment by unique entry or by a shader lookup (for common geological data).

The output of the script looks something like:

Where in this case data is shaded by classification:

We build on this for a 3D geological model of the dam foundations:

Worth noting this is work from about 5 years ago, and the script has become considerably more capable since.

I will work through some of your example files for datasets, and see if your tools can give me a more efficient solution than where I am currently at, then report back :slight_smile:

Thanks again for the input!

2 Likes

This is actually not much data. Even for dictionaries and lookup structures. I would estimate in ranges of 1 million data points the choice of the data structures are making a noticable difference. So if you need seconds then the bottleneck is somewhere else. Of course using native Grasshopper components is often the first performance tradeoff. You are using the correct terminology and as you said you already know how to loop. Essentially this is almost anything you need to know to create a tailor-made solution for your use case.

The trick is to do as much as possible in one loop, if you pass data from one component to another you are looping multiple times just because data is now in a different component. Scripting in general is not necessarily harder, it can even make things easier. You can easily skip empty data and automatically tighten the data set just by directly branching (=using the if-statement). And often making it easy is the first performance boost.

As a consequence, for the next revision, just try to invest a little bit on how to write scripts. There are plenty of tutorials out there for Grasshopper. The language doesn’t matter, although you find better explanations for C# and IronPython. But all 3 are based on the same framework, meaning they all offer the same functionality. Just give it a try and you will be rewarded with doing less workarounds and having better readibility.

1 Like

Not sure if you’ve looked at Elefront, which will let you get/set key/value pairs (“attributes”). This data can be embedded in the geometry if you bake into rhino (as “user text”), or you can use a parallel flow of data with the same tree / branch structure as your other geometry or data in your script… You could also export to a csv file, or link to excel, google, database, etc. from there.

Hi LJ,

First of all, looks like some really interesting work in your images, well done.
Looking at your GDA94 reference in the image above, I take it you’re based in Australia? (as I am)

I’m not sure if you’re aware of the openBIM data model Industry Foundation Classes (IFC, published as an ISO standard) developed and maintained by buildingSMART. There is an active extension project to better support concepts used on infrastructure projects, and this includes exchange of concepts such as boreholes, geotechnics etc.

At Geometry Gym, we’ve been enabling the use of this data model within Rhino/Grasshopper. This permits interactions with a sophisticated data model (relationships including classification and attributes), rather than just a table concept. Because IFC covers many discliplines, it’s not necessarily streamlined as a discipline specific data model, but it is capable of a lot.

With our tools, you can generate these concepts, as well as query them etc in Grasshopper. I’ve attached an image where I was testing/demonstrating some of the aspects of borehole digitization.

If you’re interested in learning more or discussing, send me a PM and we can arrange a time to do so.

Cheers,

Jon

Thanks again all for the input,
@TomTom - I’ll invest a bit more time into scripting, I’m invested in using the GH components where we may want to change things, simply as its more accessible to many in my profession (Engineers tend to be logical but many have no programming history and get scared easily).

I’m sure there would be a bunch of functions which i cant see changing in which case a python loop and a description would likely work great.

One thing i have noticed, is VB blocks within GH seem to be faster than Python or C# blocks for simple things like if statements - not sure if that’s just my PC or if that’s normal?

RhinoUser - Will have a good look at Elefront and see what that can do for me, thanks.

@jonm, I thought someone might pick the location reference :slight_smile: I’m based in Tassie, I have looked at geometrygym previously and it looks really interesting. I’ll do some reading first, if you have some stuff on your website I can trawl?


Thanks again all for the input - I’ll review everything people have provided.

I do have a bunch of other functions on top of the GET/SET i added, (like tree rebuilds based on a master parameter etc). If there is any value in the components i’ve got to date, I’ll make the work available in case others wanted to have a play :slight_smile:

1 Like

Thats good to hear…

A script component gets compiled whenever you change something. This compilation step causes the first execution to be a bit slower. VB might compile a bit faster. Furthermore the computation time varies, depending on whats running on the background of you PC. Script components are also slower than plugin components because they have an extra overheat just because of making them accessable in Grasshopper. Under the hood, all three languages operate on the same framework, which means what gets converted into Intermediate language → assembly instructions → bytecode should almost always be the same. There could be an implementation variation and overheat for IronPython. But in general unless you are performing hundreds of thousands or million operations, this shouldn’t be something to worry about.