Text based definitions

feklee · February 28, 2024, 2:05am

I realize, the file format of the definition Grasshopper2 is a Zip archive with a binary document inside. For training ML models and using them to create defintions, it would be nice to have text based definitions. After all, Grasshopper is a functional programming language.

DavidRutten · February 28, 2024, 9:29am

You want to train a computer to create valid Grasshopper algorithms at the file level?

Wouldn’t it be much easier to come up with a minimal notation (objects and connections) and then convert such a notation into a real file using a bit of code? Something along the lines of:

Add "Point" = A
Add "Point" = B
Add "Line 2Pt" = L
Connect A -> L1
Connect B -> L2
...

nathanletwory · February 28, 2024, 9:35am

I’d assume just being able to parse existing GH definitions in an automated way without having to devise a way to express them manually. That way you could find all GH definitions ever made, assume they are perfect, and push them through the learning system.

DavidRutten · February 28, 2024, 9:42am

I don’t understand what that means.

But is the point to have a machine create gh files or not? Because it’s going to be quite difficult to create valid gh files which have all the values in all the right places. If the goal is to have a computer create grasshopper algorithms, then using a much simpler, more flexible notation would be a major simplifying step surely?

But maybe I’m confused about the goal here.

Dani_Abalde · February 28, 2024, 11:52am

It is inefficient and unnecessary to train with the whole file, agreed, but it is still convenient to have a text representation, because there are several use cases, from assistants, definition generators, LLM-based file/snippets/component search engines, etc. that could use it. All this can be done by third parties, yes, but a more semantic GH is a better GH. I can understand that it is not your responsibility to define training data for ML, but it is in your interest to allow the definitions to be represented semantically, i.e. connected components sharing metadata like name, description, category, parameter signature, etc, to be shared between users and plugins. GH2 brings any changes in this respect?

DavidRutten · February 28, 2024, 9:15pm

Yes I can see that, and do agree. Settings in GH2 are saved using the same mechanism as files, and at present they cannot therefore be read by humans or edited by hand. This will be quite important eventually so I either need to make a stand-alone app which can open and edit those files, or just switch over to a text-based format.

Only for the worse at present. In GH1 you had the option to use Xml files instead of binary files, GH2 only supports binary files now. I’m probably not going to use Xml again, I really dislike how it combines the worst of machine readable data and human readable data into a single format. Maybe JSON will be a better choice, I haven’t decided yet.

Dani_Abalde · February 29, 2024, 9:10am

Yeah, json makes more sense. I would do a minimal implementation of GH snippets to json and vice versa, to deliver this with minimal time spent and I would focus on making that every definition editing action in GH2 is reproducible and convertible to text, such as add object, connect object A to B, change slider value… so that the instructions to build a definition are reproducible with text. This is something that third parties cannot do in GH1 because there are missing events and no easy interface to hardware inputs. This would also open the door to easier version control, but that is another topic.

It is also necessary to be able to give metadata to the snippets, such as description, alternative names, category… Because inferring (by machines) the usefulness of a snippet without being able to access the data forces them to be generic inferences and this underfitting problem I guess that could be solved by enriching the snippets with additional metadata. The training data must have enough information to know what kind of object they are building, whether it is a ring or a building or whatever, because it is necessary context so that they are able to really help us.

With these three things I think I could start recording my GH activity and train my personal assistant with it. In my opinion, the challenge here is not to train an AI model, but to create the right training data. I don’t think we need smarter AIs, we need GH to be more semantic. That’s why I think native support is so important.

hitenter · February 29, 2024, 9:22am

regarding this I remember a guy that made some interesting youtube videos about those formats and alternatives. Interesting and fun to watch, imo:

DavidRutten · March 7, 2024, 1:15pm

Had a look at Sml and Wsv, pretty solid reasoning in my opinion.

It didn’t take a lot of work to export GH2 documents to sml, but I haven’t tried to write an importer yet. It’s also clear some of the choices I made about how to store data are not the best of friends with the sml way of doing things. For example lists of key/value pairs now result in data like this:

Key[0] "Attr.Bounds"
Typ[0] 106
Val[0] 114.5 56.5 60.0 28.0
Key[1] "Attr.Pivot"
Typ[1] 102
Val[1] 114.5 69.5
Key[2] "CreatedAt"
Typ[2] 12
Val[2] 2024-03-07-12-41-05(993)
Key[3] "CreatedBy"
Typ[3] 10
Val[3] ""
Key[4] "ModifiedAt"
Typ[4] 12
Val[4] 2024-03-07-12-41-11(286)
Key[5] "ModifiedBy"
Typ[5] 10
Val[5] ""

which could have been:

Attr.Bounds 114.5 56.5 60.0 28.0
Attr.Pivot 114.5 69.5
CreatedAt 2024-03-07-12-41-05(993)
CreatedBy ""
ModifiedAt 2024-03-07-12-41-11(286)
ModifiedBy ""