Can GH definitions (.gh) be made searchable?

Question first: Can GH definitions (.gh) be made searchable? (Some of my files are saved as ghx, then no problem to search, but not all of them due to them being slow to work with).

I often find myself searching old .gh files for C# scripts in which I have already solved a speific problem or used a RhinoCommon feature which I know the name of but have forgotten how to use in code (aso, aso) and so every now and then I spend time searching old GH definitions.

Then I recalled having heard that the GH definitions is only zipped .ghx files (I could have gotten this part entirely wrong) and so I tried searching .gh files with PowerGrep, which handles many different archive formats, and it also has an option for assigning custom extensions to known archive formats as illustrated below:

However, when I try to search using any of these formats associated, I get no matches. Which indicates that none of the listed formats are being used for .gh files.

So, what specific compressed binary format is being used for (non encrypted) gh files? Is it some properietary format? (I hope not…)

// Rolf

Hi Rolf, see this solution Get Grasshopper Document Object Count without opening grasshopper
GH_IO.dll let’s you read the .gh and .ghx file format. Starting with the example @DavidRutten made for me recently, I imagine it wouldn’t be much work to make a small program that extracts the code associated with each C# or python script component across your entire ouvre of definitions.
You could then save all this as plaintext and search however you please. Nowhere near a computer so can’t help specifically right now.

1 Like

That’s not entirely correct, also not entirely wrong. The file format I wrote is basically a strongly typed, hierarchical dictionary where values are stored under both a string name and integer index. Only a small number of data types is supported, but since byte-array and string are amongst them, you can store all possible data.

This runtime dictionary can be written to or read from two types of files, an xml style text format and a binary format. Xml is not strongly typed so has loads of overhead. The binary flavour is very efficient, and the resulting byte stream is run through a deflate compressor to boot.

Major downside of the gh format is that you need GH_IO.dll to read it.

1 Like

Hi @dharman & @DavidRutten;
Yes, I was able to use the examples as a good start. Thank you very much!

I have now made a CommandLine tool which scans a folder structure starting from a folder given as a CommandLine option (a filename or a folder) and optionally a command as well (as illustrated below).

So far I have three (3) commands for different listings of files and temp-folders, and one command for conversion (gh ->copy-> ghz, and yet another command for deleting these .ghx copies.

The .ghx copies are placed in subfolders below any existing gh-file (thus I can simply delete the subfolder if the .ghx copy isn’t needed anymore)

Fig 1. Here an example of Convert (“C”) gh files into .ghx versions placed into subfolders named …\ghx_tmp

In this first version the tool grabs all gh files in the folder (which doesn’t have a ghx equivalent in the same folder) an makes a .ghx-copy in the sub-folder …_ghx_tmp.

The next version will handle individual files as well (it picks them up already , but uses only of the path to iterate over all files in the same folder, disregarding the single filename).

I commented out the example code but that code can be uncommented in upcoming version. I was about to publish the sources on BitBucket, but that silly thing refuses to accept me pushing up the project. Will fix that asap.

Fig 2. Here pressing D for “Dir” to list all the temp folders created with the previsus Convert command (5 temp folders had been created, containing .ghx versions of gh files):

Fig 3. … and pressing X for a listing of the ghx files, of which some already existed (and so they were not copied or converted) however a few (7 of them) gh-files were converted and placed in their respective \_ghx_tmp subfolders so they easily can be deleted later:

Fig 4. Here pressing the command “G” lists all gh files under current path, in this case the 7 gh-files which were converted above:

Fig 5. And finally, pressing R for remove, removes the temp folders and files altogether (Ops, I just now noticed that the reporting of the number of deleted files is missing.):

Grasshopper no searchable
So, all in all, with these few commands I can make my entire history with Grasshopper searcheable in a few seconds (I often use PowerGrep, RegexBuddy and EditpadPro for regex searching and text manipulation).

I will uncomment some of Davids example code next.

Anyway, if anyone is interested, here’s the exe-file (remove the “.gh” extension, which is there only to mask the file so it could be uploaded). Place your own copy of GH_IO.dll in the same folder as this exe-file (wherever you put it) and off you go.

HOW TO

  • To start the program, Win10 wants you to type a period and a slash before the app name, like so:

    .\ghtoghx.exe […your filepath…] :

    ghtoghx.exe.gh (17.5 KB)

  • It is adviceable to terminate folder paths with a backslash as to prevent the path from being truncated one step.

  • If you add a second option after the filepath, like G, H, D, C or R (R = Remove), then the app will execute and terminate directly after.

Sources will be available asap. But now it’s time to hit the sack.

// Rolf

1 Like

Question, if the script code was available as loose text files inside a zip file, would that have allowed you to search it directly?

Grasshopper 2 files use the zip archive as a base (*.ghz extension) and each ghz file will contain a couple of subfiles, such as for example the thumbnail as a jpg or png. I could possibly store user-typed strings as separate textfiles in the archive as well, but I wasn’t planning on that. But if it makes the file contents searchable directly from windows, that would be a good reason.

2 Likes

Yes. PowerGrep and and many other search tools can search zip-files directly. Plain text in a zip file would be a huuuuge advantage.

+Edit: A very smart strategy would be to place all contant which is not disturbed by time or often-changing-layout-info like positions on canvas etc., inside one separate tag, then that tag can easily be grabbed or extracted by any regex tool and examined in more detail inside that tag.

Edit: That would be a big win also for DIFF tools which then would be able to focus (preferably) only one separate (“stable”) part of the file. Preferably the first part (more convenient manual diff). And, don’t forget to “sort” all content based on “stable” attributes, like guids, so that data remains placed consistently inside its storage area after any changes or additions.

// Rolf

1 Like

Even if the extension isn’t *.zip but *.ghz?

No problem. The extension kan be configured to the underlaying archive technology. Exactly that is illustrated here (posted earlier) where I tried to associate *.gh with zip in PowerGrep:

.ghz will be just fine.

// Rolf

1 Like

Hi David, whilst you are considering improvements to the .gh (or .ghz?) File format, is there an opportunity to make it more git friendly?

There have been several discussions about keeping grasshopper definitions in source control.
It would be great to be able to sensibly perform diffs of grasshopper files. Ghx has its limitations

Is this something that’s important to many users?
@andheum had a pretty neat prototype of a visual gh diff tool

Did that tool sort the content?

The big problem with .ghx format is that the order of items can change, even if the logic in the definition doesn’t change (it’s the logic which often is the most imprortant thing to diff on).

Separation
Also timestamps ans positions makes definitions “dirty” in DIFFs, although that wouldn’t be so bad if thumbnails and layout data were placed at the bottom of the ghx file, enclosed or separated by some tag so one could easily extract the contant before that tag and used for diffs.

Sorting
So sorting and separating layout info from non-layout (or “non dynamic”) info would make diff an VCL more meaningful.

Size/Compression
Another problem is the size of the .ghx files (they can become huuuge…). I don’t know if any VCS system can handle archived versions and still perform diff on them(?). For now I use PowerGrep for this, although it requires the pre-processing of the gh files to convert & compress them to *.ghx.gz format.

But all these problems, VCL and DIFF friendliness really should be supported out of the box. Making Grashopper definitions is like any other software development and we need to be able to easily keep track of changes and versions.

// Rolf

2 Likes