Unexpected changes to a cluster's document's serialised binary data

Hello

This is getting into the nitty-gritty details of comparing Grasshopper definitions, through my current implementation involving (de)serialising GH documents inside GH_Clusters to byte[] and back again (therefore I humbly suggest @DavidRutten as a potential replier).

I’m trying to obtain a very robust way of detecting whether clusters across documents are functionally identical, using a SHA1 calculation based on byte arrays - an attached script is further below.

Imagine two GH_Documents:

  • HostA containing one cluster, and
  • HostB

When I:

  1. Serialise the document inside the cluster object of HostA to a byte[] using GH_Archive.Serialize_Binary
  2. deserialise that byte[] into a new GH_Document using GH_Archive.Deserialize_Binary then GH_Archive.ExtractObject to obtain the GH_Document
  3. create a new cluster based on that document and add that as a brand new cluster to Host B

Then I’m hoping the SHA1 value of the doc inside the new cluster remains the same as the old.

It mostly works but I’ve come across one (attached) case where, after adding the new cluster to HostB and presssing F5 a few times, the SHA1 value of the doc inside the new cluster changes - only, I can’t see anything having changed.

The cluster document’s objects, properties, DocumentId, number of bytes, etc are all unchanged, but there must a few bytes that have changed because the SHA1 value changes.

This attached file demonstrates this process all in one definition, through internalised data and C# scripts:
Cluster_SHA1_Test.gh (15.9 KB)

If you enable the script component at the top, then press F5 a few times, a new cluster will appear and its displayed calculated SHA1 value eventually changes. Can you tell me why? I know that Grasshopper can change DocumentIds of clusters at will, but that doesn’t happen here, so what else could there be?

And are there any broader/general approaches to being able to detect whether one GH_Document is functionally identical to another?

Any thoughts here will be very helpful for a new plug-in I’m developing. Thanks in advance

Nic

I think I can partly answer this, after seeing and comparing the GHX versions of these documents. It seems like Grasshopper can sometimes alter the nicknames of output parameters of Stream Filter components:

All 3 differences in this case are outputs of Stream Filter components.

Would anyone know why this happens?

And futher to my last reply with what is probably the answer:

Stream Filters are one of the few kinds of components whose outputs change name according to the data being fed into it. If the G input is true, then the output will change to “S(1)”. And hence, this will cause a few bytes in the binary serialised version of the document to change and therefore the SHA1 value.