GH: Slow serializing to disk - due to compression?

Two questions regarding (my own) Data Output and Data Input components:

Q1: Everytime I connect an output wire, the Write method is fired. Is it possible to know which output was causing the event?

public override bool Write(GH_IO.Serialization.GH_IWriter writer)
   // multiple values stored, so all these are stored everytime an output is 
   // connected, which they will be everytime the GH definition is opened.
   // (Outputs created on open)


Q2: Is it possible to tweak the serialization so as to not compress the data (if I understood it correctly, the “Serialize_Binary()” method spends quite some time compressing the data :

var bytes = datachunk.Serialize_Binary(); // byte[]

or is it the WriteAllBytes(...) method which is compressing the data?

System.IO.File.WriteAllBytes(fpath, bytes);

Anyway, I would prefer writing with high speed rather that speding CPU cycles compressing data on “super-fast” NVMe disks…

Any trick I could use to speed up the disk-writes?

// Rolf

Did someone say this, or did you profile that?

It is not possible to disable the compression, but then it should not represent the bulk of the work anyway.

You could try and execute the file serialisation (or at least the write) on a different thread.

What sort of write times are we talking about? And is it because the contains a lot of internalised data?

5-10 seconds (meshes).

No. I avoid that from experience… The heavy lifting is parts of meshes, which are “captured” (cut out), then the separated parts are dumped to disk, and then picked up by other definitions for analyse. The “capturing” of the part meshes is for speed-up so as to not having to traverse a big mesh when a specific analysis only concerns a part of the mesh, etc.

That kind of stuff. Some lines and points and breps goes with the dumps as well, but the meshes are the slow ones to read/write from disk.

// Rolf

The WriteAllBytes method is only serializing the data as is without compression besides the serialization itself, I believe. WriteAllBytes is a .Net method not related to Grasshopper. It simply returns a byte() from content passed into it.

The serialize_binary() method from grasshopper api is a helper method for serializing grasshopper data trees into a byte() in a compliant way grasshopper understands. I have not experienced speed issues due to the serialize_binary() method itself, but I will investigate. The bottleneck may be from the data size itself, the grasshopper graph, or a combination thereof.

I don’t know anything about what you are doing in the graph, what other nodes which may cause slowness, but here are some ideas:

  • If you are doing batch processing inside the graph your network should be as simple as possible. Every node can add compute time
  • Use multi-threaded nodes where possible. If the work is independent enough to multi-thread then it should help. You may need to rework your graph design.

***Work with FileStreams throughout the graph life cycle and write to disk only when needed. Then queue up each file to write to disk by a background process. You can create a server-style console app Grasshopper being the client. It would receive the meshes as a Filestream or just a byte() and manage the writing of the files to disk, listen for changes from Grasshopper, etc. After converting Grasshopper data into bytes() we can work with bytes() however we want. If you need speed then streaming the meshes would be helpful. Not sure if the analyse is done on the same computer or a different one but on the same network either way it can work. **

  • Writing 3D data to disk is generally slower then accessing from memory so, I always try to use a api, procol to transport data until I need to write to file.

I have noticed it’s not a great idea when speed is the goal to do all processing within host applications (Grasshopper/Rhino app). I do processing from external tools that automate, manage, open and close gh and 3dm files. This is much more robust for batch processing work and the approach I take.

Hope this helps. Provide more insight if possible.

How does the GH_Chunk performance compare to a straight up ISerializable approach which converts the meshes directly to byte arrays and writes those arrays to disk without compression?

My gut feeling is that the bottleneck here is the conversion from ON_Mesh to byte[] via OpenNurbs, but I may be wrong.

I use both approaches and have not noticed a big difference but this all depend on implementation details and graph complexity. When I have time I will benchmark and post a solution.

Well yes, that’s part of why I’m breaking up big graphs/networks/gh-definitions into smaller “batch-processors” (if your terms graphs& networks alludes to the internal component network inside gh-definitions?)

It’s not only about speed though. It’s also about complexity and extendability of the overall workflow. Basic requirements are

  1. part results can later be extended into different directions for different analyses (a bit like like super class -> sub classes). Starting out with a workflow of a few “basic” analyses, which later will split into a “tree” of variant analyses.

  2. part results must be persisted (dumped to disk). This has also some benefits, despite the cost";
    2.1 A failed workflow can be restarted from the latest successfully dumped data
    3.2 Variant analyse processes can start from such dumped part results at any later point in time without having to rerun the entire workflow.
    3.3 If saving the dumped data on a common file system, slave instances of Rhino/Gh can pick up the data to process following steps (or divert into different sub-workflows from there).

Now, with the basic requirement to dump part solutions, I of course also want each workflow-step ( = each gh-definition) to do its job as fast as possible, which would include the save operation itself. But since saving the resulting data would be the last operation in each workflow step, then in my understanding, threading the save operation wouldn’t save any overall time for each workflow step. This is why I started to pay attention to the save operation itself.

My current gh-definitions are too big, to difficult to modify or to customize into processing variants (different analysis) and as monoliths they are not very scalable. Taken together it motivates to split the workflow into workflow-steps (separate gh-definitions) “connected” via dumped

if the bottle neck is CPU-cycles then not even an in-memory disk would provide better speed. I’m also not certain that an in-memory disk would survive a workflow crash, but perhaps they could work in “isolation” (I have to do some homework on that subject).

Inside the definitions I have tried to make very efficient solutions (packing component netoworks into C# code, threaded which possible, aso). So splitting up the gh definitions into workflow-steps seems unavoidable.

One last thing: If Elefront would have been opensource I would have tested going down that route long ago. I need to be in charge of all critical functionality as to not corner myself and the project, so using 3rd party components come last on my list of alternatives (although I’d love to use much of those fantastic plugins that already exist). If I had a budget I’d offer to buy a source code licence (critical if support and maintenance would stop, can’t take that risk).

Hope this explains my goal, where I’m coming from, and why.

// Rolf

Reading your post again, I should perhaps mention explicitly that I save only “final results” from each gh-definition ( = workflow-step).

Some of these wf-steps save several part-meshes (or sub-meshes being cut out of the big main meshes) and saving multiple such part-meshes may perhaps benefit from threading rather than saving them in one go, or in sequence.

// Rolf

I’m currently trying to “rescue” my computer which is struggling with hardware issues, so I can’t just now try a “straight up ISerializable”. I’ll dig into this next week, if I’m still with a working dev machine then. :face_with_head_bandage: :face_with_thermometer:

// Rolf

1 Like

Good luck.