Async, UI & Computation

Just asking! I know this is a redacted kind of request/wish/etc., but: would it not be wonderful if SolveInstance would be on the lines of an async (really don’t care wether the multithreaded way or reactor pattern à la node) function that essentially communicates back in an evented way, to the ui and to the rest of the graph?

There’s a zillion ways of doing this in .net nowadays, and I know this implies a rather big architectural change - but if not here, where else could i ask?

PS: I know there’s been work towards multithreaded components, and I haven’t looked into how that is sorted and if it would invalidate this question, so feel free to educate me!

:beers:

Yes, implementing multi-threading on the level just above SolveInstance/RunScript is the best place. I can’t do it for GH1 without breaking the SDK though, so this must wait until GH2.

1 Like

Yay! this sounds good. I know, this by no means is a GH1 topic. I’m excited, as off the top of my head, this would mean components can:

  • report progress of task
  • feedback into themselves (or the rest of the graph) without hacking
  • potentially handle catastrophic failures safely (to a certain extent…)
  • be user-cancellable if frozen (via cancellation tokens or some sort).

Though going fully event-based means also big questions how do you handle one-to-many, many-to-many (all the cross reference parts, etc) if dependant on multiple actors. So it’s a long walk.

1 Like

That would already be possible, but yes, since the UI is no longer locked during a solution at least the progress reporting is easier to implement.

No, there will still be distinct solutions that run in series. There will only ever be (at most) a single active solution. If a new solution is started while another one is still busy, the first solution will be cancelled (although it may keep computing for a potentially long time) before the next solution is allowed to start.

More options to deal with the fallout yes. If a component decides to go into an infinite loop it will not lock the UI so you will be able to at least save the file and close Rhino properly. However it is never a good idea to terminate a threaded process from the outside, as that can leave the application as a whole in a corrupt state.

Yup, along with UI remaining responsive this is the second major benefit. Solutions can be automatically cancelled by starting new solutions, or indeed on user request.


There are also downsides though, mostly to do with code complexity, and thus stability/bug density. Multiple solutions may be running at the same time (all but one of them will have been cancelled, but may still be running), there are race conditions, object expiration will be tied to solution identifiers, … It’s taking me a long time to make it (a) work, (b) fast, and (c) debuggable.

See this is where my incomplete knowledge of how things run behind the scenes shows up. I’ll mull on this. What I had in mind is essentially removing the DA in DAG, but just writing this sounds a rather amateurish idea, potentially detecting cycles and limiting them to a certain xms execution. Anyway, talk is cheap - so i’ll stop.

Perfectly agree with the issue of stability & bug density. The component pattern now is rather stable and easy to grasp. I do nevertheless notice the present tense in

which does raise hopes up, even for a limited implementation. If time allows I’ll hack some stuff around.

So yeah, still in the friday evening armchair philosophising category, I do am churning on whether gh could be switched to essentially node’s reactor pattern or such similar invention.


Essentially the interesting part is the event loop, which would run at “as fast as you can” rate, or bounded by a certain min and max “FPS” that you want to achieve (ie not go in overdrive if it’s a simple loop adding to infinity).

I know .net has its own two interpretations on this, for I/O intensive and compute intensive tasks. Again, talk is cheap so I’ll just go hack on some speckle bugs :slight_smile:

1 Like

I have long wanted all components to act as tasks too since the whole task architecture in .NET lends itself to defining graphs with dependencies.

protected override async Task<SolveOutput> SolveInstanceAsync (SolveInput da,
    GH_CancellationSupport cancel, GH_ProgressReporter reporter)
{
   ...
}

would be awesome :slight_smile:

(edit)… I know this is hard and there are corner cases; it’s just something that would be really cool. Greater use of Tasks in general Rhino would be awesome too.

4 Likes

That was rather disappointing news. Really really bad news. :frowning:

I know you won’t trust users to handle concurrency, but please, make it at least “tweakable” so that someone knowing how to switch that on can do so (an handle “eventual sync” in separate sync-compoenents).

Notice that then you let the user/developer take the consquences if not understanding the need for synchorinization from these parallel computed components (but this is std FBP and not very special at all)

Why would you care if GrassHopper is being (explicitly deliberatly) abused by someomne without a clue? Be cool! :sunglasses:

// Rolf

Why? What benefits do you expect to get from running N solutions simulateously, and how do you expect to collate the outcomes of all those solutions?

Well, why do we do it in code? :sunglasses:

Collate an dispatch can be controlled with the data being sent & received. Data being packed into InfoPackages (IP) where control data starts and stops data stream (IIP’s). Think of a “IP_Goo” kind of thing.

The following data IP’s contains two IIP’s and five IPs (data is self contained and self aware):

“(” - data is coming next
“a”
“d”
“a”
“t”
“a”
“)” - end of data stream

Four (4) sequences of data sent

"(adata)(another data sequence)(nothing follows)()"

Dispatch:

[out  ] --> "(adata)"
[out+1] --> "(another data sequence)"
[out+2] --> "(nothing follows)"
[out+3] --> "()"

Sending this will block the receiving component:

"(adata"

… until it receives

")"

Or whatever control signal being used. Classic FBP.

Collate components receives, and verifies, its data streams per in-port.

This concept plays well with any complex component which does concurrent processing internally (except for perhaps checking for available threads, etc).

// Rolf

I’m not talking about implementation details, I’m wondering what the benefit is of running 6 solutions simultaneously but slowly, instead of running a single solution six times faster.

Imagine a slider being dragged from 0.0 to 1.0 in 0.1 increments. We start out at 0.0 and the assumption is that the document is currently in a solved, completed state. Then the slider moves to 0.1 and a bunch of objects are expired. A new solution starts but while that solution is computing the slider moves to 0.2. Anything that is currently being computed is no longer relevant because it is associated with a state the document is no longer in. So we want to start a solution for 0.2 instead, and as soon as we do there’s really no point in any further investments in the 0.1 solution. Even if we let it finish, we cannot store its outcome because that outcome is wrong for the current state.

How is “internal” component’s concurrency any different from multiple components in parallel? Processes as processes.

A slider, for example, would start sending “(value1, value…” (no stop signal yet)…

and when releasing the slider, and after 2 ms, an “…)” signal is sent. Only now the receiving component’s inport would be able to verify indata and starting to processing it internally.

Dragging slowly would allow the slider to send complete packets (with stop signal included), etc.

Escape signals can be used to abandon long running processes that should be expired. The idea is that the network (wires) would run in a “definition global” thread, while components would pick their own from a thread pool. So components would respond only when port buffers have valid data. etc.

// Rolf

Well that’s a different matter, we were talking about different solutions, i.e. the entirety of all expired components in a single document counts as a single solution.

Different components could be solved simultaneously within a single solution, however I suspect that it’s not all that useful. Mostly bottlenecks in a GH file occur at singular locations that are serial to each other, not parallel.

Take a file containing 4 objects; \mathbb{A}, \mathbb{B}, \mathbb{C}, and \mathbb{D}. \mathbb{B} and \mathbb{C} depend on \mathbb{A}, and \mathbb{D} depends on both \mathbb{B} and \mathbb{C}. In this scheme, once \mathbb{A} has been solved \mathbb{B} and \mathbb{C} could be solved simultaneously, and whenever the slowest one finishes \mathbb{D} can be solved. This is exactly the sort of concurrency that the Task-Parallel-Library is really good at; you queue up a bunch of tasks and you get to specify for each one on which other ones it depends. However it is not a particularly useful approach within GH, because it is exceedingly unlikely that \mathbb{B} and \mathbb{C} are both slowing down the solution as a whole in equal terms, while needing fewer than half the available threads.

Instead of awarding threads to different components, it makes more sense to instead solve components one-at-a-time, but give each component internally access to all available threads. This way a component which has to iterate a lot (offsetting 10k curves for example, or 100k ray/mesh intersections) can use all available threads to run through those loops. You don’t want to use Tasks for potentially small operations like this because of the overhead, so knowing in advance how many threads will be available is important knowledge for optimisation.

If \mathbb{B} and \mathbb{C} are both slow components, then the reason they are slow is most likely because they are both looping over a lot of data. This in turn means it doesn’t really matter whether you solve them in parallel or serially. Either you speed up \mathbb{B} by a factor of 6 because you can throw all your threads at it and afterwards you speed up \mathbb{C} by a factor of 6, or you run them both at the same time while sped up by a factor of 3 because they have to share threads now. Six of one, half a dozen of the other.

OK, I thought that meant one component (“SolveInstance”).

Yup, that’s very typical for any complex network. But it’s up to the designer of the solution to avoid them, with different approaches. Some of them are simply going to be The bottle neck, while others can be avoided.

But especially with user interactive processes not only total processing speed is relevant, also responsiveness, sometimes even to the cost of total processing time.

I’m also thinking about GH_FBP as a “general processing platform” and there are endless kinds of cases which requires different approaches to perform as desired.

That depends entirely on the problem. In many cases, yes, and I know that not all understand the point with “traffic lines” which is not so much parallel as dynamic serial (faster car’s can pass each other, which avoids slowing down the whole, etc).

But when thinking in terms of a manufacturing plant it is inevitable that in many cases you really do want to process material in multiple production lines even using complex machines ( = Components).

Very tight loops with very many items and small operations is of coruse best processed internally in components, but that’s of course not always the case.

And the main point really is to let the end user design his own production lines (component sequences) in his own manufacturing plant (solution) using the various techniques for queing, load balancing etc which - and you already hinted about (“Task-Parallel-Library is really good at;”) - requires planning ahead depending on the data you are processing. (this is why the task-parallel-library consist not only of one scheme…)

Which is exactly why not all cases will be handled very efficiently in “fixed scheme components”, but would benefit from the user’s own design of the overall process flow.

Anyway, from start I have dreamed of seing GH making this possible on this very platform. That would make GH a killer app for “any kind” of massive data processing. Unix pipes comes to mind (but even more flexible than that). :slight_smile:

// Rolf

This thread did explode a bit, but I think Steve did a wonderful job with three lines of exemplary code:

Simultaneously running solutions would help if you’re doing design exploration, for instance. But I see this as a later step, as for sure one solution = one state.

Let’s see where this goes :slight_smile: I’m happy for the conversation :tumbler_glass:

One word: Load balancing. It takes study. Or as you put ut, “design exploration”. :wink:

There’s no “one size fits all” in concurrency. Ring buffers proves part of my point. :slight_smile:

// Rolf

Well, isn’t this exactly the same as saying “The component definition in a main thread (keeps the world sequential) with all the components threaded (from a thread pool)”.

Which is essentially what I have been talking about for some time now. It’s essentially a manufacturing plant (= “gh definition”) with machines (components).

Most important; the general data flow in the network/definition in one single thread (gotta keep track of when the “World” changes) while individual machines (components) run independently (utilizing a thread pool).

That will not solve all concurrency problems, it’s only a generic platform for solving concurrency problems, and it would open up for “design exploration” by the end users so that the best suited concurrency pattern can be applied to specific problems.

Because “design exploration” is a user Task. :wink:

But please do not abandon the project of optimizing the components! (but if designing future proof solutions, make sure that the components, or the thread pool, is aware of how the kernels are used, thus enabling components to take precaution, prioritizing it self up/down, and the main thread to ensure optimal “general flow” through the network avoiding single components to grab all the resources and so cause bottle necks).

On such a platform it would make sense to also consider (optionally) emitting list item results, item by item, as they are done processed. That would enable JIT, Just In Time delivery. Like in a manufacturing plant, where you have it “both ways” - concurrency through the configuration of the overall manufactoring plant, and complex processing in complex machines. These are essentially the same concepts, only scaled to different levels of application.

Basic generic concepts always scale.

// Rolf

I agree. For the few cases where this is really needed, just run six copies of rhino+gh for now.

I don’t really. Any GH file which takes an appreciable amount of time to complete will do so because it is looping over lots of data. These loops can be multi-threaded so that all available processors are working together to speed up that one solution. Running more than one GH solution at the same time will just slow them all down. You will not be done faster.

That said, having a general mechanism for running solutions on GH instances remotely might be quite useful, as you can have a whole bunch of computers all working on different solutions simultaneously. But that sort of computational farming is a completely different project from multi-threading a single instance of GH.

1 Like

The following works for most cases. Could be implemented over night (I used this technique for slave servers in a huge business system, no problem whatsoever). Should be standard components in GH. File names = guid.dat, guid.lock, guid.done etc.

Checks can be done at both ends periodically (whether the files has been picked up and moved to a “processed” folder, within timeout period, if not - resend) and on the receiving end, if all files has been picked up and whether a resulting file was produced (same guid-filename for this test). Whether the result file should contain data or not is irrelevant, only for sanity check.

It just works. I’d love to see this as standard components. Data format is already there, just dump it and pick it up. Selling slave licenses will never become easier than this. :wink:

Edit: BTW, the above would also be a quick and dirty - although robust - solution to running multiple solutions on the same machine (due to the design of resusable “solution modules” which deals with different stages in the overall process, which we discussed earlier). Can be fixed now.

// Rolf