Rhino.compute error when launching more child processes

I’m trying to run the rhino.compute service on an EC2 machine and I’m having some issues when the service tries to launch more then 1 child process.
If I try to submit requests to the server shortly after starting it, I have no issues. I only run into problems when submitting a request after the service has waited long enough to shut down the child processes. When this happens, it’ll spin up the first child with no problem. After that, it tries to start up the other processes, but when it finishes initializing it’ll throw an error saying that the machine actively refused the request when trying to send a request to it.
I’ve tried changing the amount of children generated on startup to be 1 to minimize this issue, and that works for the most part. But if/when I get enough requests for the service to spin up another child process, it creates another process and throws the error (see picture). I left the port number so you can see that it’s the second one, port 6002, that throws the error.

Another oddity is that after this error happens, the max concurrent requests starts going into the negatives.
image

@stevebaer do you have any idea what’s going on here?

Hi @will @stevebaer any news on this issue?

Thanks!

I have been experimenting locally using Hops and seeing cases where a negative number is getting reported. So far I haven’t figured out the cause, but at least I’m seeing this bug.

Hi @stevebaer,

I’ve been able to reproduce it in local with a very small client application and a server with 2 child “nodes”. The commit I’ve been using to test is the latest on the current “master” branch (https://github.com/mcneel/compute.rhino3d/76b83d8816fea1c9f6d208acf346780b8d7538ad.zip) with this parameters

{
"profiles": {
"rhino.compute": {
  "commandName": "Project",
  "commandLineArgs": "--port 6500 --childcount 2 --idlespan 10"
}

The crash is happening in rhino.compute.ReverseProxy.cs line 125, you can add a “try catch” and debug around it:
return await _client.SendAsync(req);

And here is the client application I’ve been using for requesting jobs to the server:
RhinoComputeCrash.zip (76.1 KB)

You need to wait about 10 seconds (because I set the parameter idlespan to 10) before connecting to the Server to let it stop the 2 “child” and the crash happens when it tries to relaunch the second child.

Let me know if you have any problem to reproduce it.

Regards!

Thanks, I’ll try to repeat this with your sample