Rhino.compute error when launching more child processes

jed_segura · May 4, 2021, 2:59pm

I’m trying to run the rhino.compute service on an EC2 machine and I’m having some issues when the service tries to launch more then 1 child process.
If I try to submit requests to the server shortly after starting it, I have no issues. I only run into problems when submitting a request after the service has waited long enough to shut down the child processes. When this happens, it’ll spin up the first child with no problem. After that, it tries to start up the other processes, but when it finishes initializing it’ll throw an error saying that the machine actively refused the request when trying to send a request to it.
I’ve tried changing the amount of children generated on startup to be 1 to minimize this issue, and that works for the most part. But if/when I get enough requests for the service to spin up another child process, it creates another process and throws the error (see picture). I left the port number so you can see that it’s the second one, port 6002, that throws the error.

Another oddity is that after this error happens, the max concurrent requests starts going into the negatives.

will · May 5, 2021, 3:28pm

@stevebaer do you have any idea what’s going on here?

mpcarlos87 · May 11, 2021, 2:24pm

Hi @will @stevebaer any news on this issue?

Thanks!

stevebaer · May 22, 2021, 12:29am

I have been experimenting locally using Hops and seeing cases where a negative number is getting reported. So far I haven’t figured out the cause, but at least I’m seeing this bug.

mpcarlos87 · May 26, 2021, 1:11pm

Hi @stevebaer,

I’ve been able to reproduce it in local with a very small client application and a server with 2 child “nodes”. The commit I’ve been using to test is the latest on the current “master” branch (https://github.com/mcneel/compute.rhino3d/76b83d8816fea1c9f6d208acf346780b8d7538ad.zip) with this parameters

{
"profiles": {
"rhino.compute": {
  "commandName": "Project",
  "commandLineArgs": "--port 6500 --childcount 2 --idlespan 10"
}

The crash is happening in rhino.compute.ReverseProxy.cs line 125, you can add a “try catch” and debug around it:
return await _client.SendAsync(req);

And here is the client application I’ve been using for requesting jobs to the server:
RhinoComputeCrash.zip (76.1 KB)

You need to wait about 10 seconds (because I set the parameter idlespan to 10) before connecting to the Server to let it stop the 2 “child” and the crash happens when it tries to relaunch the second child.

Let me know if you have any problem to reproduce it.

Regards!

stevebaer · May 28, 2021, 2:57pm

Thanks, I’ll try to repeat this with your sample

Topic		Replies	Views
Unable to start a local compute server compute.rhino3d windows , rhinocommon , rhino7	7	286	April 15, 2024
Odd behavior of — childcount in rhino compute compute.rhino3d windows , rhinocompute	5	308	January 25, 2024
Rhino compute multi-threading compute.rhino3d windows	9	1045	July 14, 2024
Force start Compute compute.rhino3d windows	0	189	June 2, 2023
Compute server down after ec2 restart + bootstrap file issues compute.rhino3d windows , rhinocompute	2	482	November 2, 2023

Rhino.compute error when launching more child processes

Related topics