Auto-scaling on AWS

Hi everyone,

we currently have an application using rhino-compute as backend and are planning to bring this application to production next week. For this to be successful we need a good auto-scaling solution. At my company we use amazon AWS for our server setup. An important detail is that we will need to add new grasshopper plugins to the rhino-compute server once in a while.

As far as I can see there are two workable solutions for auto-scaling on AWS:

  • Using amazon ECS (Elastic Container Service) and ECR (Elastic Container Register). This setup uses docker images for deploying servers. This is the first solution that I tried, because we use ECS/ECR for all our web apps at my company. However I ran into a major issue with the docker image. I can build the dockerfile provided in the rhino-compute repo and push it to ECR. But the container instance fails to pull the image from ECR. I’m currently still in contact with AWS support about this issue. But it seems some of the layers of the docker image can not be downloaded. It looks like these are the Windows layers ( mcr.microsoft.com/windows:1809).
  • Using an AIM (Amazon Machine Image) and EC2 auto-scaling. I guess this should be workable alternative. Although AIMs are less convenient to maintain and setup than docker images.

Does anyone here have any experience with setting up rhino-compute with auto-scaling on AWS? Is there an alternative that I’m missing? Any tips or recommendations?

I actually had the deploy through ECS/ECR go through a few times. But even then it took over an hour to deploy. And about 99% it just fails after trying for over an hour. So I guess I’m either doing something wrong or the docker-image is not suited for rapid deployment. It’s not exactly a lightweight image with its reliance on a full windows base.

Another question I have is regarding the rhino licensing. Right now we have a license with core/hour billing setup. This works well for one rhino-compute ec2 instance. If we launch multiple instances, can we keep using the same license? Or do we need multiple licenses, one for each instance? In case of the latter, is there a solution to automatically distribute these keys?

Kind regards,
Geert

Your core-hour billing set up should be used on all of your instances. I can think of a few scenarios where you might want to have separate billing accounts (for example, billing different apps separately), but if you have one application, then use the same account.

As for the autoscaling AWS and Docker, I’d have to defer to @will @aj1 and @brian.

1 Like

Scaling is pretty outside the scope of what we can help with, because scaling is very problem-dependent. Do the calls to your compute instance generally execute quickly, or in about the same amount of time? Or is the compute load highly variable?

For very consistent workloads, it’s possible to scale out based on current load - if the system is at 60% capacity, scale out. If the system is at 20% capacity, scale in.

But we’ve not seen Rhino-based compute workloads that are predictable. So, it might be that scaling based on a worker queue makes more sense. In this case, all requests would be placed in queue, and your system would scale based on the length of the queue. This involves increased complexity in the interaction between client and server, however.

AWS supports both of these styles of scaling, and provide queue mechanisms that you can build on top of them.

Our current goal is to provide the computational back end for each instance, and not build the whole system - at least not in the short term.

1 Like

I don’t have any experience with ECS but EKS works (pulling images from ECR) so I’d hope ECS would too.

Do you have a rough average on how long a deploy takes using EKS? And what compute option are you using for EKS? EC2 or Fargate?

Windows containers aren’t supported on Fargate. Deployment times depend on the number and size of new layers that need to be downloaded from ECR (plus time to build and push the image beforehand) assuming that the node was already running a previous version of the container. Pulling a new version of the windows image is pretty slow, but in my experience everything else just takes a few minutes.

1 Like

Thanks for all valuable feedback!

I continued looking into the ECS issue with AWS support and it seems it’s a bug in the ECS service running on the windows host. The issue has been reported but I have no ETA on when their internal team will be able to pick it up.

1 Like