Choosing a consumer service type for message queue processing: deployment or job or serverless?

I architect and partly manage a modular monolith application (~25 services) that runs on Kubernetes on cloud (Azure). I recently got a requirement for a workload with the following characteristics:

The workload has to be run asynchronously (user will be notified as soon as it finishes)
Each workload requires a decent amount of CPU - say 4 cores.
The concurrency for these workloads are not that high, say max 10-20 instances need to run in parallel.
Each workload should finish within 3 minutes. Faster is better always.
This workload is going to be run sporadically through the day, majorly the runs would be concentrated at the start of the working day, but that can span across timezone.

As you can see, that's a highly specific requirement, which warrants a highly specific solution.

What are some of the possible approaches we can take to solve this?

Option 1: Use kubernetes deployments.

The simplest option is to just use kubernetes deployments. It is simple, native, most of the other services would already be deployments. The downside, in this very specific requirement, is the high cost to keep deployments running as each pod would need 4 cores of CPU request at least. We would have to run at least 10 replicas to support concurrency and performance.

If we knew the approximate time window of these workloads, we could schedule scale out of the VMs, get them ready up front. As we don't have a very narrow window, we need to rely on the autoscalers. Horizontal pod autoscaler does not work very well, because the avg CPU utilization of the pod takes a bit of time to hit the levels for scaling out. As the pod is idle most of the time, if concurrently multiple workloads come in, they might get assigned to the same pods, thus degrading their performance. Even if that can be resolved somehow, if there aren't enough resources on the VM, the node autoscaler starts to spin up a new VM, and then new replica gets created on that VM and then the workload has to run there. Significant delay (~2-3 minutes, but significant in our context nonetheless).

Another point to note, if we choose this route, is regarding CPU throttling. Because these are CPU intensive workloads, we need to be aware of this issue. Even if you don't completely remove limits, you need to set the limit to 2 cores higher at least than what is required.

So, it would be great to "reserve" all 4 cores for just one request. And that brings us to...

Option 2: Use kubernetes jobs.

Use one job for processing one request. Jobs start fast (in the context of a 2-3 minute workload), do one thing, and free up the resources once done.

But how do you dynamically create a job? The only controller for jobs that we get from k8s is the cronjob, which is not what we want here. It would be great to trigger a job based on a request, or a message. How do you do this? You can use KEDA to trigger a Kubernetes job! Have the request to run the workload come in to a queue, use KEDA to monitor the queue and create job on each message.

While running as a job solves the problem of using up resources when not in use, it still doesn't handle the case of scaling out when the VM runs out of resources. For example, if the 16 core VM already has 3 jobs running (each say requested 5 cores), what happens when the 4th job is triggered? Node autoscaler has to start a new VM, and that causes a delay. More than the delay, it causes unpredictability in the performance of the workload. One solution to this is to reduce the CPU requests to fit in some extra pods into the existing VMs, assuming your nodes aren't peaking already. (I would have been great to have a node autoscaler in Azure that works based on cpu / memory usage %, rather than pending pods..)

Let's also try a non k8s approach to see if there are alternatives...

Option 3: Use Container as a Service - Azure Container Instance

Use virtual kubelet to schedule the pod on a "virtual" node type, and a pod scheduled on this node is created in the ACI.

Ideally, this should have worked, however, ACI also faces the problem of having to provision the underlying infrastructure if there isn't any. I think of ACI as being placed somewhere between AKS and Azure Functions. (Aren't there enough Leaky Abstractions already?)

So, ACI also has the cold start problem and the start up times are not guaranteed (took anywhere between 0.5-5 minutes for me). It can give you up to 4 cores CPU, so if the workload could have tolerated up to a 5 min startup delay, or the unpredictability, this would have been fine.

Option 4: Serverless

Would Azure Functions work? The consumption plan has a 1 CPU limit, so that's not useful. Wait, AWS Lambda now has a 6 core limit in its pay-as-you-go plan! And it does work. But connecting Azure and AWS (considering everything else I had was on Azure), setting up VPCs, opening up database accesses only to the lambda, making sure the lambda is considered in the Terraform scripts... its a nightmare, and not worth the benefits.

But what about Azure Functions in the Premium plan? That works, allowing up to 4 cores. You can use an Always Ready instance, and some burstable instances (which need to be paid on a consumption basis. I believe you will still get some cold start issue even if you use Always Ready instance, I haven't actually implemented this one. But when I tried out the math in the Azure pricing calculator, it turned to be way more expensive than simply running the VMs (with 3 year reservations).

Final words:

So what did I choose?

KEDA based jobs for running workloads.
Tweak the CPU requests and limits appropriately
Go for 3 year VM reservations in Azure - you get massive discounts. FaaS have no reservation discounts.
For very rarely used services, and services that need high resources, try to avoid deployments. Prefer Kubernetes jobs or Azure Functions (you can even run function runtime in AKS without incurring any cost).

I'm pretty sure there could be a few more possible solutions for this scenario. Even a slight change in requirement would tilt the suitability of one solution over the other.

Photo by Pichara Bann on Unsplash [and modified by me ;) ]

Maybe, it depends.