Why we migrated from EC2 to EKS to Lambda

A journey from Virtual Machines (EC2) to Containers (EKS) and finally, to Function as a Service (Lambda). Why did we decided to go for "Serverless" FaaS ? Which steps we've gone through ? You'll learn everything in this topic !

Concepts

First of all, we need to define the concepts we will talk about. If you are not familiar with Virtualization, Containers, or Serverless, let's explore them quickly :

Virtualization : makes you able to create many virtual machines relying on one or many physical one(s). Then, you can dispatch the physical ressources over many VMs with a virtualization software relying on the physical hypervisor. A VM alone is not useful. You will have to install the operating system and all the softwares and patches required to make this VM being usable.
Containers : is lightweight and standalone executable package of a software embedding everything needed to run an application, including code, runtime, libraries, settings... Containers relies on the OS layer making them faster to build, deploy, start, stop... than VMs. With AWS EKS, you can easily deploy Kubernetes clusters and run some containers inside pods. It perfectly fits with platform development or distributed software composed of multiple components.
Serverless : is kind of a buzzword. I often hear "hey ! how can my software run if there is no server ?". Indeed, there is no "no-server" 😅. Serverless means that the consumer of the "serverless" service just brings his code, libraries and configuration and don't give a sh*t to the runtime and hardware. Because, of course, there are servers behind the service but you don't have to manage them.

Virtual Machine vs Containers vs Serverless architecture

We also need to define what IaaS, PaaS and Faas are because it has some kind of correlation :

Infrastructure as a Service (IaaS) : is the act of proposing VMs to be provisioned as a self-service.
Platform as a Service (PaaS) : is the act of proposing application environments where developers just have to deploy their softwares deliverables.
Function as a Service (FaaS) : is often misunderstood ☝. With the rise of microservices architectures, software engineers tries to decouple their software the most possible. One way to do this is to have a runtime for each function instead of having one for many of them. We will see later in this post the strengths and weaknesses of this approach from real world examples.

Task to deal with between Physical, IaaS, PaaS and FaaS

As we have made those concepts clear for everyone, let's talk about our journey from VMs to Containers and then, to Function as a Service.

Initial context : VMs with IaaS and monolithic Java software

So, here is our initial situation : we moved our Software from on-premise to AWS with a lift and shift strategy (migration as is). The application is mainly composed of a Java (Spring) monolithic backend (also angular front and PostgreSql database but it's out of scope in this article).

We wanted to decouple our business from inside the monolith to outside bounded contexts like microservices and make it Cloud Ready. So we decided to rewrite some code to make it possible. Here is what we were targeting :

Our software after code rewrite

With a new design comes new challenges and potential benefits :

Tracking distributed traces
Collect microservices logs and centralize it
Use Event Streaming to synchronize data
High availability, fault tolerance, resilience...

Using a single VM for all the microservices would have been a huge mistake because if the VM crashed, we would have lost every services.

Using multiple VMs would have been a possibility but in a cost perspective, not the best one. Moreover, auto-scaling VMs can take several minutes so elasticity would never be in real time.

This is why we decided to replatform our solution to go for containers. As we were pretty experienced with Kubernetes, we choose EKS over ECS.

EKS : Kubernetes to rule them all

Using Terraform, we created our Kubernetes cluster with EKS to make it perfectly deployable from ground to sky.

We deployed our microservices with Helm charts. Here are some of those microservices :

Billing
Order
Admin
Cart
Invoice
...

Overview of some of our miscroservices inside our Cluster

Some of the challenges we succeed in and the goals we reached :

Real-time monitoring : Logs, Traces and metrics centralization with Prometheus
Microservices autonomy
High availability, fault tolerance and resilience
Decoupling : painless code merge with pizza teams behind each service
Infrastructure as code : ability to redeploy the whole platform in minutes within a new Kubernetes Cluster or namespace
many other ones...

So, you might be wondering what we could expect more ?

Well, with the platform and environments growing, we had to bring several underlying EC2 to our EKS clusters since it needs compute to create nodes and the cost of the platform was huge.

So why didn't we go Fargate ?

EKS + Fargate is nice but we had some big challenges about scaling quickly since this software was about buying tickets for concerts. Fargate wasn't well adapted to those huge scale-in/scale-out with real-time scenarios. Moreover, thanks to the trace monitoring, we observed that some of our containers were requiring scaling for only 3% to 5% of the code.

What if we isolated each functions of the code in lambdas with FaaS and delegate the scaling needs and the high availability to AWS ?

We did a Proof of Concept by splitting our Order microservice in Lambda functions only and tested it in Production with good results. Less infrastructure management, less Yamlnetes, high availability, good performances and cost savings.

So the PoC was successful and we started to transform the whole application in Lambdas.

How Lambda functions made us save a lot of money

Why we thought we needed FaaS from a Solution Architect's point of view

Let's take a short real life example to illustrate what made us move to lambda. Imagine that you have a microservice handling end-users operations like authentication, subscribe, update profile, password recovery...

The "login" function might be called 1 million times per day since it is a very popular application. But, the password recovery function is only called 10 000 times per day. Is it worth packaging them in the same microservice ?

From a business/developer perspective, it could, because both of those functions are related to user management.
But, from an ops/architect perspective, if those two functions are on different runtimes, it can be managed/exploited differently.
From a product perspective, if one of the function fails, as long as they are decoupled and stateless, they wont make the whole system fail, leading to a better service level.

Example of cimple CRUD operations with Lambda functions

So, it can make sense to separate those functions into individual deliverables.

Why we thought Lambda made us save a lot of money

It's all about Lambda service level and cost model :

Service level : Lambda has been made to make developers to bring their code, libraries and configurations without taking care of any underlying infrastructure. AWS provide an adapted runtime each invocation. You benefit from native high availability and resiliency.
Cost model : Lambda cost model is a bit complicated to be explained here but I will take a shortcut : you pay only when your function is invoked with a very large free tier per month.

That said, when you are running a container with for example 100 functions inside, you pay for the whole container, no matter how many calls each function is made. Moreover, when 1 of your function needs scaling, the whole underlying pod scales whereas with Lambda, only the concerned function scales.

So, we can say that FaaS + Serverless brings a fine grained software cost and scalability management going deeper and further than containers.

Common received ideas and Lambda challenges

There are some common received ideas that ain't true about lambda functions but also some legit ones. Let's explore them :

Lambda functions are vendor lock-in - FALSE - There are some technologies like OpenFaaS or KNative that makes you able to run FaaS on-premise. You will have to handle the drawbacks of not being in a public cloud (capacity planning etc) but you will perfectly be able to run your application
Big softwares/platforms are hard to maintain when there is hundreds of functions - TRUE - We often need to regroup some functions in a logical bounded context for maintenance and evolutions. There is no such tool or view... yet...
Lambda can lead to underperformance with cold starts - TRUE/FALSE - It depends on how you built the function. If you build a function that requires large libraries with slow context loading, it can lead to performance issues. So beware : a function must be stupid, simple.

Search This Blog

Meidi Airouche's blog