This story is about how to make your service reliable and cheap. It’s not going to be a simple thing but trust me it’s worth of your effort.
What does profiling mean?
Well, it depends on the context, but for me, it’s the process which allows to deeply understand how the service works under the hood, how it behaves when put under the pressure and what will happen when the pressure is relaxed.
So I hope that the next question just popped up…
Why do we profile?
To make our service cost-optimal, resilient and to gain knowledge about our solution.
- What resources do we need to run our code?
- How long and how much memory need our running Lambda?
- Are we performing not required / expensive requests?
- How much our infrastructure costs per month?
- How much one request costs?
- How our service behaves under the load?
- Are there any memory leaks?
- Is the service scaling up and down?
- How quickly it’s scaling?
- Does container work under the load?
- Do we protect our legacy systems?
- How heavy load we put on the database?
Design your service like you’re on call — you don’t want your phone calling at 2.30 am. You just don’t want that.
When should I profile my service?
It’s a good question and the answer is quite easy. ASAP! Yes — As Soon As Possible. In my opinion, you should start thinking about profiling whenever you start to design your system. This is the time when you will (at least you should) get or ask about non-functional requirements and the business problem you’re going to solve. At this point in time, some concepts will start to grow in your head and you should start questioning this:
- Will it cope with the load?
- What will be the cost of running your system?
- How it will work and communicate?
Here we come to 3 pillars, fundamentals for code which you’ll write very shortly.
This one has a lot of benefits. It can surface the first issues with your solution (more about this later) but also later will make communication in the team much easier.
Sequence diagram also should expose the expensive request for example to external APIs (Google Maps for example)
Not everything is visible on the sequence diagram. It makes sense to draw the architecture of your infrastructure. This one should surface additional components which will have an impact on the final cost of your solution.
How much your service will cost when it’s not under the load? What cost should you expect when the load increases? It’s worth to know upfront. Maybe the solution you are thinking about doesn't make sense from a financial perspective or maybe you can take a look at cost optimization earlier.
The question is how small, not big, your infrastructure can be.
Let’s try to work out something for a simple business case. We have a legacy system which stores information from water meters. Currently, the system is used only by the internal team, but they have a great idea to expose it and allow customers to post their data on their own. So such a great idea! It’s 2020 — people can do that on their own, the team can focus on more important things. Let’s get to work!
Looks simple, but let’s think about consequences:
- A customer has to wait for the request to be processed by the current system. Can result in a bad experience and long response time.
- There’s no way to control pressure on the current system. More customers trying to store their readings, more request to API, more requests to the current system and database.
- Points of failure with an impact on the client: DB, Current System, API, Frontend applications.
Let’s fix these issues and make our solution asynchronous.
Looks better. What improvements can we see here?
- Response time: service doesn’t depend on the current “legacy” system. The customer gets a response just after publishing the message in the queue.
- If the Current System is down we have retry mechanism out of the box on the infrastructure level. Lambda fails to store message so exit with non zero status, a message goes back to the queue and Lambda will retry. One problem less to solve in the code! Even more, we can have the Dead Letter Queue and nice notifications.
- We have a funnel which gives us a way to control the pressure on our legacy system. By setting for example Lambda concurrency we can allow for more or fewer requests per second to our system.
Can we improve something here? Of course! In this solution our clients are blind. Our solution is eventually consistent and they have no idea what’s going on with their request. Is it stored? Was there any issue? Let’s try to solve this issue.
Here we go 😄
We’ve added additional persistence for API (ReadingsDB). This one stores the information about the reading meter request. So when user makes the request 002 such document can be stored in ReadingsDB:
This will allow the Frontend application present to the client the current status of request which can be changed by Lambda using the request 015. So after a while, the document can look like this:
"reason": "Reading lower than the previous."
And of course, result can be presented to the Customer. Looks awesome, doesn’t it?
Benefits of the final approach
- Full Control of pressure on Current system.
- Retry mechanism out of the box.
- Information about the status of request available for Customer any time.
And the final one, take a look at the sequence diagram and number next to the request. It makes communication inside the team so much easier. Jira tickets, bugs, issues — team members can directly refer to the request numbers.
So we have flow in our application we are more/less happy with. First pillar is ready. Let’s take a look at the next one.
The price of reliability is the pursuit of the utmost simplicity.
C.A.R. Hoare, Turing Award lecture
Architecture Diagram / Costs
Based on the sequence diagram we can quickly prepare our architecture diagram.
Based on this diagram we can see that there is an additional component and finally for pricing, we need to take under consideration:
- ALB pricing
- ECS / Fargate task pricing
- Database (ReadingsDB) — MongoDB.Atlas pricing
- API Gateway pricing
- SQS pricing
- Lambda pricing
Having these components we need also do some assumptions about the number of Fargate task which will be constantly running (minimum 2), vCPU/memory for tasks, number of LCU for ALB, memory&duration for Lambda etc. There’ll be cost related to Cloudwatch, ECR etc. However, when you prepared the spreadsheet you can nicely see, what is the monthly cost of your infrastructure and how it changes with a number of requests per month. It’s really worth doing!
Be aware pricing of each component you are using and each request you are making. Check everywhere.
We have our pillars to start writing the code! But remember profiling is the continuous process. In each ticket, you should consider if it has an impact on the workflow and may require profiling.
Wait, what? Do I need to profile tickets?
Yes! When you want to put your shiny container into the production environment you need to be sure how it behaves. In the staging environment, you need to perform some tests to verify is the service is scaling up and down. Let’s start with 1 running task and start adding pressure (increase the number of requests). You should identify points when the new tasks are introduced and how long does it take to scale up. After releasing pressure you should see the scaling down taking place.
During this test, you should also observe the logs to identify any exceptions, suspicious logs messages, lost request or strange behaviours. Better you test it now, fewer problems you will have in future.
You can also take a look at the memory metrics. Are there any suspicious patterns? Is your container leaking?
And the most important, during these tests monitor the legacy systems. You have the ways to control the pressure on it:
We have to protect our legacy systems.
What you can use?
What to read?
- AWS Well-Architected
- Serverless Lens
- Understanding AWS Lambda behaviour using Amazon CloudWatch Logs Insights
Teams sometimes find this process difficult, expensive and required a lot of effort. But when the team gains some experience this process should go smoothly and have very small or even no impact on the performance. In the return, we have a big chance to detect any issues on the very early stage of the development cycle when it’s much cheaper to fix the issues.