Kubernetes APIs Graceful Shutdown

Patrick Almeida
6 min readFeb 26, 2021

Why not talk a little bit more about DevOps?

In this article we are going to bring together a portion of knowledge within modern infrastructure and API’s way to start and stop your applications gracefully. LOL

Many companies nowadays run their applications as a big and complex distributed system within Kubernetes, and for sure, Kubernetes is so impressive when we see how beautifully it handles the orchestration of several services simultaneously.

It’s very easy to deploy your applications as a Kubernetes service or deployment and it’s suddenly working and responding faster than a Ferrari. But what about firing up a new version of your application? With a minimum and simple configuration, Kubernetes will first bring up a new version pod, test its liveness and readiness, and then start routing traffic to that new pod so the old one can be killed.

But what about the old pod? Have you noticed that we explicitly said it would be killed, and that’s really what’s going to happen if you don’t treat, somehow, the graceful shutdown of that pod when it receives Kubernetes’ kill signal. And by getting killed a request could be lost in the moment it’s being processed.

As we know, there are many ways to prevent that, your application for sure is stateless, you can have an event based route/application, or any kind of supposed workaround for this. But if your system is already in production, request(s) will be lost within deployment time if you don’t treat your application’s graceful shutdown, especially if your applications do some delayed and heavy duty.

Let’s stop winding up and start explaining technically how to deal with it.

When Kubernetes needs to terminate a pod it sends a SIGTERM signal in order to do the job. And it can be handled in some ways, in this article I’m going to explain how I handled it in a lab.

Now, let’s create our test environment with all the files we need, to reproduce it you must create all files in the same directory, one of your choice. For the initial test I used a little bit more complex API that you can find in my Github. But for this example we are going to keep it simple, this is the API we are using with a single POST route, just put our lab in practice:

## filename
api.py

from flask import Flask
from flask import jsonify
from flask import request
app_name = 'comments-api'
app = Flask(app_name)
app.debug = True
comments = {}@app.route("/healthcheck")
def healthcheck():
return jsonify({"pong": True}), 200
@app.route('/api/comment/new', methods=['POST'])
def api_comment_new():
request_data = request.get_json()
email = request_data['email']
comment = request_data['comment']
content_id = '{}'.format(request_data['content_id'])
new_comment = {
'email': email,
'comment': comment,
}
if content_id in comments:
comments[content_id].append(new_comment)
else:
comments[content_id] = [new_comment]
message = 'comment created and associated with content_id {}'.format(content_id)
response = {
'status': 'SUCCESS',
'message': message,
}
return jsonify(response), 201

And as we are talking about Kubernetes, this is the Dockerfile to build the image:

## filename
Dockerfile

FROM python:3.8-slim-busterADD api.py requirements.txt /app/WORKDIR /appRUN pip install -r requirements.txtEXPOSE 8000STOPSIGNAL SIGTERM # this is optionalENTRYPOINT [ "gunicorn", " - log-level", "debug", " - graceful-timeout", "30", " - bind", "0.0.0.0:8000", "api:app" ]

This is our API’s requirements:

## filename
requirements.txt

click==7.1.2
Flask==1.1.2
gunicorn==20.0.4
itsdangerous==1.1.0
Jinja2==2.11.2
MarkupSafe==1.1.1
Werkzeug==1.0.1

And to make our lives easier, let’s create a docker-compose too:

## filename
docker-compose.yml

version: '3'
services:
sample-app:
container_name: sample-app
build:
context: .
dockerfile: Dockerfile
image: sample-app:latest
ports:
- 8000:8000
restart: always

Now, with every file created, let’s deploy it locally so we can finally test it. Open up a terminal and run these commands:

docker-compose build
docker-compose up

Ok, everything up and running, now we can send some requests in the simplest way possible, create a new python script:

## filename
post-test.py

import requests
import time
for i in range(100000):
resp = requests.post(
'http://localhost:8000/api/comment/new',
json={"email":"email@email.com","comment":f"Post number {i}","content_id":1}
)
time.sleep(0.5)

Open up another terminal window and run the script:

python post-test.py

We are simulating, with docker-compose, a Kuebrnetes environment and soon we are going to simulate it’s SIGTERM signal, but for now we are just going to use: docker kill sample-app.
As we kill the container with the simple docker kill command, it’s not going to send SIGTERM signal and it’s gonna terminate the container all of the sudden without finishing Gunicorn’s workers first. Just open up another terminal and run the command below while the container is running and the request are being made:

docker kill sample-app

And this is the output log:

sample-app | [2021–02–16 00:16:40 +0000] [1] [INFO] Starting gunicorn 20.0.4
sample-app | [2021–02–16 00:16:40 +0000] [1] [DEBUG] Arbiter booted
sample-app | [2021–02–16 00:16:40 +0000] [1] [INFO] Listening at: http://0.0.0.0:8000 (1)
sample-app | [2021–02–16 00:16:40 +0000] [1] [INFO] Using worker: sync
sample-app | [2021–02–16 00:16:40 +0000] [8] [INFO] Booting worker with pid: 8
sample-app | [2021–02–16 00:16:40 +0000] [1] [DEBUG] 1 workers
sample-app | [2021–02–16 00:17:17 +0000] [8] [DEBUG] POST /api/comment/new
sample-app | [2021–02–16 00:17:18 +0000] [8] [DEBUG] POST /api/comment/new
sample-app | [2021–02–16 00:17:18 +0000] [8] [DEBUG] POST /api/comment/new
sample-app | [2021–02–16 00:17:19 +0000] [8] [DEBUG] POST /api/comment/new
sample-app | [2021–02–16 00:17:19 +0000] [8] [DEBUG] POST /api/comment/new
sample-app | [2021–02–16 00:17:20 +0000] [8] [DEBUG] POST /api/comment/new
sample-app | [2021–02–16 00:17:20 +0000] [8] [DEBUG] POST /api/comment/new
sample-app exited with code 137

In this log we can notice that the container exited with code 137 (It’s actually OOM code) and there is no message that Gunicorn’s workers exited gracefully by finishing their job before dying. And this is the real problem, in a production environment within a more complex and with longer requests application, for sure it would drop some requests in the middle of it’s work. By losing these requests you can lose a registration if this routes’ job is to register a user, you could lose a comment in your social media if this is its purpose, you could even lose a money transaction if we are talking about a banking application.

Now, if we work with SIGTERM signal, which is Kubernetes default signal to kill pods, you are going to see that gunicorn is able to handle this signal properly and kill it’s workers before letting the container die, and by properly I mean that every worker will finish its job within the 30 seconds timeout defined in it’s start up in Dockerfile entrypoint.

To simulate a SIGTERM signal that is sent by Kubernetes, we are using the same docker kill command but with an extra parameter — signal=”SIGTERM”. Just open up another terminal window, and run the command below while the container is running and the request are being made:

docker kill --signal="SIGTERM" sample-app

Here is the output:

sample-app | [2021–02–16 00:39:02 +0000] [1] [INFO] Starting gunicorn 20.0.4
sample-app | [2021–02–16 00:39:02 +0000] [1] [DEBUG] Arbiter booted
sample-app | [2021–02–16 00:39:02 +0000] [1] [INFO] Listening at: http://0.0.0.0:8000 (1)
sample-app | [2021–02–16 00:39:02 +0000] [1] [INFO] Using worker: sync
sample-app | [2021–02–16 00:39:02 +0000] [9] [INFO] Booting worker with pid: 9
sample-app | [2021–02–16 00:39:02 +0000] [1] [DEBUG] 1 workers
sample-app | [2021–02–16 00:39:08 +0000] [9] [DEBUG] POST /api/comment/new
sample-app | [2021–02–16 00:39:09 +0000] [9] [DEBUG] POST /api/comment/new
sample-app | [2021–02–16 00:39:09 +0000] [9] [DEBUG] POST /api/comment/new
sample-app | [2021–02–16 00:39:10 +0000] [9] [DEBUG] POST /api/comment/new
sample-app | [2021–02–16 00:39:10 +0000] [9] [DEBUG] POST /api/comment/new
sample-app | [2021–02–16 00:39:11 +0000] [9] [DEBUG] POST /api/comment/new
sample-app | [2021–02–16 00:39:11 +0000] [9] [DEBUG] POST /api/comment/new
sample-app | [2021–02–16 00:39:12 +0000] [9] [DEBUG] POST /api/comment/new
sample-app | [2021–02–16 00:39:12 +0000] [1] [INFO] Handling signal: term
sample-app | [2021–02–16 00:39:12 +0000] [9] [INFO] Worker exiting (pid: 9)
sample-app | [2021–02–16 00:39:12 +0000] [1] [INFO] Shutting down: Master
sample-app exited with code 0

Now you can see that Gunicorn’s worker has finished its work and died before the container shutdown with an exit 0 code. There it is, the graceful shutdown of our application, with assurance that no requests were lost, at least within the set graceful timeout.

For this article I worked with Gunicorn, which is an exclusive production ready HTTP webserver for python. But you can take this example as a much more simple solution than working around all this within the application or somewhere else.
I’m sure that most of the production ready webservers are able to handle SIGTERM properly with none or minimal previous configuration.

This article is surely much more superficial than real life, but it’s purpose is just to introduce an approach involving Dev and Ops solution in one.

I hope this article helped you in some way.

Contact me through my Github if you have any doubt about this article or if just want to chat. =)

--

--