Docker + Kubernetes + Helm: A comprehensive step-by-step using Java

Ignacio Cicero
12 min readMar 16, 2021

A broad topic this time. In this article I will present a guide about how you can run your application inside Docker and, from there, take it to Kubernetes with Helm. I will try to explain the basic concepts to ease the understanding of what is going on, but I will not go in too deep to avoid information overload.

This guide will be driven by a sample project that I have built, so I will be referring to it in many places. You will find it on my github repo here. I will try to avoid listing every command and focus on the process itself but you will find everything you need in the repo.

In this article I will talk you through:

The application

I wrote a sample application with two rest endpoints:

  • One will increase and return the value of a counter.
  • One will just read the counter.

The counter is stored in a MongoDB database. I reckon it is not right tool for this use case -the appropriate one would have been Redis- but I wanted to have a db that stores the information in the filesystem to showcase kubernetes Persistent Volume Claims (PVC).

If you are not using Java, skip to the next section

I wrote the application in Java as it is the language I use on a daily basis. And, for the sake of variety, I avoided using spring + maven and went for gradle + Javalin. To create the docker image in the build, I used the palantir docker plugin for gradle.

If you are looking for spring + maven, head to this repo where I have used them. All you care about is: the Dockerfile and the maven plugin for Docker. There are a few maven plugins for docker, in that case I used the one from spotify but you could also use the one from frabric8.

You can find the complete source code for this app together with some low level details and instructions about running it locally here.

Docker

Docker is an application created to run applications in containers. But what is a container? The first thing I want to clarify: A container is not a virtual machine. A virtual machine is a fully-fledged operating system than you can run in VirtualBox or VMware. Inside them you can install a UI and many programs as if it were your real PC, obviously consuming tons of resources. Instead, a container is a virtualized environment with bare minimum requirements to run one and only one application, they serve only one purpose. They are small in size compared to a virtual machine, can be quickly created and deleted, you can start many of them at the same time with different configuration. Take for example MongoDB: You can pull a MongoDB image quickly from DockerHub, run it and that is it, you have mongo running in your machine without directly installing or configure anything; The image was build specifically for mongo: all the basic OS modules and proper configuration to make sure it runs out-of-the-box.

The Dockerfile

This file is the recipe of your image. It can define many things, but the basics are:

  • The base image.
  • Arguments.
  • The exposed ports.
  • Files to be added.
  • The command to run.

Without going into too much detail, I will just say that the images are build on layers and that you can create your image starting from an existing image. Take for example my application: In the Dockerfile (we will get to that in a second) you will see this line FROM openjdk:11. That line means: “Begin with an image that can run openjdk java version 11”. If you are not using java, you will need to use some else. For exampleFROM node:14 for nodejs orFROM python:3.8-slim-busterfor python.

In the case of the sample app:

FROM openjdk:11

EXPOSE 8080

ARG JAR_FILE

COPY ${JAR_FILE} docker-k8s-helm-demo.jar

ENTRYPOINT ["java","-jar","docker-k8s-helm-demo.jar"]
  • Start with a base image of openjdk 11.
  • Expose port 8080
  • Define a parameter for the java executable file path. This is because during build time, the executable is created with the version attached to the name. If you hardcode the version in the Dockerfile, it will not work on the next release. It is a way of keeping the Dockerfile agnostic from your build process.
  • Copy the file passed as parameter inside the image with the name docker-k8s-helm-demo.jar.
  • Define an Entrypoint for the application: What to run when it boots.

Once you have that, you need to build the image. In my case, the image is built using the docker gradle plugin but you build it your self with a docker command. First, make sure you have created the executable for your app and run:

docker build --build-arg JAR_FILE=build/libs/docker-k8s-helm-demo-1.0-SNAPSHOT.jar -t my_tag .

from the same directory where you have your Dockerfile. Notice the . at the end, that tells docker the location of the Dockerfile. The -t flag tags the image with a name.

If all goes OK, you should be able to run docker images and see something like this:

REPOSITORY         TAG                  IMAGE ID       
my_tag 1.0-SNAPSHOT 8a6b3cd45513

Then you can run it like this:

docker run --name my_app -p 8080:8080 my_tag

The -p 8080:8080 tells docker to redirect the traffic from the real 8080 port of you machine to the 8080 port of the container. This port-forward works because port 8080 is exposed in the Dockerfile.

Dockerfile-compose

The application from the previous section will also need mongodb to function properly. And if it should ever need another piece of infrastructure like a queue broker or zookeeper that’s something else you will need to run.

As you can see, the more moving parts you add to your ecosystem, the more complicated it gets to spin up an environment. For that, you can use docker-compose. Docker-compose allows you to define in one file which containers you want to run, with which config, networking and dependencies between them. You simply create a file called docker-compose.yaml (see an example in my repo) and run docker-compose up.

Docker-composed is not really used in production for a few things I will cover in the next section. However, it is extremely useful for integration testing in your local machine. Say you need to start a web ui, two instances of a backend application, a load-balancer, a database, a queue broker and an extra application that acts a “simulator” for an external system -a software that feeds external messages to the queue. You could start everything with docker-compose and run your tests locally with any tool of your choice like selenium, JMeter… or even a testing application that is executed outside docker (manually or from the IDE) that runs a set of BDD tests with cucumber or some other tool.

Kubernetes

Now that we have docker-compose, we can run our entire environment with one command, how cool is that? But… what if you need to scale up or down? what if you wanted to auto reboot instances? what if you need a more advanced request routing? what if you wanted load balancing? what if you entire environment needs running more than one application? Kubernetes comes to solve all of these questions.

You could think of Kubernetes as an “orchestrator of containers”. It allows you to define the different pieces of your infrastructure with a few descriptor files. Then the kubernetes cluster handles all the networking, the scaling, the auto restart of apps itself.

Kubernetes basics

Kubernetes is divided into tow parts: a client and a server. The server is the one in charge of performing the operations the cluster. For this, there are many implementations such as minikube (for local environment), EKS (Amazon Web Services) or GKE (Google Cloud). The client (called kubectl) is application that sends commands for the server to run. For example: from your machine you could run kubectl apply -f namespace.yaml to instruct the server to apply whatever service definition the file contains.

Unlike docker, Kubernetes relies on many files (though you could define everything in one but it could be harder to maintain). Ideally, you use one file per service. There are lots of services you can define in kubernetes, being the pod the smallest unit of all. A pod can contain many docker images but they are rarely used in production as you cannot define a desired state of all of them. For this example I will focus on the following services:

  • Deployment: As stated before, you cannot define a desired state for pod. The deployment allows you to specify how many replicas of pods you want, makes sure all of them are configured the same and recreates them automatically in case of failure.
  • ClusterIP: Makes the container visible inside the the kubernetes cluster in case it needs to communicate with other containers.
  • Ingress: It represents an entry point to the cluster from the outside and performs routing of requests like NGINX. For example: if the request path starts with “/api” send it to some ClusterIP, if it starts with “/static” send it to another ClusterIP. We will get back to this later.
  • Persistent Volume Claims (PVC): As it name states, a PVC is used to persist data. But what does it mean? It is used to create a disk allocation outside the pods where data can be stored and not lost when the pod restarts. If you do not use it, your data in the database will be lost if the node dies for some reason.
  • Secrets: They are used to store sensitive data such as passwords inside the cluster. The idea is that some administrator will store passwords inside the container using secrets and then the pods will read from the secret. This means your passwords will not be present in your configuration files. Kubernetes stores the secrets encrypted and it takes takes care of the injecting them into the pods that say they need it.
  • Namespace: This is just a formality for the most part. In a large cluster, the number of objects could be really big and chances are you just want to focus only those belonging to your app. With a namespace, the results of every kubernetes command you run will be limited to just those belonging to it.

Another important concept of kubernetes are the labels. The labels do not have any direct influence on the definition of objects. On one side, the labels are used to highlight to the user some relevant information about the object. However, they have another powerful usage: They make some objects reusable. For example in the app you define a ClusterIP with a selector component: app. This means that the ClusterIP can be reused by any object that is tagged with a label called “component” and value of “app”.

Networking

This is a very complicated topic that requires a thorough read through the documentation. But to make it simpler, see the following diagram of the sample application:

The application has its port 8080 exposed but the only way to talk to it is through the ClusterIP with defines a port and forwards it the container port. In my case I only had one application with one port so it made sense to use the same but nothing is preventing you from using a completely different port in the ClusterIP.

As stated, the ClusterIP makes a port visible to other pods in the cluster. That’s good enough for the app to connect to Mongo, but we also want the application endpoints to be reachable from outside the cluster. For that we define theingress service that acts as a router and gate to the outside world.

Some useful tools that you can download to visualize what is happening in the cluster are Lens or kustomize. Alternatively, you can simply run minikube dashboard that will spin up the built-in dashboard.

In the repository you find will all necessary kubernetes commands for all of these topics… but there is one more thing.

Helm

This one is a rather optional part of this tool set but it is really useful. To make it short, it serves two purposes:

  • To automate the deployment of every kubernetes object. Without it, you should apply every kubernetes file you create, one by one. With helm you create a chart and helm deploys everything for you.
  • To configure the kubernetes objects for different environments. Helm treats the kubernetes files as go templates(See docs here). You can define values that change per environment (say dev, test, prod), definition of objects per environment (say you want so store a password hardcoded for dev and in a secret for prod) or you want a service to be deployed just in dev. There are tons of things you can do with it.

To use it, you just have to define:

  • A Chart.yaml file stating the name of the chart
  • A values.yaml file with values for the variables you use inside your templates
  • A templates folder where you will store your kubernetes files.

An example: The docker image you created first is usually tagged with a version. So every time you release a new version, the number changes and then you will have fish all the places in the yaml files where you need to update that version. With helm you can do the following:

In the deployment file, instead of hardcoding the name/tag of your docker image, you do:

containers:
- name: app
image: {{ .Values.image.name }}:{{ .Values.image.tag }}

And in your values.yaml:

image:
name: nacho270/docker-k8s-helm-demo
tag: 1.0-SNAPSHOT

Then you simply do: helm install docker-k8s-helm-demo app. Here, the name app is the name used in the Chart.yaml file.

Another thing to note is that the charts are also versioned, so it is easy to rollback to a previous version of the kubernetes objects.

Afterthoughts

This article’s intention is by no means a statement of “you must use this, this is the best there is”. Everyone has its own recipe regarding how to do things: what to do with docker, how to define kubernetes objects… to use helm or kustomize or nothing.

These are are the tools that I use on a daily basis and I think they make your life easier, less painful. It is true, though, that as a software engineer you should not be touching all of these infrastructure/devops tools but I think it is super useful to you, and the company, that you know how everything is deployed and design you apps accordingly.

The current state of the art in the industry are the microservices + cloud. Larger companies can have literally hundreds, or even thousands, of these. And if on top of that you add CI/CD services, databases, queue brokers, monitoring dashboards, logging systems, authentication services… and multiple environments (dev, test, prod) of everything, the size of the infrastructure becomes quite unmanageable. Then comes issue of scaling the infrastructure to cope with traffic peaks (and downscaling of course)

Docker, with its repeatable containers for the app, kubernetes, with its desired state for the deployment and routing and, lastly, helm with its versioned charts become a real aid when it comes to maintain these monstrous infrastructures companies have today. Moreover, they speed up the times to create environments in the production services and locally.

Another positive point is that -with some reading obviously- you can take advantage of cloud services like AWS/GCP and easily deploy your app in the cloud without the need of buying/configuring servers and deal with harsh configurations, connectivity, security and all those issues caused by hardware failure. All of those concerns are taken care for you with the cloud/kubernetes interaction. Of course is not something you will learn over a weekend but, in my opinion, is less steep than learning how to configure every single setting in an OS.

Of course, the usage of these tools is still error prone (i.e, they are not the silver bullet), but I think they are helping software to become the end result of a real engineered process. Yes, there will always be that creative process of how to design the software and its features; but if you think about it, all software runs on hardware, there are different environments, configurations for each, security and so forth, regardless of the chosen programming language, how you designed it, which database or queue broker you use. I mean, apart from the small things you need to tweak to build your Dockerfile depending on the technology you want, we haven’t really discussed anything related to a specific programming language.

The end result of all this is that these tools remove a large crafted/bespoke part of the software development process making it more measurable, predictable and repeatable… a more engineered process.

--

--

Ignacio Cicero

I’m a back-end software engineer working in finance. I write about Java and tech that I decide to research.