What is Docker? - SaturnCI - Continuous Integration for Ruby on Rails

Today is your first day at your new job at the hottest startup in your town, DumpsterRental.ninja. Your boss, Scott, has given you your first assignment which is to get the DumpsterRental.ninja app set up on your laptop.

The rocky road to getting onboarded at DumpsterRental.ninja

Scott told you there's a README file you can use to help you get your development environment set up, and if you have any questions, Justin, the lead developer, will be happy to help you. According to the README, you that you need to install Ruby 3.3.5, PostgreSQL, Node, and Redis. Once those are installed you'll need to start the Redis, PostgreSQL and Ruby on Rails servers. You're familiar with Rails, so you know that your setup will be complete when you're able to pull up the DumpsterRental.ninja home page at localhost:3000.

Four hours later, you're unfortunately still not in business. First you got an error saying that one of the gems failed to install because it needed OpenSSL 1.1.1 but on your computer you have OpenSSL 3.0.2.

Once you got past the OpenSSL issue, you discovered that DumpsterRental.ninja requires PostgreSQL 14, but you have a personal project which uses PostgreSQL 16. You can't have both versions running at the same time without complex configuration. You spent a while trying to figure out how to get DumpsterRental.ninja working while keeping your personal project working, but finally you decided to just uninstall PostgreSQL 16 and deal with your personal project being broken for now.

Now Rails starts successfully but you get a raw error when you try to load any page because the ImageMagick library isn't installed. DumpsterRental.ninja? More like DumpsterFIRE.ninja, right???

You ask Justin for help (the lead developer) but he's been pulled into an incident. Over the last couple weeks there have been alerts warning that both the two production servers are occasionally bumping against their resource limits, but there hasn't been time to do anything about it because a security-related upgrade project took even higher priority. Now the resource limit issue has become so frequent that it can't be put off any longer, and a third (and preferably fourth) production server instance is needed, urgently.

Unfortunately, setting up new production server instances is not so easy. The production servers were provisioned three years ago, and the guy who did it is no longer with the company. Justin has been able to get a new production server almost working, but each time he thinks he has solved the final issue, a new problem comes up. What a nightmare.

Sadly, DumpsterRental.ninja is a bit behind the times when it comes to environment setup. Setting up a new environment, whether it be a development or production environment, doesn't have to be a slow and painful experience. Your development environment could have been set up in about five minutes with just one command. Provisioning two new production servers could have been as simple as changing a 2 to a 4 in a configuration file.

Shortly we'll look at the modern way of setting up development and production environments. But first let's be sure to understand the specific weaknesses of DumpsterRental.ninja's manual setup approach.

The downsides of manual setup

When you set up an environment manually, there's no way to be absolutely sure what the right setup configuration is and what all the environment's dependencies are. A README file, as in the DumpsterRental.ninja example, can help of course, but there's no guarantee that such a file will be in sync with reality.

Manual setup is subject to human error. Perhaps you've experienced instances where an installation process doesn't work as advertised for you and then you discover you've missed some crucial step.

When a machine is set up manually, the steps that led to the machine's configuration state are unknowable. This is true even if the setup steps are carefully documented, since it's always possible that the steps that were carried out on the machine didn't exactly match what was documented.

And of course, manual setup is toil. It's a waste of engineer time which could be better spent creating business value. Better if this setup work is automated.

A step toward reliable automation: setup scripts

Instead of performing setup work manually, which after all is usually just a series of shell commands, it can be automated. This is a great step in the right direction. A good setup script can reduce the entire setup process to a single command, saving a huge amount of toil. This approach is not perfect, however, and in reality it's usually not one flawless command. There are two problems.

The first problem is that setup scripts are imperative. In other words, setup scripts describe a series of executable steps rather than a desired end state. (A specification which describes a desired end state is declarative.) Because a setup script is imperative and its commands are executed serially, it can fail partway through, leaving your environment in a partially-set-up state and leaving you with the task of figuring out where and how to pick back up.

The second problem with a setup script (or manual setup for that matter) is that the machine where the environment is always subject to incompatibilities with the script. We saw this in the DumpsterRental.ninja example where two different environments needed two different versions of Redis.

Even better than setup scripts

The execution of a setup script is kind of like a live performance given by a band of musicians reading sheet music. Each musical note is an instruction. Even if the instructions are flawless, the performance may not be. The sheet music could contain a mistake—a flaw in the instructions. Or an external phenomenon, like an ambulance driving by, could disturb the listening experience.

There is a way to guarantee a flawless listening experience, though, which is to record the performance and then play back the recording. If there's a mistake in the sheet music, it can be corrected and the band can try again. If an ambulance drives by the recording studio, the band can record another take. In this way, the listening experience is insulated from any problems that may have occurred during the recording process.

Just as a recorded piece of music insulates the listener from any possible performance problems, Docker's design insulates the environment from any snags that may have arisen during the setup process. How does Docker accomplish this?

How Docker works

Analogous to a band's studio recording session is Docker's build process. During a build process, Docker will read and execute lines of a setup script, just as musicians will read and play notes on sheet music.

The "sheet music" Docker reads (the setup script) comes from something called a Dockerfile. (To be honest, I don't think Dockerfile is a very good name. "What should we call the setup script?" "Um...well this is Docker...and it's a file...so how about the Docker...file?" "Perfect!" I wish they would have given it a name that reflects what it is, like setup for example, but Dockerfile is what we get.)

Just as a piece of sheet music can contain a mistake, such as an off-key note that sounds bad, the setup instructions in a Dockerfile can contain code that won't execute. In this case the build process will halt and the result won't be kept. Same thing if an external factor causes the build process to fail, such as a network failure. The end result will only be kept if the build process completes successfully. In this event, the artifact produced is called an image. An image is like a master recording, the source from which where all the listening experiences originated.

Here is where we must part from our musical analogy. When a recording gets played, all that's produced are very ephemeral sound waves in the air. When a Docker image gets "played", what happens is not so comparable to a musical recording being played. A Docker image is basically a specification of an environment. More precisely, a Docker image is a specification of an environment and the operating system the environment is running on. When a Docker image is "played", it brings into existence an entire (virtual) computer, pre-loaded with the software and filesystem that resulted from the build process. This virtual computer is called a container. Just as the entire reason a band hits the studio is to offer a listening experience for its fans, the entire point of a Dockerfile and a build process is to produce a container.

In fact, why don't we create a container of our own right now.

A concrete container example

Below is some "sheet music", a real Dockerfile. As we saw, a Docker image is a specification not only of what software a container runs but also what operating system it runs. In this instance the operating system we're running is Ubuntu Linux, version 22.04. The FROM command is what specifies the base image that our image will be based on.

The software we want on our environment includes the Ruby language. We're using the APT package management system to install Ruby on our container. Docker's RUN command can run arbitrary shell commands.

The instructions in our Dockerfile presuppose the existence of a Ruby file called app.rb, which you can see below, after the Dockerfile. The COPY command says "copy from the source app.rb on the host machine (your computer) to the destination /app.rb on the container".

The final command, CMD, specifies the command that should run when the container starts. CMD takes a list of arguments, which in this case are ruby, the Ruby interpreter, and /app.rb, the file that the Ruby interpreter is to interpret. Below is our complete Dockerfile.

# Dockerfile

# Use Ubuntu Linux 22.04 as the container's operating system.
FROM ubuntu:22.04

# Update the package list and install Ruby.
# The -y flag automatically answers "yes" to prompts.
# The && chains commands together so they run as one step.
RUN apt-get update && apt-get install -y ruby

# Copy the app.rb file from the host machine (your computer)
# onto the container.
COPY app.rb /app.rb

# When the container starts, run the Ruby interpreter
# on our app.rb file.
CMD ["ruby", "/app.rb"]

In the spirit of providing the simplest possible example, our Ruby script does nothing more than output the classic expression "Hello, world!".

# app.rb

puts "Hello, world!"

Now let's build our image and run our container. Running the docker build . command gives the following output.

$ docker build .
[+] Building 151.2s (9/9) FINISHED                                                                                    
 => [internal] load build definition from Dockerfile                                                                    0.0s
 => => transferring dockerfile: 533B                                                                                    0.0s
 => [internal] load .dockerignore                                                                                       0.0s
 => => transferring context: 2B                                                                                         0.0s
 => [internal] load metadata for docker.io/library/ubuntu:22.04                                                         4.0s
 => [auth] sharing credentials for registrycache.saturnci.com:5000                                                      0.0s
 => [1/3] FROM docker.io/library/ubuntu:22.04@sha256:09506232a8004baa32c47d68f1e5c307d648fdd59f5e7eaa42aaf87914100db3  58.9s
 => => resolve docker.io/library/ubuntu:22.04@sha256:09506232a8004baa32c47d68f1e5c307d648fdd59f5e7eaa42aaf87914100db3   0.0s
 => => sha256:f85691aa4b9092cbb48212c835b78068e3321656ba2c306dae491e1a02d1b4d3 27.38MB / 27.38MB                       57.6s
 => => sha256:09506232a8004baa32c47d68f1e5c307d648fdd59f5e7eaa42aaf87914100db3 6.69kB / 6.69kB                          0.0s
 => => sha256:40c5d6cde65809ff9d47ce06d1dd01d428a9af388a8918fc7a3310a55a4c39cb 424B / 424B                              0.0s
 => => sha256:37711cf832d3f462205c1ca68ff571e3947cbe3fdeb380a6e51fdd75421eddb9 2.31kB / 2.31kB                          0.0s
 => => extracting sha256:f85691aa4b9092cbb48212c835b78068e3321656ba2c306dae491e1a02d1b4d3                               1.0s
 => [internal] load build context                                                                                       0.2s
 => => transferring context: 54B                                                                                        0.2s
 => [2/3] RUN apt-get update && apt-get install -y ruby                                                                87.7s
 => [3/3] COPY app.rb /app.rb                                                                                           0.0s
 => exporting to image                                                                                                  0.4s
 => => exporting layers                                                                                                 0.4s
 => => writing image sha256:7b2f82a21fd8ec2c1dbef89249245e1572863bd391e431cc4ea8c5473ca1a834

The final line shows that the id of our image is 7b2f82a21fd8ec2c1dbef89249245e1572863bd391e431cc4ea8c5473ca1a834. We can start a container based on this image using the docker run command.

$ docker run 7b2f82a21fd8ec2c1dbef89249245e1572863bd391e431cc4ea8c5473ca1a834
Hello, world!

Actually, we don't have to use the full id every time. Docker allows an abbreviated version for convenience.

$ docker run 7b2f82a21fd8
Hello, world!

7b2f82a21fd8

To show that this Docker container is running its own operating system, we can run the uname -a command, which will show Linux even though my computer, the host machine, is a Mac.

$ docker run 7b2f82a21fd8 uname -a
Linux bce33c9caf30 5.10.76-linuxkit #1 SMP PREEMPT Mon Nov 8 11:22:26 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux

See? It's a little computer on your computer. To me, there's something magical about that. Now let's tie all this back to your troubles at DumpsterRental.ninja.

Running additional services

What if your development environment needs an additional service in order to run, like PostgreSQL for example? You might think that you would put PostgreSQL in your Dockerfile but this is not how it works. In Docker there's a principle of "one container, one process". Why does this principle exist?

My interpretation of the "one container, one process" principle is that the small benefit (if any) that would be gained by stuffing multiple processes (PostgreSQL, Redis, Elasticsearch, etc.) into one container would be far outweighed by the downsides of these processes being so tightly coupled together. From a configuration perspective, it wouldn't be simpler to stick two processes in the same container, in fact it would be more complicated because the container would be serving two masters, and the container's configuration would be confusing because it would be a mix of the needs of its two different processes. And for what benefit? Containers are cheap. Virtually nothing is saved by putting multiple processes into one container. When each container is dedicated to just one process, the design of the system as a whole can be more modular and easier to understand.

When you create a Dockerfile, you're creating one setup configuration for one container serving one process. If an addition to your main (let's say) Ruby application your development environment requires a PostgreSQL server, then that will be a separate container running a separate process.

Does this mean you create two Dockerfiles, one for your Ruby application and one for PostgreSQL? No. Since PostgreSQL is a very common dependency, you can find ready-made images for it. All you have to do is download a PostgreSQL image of your choice and then run a container based on that image. Then your Ruby application can connect to your PostgreSQL container just as if it were a PostgreSQL server running natively on your host machine.

To make it easier to manage multi-process development environment, Docker offers a tool called Docker Compose, which allows you to compose an environment that runs multiple services. Compose is an extremely useful and feature-rich tool—so much so that I'll be covering it in detail in a separate post. For our present purposes, all you need to remember is that a Dockerfile specifies a single container and service, whereas Docker Compose specifies an entire multi-service environment.

How does this all help you at DumpsterRental.ninja?

Let's fast-forward to a hypothetical future where DumpsterRental.ninja has fully embraced Docker for both its development and production environments. What would this change?

Development environment setup

When you tried to set up DumpsterRental.ninja's development environment manually, you ran into a conflict between the OpenSSL version that DumpsterRental.ninja needs (1.1.1) and the OpenSSL version you already have on your computer (3.0.2). With Docker this is a non-issue. Since the DumpsterRental.ninja application runs in a Docker container, there's no need to install OpenSSL 1.1.1 on your computer, only the container. The OpenSSL 1.1.1 dependency can be specified in DumpsterRental.ninja's Dockerfile.

That troublesome ImageMagick dependency—the one you didn't know you needed because it was missing from the README—can be included in the Dockerfile as well. Below is an illustration of what that part of the Dockerfile could look like.

FROM ubuntu:22.04

RUN apt-get update && apt-get install -y \
  ruby \
  imagemagick \
  openssl=1.1.1f-1ubuntu2.20

The Dockerfile can handle the Ruby dependency as well. Instead of using the ubuntu base image, we can use the ruby base image, which itself is based on a different Linux-based base image. (You can poke around in the Ruby base image source code here.)

FROM ruby:3.3.5

RUN apt-get update && apt-get install -y \
  imagemagick \
  openssl=1.1.1f-1ubuntu2.20

There's no conflict between DumpsterRental.ninja's PostgreSQL 14 and your local PostgreSQL 16, because you don't have to install PostgreSQL 14 on your computer. That can be run from a Docker container.

Production infrastructure

Before DumpsterRental.ninja started using Docker, provisioning a new production server that had the same configuration as the original two was difficult if not impossible. Now that there's a Dockerfile that specifies exactly what the production server should look like (which may or may not match the development Dockerfile), creating a new instance can be trivial.

DumpsterRental.ninja happens to be using Kubernetes for its production infrastructure, although that's only one of many options. As it so happens, the number of production instances can be scaled by changing the replicas setting in a Kubernetes config file. Here's what the relevant part of the file looked like before:

services:
  web:
    image: dumpsterrental:latest
    replicas: 2

And here's what it looks like after:

services:
  web:
    image: dumpsterrental:latest
    replicas: 4

Because the production server specifications are already captured in a static Docker image, there's no need to run a setup script each time a new server instance is needed. The image gets plopped onto the server, a container gets initialized based on the image, and that's all there is to it. That means you and Scott and Justin can spend a lot less time fiddling with environment setup and a bit more time helping people rent dumpsters.