Docker Basics
Docker Basics#
What is docker?#
It is an open platform to build, ship and run distributed applications. It was introduced in 2013 by Soloman Hykes. However Linux containers have been around since 2008.
How does Docker help:
- Packaging software in a way the leverages skills developers have
- A way to bundle dependencies
- Using some artifact to test and deliver - accelerates development cycles
- Less resource intensive than VM’s - reduces infrastructure costs
- Helps onboard new developers more quickly
- Lowers the wall between development and operations teams
Understanding docker is essential to build and operate cloud-native applications - scalable, highly available, and run on managed cloud infrastructures
Eventually container orchestration technologies like kubernetes will be required
Ultimately it reduces complexity of communication between development and infrastructure teams. Letting companies ship better software faster. As teams get bigger the burden of communication becomes harder. No need for developers to request a package version installed on a production server. Provides a layer of isolation. Nature of throw-away containers prevent relying on old artifacts. Also allows for better reliability and scalability.
Many Parts:
- Docker engine: powers docker locally - a portable, lightweight runtime and packaging tool
- Docker hub: cloud service to share images
- docker compose: mechanism to define and run multi-container applications
- docker swarm: container orchestration for production
- docker registry: server side application that stores and lets you distribute Docker images
- docker machine: tool that lets you install Docker Engine on virtual hosts and manage hosts
However when people are talking docker they are usually talking
docker engine
What Docker is Not#
- Enterprise Virtualization (VMWare, KVM): Complete OS on hypervisor - docker is process based and lightweight
- Cloud platform (Openstack, cloudstack): docker does not allow for creating of new host systems, object stores, block storage
- Configuration management (Chef, Ansible and Puppet) - docker can lesson the load but does not maange state
- Deployment framework (capistrano, fabric)
- Workload management (Mesos, Kubernetes, Swarm)
- VM manager / Development environemnt (vagrant) - a virtual machine manager
Atomic host - Small finely tuned image like CoreOS
Serverless vs Docker vs VM’s#
They are not competing - they should be used together for maximum benefit.
The Dockerfile
and docker-compose.yml
can take the role of an operating manual.
In years past, getting a development environment running was often a multiday task.
Nowadays it is:
- Install docker
- Clone the repo
docker-compose up
Always learn the fundamentals first - mastering the basics will give you greater success later on.
The most important piece of your project, the Dockerfile
You’ll learn that any application code can be packaged in a container image, regardless of age, framework, language, or architectural pattern.
The Docker Landscape#
Process Simplification#
Our experience has shown that when you are following traditional processes, deploying a brand new application into production can take the better part of a week for a complex new system.
DevOps can help but also requires arguably more time, effort and communication.
This time wasted can limit the kinds of innovation that development teams will undertake in the future.
If deploying software is hard, time-consuming, and dependent on resources from another team, then developers may just build everything into the existing application in order to avoid suffering the new deployment penalty.
An updated deploy may be:
- Development team build image and ship to registry
- Operations engineers provide config details and provision resources
- Developers trigger deployment
Clearer division of responsibilities
Broad Support#
Docker is well supported by large public cloud providers. * AWS: Elastic Container Service (ECS), EKS (ECS for Kubernetes), Fargate and Elastic Beanstalk. * Google AppEngine and Google Kubernetes Engine * Red Hat OpenShift * IBM Cloud * Microsoft Azure Container Service * Rackspace Cloud * Docker Cloud
At DockerCon 2014, Google’s Eric Brewer announced that Google would be supporting Docker as its primary internal container format
A lot of money is backing the success and stability of the platform.
Docker has traditionally been developed on the Ubuntu Linux distribution. Also RedHat has gone all-in on containers and all of their platforms have first-class support for Docker.
Red Hat’s CoreOS, which is built entirely on top of Docker containers, either running on Docker itself or under their own rkt
runtime
In the first years after Docker’s release, a set of competitors and service providers voiced concerns about Docker’s proprietary image format. Containers on Linux did not have a standard image format, so Docker, Inc., created their own according to the needs of their business.
The OCI (Open Container Initiative) is a spec for images/runtimes, these claim to implement:
runc
- part of dockerrailcar
- oraclekata containers
- intel and openstackgVisor
- google - implemented in userspace
Architecture#
Its fundamental user-facing structure is indeed a simple client/server model via API
The docker command-line tool and
dockerd
daemon talk to each other over network sockets
Ports:
TCP 2375
for unencryptedTCP 2376
for encrypted
docker swarm
was an oler, now deprecated project that was a standalone application. There is now a newer built-in version called swarm mode
.
Docker also has it’s own toolset: Compose, Machine and Swarm. However Docker’s offering in the production orchestration space have been overshadowed by Google’s Kubernetes and Apache Mesos.
Container Networking#
- Most people run their containers in the default configuration, called bridge mode.
- A bridge is just a network device that repeats traffic from one side to another
- The Docker server acts as a virtual bridge and the containers are clients behind it
- Docker lets you bind and expose individual or groups of ports on the host to the container so that the outside world can reach your container on those ports
Getting the Most out of Docker#
Docker’s architecture aims it squarely at applications that are either stateless or where the state is externalized into data stores like databases or caches. Those are the easiest to containerize.
Putting a database engine inside Docker is a bit like swimming against the current - this is not the most obvious use case for Docker.
Some good applications for beginning with Docker include web frontends, backend APIs, and short-running tasks like maintenance scripts that might normally be handled by cron.
Containers are not Virtual Machines#
- Think of docker containers not as virtual machines but as very lightweight wrappers around a single Unix process (they are process based)
- Containers are also ephemeral: they may come and go much more readily than a traditional virtual machine
- if you run Docker on a Mac or Windows system you are leveraging a Linux virtual machine to run
dockerd
, the Docker server. However, on Linuxdockerd
can be run natively and therefore there is no need for a virtual machine to be run anywhere on the system - There is limited isolation. The default container configuration just has them all sharing CPU and memory on the host system - this means that unless you constrain them, containers can compete for resources on your production machines
- Containers are lightweight - just reference to a layered filesystem
You probably wouldn’t, for instance, spin up an entire virtual machine to run a
curl
command to a website from a remote location, but you might spin up a new container for this purpose
Warning: Many types of security vulnerabilities or simple misconfiguration can give the container’s root user unauthorized access to the host’s system resources, files, and processes.
Immutable Infrastructure#
An immutable infrastructure, where components are replaced entirely rather than being changed in place. The difficulty of creating an idempotent configuration management codebase has given rise to popularity of immutable infrastructure.
CoreOS can entirely update itself and switch to the updated OS - without requiring decommissioning.
Many container-based production systems are now using tools such as HashiCorp’s Packer to build cloud virtual server images and then leveraging Docker to nearly or entirely avoid configuration management systems
Stateless Applications#
A good example of the kind of application that containerizes well is a web application that keeps its state in a database
- Stateless applications are normally designed to immediately answer a single self-contained request, and have no need to track information between requests from one or more clients
Local state relies on configuration files - this limits reusability and makes it harder to deploy. In many cases, you move configuration state into environment variables that can be passed to your application from the container.
Rather than baking the configuration into the container, you apply the configuration to the container at deployment time
This allows you to easily do things like use the same container to run in either production or staging environments
Externalising State#
- Configuration is best passed by environment variables
- Applications that need to store files, however, face some challenges. Storing things to the container’s filesystem will not perform well, will be extremely limited by space, and will not preserve state across a container lifecycle
- It’s best to design a solution where the state can be stored in a centralized location that could be accessed regardless of which host a container runs on - Amazon S3, EBS volumes, HDFS, Openstack swift, a local block store or even mounting EBS volumes.
In short avoid containerising applications that make extensive use of the filesystem.
The Docker Workflow#
Revision Control#
- Filesystem layers - Docker will use as many base layers as it can so that only the layers affected by the code change are rebuilt
- Image tags - image tagging at deployment time - which can be standardised across applications
Also makes communication between teams and tooling simpler.
Note on
latest
: It is a bad idea to uselatest
in production workflows. As dependedncies can be updated without your knowledge. Rolling back is also an issue as what tag will you use.
Building#
Building an application is a dark art in most corporates - only a few know how to do it for different applications. Docker doesn’t solve all the problems, but it does provide a standardized tool configuration and toolset for builds
To consume a Dockerfile
and produce a Docker image:
docker build
Each command in a Dockerfile generates a new layer in the image.
Modern multistage Docker builds also allow you to define the build environment separately from the final artifact image - that is why environment variables must not be baked into the dockerfile?
Testing#
- By design, containers include all of their dependencies, tests run on containers are very reliable
- Docker containers can be a real lifeline to developers or QA engineers who need to wade into the swamp of inter-microservice API calls.
- Integration tests are easier
Packaging and Deployment#
- Docker tools only use one thing - the container - the docker image.
- Docker has one line to build and run on a host
- Standard docker client handles deploying to a single host at a time.
Orchestration#
- Mass deployment tools: New Relic Centurion, Spotify’s Helios or Ansible Docker tooling
- Fully automated schedulers:
- Apache Mesos
- Googles’ Kubernetes
- Others: Hashicorp’s nomad, CoreOS’s Techtonic, Mesosphere DC/OS and Rancher
Atomic Hosts#
- Traditionally, servers and virtual machines are systems that an organization will carefully assemble, configure, and maintain to provide a wide variety of functionality that supports a broad range of usage patterns
- Updates must often be applied via nonatomic operations
Additional Tools#
- Rancher’s Convoy plug-in for managing persistent volumes on Amazon EBS volumes or over NFS mounts
- Weaveworks’ Weave Net network overlay
- Microsoft’s Azure File Service plug-in
Installing Docker#
A lot of seemingly complex setup info in the book, I just installed MacOS docker desktop
Testing Docker#
Open a shell sessions on alpine linux:
docker run --rm -ti alpine:latest /bin/sh
alpine
is the image name and latest
is the image tag
Docker Server#
Running the docker daemon manually is as simple as:
sudo dockerd -H unix:///var/run/docker.sock -H tcp://0.0.0.0:2375
won’t work on window or mac
If you ever have a need to access the underlying VM:
docker run -it --privileged --pid=host debian nsenter -t 1 -m -u -n -i sh
/ # cat /etc/os-release
PRETTY_NAME="Docker Desktop"
/ # ps | grep dockerd
1382 root 8:14 /usr/local/bin/dockerd -H unix:///var/run/docker.sock --config-file /run/config/docker/daemon.json --swarm-default-advertise-addr=eth0 --userland-proxy-path /usr/bin/vpnkit-expose-port
15267 root 0:00 grep dockerd
Working with docker images#
- Every Docker container is based on an image
- Images are the underlying definition of what gets reconstituted into a running container, much like a virtual disk becomes a virtual machine when you start it up
- Every Docker image consists of one or more filesystem layers that generally have a direct one-to-one mapping to each individual build step used to create that image
Images make up everything you do with docker:
- building images
- uploading (pushing) images to an image registry
- Downloading (pulling) images to an image registry
- Creating and running containers from an image
Anatomy of a Dockerfile#
- This file describes all the steps that are required to create an image.
- would usually be contained within the root directory of the source code repository for your application
Example for a nodeJS application:
# The Base Image locked to specific release
FROM node:0.10
# Key-value pairs to search and identify images/containers
LABEL "maintainer"="anna@example.com"
LABEL "rating"="Five Stars" "class"="First Class"
# Set the user to run processes on the container
USER root
# Set Shell environment variables
ENV AP /data/app
ENV SCPATH /etc/supervisor/conf.d
RUN apt-get -y update
# Run CLI instructions
# The daemons
RUN apt-get -y install supervisor
RUN mkdir -p /var/log/supervisor
# Copy files from either the local filesystem or a remote URL into your image
# Supervisor Configuration
ADD ./supervisord/conf.d/* $SCPATH/
# Application Code
ADD *.js* $AP/
# Change the working directory for the remaining build instructions
WORKDIR $AP
RUN npm install
# Defines the process you want to run within the container
CMD ["supervisord", "-n"]
- Each line in a Dockerfile creates a new image layer that is stored by Docker
- This means that when you build new images, Docker will only need to build layers that deviate from previous builds - you can reuse all the layers that haven’t changed
- You could build a Node instance from a plain, base Linux image
- The Node.js community maintains a series of Docker images
-
You could lock the image to a specific point release of node:
node:0.10.33
-
You can check image meta data with:
docker inspect <image-tag>
MAINTAINER
is a deprecated field in the Dockerfile specification- By default, Docker runs all processes as
root
within the container - It is not recommended that you run commands like
apt-get -y update
oryum -y update
in your application’sDockerfile
- requires crawling the repository index each time you run a build - not repeatable ADD
actually copies the files to the image - so once the image is built you don’t need access to them.- Remember that every instruction creates a new Docker image layer, so it often makes sense to combine a few logically grouped commands onto a single line
- The order of commands in a Dockerfile can have a very significant impact on ongoing build times. You should try to order commands so that things that change between every single build are closer to the bottom
- When you rebuild an image, every single layer after the first introduced change will need to be rebuilt.
- It is generally considered a best practice to try to run only a single process within a container
- A container should provide a single function so that it remains easy to horizontally scale individual functions
Due to potential security risks, production containers should almost always be run under the context of a nonprivileged user
Building an Image#
When cloning a repo you will get something like this:
$ tree -a -I .git
.
├── .dockerignore
├── .gitignore
├── Dockerfile
├── Makefile
├── README.md
├── Vagrantfile
├── index.js
├── package.json
└── supervisord
└── conf.d
├── node.conf
└── supervisord.conf
2 directories, 10 files
.dockerignore
file defines what you don’t want to upload to the docker host when building the image.package.json
defines the nodejs app and lists dependencies.index.js
is the main source code of the application- The
supervisord
directory contains the configuration files forsupervisord
to start and monitor.
To build the image:
docker build -t example/docker-node-hello:latest .
This builds the image:
Successfully built d9dff15d5b87
Successfully tagged example/docker-node-hello:latest
To improve the speed of builds, Docker will use a local cache when it thinks it is safe. This can sometimes lead to unexpected issues because it doesn’t always notice that something changed in a lower layer. In the preceding output you will notice lines like ---> Running in 0c2dc15cab8d
. If instead you see ---> Using cache
, you know that Docker decided to use the cache.
You can disable the cache for a build by using the --no-cache
argument to the docker build command.
Troubleshooting Broken Builds#
We hope that things just work, but in reality that is rarely the case.
If we change apt-get -y update
to apt-get -y update-all
, you get:
Step 7/14 : RUN apt-get -y update-all
---> Running in 06cd2cf607a0
E: Invalid operation update-all
The command '/bin/sh -c apt-get -y update-all' returned a non-zero code: 100
Running in 06cd2cf607a0
gives us the container id. Sometimes you will just get an image id: ---> 8a773166616c
.
In that case, you can run an interactive container to determine why the build is not working:
docker run --rm -ti 8a773166616c /bin/bash
Fix the error.
Running your Image#
Once you have built your image, you can run it on your docker host with:
docker run -d -p 8080:8080 example/docker-node-hello:latest
Which means create a running container in the background from the image with example/docker-node-hello:latest
tag and port 8080
in the container mapped to port 8080
on the host.
You can verify it is running with:
docker ps
docker container list
If you are running docker locally you can go to: http://127.0.0.1:8080
Otherwise check the docker host ip with echo $DOCKER_HOST
or docker-machine ip
.
To send environment variables to the container:
Stop it first
docker ps
docker stop 85fc21edaad9
You can then update the $WHO
environment variable with:
docker run -d -p 8080:8080 -e WHO="Stephen" \
example/docker-node-hello:latest
This will update the application.
Custom Base Images#
These are the lowest level images that other docker images are based on.
Most are minimal installs of linux distributions: ubuntu
, fedora
, centOS
and Alpine Linux
.
However, there are times when it is more preferable to build your own base images rather than using an image created by someone else
Sometimes you want to get the image size down. Alpine Linux
is designed to be small and is popular for docker.
Alpine is based around musl libc
not gnu libc
.
It has the largest impact on Java-based applications and DNS resolution
It is widely used in production but ships with /bin/sh
and not /bin/bash
.
Storing Images#
- You don’t normally build the images on a production server and then run them
- Ordinarily, deployment is the process of pulling an image from a repository and running it on one or more Docker servers.
Public Registries#
This is probably the right first step if you’re getting serious about Docker but are not yet shipping enough code to need an internally hosted solution. One of the downsides is that every layer of every deployment might need to be dragged across the internet in order to deploy an application.
Private Registries#
You can auth with docker login
- this logs into docker hub.
It writes a dotfile to cat ~/.docker/config.json
.
You can logout to remove cached credentials:
docker logout
You might be able to log into a different registry:
docker login someregistry.example.com
You can change the repo of your image with:
docker tag example/docker-node-hello:latest \
${<myuser>}/docker-node-hello:latest
Check if the image exists on the server:
docker image ls ${<myuser>}/docker-node-hello
Upload to the repo (push):
docker push ${<myuser>}/docker-node-hello:latest
If this image was uploaded to a public repository, anyone in the world can now easily download it by running the docker pull command.
Pull the image:
docker pull ${<myuser>}/docker-node-hello:latest
Running a pirvate registry#
More info in the book about creating and running a private registry
Advanced Building Techniques#
Keeping your image sizes small and your build times fast can be very beneficial in decreasing the time required to build and deploy new versions of your software into production
A 1Gb Image that needs to be installed on 100 nodes becomes a problem - a scaling problem. When you deploy new releases multiple times a day it becomes a bigger problem.
Most times a minimal linux distribution is not even required.
For example, go
is a compiled programming language producing static binary files.
docker run -d -p 8080:8080 adejonge/helloworld
Then get the info of the last container you created:
docker container ls -l
You can export the files in the container returned to a tarball
docker export a6f328e6eda1 -o web-app.tar
You can examine the contents of the tarball with:
$ tar -tvf web-app.tar
-rwxr-xr-x 0 root 0 0 Sep 25 15:16 .dockerenv
drwxr-xr-x 0 root 0 0 Sep 25 15:16 dev/
-rwxr-xr-x 0 root 0 0 Sep 25 15:16 dev/console
drwxr-xr-x 0 root 0 0 Sep 25 15:16 dev/pts/
drwxr-xr-x 0 root 0 0 Sep 25 15:16 dev/shm/
drwxr-xr-x 0 root 0 0 Sep 25 15:16 etc/
-rwxr-xr-x 0 root 0 0 Sep 25 15:16 etc/hostname
-rwxr-xr-x 0 root 0 0 Sep 25 15:16 etc/hosts
lrwxrwxrwx 0 root 0 0 Sep 25 15:16 etc/mtab -> /proc/mounts
-rwxr-xr-x 0 root 0 0 Sep 25 15:16 etc/resolv.conf
-rwxr-xr-x 0 root 0 3604416 Jul 2 2014 helloworld
drwxr-xr-x 0 root 0 0 Sep 25 15:16 proc/
drwxr-xr-x 0 root 0 0 Sep 25 15:16 sys/
Almost all the files are 0
bytes in length. These files are required to exist in every linux container and are automatically bind-mounted from the host into the container when it is first created.
The only thing with size is the helloworld
binary.
Containers are only required to contain exactly what they need to run on the underlying kernel
Sometimes a shell is good for troubleshooting so people compromise with a lightweight linux distribution.
We can poke around in alpine linux with:
docker run -ti alpine:latest /bin/sh
We cannot do this with the go image above, because it does not contain a shell or ssh
.
We can’t use ssh
, nsenter
or docker exec
to examine it.
How do we examine a containers filesystem? We need to inspect the image to find where the files are stored:
docker image inspect alpine:latest
"Id": "sha256:055936d3920576da37aa9bc460d70c5f212028bda1c08c0879aedf03d7a66ea1",
"RepoTags": [
"alpine:latest"
],
"RepoDigests": [
"alpine@sha256:769fddc7cc2f0a1c35abb2f91432e8beecf83916c421420e6a6da9f8975464b6"
],
"GraphDriver": {
"Data": {
"MergedDir": "/var/lib/docker/overlay2/33cd56da7058828f2bbc3f162beaf8e611d6a44b96ff15dd76d0b9f6927e11be/merged",
"UpperDir": "/var/lib/docker/overlay2/33cd56da7058828f2bbc3f162beaf8e611d6a44b96ff15dd76d0b9f6927e11be/diff",
"WorkDir": "/var/lib/docker/overlay2/33cd56da7058828f2bbc3f162beaf8e611d6a44b96ff15dd76d0b9f6927e11be/work"
},
"Name": "overlay2"
},
and on the go application container:
docker image inspect adejonge/helloworld:latest
"Id": "sha256:4fa84a96f0d641a79ad7574fd75eabee71e93095fb35af9c30e9b59e3269206d",
"RepoTags": [
"adejonge/helloworld:latest"
],
"RepoDigests": [
"adejonge/helloworld@sha256:46d95092b73dccbcc12cdec910a3673e41cd7c4fabc7bf7413dfb12f709e2a1d"
],
"GraphDriver": {
"Data": {
"LowerDir": "/var/lib/docker/overlay2/86f86888c7eb97264dfab3308d46be90c05b2cc39f8290491aabf080579d9ef1/diff:/var/lib/docker/overlay2/157089801bf8d5c0ba8944a9b50b52d7239ec68d3ac231b2ef5e8f8785925401/diff",
"MergedDir": "/var/lib/docker/overlay2/26103927669ab87c80f1d03e5de4823ec239b4428b3e3c08afafc9180aaa4cef/merged",
"UpperDir": "/var/lib/docker/overlay2/26103927669ab87c80f1d03e5de4823ec239b4428b3e3c08afafc9180aaa4cef/diff",
"WorkDir": "/var/lib/docker/overlay2/26103927669ab87c80f1d03e5de4823ec239b4428b3e3c08afafc9180aaa4cef/work"
},
"Name": "overlay2"
On linux we need to use this command to explore the host machine:
docker run -it --privileged --pid=host debian nsenter -t 1 -m -u -n -i sh
If you go to the diff
folder of the MergedDir
you can see the files.
The size of alpine is:
du -sh /var/lib/docker/overlay2/33cd56da7058828f2bbc3f162beaf8e611d6a44b96ff15dd76d0b9f6927e11be/diff/
5.8 M
In the go case check the lowerDir:
ls -lh /var/lib/docker/overlay2/86f86888c7eb97264df
ab3308d46be90c05b2cc39f8290491aabf080579d9ef1/diff
total 3520
-rwxr-xr-x 1 root root 3.4M Jul 2 2014 helloworld
You can even run this file directly from the docker server - driving how useful statically compiled applications can be
Multistage Builds#
# Build container
FROM golang:alpine as builder
RUN apk update && \
apk add git && \
CGO_ENABLED=0 go get -a -ldflags '-s' github.com/adriaandejonge/helloworld
# Production container
FROM scratch
COPY --from=builder /go/bin/helloworld /helloworld
EXPOSE 8080
CMD ["/helloworld"]
Base your build on the golang image and will be referring to this build image/stage
More relevant stuff in the book
Layers are Additive#
The filesystem layers that make up images are additive in nature. In other words you cannot make your image smaller by deleting files that were generated in earlier steps.
Example:
FROM fedora
RUN dnf install -y httpd
CMD ["/usr/sbin/httpd", "-DFOREGROUND"]
Just docker build:
docker build .
...
Successfully built 2f96aac9e094
Lets tag that build, to refer to it easier:
docker tag 2f96aac9e094 size1
Now lets look at the image layers with:
docker history size1
IMAGE CREATED CREATED BY SIZE COMMENT
2f96aac9e094 About a minute ago /bin/sh -c #(nop) CMD ["/usr/sbin/httpd" "-… 0B
2c0ecaeaa083 About a minute ago /bin/sh -c dnf install -y httpd 223MB
e9ed59d2baf7 4 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0B
<missing> 4 weeks ago /bin/sh -c #(nop) ADD file:22abad47ea7363005… 246MB
<missing> 8 months ago /bin/sh -c #(nop) ENV DISTTAG=f30container … 0B
<missing> 8 months ago /bin/sh -c #(nop) LABEL maintainer=Clement … 0B
Notice 4 layers added no size to our final image, but 2 of them added a great deal of size. The Fedora image that includes a minimal Linux distribution at 246MB. However the 182MB is surprising - apache shouldn’t be that big.
Package managers like apk
, apt
, dnf
or yum
rely on cache which takes a large amount of space and is useless once the package has been installed. The obvious next step is to delete the cache.
FROM fedora
RUN dnf install -y httpd
RUN dnf clean all
CMD ["/usr/sbin/httpd", "-DFOREGROUND"]
Then build, tag and view the history again:
IMAGE CREATED CREATED BY SIZE COMMENT
3203a6f67417 14 seconds ago /bin/sh -c #(nop) CMD ["/usr/sbin/httpd" "-… 0B
a71a22f17812 14 seconds ago /bin/sh -c dnf clean all 1.73MB
2c0ecaeaa083 9 minutes ago /bin/sh -c dnf install -y httpd 223MB
e9ed59d2baf7 4 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0B
<missing> 4 weeks ago /bin/sh -c #(nop) ADD file:22abad47ea7363005… 246MB
<missing> 8 months ago /bin/sh -c #(nop) ENV DISTTAG=f30container … 0B
<missing> 8 months ago /bin/sh -c #(nop) LABEL maintainer=Clement … 0B
Now the image is 1.73MB greater than it was.
Showing the additive nature. The only way to make a layer smaller is to remove file before saving the layer.
The most common way to deal with this is by stringing commands together on a single Dockerfile line. You can do this very easily by taking advantage of the && operator
So rewrite the file like:
FROM fedora
RUN dnf install -y httpd && \
dnf clean all
CMD ["/usr/sbin/httpd", "-DFOREGROUND"]
That drops the file size significantly:
IMAGE CREATED CREATED BY SIZE COMMENT
21eff7fc3868 3 minutes ago /bin/sh -c #(nop) CMD ["/usr/sbin/httpd" "-… 0B
8eaa03cfcd9d 3 minutes ago /bin/sh -c dnf install -y httpd && dnf clean… 19.8MB
e9ed59d2baf7 4 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0B
<missing> 4 weeks ago /bin/sh -c #(nop) ADD file:22abad47ea7363005… 246MB
<missing> 8 months ago /bin/sh -c #(nop) ENV DISTTAG=f30container … 0B
<missing> 8 months ago /bin/sh -c #(nop) LABEL maintainer=Clement … 0B
Boom - massive saving in space.
Optimising for Cache#
We want to keep builds as fast as possible. Keeping feedback loops tight.
Docker uses a layer cache to try to avoid rebuilding any image layers that it has already built
Therefore, the order in which you do things inside your Dockerfile can have a dramatic impact on how long your builds take on average.
let’s customise the previous dockerfile:
FROM fedora
RUN dnf install -y httpd && \
dnf clean all
RUN mkdir /var/www && \
mkdir /var/www/html
ADD index.html /var/www/html
CMD ["/usr/sbin/httpd", "-DFOREGROUND"]
Now we add the index.html
:
<html>
<head>
<title>My custom Web Site</title>
</head>
<body>
<p>Welcome to my custom Web Site</p>
</body>
</html>
Time the build process:
time docker build --no-cache .
...
real 2m3.571s
user 0m0.085s
sys 0m0.185s
Building straight after:
time docker build
real 0m0.635s
user 0m0.075s
sys 0m0.171s
Update the index.html
file:
# Most items will use cache
real 0m1.255s
user 0m0.075s
sys 0m0.172s
Long story short: order matters. The files that change frequently should be lower down in the
Dockerfile
as cache will be invalidated for less tasks. Stable time consuming tasks should be higher up.
Working with Containers#
What are Containers?#
Virtualization - KVM or VmWare let you run a complete linux kernel on a virtualised layer called a hypervisor
.
Provides strong isolation between workloads - each virtual machine has its own kernel in a seperate memory space.
Container share a single kernel. Isolation between workloads happens in one kernel. Operating system virtualization. Provides resource efficiency - one less layer between the task and hardware underneath. A vm call has to go through vm’s kernel then through the hypervisor kernel.
Downside is you can only run containers that are compatible with the underlying kernel. Containers are therefore an OS-specific technology. You can’t run a windows or BSD container on a linux kernel. You need to use KVM or VMWare in that instance.
History of Containers#
- Containers are not a new idea.
- Seeds for today’s containers were planted in 1979 with the addition of the
chroot
system call to Version 7 Unix. chroot
restricts a process’s view of the underlying filesystem to a single subtree- It was used to protect the OS from untrusted processes like
FTP, BIND and Sendmail
Sidewinder firewall built on top of BSDI Unix
created tightly controlled domains on a single kernel - not compatible with many unix installations.- In 2000, FreeBSD released the
jail
command - which allowed shared environments in hosting providers. Between customers and internal users. jail
added restrictions and expanded onchroot
.- In 2004, Sun released Solaris 10 which included Solaris containers. - evolving into Solaris zones. The first major commerical container implementations.
- 2005 - OpenVZ for linux
- 2007 - HP secure resource partitions - HP-UX containers
- 2008 - Linux containers (LXC)
Only in 2013 did the LXC growth start proper with the inclusion of user namespaces in 3.8
.
Creating a Container#
So far we’ve used docker run
. This does 2 things:
1. docker create
- create a container from the underlying image
2. docker start
- execute the container
You can use -p
to map network ports or -e
to pass environment variables to the container.
Basic Configuration#
Settings in the Dockerfile are always used as defaults, but can be overriden at container creation time
Container name: by default docker randomly names your container, if you want to give a name use:
docker create --name="awesome-service" ubuntu:latest sleep 120
You can start the container with:
docker start awesome-service
It will automatically exit after 120 seconds, but you could stop it manually with:
docker stop awesome-service
Labels: Key-value pairs that can be applied to images and containers. New containers automatically inherit the labels from the parent image.
You can also add labels to a container with -l
:
docker run -d --name has-some-labels -l deployer=Ahmed -l tester=Asako ubuntu:latest sleep 1000
You can then search for containers on this metadata:
docker ps -a -f label=deployer=Ahmed
You can use docker inspect <container-id>
to see all the labels a container has.
hostname: docker copies certain files on the host /etc/hostname
and others, into the container’s configuration directory on the host. It then uses a bind mount to link that file into the container.
We can launch a default container like this:
docker run --rm -ti ubuntu:latest /bin/bash
The argument --rm
tells docker to delete the container when it exits.
The -t
argument tells docker to allocate a pseudo-TTY, -i
tells docker that it is going to be an interactive session and we want to keep STDIN
open.
On the docker container run: mount
, you will see:
/dev/sda1 on /etc/resolv.conf type ext4 (rw,relatime,data=ordered)
/dev/sda1 on /etc/hostname type ext4 (rw,relatime,data=ordered)
/dev/sda1 on /etc/hosts type ext4 (rw,relatime,data=ordered)
You can check the hostname with hostname -f
, it will not be fully qualified.
To set the hostname specifically, we can use the --hostname
argument to pass in a more specific value.
docker run --rm -ti --hostname="mycontainer.example.com" ubuntu:latest /bin/bash
domain name service (dns):
resolv.conf
is also managed via bind mount between the host and container.
By default it is an exact copy of the docker host’s resolv.conf
. You can override this with --dns
and --dns-search
arguments.
docker run --rm -ti --dns=8.8.8.8 --dns=8.8.4.4 --dns-search=example1.com \
--dns-search=example2.com ubuntu:latest /bin/bash
root@117e9a8ed06a:/# more /etc/resolv.conf
search example1.com example2.com
nameserver 8.8.8.8
nameserver 8.8.4.4
mac address: more in the book on this
storage volumes: Sometimes you need storage that will persist between container deployments.
Mounting storage from the Docker host is not generally advisable because it ties your container to a particular Docker host for its persistent state. But for cases like temporary cache files or other semi-ephemeral states, it can make sense.
You can use -v
to mount directories and individual files from the host server into the container.
This mounts /mnt/session_data
to /data
within the container:
docker run --rm -ti -v /mnt/session_data:/data ubuntu:latest /bin/bash
By default it is read/write, you can change to read only with:
docker run --rm -ti -v /mnt/session_data:/data:ro ubuntu:latest /bin/bash
None of the mount points need to pre-exist.
If the container application is designed to write into /data
, then this data will be visible on the host filesystem in /mnt/session_data
and will remain available when this container stops and a new container starts with the same volume mounted.
It is possible to tell Docker that the root volume of your container should be mounted read-only so that processes within the container cannot write anything to the root filesystem. This prevents things like logfiles, which a developer may be unaware of, from filling up the container’s allocated disk in production.
We can accomplish this by using --read-only=true
docker run --rm -ti --read-only=true -v /mnt/session_data:/data ubuntu:latest /bin/bash
In this case the root filesystem will be read only, the /data
dir will be read/write.
To make /tmp
writable use the --tmpfs
argument with docker run
Containers should be designed to be stateless whenever possible. Managing storage creates undesirable dependencies and can easily make deployment scenarios much more complicated.
SELinux and volume mounts#
- You might get a
Permission Denied
error when trying to mount a volume to your container. - You can use a
z
option to the docker command - more in the book
Resource Quotas#
Other applications on the same physical system can have a noticable impact on performance and resource availability.
Virtual Machines have the advantage that you can easily and tightly control the CPU, memory allocated. In docker you must leverage cgroup
to control resources.
docker create
and docker run
allow you to configure CPU, swap, memory and storage I/O restrictions.
You can also use docker container update
to adjust at runtime.
If your kernel does not support these things, when you do docker info
…you will get warnings.
CPU Shares#
The computing power of all CPU cores in a system is considered the full pool of shares.
Docker assigns 1024
to represent the full pool.
If you want the container to use at most half of the computing power, you would allocate 512
.
This value is used for scheduling time for CPU usage, not actual limiting.
Use the stress
program container:
docker run --rm -ti progrium/stress --cpu 2 --io 1 --vm 2 --vm-bytes 128M --timeout 120s
then you can see how the system is being affected:
top -bn1 | head -n 15
then half the number of shares assigned with:
docker run --rm -ti --cpu-shares 512 progrium/stress --cpu 2 --io 1 --vm 2 --vm-bytes 128M --timeout 120s
Docker’s
cgroup
limits are not hard limits, they are relative. Similar to thenice
command.
CPU Pinning#
You can assign a container to a specific core or cores.
You pin it with --cpuset=0
…remember this command is 0 indexed.
Using the CPU CFS (Completely Fair Scheduler)
, you can alter the CPU quota for a given container setting the --cpu-quota
flag.
Simplifying CPU quotas#
You can now simplify tell docker the amount of compute you want available to your container.
Use the --cpus
command, a number between 0.01
and the number of cores on your machine.
docker run -d --cpus="40.25" progrium/stress --cpu 2 --io 1 --vm 2 --vm-bytes 128M --timeout 60s
If it is too high you get an error:
docker: Error response from daemon: Range of CPUs is from 0.01 to 2.00, as there are only 2 CPUs available.
You could update the resource limits of more than one container with:
docker update --cpus="1.5" <container_id> <container_id>
Memory#
Important: The memory limit is a hard limit. You can also allocate more memory than the host has - in that case the container will use swap.
You use the --memory
option:
docker run --rm -ti --memory 512m progrium/stress --cpu 2 --io 1 --vm 2 --vm-bytes 128M --timeout 120s
This sets 512MB as the RAM constraint and 512MB as the additional swap space.
Docker supports b
, k
, m
, or g
representing bytes, kilobytes, megabytes, or gigabytes.
To set or disable swap you would use: --memory-swap
This commands sets the amount of memory to 512MB and amount of swap to 256MB:
docker run --rm -ti --memory 512m --memory-swap=768m progrium/stress --cpu 2 --io 1 --vm 2 --vm-bytes 128M --timeout 120s
Setting --memory-swap=-1
disables swap entirely
Again, unlike CPU shares, memory is a hard limit! This is good, because the constraint doesn’t suddenly have a noticeable effect on the container when another container is deployed to the system. But it does mean that you need to be careful that the limit closely matches your container’s needs because there is no wiggle room. An out-of-memory container causes the kernel to behave just like it would if the system were out of memory. It will try to find a process to kill in order to free up space. This is a common failure case where containers have their memory limits set too low. The telltale sign of this issue is a container exit code of 137 and kernel out-of-memory (OOM) messages in the dmesg output.
More stuff in the book about running out of memory
Most on IO and ulimits in the book.
Starting a Container#
docker create
- creates the docker container configuration but not the running process.
Let’s say we needed to run a copy of redis, a common key/value store. It is a lightweight, long-lives process and serves as an example of something we might do in a real environment.
Create the container:
docker create -p 6379:6379 redis:5.0
This creates the container and gives you the hash.
You can list all containers on your system (including not started ones) with:
docker ps -a
or docker container list -a
Start the redis container with:
docker start b8990dd4b1b0
There it is:
docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b8990dd4b1b0 redis:5.0 "docker-entrypoint.s…" About a minute ago Up 48 seconds 0.0.0.0:6379->6379/tcp inspiring_ritchie
Autorestarting a Container#
For production applications you want your containers up at all times. Sometimes a sheduler can do this for you.
You can tell docker to handle restarts for you with --restart
argument. It takes 4 values:
no
- Never restart if it exitsalways
- Always restarts when exitson-failure
- Restarts on a nonzero exit code -on-filure:3
restarts 3 times then gives up.-
unless-stopped
- Restart unless intentionally stopped with something likedocker stop
docker run -ti –restart=on-failure:3 redis:5.0
Stopping a container#
To stop and start:
docker stop b8990dd4b1b0
docker start b8990dd4b1b0
As long as a container has not been deleted, you can restart it without needing to recreate it (build it).
Containers are just a tree of processes that interact with the system in essentially the same way as any other process on the server
Important as we can send unix signals to our processes.
We sent SIGTERM
to the container.
If you want to ensure that a container is killed if it hasn’t stopped in a certain time:
docker stop -t 25 b8990dd4b1b
This sends a SIGKILL
after 25 seconds.
Killing a Container#
docker kill b8990dd4b1b
Use
man signal
for more info on signals
More on signals in the book, like HUP
and USR1
Pausing a Container#
If you want to pause and unpause a container:
docker pause b8990dd4b1b0
docker unpause b8990dd4b1b0
Reasons for pausing:
- leave resources allocated
- leave entries in the process table
- Taking a snapshot
- Freeing CPU time
Pausing uses cgroups
freezer, pausing does not send any information to the containers about a state change.
Cleaning up containers and images#
We have accumulated a lot of image layers and container folders on our system.
We must stop all containers that are using an image before removing the image itself
There are times, especially during development cycles, when it makes sense to completely purge all the images or containers from your system
This can be done with:
docker system prune
To remove only unused images use:
docker system prune -a
To remove all containers on your docker host:
docker rm $(docker ps -a -q)
Delete all images on your docker host:
docker rmi $(docker images -q)
Remove all untagged images:
docker rmi $(docker images -q -f "dangling=true")
Windows Containers#
Check the book
6. Exploring Docker#
Docker version#
$ docker version
Client: Docker Engine - Community
Version: 19.03.2
API version: 1.40
Go version: go1.12.8
Git commit: 6a30dfc
Built: Thu Aug 29 05:26:49 2019
OS/Arch: darwin/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.2
API version: 1.40 (minimum version 1.12)
Go version: go1.12.8
Git commit: 6a30dfc
Built: Thu Aug 29 05:32:21 2019
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: v1.2.6
GitCommit: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc:
Version: 1.0.0-rc8
GitCommit: 425e105d5a03fabd737a126ad93d62a9eeede87f
docker-init:
Version: 0.18.0
GitCommit: fec3683
Server Info#
Use docker info
:
$ docker info
Client:
Debug Mode: false
Server:
Containers: 6
Running: 3
Paused: 0
Stopped: 3
Images: 149
Server Version: 19.03.2
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 4.9.184-linuxkit
Operating System: Docker Desktop
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 1.952GiB
Name: docker-desktop
ID: WU2A:7GJU:3UUY:HP3R:P4WD:DSCE:YBRN:TZMM:UAY2:CMA5:3HJH:FTEM
Docker Root Dir: /var/lib/docker
Debug Mode: true
File Descriptors: 54
Goroutines: 67
System Time: 2019-09-30T10:26:31.5423907Z
EventsListeners: 2
HTTP Proxy: gateway.docker.internal:3128
HTTPS Proxy: gateway.docker.internal:3129
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
In most installations /var/lib/docker
will be the default root directory.
Downloading Image Updates#
docker pull ubuntu:latest
This pulls the latest version of an image
Docker won’t automatically keep the local image up to date for you. You’ll be responsible for doing that yourself
It’s always safest to deploy production code using a fixed tag rather than the latest tag. This helps guarantee that you got the version you expected.
Inspecting a Container#
Example: start a container
docker run -d -t ubuntu /bin/bash
Inspect the container with:
docker inspect 422ed51e84fc
Exploring the Shell#
Run an ubuntu 16.04 container with a bash shell as the top level process
docker run -i -t ubuntu:16.04 /bin/bash
Check the processes we are running:
ps -ef
root 1 0 0 10:53 pts/0 00:00:00 /bin/bash
root 11 1 0 10:55 pts/0 00:00:00 ps -ef
The power of docker above. When we told docker to run bash - that is the only thing running.
Docker containers don’t, by default, start anything in the background like a full virtual machine would.
Anything you do in a container will be lost in the future, you need to add it to the dockerfile for it to persist.
Returning a Result#
Would you spin up a whole virtual machine in order to run a single process and get the result? You usually wouldn’t do that because it would be very time-consuming and require booting a whole operating system to simply execute one command
containers are very lightweight and don’t have to boot up like an operating system
Running something like a quick background job and waiting for the exit code is a normal use case for a Docker container
Kind of like: Function as a Service
If you run the remote command in foreground mode, docker
will redirect its stdin to the remote process, and the remote process’s stdout
and stderr
to your terminal
Docker starts up the container, executes the command that we requested inside the container’s namespaces and cgroups, and then exits, so that no process is left running between invocations
As an example, using the ubuntu image’s id. /bin/false
will always give an exit code of 0
. /bin/true
will always give n exit code of 0
.
docker run --rm 657d80a6401d /bin/false
echo $?
1
and:
docker run --rm 657d80a6401d /bin/true
echo $?
0
You can also get the output:
docker run --rm 657d80a6401d /bin/cat /etc/passwd
Getting inside a running container#
There is the docker way with docker exec
or the linux way with nsenter
.
Docker Exec#
The easiest and best way.
Docker exec with an interactive and pseudo-TTY:
docker exec -it 657d80a6401d /bin/bash
You can check what else is running in the container with:
ps -ef
root@422ed51e84fc:/# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 10:49 pts/0 00:00:00 /bin/bash
root 22 0 0 12:02 pts/1 00:00:00 /bin/bash
root 32 22 0 12:02 pts/1 00:00:00 ps -ef
whereas if we do the same thing on an lxd container:
lxc exec third -- /bin/bash
there will be a lot more processes:
root@third:~# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 11:59 ? 00:00:00 /sbin/init
root 53 1 0 11:59 ? 00:00:00 /lib/systemd/systemd-journald
root 65 1 0 11:59 ? 00:00:00 /lib/systemd/systemd-udevd
systemd+ 155 1 0 11:59 ? 00:00:00 /lib/systemd/systemd-networkd
systemd+ 157 1 0 11:59 ? 00:00:00 /lib/systemd/systemd-resolved
root 191 1 0 11:59 ? 00:00:00 /usr/lib/accountsservice/accounts-daemon
syslog 192 1 0 11:59 ? 00:00:00 /usr/sbin/rsyslogd -n
daemon 194 1 0 11:59 ? 00:00:00 /usr/sbin/atd -f
root 195 1 0 11:59 ? 00:00:00 /lib/systemd/systemd-logind
root 196 1 0 11:59 ? 00:00:00 /usr/sbin/cron -f
message+ 199 1 0 11:59 ? 00:00:00 /usr/bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation --syslog-
root 204 1 0 11:59 ? 00:00:00 /usr/bin/python3 /usr/bin/networkd-dispatcher --run-startup-triggers
root 213 1 0 11:59 console 00:00:00 /sbin/agetty -o -p -- \u --noclear --keep-baud console 115200,38400,9600 vt220
root 216 1 0 11:59 ? 00:00:00 /usr/sbin/sshd -D
root 217 1 0 11:59 ? 00:00:00 /usr/lib/policykit-1/polkitd --no-debug
root 221 1 0 11:59 ? 00:00:00 /usr/bin/python3 /usr/share/unattended-upgrades/unattended-upgrade-shutdown --wait-for-signal
root 304 0 0 12:01 ? 00:00:00 /bin/bash
root 314 304 0 12:01 ? 00:00:00 ps -ef
systemd
is installed and running on the lxc
container. On docker, it is not present.
That shows that lxd containers are meant to be full vm’s, whereas docker containers are processed based.
You can’t just start an empty container - well you can but it will exit successfully every time.
You can also run additional processes in the background via docker exec -d
- but think long and hard before doing so. You will lose repeatability and is useful only for debugging.
If you’re tempted to do this, you would probably reap bigger gains from rebuilding your container image to launch both processes in a repeatable way
Nsenter#
Part of util-linux
, nsenter
allows you to enter any linux namespace.
They are the core of what makes a container a container.
You can use docker to install nsenter
on the docker server:
docker run --rm -v /usr/local/bin:/target jpetazzo/nsenter
- this will only work on a linux docker host
You should be very careful about doing this! It’s always a good idea to check out what you are running, and particularly what you are exposing part of your filesystem to, before you run a third-party container on your system
With
-v
, we’re telling Docker to expose the host’s/usr/local/bin
directory into the running container as/target
it is then copying an executable into that directory on our host’s filesystem
More stuff on nsenter
in the book
Docker Volume#
List the volumes stored in your root directory
docker volume ls
These volumes are not bind-mounted volumes, but special data containers that are useful for persisting data
Create a volume with:
docker volume create my-data
You can start a container with this volume attached to it:
docker run --rm --mount source=my-data,target=/app ubuntu:latest touch /app/my-persistent-data
Delete a volume with:
docker volume rm my-data
Logging#
Logging is a critical part of any production application
You might expect logs to write to a local logfile, to the kernel buffer dmesg
or to systemd
available from journalctl
. However none of these work because of container restrictions.
Docker makes logging easier because it captures all of the normal text output from applications in the containers it manages.
Anything send to stdout
or stderr
is captured by the docker daemon and sent to a configurable logging backend.
docker logs#
The default logging mechanism.
Get logs with: docker logs <container_id>
Nice because you get logs remotely and on demand.
Options for logging:
docker logs 422ed51e84fc --tail 50
docker logs 422ed51e84fc -f
docker logs 422ed51e84fc --since 2002-10-02
The actual files with the logs in are at: /var/lib/docker/containers/<container_id>/
Downsides to this form of logging:
- log rotation
- access to logs after rotation
- disk space usage for high-volume logging
You’ll want to make sure you specify the --log-opt max-size
and --log-opt max-file
settings if running in production.
Configurable Logging Backends#
json-file
, syslog
, fluentd
, journald
, gelf
, awslogs
, splunk
, etwlogs
, gcplogs
and logentries
The simplest for running docker at scale is syslog
.
You can specify this with --log-driver=syslog
or setting it as the default in daemon.json
daemon.json
file is the configuration for the dockerd
server found in /etc/docker/
Unless you run json-file
or journald
, you will lose the ability to use the docker logs
command
Many companies already have a syslog architecture in place so it makes it a easy migration path. Newer linux distributions use systemd
and journald
You should be cautious about streaming logs from Docker to a remote log server over TCP or TLS, both run on top of connection-oriented TCP sessions.
If it fails to make the connection, it will block trying to start the container. If you are running this as your default logging mechanism, this can strike at any time on any deployment.
We encourage you to use the UDP
option for syslog logging if you intend to use the syslog driver. However, that means logs are not encrypted and not guaranteed delivery.
We err on the side of reliability
You can log directly to a remote syslog-compatible server from a single container by setting the log option syslog-address similar to this:
--log-opt syslog-address=udp://192.168.42.42:123
Most logging pluggins are blocking by default. You can change this setting with: --log-opt mode=non-blocking
Then setting the max size: --log-opt max-buffer-size=4m
The application will no longer block when that buffer fills up. Instead, the oldest loglines in memory will be dropped.
Non-plugin Community Options#
- Log directly from your applicatoin (bypassing docker)
- Process manager relay the logs (
systemd
,upstart
,supervisord
orrunit
) - Run a logging relay in the container that wraps
stdout/sterr
- Relay docker json logs to a remote logging framework
Some third party libraries like
supervisor
write to the filesystem.
Writing logs inside the container is not recommended. It makes them hard to get to, prevents them from being preserved beyond the container lifespan, and can wreak havoc with the Docker filesystem backend.
The main drawback is they hide logs from docker logs
The svlogd
daemon can collect logs from your processes stdout
and ship them to remote hosts over UDP.
It is simple to setup and available ubiquitously.
Another good option is logsprout which runs in a seperate containe, talks to the docker daemon and logs to syslog.
You can turn off logging with --log-driver=none
Monitoring Docker#
Production systems must be observable and measurable
Docker supports health checks and basic reporting via docker stats
and docker events
docker stats
works similar to top
CONTAINER ID NAME CPU % MEM USAGE / LIMIT MEM % NET I/O BLOCK I/O PIDS
422ed51e84fc competent_einstein 0.00% 788KiB / 1.952GiB 0.04% 2.26kB / 0B 1.68MB / 0B 1
b8990dd4b1b0 inspiring_ritchie 0.32% 1.539MiB / 1.952GiB 0.08% 2.72kB / 0B 0B / 0B 4
One common problem with running production containers is that overly aggressive memory limits can cause the kernel OOM (out of memory) killer to stop the container over and over again. The stats command can really help with tracking that down
With regard to I/O statistics, if you run all of your applications in containers, then this summary can make it very clear where your I/O is going from the system. Before containers, this was much harder to figure out!
The number of active processes inside the container is helpful for debugging as well. If you have an application that is spawning children without reaping them, this can expose it pretty quickly.
Get stats over http or unix socket:
curl --unix-socket /var/run/docker.sock http://v1/containers/91c86ec7b33f/stats
Container Health Checks#
- A container might never actually enter a healthy state where it could receive traffic
- Production systems also fail and may become unhealthy
Many production environments have standardized ways to health-check applications. Unfortunately, there’s no clear standard for how to do that across organizations and so it’s unlikely that many companies do it in the same way
Following the shipping container metaphor, Docker containers should really look the same to the outside world no matter what is inside the container
health-check mechanism not only standardizes health checking for containers but also maintains the isolation between what is inside the container and what it looks like on the outside
Example Dockerfile
:
FROM mongo:3.2
COPY docker-healthcheck /usr/local/bin/
HEALTHCHECK CMD ["docker-healthcheck"]
Contents of docker-healthcheck
:
#!/bin/bash
set -eo pipefail
host="$(hostname --ip-address || echo '127.0.0.1')"
if mongo --quiet "$host/test" --eval 'quit(db.runCommand({ ping: 1 }).ok ? 0 : 2)'; then
exit 0
fi
exit 1
The image inherits the entrypoint, default command and port to expose
Build the image:
docker build -t mongo-with-check:3.2 .
Run the container:
docker run -d --name mongo-hc mongo-with-check:3.2
The status will now have the health
tag:
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
ff029b988a53 mongo-with-check:3.2 "docker-entrypoint.s…" 35 seconds ago Up 34 seconds (healthy) 27017/tcp mongo-hc
You can configure details about the healthchecks:
--health-interval
- how often--health-retries
- how many failures are required to mark it unhealthy--no-healthcheck
- disable the health check completely
Docker Events#
The dockerd
daemon internally generates an events stream around the container lifecycle
This events stream is useful in monitoring scenarios or in triggering additional actions, like wanting to be alerted when a job completes
docker events
Can also hit the api with:
curl --unix-socket /var/run/docker.sock http://v1/events
Can also use --since
and --until
flags
Graphs and Visualisations#
- DataDog
- GroundWork
- New Relic
- Prometheus
- Nagios
Install cAdvisor
:
docker run \
--volume=/:/rootfs:ro \
--volume=/var/run:/var/run:rw \
--volume=/sys:/sys:ro \
--volume=/var/lib/docker/:/var/lib/docker:ro \
--publish=8080:8080 \
--detach=true \
--name=cadvisor \
google/cadvisor:latest
You can access it on http://127.0.0.1:8080
, it also has REST API.
Prometheus Monitoring#
A popular solution for monitoring distributed applications: Prometheus
It works on a pull model, it reaches and gets the info.
It is for monitoring the dockerd service
Provide --experimental
and --metrics-addr=
options
In daemon.json
:
{
"experimental": true,
"metrics-addr": "0.0.0.0:9323"
}
Any time you make a service available on the network, you need to consider what security risks you might introduce
Restart docker:
systemctl restart docker
More info on setting up prometheus in the docs
Exploration#
You can explore some more stuff
docker cp
- Copy files in and out of a containerdocker export
- Saving a container’s filesystem to a tarballdocker save
- Saving an image to tarballdocker import
- Loading an image from tarball
Debugging Containers#
It’s also important to have a good understanding of debugging containers before moving on to more complex deployments. Without debugging skills, it will be difficult to see where orchestration systems have gone wrong
It is also critical to understand that your application is not running in a separate system from the other Docker processes. They share a kernel, likely a filesystem, and depending on your container configuration, they may share network interfaces
Despite feeling in many ways like a virtualization layer, processes in containers are just processes on the Docker host itself
Process Output#
docker top <container_id>
We get a list of processes running in the container ordered by pid
.
Some stripped-down versions of Linux run the Busybox shell, which does not have full ps support
The example in the book are intense - a bit over my head.
Process Inspection#
Again way over my head - using things like strace
, lsof
and gdb
Controlling Processes#
Note that unless you kill the top-level process in the container (PID 1 inside the container), killing a process will not terminate the container itself
Killing processes in a container could confuse developers or worse the scheduler - Mesos or Kubernetes or any other system that is health-checking your application.
Some more signal stuff - over my head.
Unless you are using an orchestrator that can handle multiple containers in an abstraction called a pod, it is best to use some process control in production containers: systemd
, upstart
, runit
and s6
Try very hard not to run more than one thing inside your container
What the hell does this mean:
There are some special needs in a container for processes that spawn background children—that is, anything that forks and daemonizes so the parent no longer manages the child process lifecycle
- Jenkins build containers are one common example where people see this go wrong.
- When daemons fork into the background, they become children of PID 1 on Unix systems. Process 1 is special and is usually an init process of some kind.
- PID 1 is responsible for making sure that children are reaped. In your container, by default your main process will be PID 1
One solution is installing an init system. Even easier is providing the --init
flag to docker run
, which will run a very small init system similar to tini.
Tini will run as pid 1.
Whatever you specify in your Dockerfile as the CMD is passed to tini and otherwise works in the same way you would expect. It does, however, replace anything you might have in the
ENTRYPOINT
section of your Dockerfile.
Example:
docker run -i -t alpine:3.6 sh
/ # ps -ef
PID USER TIME COMMAND
1 root 0:00 sh
6 root 0:00 ps -ef
In this case the CMD
we launched is pid 1
- meaning it is responsible for child reaping.
docker run -i -t --init alpine:3.6 sh
/ # ps -ef
PID USER TIME COMMAND
1 root 0:00 /sbin/docker-init -- sh
6 root 0:00 sh
7 root 0:00 ps -ef
In this case pid 1
is /sbin/docker-init
.
It’s small enough that you should consider having at least tini inside your containers in production
Network Inspection#
More complicated than process inspection
Docker containers can be connected to the network in a number of ways
By default it uses the default bridge network that docker creates. This is a virtual network where the host is the gateway to the rest of the world.
View networks:
docker network ls
NETWORK ID NAME DRIVER SCOPE
a4ea6aeb7503 bridge bridge local
b5b5fa6889b1 host host local
08b8b30a20da none null local
- Host network for containers running in
host
network mode - containers share the same network space as the host - None network that disables network access entirely
View containers connnected to networks with:
docker network inspect < network_id or name >
You can also see the host IPv4Address
and the host network address they are bound to host_binding_ipv4
.
Note that if you have containers on different networks, they may not have connectivity to each other, depending on how the networks were configured
In general we recommend leaving your containers on the default bridge network unless you have a good reason not to
More stuff in the book
Image History#
When you’re building and deploying a single container, it’s easy to keep track of where it came from and what images it’s sitting on top of
docker history redis:5.0
An excellent utility when trying to figure out why the size of the final image is much larger than expected
inspecting a Container#
If you are on a linux docker host containers are found in /var/lib/docker/containers
.
- These folders container bind-mounted files and folders -
resolv.conf
andhostname
. - It also contains the docker logs by default.
config.v2.json
- the json confighostconfig.json
- networking config
Filesystem inspection#
A common problem with Dockerized applications is that they continue to write things into the filesystem
Use docker diff
sudo docker diff ff029b988a53
C /tmp
A /tmp/mongodb-27017.sock
A
- addedC
- changed
Usually you can see things like writing to a log /var/log/redis/redis.log
or crontab
changes.
Exploring Docker Compose#
To share your projects and work with complex projects requiring more than 1 container.
If you’re running a whole stack of containers, however, every container needs to be run with the proper setup to ensure that the underlying application is configured correctly and will run as expected
Docker compose can streamline development tasks.
Configuring Docker Compose#
Shell scripts are ok, but they are big and error prone.
Docker Compose is typically configured with a single, declarative YAML file for each project
Here is an example of the docker-compose.yml
:
version: '2'
services:
mongo:
build:
context: ../../mongodb/docker
image: spkane/mongo:3.2
restart: unless-stopped
command: mongod --smallfiles --oplogSize 128 --replSet rs0
volumes:
- "../../mongodb/data/db:/data/db"
networks:
- botnet
mongo-init-replica:
image: spkane/mongo:3.2
command: 'mongo mongo/rocketchat --eval "rs.initiate({ ..."'
depends_on:
- mongo
networks:
- botnet
rocketchat:
image: rocketchat/rocket.chat:0.61.0
restart: unless-stopped
volumes:
- "../../rocketchat/data/uploads:/app/uploads"
environment:
PORT: 3000
ROOT_URL: "http://127.0.0.1:3000"
MONGO_URL: "mongodb://mongo:27017/rocketchat"
MONGO_OPLOG_URL: "mongodb://mongo:27017/local"
MAIL_URL: "smtp://smtp.email"
depends_on:
- mongo
ports:
- 3000:3000
networks:
- botnet
zmachine:
image: spkane/zmachine-api:latest
restart: unless-stopped
volumes:
- "../../zmachine/saves:/root/saves"
- "../../zmachine/zcode:/root/zcode"
depends_on:
- rocketchat
expose:
- "80"
networks:
- botnet
hubot:
image: rocketchat/hubot-rocketchat:latest
restart: unless-stopped
volumes:
- "../../hubot/scripts:/home/hubot/scripts"
environment:
RESPOND_TO_DM: "true"
HUBOT_ALIAS: ". "
LISTEN_ON_ALL_PUBLIC: "true"
ROCKETCHAT_AUTH: "password"
ROCKETCHAT_URL: "rocketchat:3000"
ROCKETCHAT_ROOM: ""
ROCKETCHAT_USER: "hubot"
ROCKETCHAT_PASSWORD: "HughTheBot"
BOT_NAME: "bot"
EXTERNAL_SCRIPTS: "hubot-help,hubot-diagnostics,hubot-zmachine"
HUBOT_ZMACHINE_SERVER: "http://zmachine:80"
HUBOT_ZMACHINE_ROOMS: "zmachine"
HUBOT_ZMACHINE_OT_PREFIX: "ot"
depends_on:
- zmachine
ports:
- 3001:8080
networks:
- botnet
networks:
botnet:
driver: bridge
Use version
to tell docker the version of the configuration language
version: '2'
The rest of the doc is divided into services
and networks
.
networks:
botnet:
driver: bridge
Create a single network, named botnet, using the (default) bridge driver, which will bridge the Docker network with the host’s networking stack
The services
section tells docker compose what applications you want to launch: mongo
, mongo-init-replica
, rocketchat
, zmachine
and hubot
.
Each named service tells docker how to build, configure and launch the service.
build:
context: ../../mongodb/docker
build
informs docker compose that it should build this image and that the files needed for the build are in that directory. You can also do: build: ../../mongo/docker
image: spkane/mongo:3.2
image
specifies either the image tag to apply to your build or download and run
restart: unless-stopped
restart
specifies when you want to restart containers.
command
specifies the command your container should run at startup. In this case it is mongo with a few command line arguments:
mongod --smallfiles --oplogSize 128 --replSet rs0
A lot of services have at least some data that should be persisted during development, despite the ephemeral nature of containers. To accomplish this, it is easiest to mount a local directory into the containers.
Use volumes
to achieve this:
volumes:
- "../../mongodb/data/db:/data/db"
The final subsection specifies which network this container should attach to:
networks:
- botnet
No build
with an image
tells docker compose to pull and launch an existing docker image.
In environment
you can specify environment variables.
environment:
PORT: 3000
ROOT_URL: "http://127.0.0.1:3000"
MONGO_URL: "mongodb://mongo:27017/rocketchat"
MONGO_OPLOG_URL: "mongodb://mongo:27017/local"
MAIL_URL: "smtp://smtp.email"
Notice that MONGO_URL
does not use an ip address or FQDN. This is because all services on on the same docker network, configures so containers can be found by their service name.
It makes rearranging easier and explicit the dependencies.
Making use of a .env
file is useful for secrets and env variables that change between developers.
depends_on:
- mongo
depends_on
defines a container that must be running before this one is started
ports
defines the ports mapped from the container to the host:
ports:
- 3000:3000
expose
tells docker that we want to expose this port to other containers on the docker network, but not to the underlying host.
Important to check the office docker-compose doc.
Launching Services#
Be in the directory with the docker-compose.yml
in and check the config:
docker-compose config
If all is good it will print out the config.
You can build all containers with the build
command. Services that use images
will be skipped.
Then you can start all services in the background with:
docker-compose up -d
Docker Compose prefixes the network and container names with the name of the directory that contains your docker-compose.yaml
Once everything is up, we can look at the logs for all the services with:
docker-compose logs
Docker Compose Commands#
Can do docker-compose top
to see an overview of containers and the processes running that are running in them
You can also do a docker exec -it
with docker compose:
docker-compose exec mongo bash
docker-compose exec <service_name> bash
You can also use docker-compose start
, docker-compose stop
.
When want to tear everything down do docker-compose down
.
The Path to Production Containers#
Getting to Production#
In that old shipping model, randomly-sized boxes, crates, barrels, and all manner of other packaging were all loaded by hand onto ships. They then had to be manually unloaded by someone who could tell which pieces needed to be unloaded first so that the whole pile wouldn’t collapse like a Jenga puzzle.
Shipping containers changed all that: we have a standardized box with well-known dimensions. These containers can be packed and unloaded in a logical order and whole groups of items arrive together when expected
To ship to a single server the docker
cli tooling is enough. To send it to many servers you need more advanced tooling.
- Locally build and test a docker image on your dev box
- Build the official image for testing and deployment from CI
- Push the image to a registry
- Deploy your docker image to your server
As your workflow evolves, you will eventually collapse those steps into a single fluid workflow. Orchestrate the deployment of images and the creation of containers on production servers
A production story must be:
- Repeatable
- It must handle config for you
- It must deliver an executable artifact
Dockers role in Production Environments#
You can pick and choose the pieces to delegate to docker, to the deployment tool or to larger platforms like Mesos or Kubernetes
Lets look at old components vs New:
Production Container Component | Component | Traditional COmponent |
---|---|---|
Application | ||
Orchestrator(K8s, Swarm) | Service Discovery, Scheduling, Monitoring | Static load balancers, deployment scripts, nagios/centOS, on-call staff |
Docker engine | logging | syslog, rsyslog, logfiles |
Docker engine | delivery, packaging | deployment scripts, zip files, tarballs, git clone, sc, rsync |
Docker engine | configuration, networking, resource limits, job control | virtual machines, puppet, chef, init systems |
Your application is on top! Everything else is there to deliver functionality to your application. After all, it’s the application that delivers business value and everything else is there to make that possible, to facilitate doing it at scale and reliably, and to standardize how it works across applications
Job control#
We tell the operating system to run a program and manage the lifecycle with upstart
, systemd
, System V init
and runit
.
We also rely on it to keep the application running. Some jobs also require cron
.
We want to be able to treat all jobs the same way from the outside.
Docker provides a strong set of primitives around job control - docker start/stop/run/kill
- simplist elements.
Resource Limits#
To set limits we can use cgroups
, ulimit
and runtime limits for programming languages.
Virtual machines could set these limits hard for a single business application.
It is important to set the limits of CPU, memory and I/O.
Networking#
Docker provides lots of configuration. YOu should decide on one mechanism to use and standardise across containers - trying to mix them is not an easy path to success.
Configuration#
2 levels:
- Linux environment - we do this with a
Dockerfile
for a repeatable environment. More traditionally ansible, puppet or chef was used to supply the dependencies. - Application configuration - Native mechanism is using environment variables.
Packaging and Delivery#
We have a standardised and consistent way to package or applications and get them to places (docker push and pull). Application components created and configured by the developers handed over to operations to manage.
In more traditional systems we would have built handcrafted deployment tooling, some of which we hopefully standardized across our applications - in a multilanguage environment, this would have been trouble
In your containerized environment, you’ll need to consider how you handle packaging your applications into images and how you store those images
Logging#
This sits on the boundary of docker and platform (k8s, swarm, mesos). Docker can collect logs from your containers and ship them somewhere - usually to the local system. In bigger environments you probably want the platform to handle this.
Monitoring#
Health checking applications in a standardised way - still not fully standardised. In many systems the platform handles monitoring and the scheduler automatically shuts down unhealthy containers.
In older systems, it is generally engineers who are paged, respond to issues, and make decisions about how to handle failed applications. In dynamic systems, this work generally moves into more automated processes that belong in the platform
Scheduling#
How to decide which services run on which servers.
Containers are easy to move around because Docker provides such good mechanisms for doing so
- better resource usage
- better reliability
- self-healing services
- dynamic scaling
In a traditional sense: You often configured a list of servers into the deployment scripts and the same set of servers would receive the new application on each deployment.
One-service-per-server models drove early virtualization in private data centers
Distributed Schedulers
Let you use it as if it were a single computer. You define some policies about how you want the application to run and let the system figure out how to run it and how many instances to run. If something goes wrong, you let the scheduler start it up again on resources that are healthy. This fits in more with the original vision of docker - running the application without worrying how it gets there.
Zero downtime deployment is done in the blue-green style - ie. 2 production environments managed by a router.
the scheduler is the system that plays Tetris for you, placing services on servers for best fit, on the fly - Kelsey Hightower
Apache Mesos, which was originally written at the University of California, Berkeley, and most publicly adopted by Twitter and Airbnb, is the most mature option
It predates docker, actual scheduling on Mesos is handled be: HubSpot’s Singularity
, Mesosphere’s Marathon
, Mesos’s Chronos
or Apache Aurora
.
Kubernetes came out in 2014, inheriting a lot of what they learned from their internal Borg system. It was built on docker.
There are now at least two dozen different commercial distributions of Kubernetes and at least a dozen cloud environments
The Kubernetes ecosystem has fragmented at a really high rate, with lots of vendors trying to stake out their own territory early on
Docker Swarm came out in 2015 and is built as docker native from the ground up.
Orchestration
Spotify Helios
Ansible's Docker Tooling
NewRelics's Centurion
Often the best path to production containers lies in containerizing your applications while running inside the traditional system, and then moving on to a more dynamic, scheduled system
Service Discovery
Mechanism by which the application finds all the other services and resources it needs on the network
- Rare is the application that has no dependency on anything else.
- Stateless, static websites are perhaps one of the only systems that may not need any service discovery.
In traditional systems, load balancers were one of the primary means for service discovery.
Other means for service discovery in older systems are static database configurations or application configuration files
For the vast majority of systems, service discovery is left to the platform
Existing service discovery methods:
- Load-balancers with well-known addresses
- DNS SRV records
- Round robin DNS
- Multicast DNS
- Overlay networks with well-known addresses
- Gossip protocols
- Apple’s bonjour protocol
- Apache zookeeper
- Hashicorp’s Consul
- CoreOS’s etcd
Often the best initial system (though not necessarily longer-term) is a system where you present dynamically configured load balancers that are easily reachable by systems in your older environment
Examples of the above (oh god what the hell is all this):
- Mesos backend for the Traefik load balancer
- Nixy for nginx and Mesos
- Kubernetes’s ingress controllers
- Standalone Sidecar service discovery with Lyft’s Envoy proxy
- Istio and Lyft’s Envoy
We suggest using the lightest-weight tool for the job
Docker and the Devops Pipeline#
One of the key promises of Docker is the ability to test your application and all of its dependencies in exactly the operating environment it would have in production
What it can’t do:
- It can’t guarantee that you have properly tested external dependencies like databases
- it does not provide any magical test framework
It can make sure that your libraries and other code dependencies are all tested together
With Docker, you can build your image, run it on your development box, and then test the exact same image with the same application version and dependencies before shipping it to production servers
Typical workflow:
- build is triggered
- build server kicks off a docker build
- image is created on the local docker
- the image is tagged with a build number or commit hash
- a container is configured to run the test suite against the newly built image
- Test suite is run against the container
- The build is marked as passing or failing
- Passed builds are shipped to the image store (registry)
Eg.
- Push to git repo
- Post commit hook triggers build on commit
- Test server runs docker build on remote
- Generating a new image on the remote docker server
- Run a new container based on the image
- Test the container
We run this command:
docker run -e ENVIRONMENT=testing -e API_KEY=12345 -i -t awesome_app:version1 /opt/awesome_app/test.sh
- We set a few environment variables for the container:
ENVIRONMENT
andAPIKEY
- We ask for a particular tag for the image
- We override the
CMD
to call our test script/opt/awesome_app/test.sh
In some cases you have to override
--entrypoint
Ensure to always use the precise tag and not latest
- to ensure you are testing the right version of the application.
A critical point to make here is that docker run will exit with the exit status of the command that was invoked in the container
If the exit code is 0
, it was successful.
Some greater tips on building images from OpenShift
The final step is taking our passed build and pushing that to the registry.
The registry is the interchange point between builds and deployments.
Critically, our fictional company’s system makes sure they only ship applications whose test suite has passed on the same Linux distribution, with the same libraries and the same exact build settings. That container might then also be tested against any outside dependencies like databases or caches without having to mock them
Outside Dependencies#
Examples: database, Memcache or Redis - you can solve this with docker compose.
You’ll still need to rely on your own language’s testing framework for the tests
Docker at Scale#
One of Docker’s major strengths is its ability to abstract away the underlying hardware and operating system so that your application is not constrained to any particular host or environment
You can scale not only horizontally but also across cloud providers. A container on one cloud looks like a container on another.
The biggest efforts to implement docker in the public cloud is:
- ECS Amazon Elastic Container Service
- Google Kubernetes Engine
- Azure Container Service
- Red Hat Openshift
Much of the tooling will work equally well in either a public cloud or your own data center.
A first step using built-in tooling might be to leverage Docker Engine’s Swarm mode to deploy containers easily across a large pool of Docker hosts
Something even simpler would be Ansible’s docker tooling, New Relic Centurion or Spotify’s Helios, without the complexity of a full blown scheduler.
- Docker Swarm mode - used by Portainer
- Kubernetes
- Openshift
- Apache Mesos
Centurion#
Repeatable deployment of applications to a group of hosts.
- Most scheduler platforms treat the cluster as a single machine - you instead tell Centurion about each individual host
- It’s focus is guaranteeing repeatability and zero down-time deployments
- First step in moving from traditional application
- It assumes a load balancer sits in front of your applications
Helios is similar but Centurion is deemed the easiest of the tools
It uses ruby
- more in the book
Docker Swarm Mode#
After building the docker engine - the runtime for containers - docker engineers turned to orchestrating a fleet of docker hosts.
Originally docker swarm
was standalone. But there is anotehr swarm called swarm mode
- built into the docker engine.
Swarm mode is meant to replace standalone docker warm entirely.
Swarm mode has the major advantage of not requiring you to install something seperately.
Docker swarm is used to present a single interface to the docker client tool, but have the interface backed by a cluster. In some situations it is possible to start with a simple solution like Swarm and graduate to Kubernetes without needing to change many of the tools that users rely on.
Setting up swarm#
You need 3 ubuntu servers running docker-ce
that can talk to each other.
Setup info is in the book.
To see the nodes:
docker node ls
Wow, it looked complicated…more in the book
Amazon ECS and Fargate#
Support for running containers natively has existed since 2014 in Elastic Beanstalk - but this service assigns only a single container - not ideal for short-lived and light weight containers.
ECS (Elastic Container Service), EKS (Elastic Kubernetes Service) and Fargate has containers as first class citizens.
Fargate is just a name for features allowing AWS to automatically manage all the nodes in your cluster
Setup is in the book
Kubernetes#
- Released to the public at DockerCon 2014
- Grown rapidly and is the most widely adopted platform
- Mesos is the most mature product - launching before docker in 2009
- Great mix of functionality and community
Like Linux - Kubernetes is available in a number of distributions, both free and commercial.
It has some nice tooling like minikube
.
Minikube#
A whole distribution of kubernetes for a single instance.
It runs on a virtual machine and has the same tooling as a production system.
It’s a bit like docker compose
- but it has all the production API’s.
You need 2 tools:
minikube
-kubectl
-
Install on MacOS:
brew cask install minikube
Check it is on your path:
which minikube
/usr/local/bin/minikube
You then need kubectl
installed with:
brew install kubernetes-cli
Running kubernetes:
minikube start
We installed a virtual machine that has a properly configured version of Docker on it. It then runs all of the necessary components of Kubernetes inside Docker containers on the host.
To see what is happening you can ssh into minikube
and check the containers running:
minikube ssh
$ docker container list
Check status of minikube:
$ minikube status
host: Running
kubelet: Running
apiserver: Running
kubectl: Correctly Configured: pointing to minikube-vm at 192.168.99.100
Check the ip of the minikube ip:
$ minikube ip
192.168.99.100
Check version:
$ minikube update-check
CurrentVersion: v1.4.0
LatestVersion: v1.4.0
Starting and stopping the cluster:
minkube start
minikube stop
Kubernetes Dashboard#
You access it with:
minikube dashboard
Under the nodes
tab, you will see a single node named minikube
.
Kubernetes exposes everything with the
kubectl
command
Kubernetes Containers and Pods#
There is an abstraction that sits on layer above containers and that is pods
.
A pod is one or more containers sharing the same cgroups
and namespaces
.
The idea is you have applications that need to be deployed together all the time. So the sheduler works with a group of applications not a single container.
All the containers in a pod can talkto eachother on localhost, eliminating the need to discover each other.
The advantage of a pod over a supercontainer is you can resource the individual application seperately and leverage public docker containers to construct your application.
A pod can also share mounted volumes.
Pods are also ephemeral - lasting a very short time.
Containers in a pod share the same ip when facing the outside world - they look like a single entity from the network level.
You generally run one instance of a container per pod.
A critical difference is that pods are not created in a build step. They are runtime abstractions existing only in kubernetes.
So you build your Docker containers and send them to a registry, then define and deploy your pods using Kubernetes
You don’t directly describe a pod either - the tools generate it for you.
the pod is the unit of execution and scheduling in a Kubernetes cluster
Let’s Deploy#
A deployment is a pod definition with some health checking and replication
To get things done on kubernetes we use the kubectl
command.
kubectl run hello-minikube --image=k8s.gcr.io/echoserver:1.4 --port=8080
But that was DEPRECATED
so better to run:
kubectl run --generator=run-pod/v1
kubectl create
Now we can view what is in our cluster with:
kubectl get all
NAME READY STATUS RESTARTS AGE
pod/hello-minikube-75cb6dd856-rgwhs 1/1 Running 0 2m59s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 14h
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/hello-minikube 1/1 1 1 2m59s
NAME DESIRED CURRENT READY AGE
replicaset.apps/hello-minikube-75cb6dd856 1 1 1 2m59s
Kubernetes created a: pod
, service
, deployment
and a replicaset
We now need to tell kubernetes to expose a port:
kubectl expose deployment hello-minikube --type=NodePort
We get a NodePort, which works a lot like EXPOSE would for a container on Docker itself, but this time for the whole deployment
We then ask kubernetes how to get to it:
kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hello-minikube NodePort 10.109.80.87 <none> 8080:31288/TCP 3m14s
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 14h
but we can’t actually get to: 10.109.80.87:8080
, as it is not accessible from your host system because it is in the vm running minikube
.
So we need minikube
to tell us how to access it:
minikube service hello-minikube --url
http://192.168.99.100:31288
The nice thing is this command is command-line friendly:
curl $(minikube service hello-minikube --url)
Delete the service and deployment on minikube
:
kubectl delete service hello-minikube
kubectl delete deployment hello-minikube
then all that is left is:
$ kubectl get all
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 16h
Deploying a Realistic Stack#
Kubernetes, much like Docker Compose, lets us define our stack in one or more YAML files that contain all of the definitions we care about in one place
lazyraster
is the servicecache-data
will be the persistent volume
Kubernetes has a huge vocabulary some terms:
PersistentVolume
- physical resource that we provision inside the cluster, supports many kinds of volumes from local storage on a node to EBS volumes on AWS. Lifecycle is independent from our application.PersistenVolumeClaim
- link between physical resource ofPersistentVolume
and the application that needs to consume it.
We set a policy for a single read/write claim or many.
Kubernetes has a glossary.
Service Definition
apiVersion: v1
kind: Service
metadata:
name: lazyraster
labels:
app: lazyraster
spec:
type: NodePort
ports:
- port: 8000
targetPort: 8000
protocol: TCP
selector:
app: lazyraster
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: cache-data-claim
labels:
app: lazyraster
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Mi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: lazyraster
labels:
app: lazyraster
spec:
selector:
matchLabels:
app: lazyraster
strategy:
type: RollingUpdate
template:
metadata:
labels:
app: lazyraster
spec:
containers:
- image: relistan/lazyraster:demo
name: lazyraster
env:
- name: RASTER_RING_TYPE
value: memberlist
- name: RASTER_BASE_DIR
value: /data
ports:
- containerPort: 8000
name: lazyraster
volumeMounts:
- name: cache-data
mountPath: /data
volumes:
- name: cache-data
persistentVolumeClaim:
claimName: cache-data-claim
There is alot of info in the book about this. Create the service with:
$ kubectl create -f lazyraster-service.yaml
service/lazyraster created
persistentvolumeclaim/cache-data-claim created
deployment.apps/lazyraster created
we got a service, a persistent volume claim, and a deployment.
To view our volumes we have to ask seperately:
kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
cache-data-claim Bound pvc-9dd3c7ca-0c97-43b7-971d-83bc358e096e 100Mi RWO standard 4m24s
There is a kubernetes cheat sheet for easier commands
A ReplicaSet is a piece of Kubernetes that is responsible for making sure that our application is running the right number of instances all the time and that they are healthy
View the url:
minikube service --url lazyraster
To scale a service up we do:
kubectl scale --replicas=2 deploy/lazyraster
Check that it worked:
kubectl scale --replicas=2 deploy/lazyraster
deployment.apps/lazyraster scaled
Now we have 2 instances:
$ kubectl get deploy/lazyraster
NAME READY UP-TO-DATE AVAILABLE AGE
lazyraster 2/2 2 2 24m
You can check the logs with:
kubectl logs deploy/lazyraster
To get logs for a specific container:
kubectl get po
NAME READY STATUS RESTARTS AGE
lazyraster-8769c6b7b-gmcww 1/1 Running 0 2m42s
lazyraster-8769c6b7b-qg9zq 1/1 Running 0 26m
then you can get the logs for a specific pod:
kubectl logs po/lazyraster-6b8dd8cb66-rqmfx
Check it out on the dashboard
minikube dashboard
Kubectl API#
There is also a kubectl API
Conclusion Kubernetes at Scale#
There are many options, that overlap and have their own ideas of what production systems should look like.
Underlying all these tools is the docker
bedrock.
Advanced Topics#
Docker has good defaults
You should stick to the defaults on your operating system unless you have a good reason to change them
Containers in Details#
Control groups (groups
), namespaces and SELinux all contain the process.
When running Docker, your computer is the hotel. Without Docker, it’s more like a hostel with open bunk rooms. In our modern hotel, each container that you launch is an individual room with preferably only 1 guest in it.
Control groups are like the floor, walls and ceiling of the room - preventing guests from using other rooms’ resources. SELinux and AppArmour are like hotel security.
A lot on cgroups, namespaces, security, privileged modes, SELinux, AppArmour, the docker daemon, host networking, configuring networks, storage and the structure of docker in the book.
Container Platform Design#
If, instead of simply deploying Docker into your existing environment, you take the time to build a well-designed container platform on top of Docker, you can enjoy the many benefits of a Docker-based workflow while simultaneously protecting yourself from some of the sharper edges that can exist in such high-velocity projects
The 12 Factor App#
12 practices for designing applications that will thrive and grow in a modern container-based Software-as-a-Service (SaaS) environment
1. One codebase tracked in revision control
A good test might be to give a new developer in your company a clean laptop and a paragraph of directions and then see if they can successfully build your application in under an hour
2. Explicitly declare and isolate dependencies
Dependencies should be defined in the repoand pulled by the build process
3. Store configuration in environment variables, not in files checked into the codebase
This makes it simple to deploy the exact same codebase to different environments, like staging and production, without maintaining complicated configuration in code or rebuilding your container for each environment
These configuration items are now an external dependency that we can inject at runtime
In case of secrets, use docker secret and Hashicorp Vault
4. Treat backing services as attached resources
Applications should handle the loss of an attached resource gracefully
When using containers, you achieve high availability most often through horizontal scaling and rolling deployments, instead of relying on the live migration of long-running process, like on traditional virtual machines
5. Strictly separate build and run stages
6. Execute the app as one or more stateless processes
All shared data must be accessed via a stateful backing store so that application instances can easily be redeployed without losing any important session data
You don’t want to keep critical state on disk in your ephemeral container, nor in the memory of one of its processes
When you must maintain state, the best approach is to use a remote datastore like Redis, PostgreSQL, Memcache, or even Amazon S3, depending on your resiliency needs.
7. Export services via port binding
8. Scale out via the process model
Adding and removing instances is easier than increasing resources.
This is where tools like: Docker swarm mode, kubernetes, Mesos and Openshift; beign to shine.
9. Maximise Robustness with fast startup and graceful shutdown
Services should be designed to be ephemeral.
10. Keep development, staging, and production as similar as possible
The same processes, artifacts and people should be used. Repeatability is very important.
11. Treat logs as event streams
Services should not concern themselves with routing or storing logs. Events should be streamed, unbuffered to STDOUT
or STDERR
.
12. Run admin/management tasks as once off processes
One-off administration tasks should be run via the exact same codebase and configuration that the application uses. You should never rely on random cron-like scripts to perform administrative and maintenance functions.
The Reactive Manifesto#
- Responsive - The system responds in a timely manner if at all possible
- Resilient - The system stays responsive in the face of failure.
- Elastic - The system stays responsive under varying workload
- Message-Driven - Reactive systems rely on asynchronous message passing to establish a boundary between components that ensures loose coupling, isolation, and location transparency.
Conclusion#
In traditional software development and deployment: there are many steps. Every step introduces risk.
Problems docker helps solve:
- Large divergence between deployment environments
- Requiring developers to manage configuration and logging
- Outdated release processes requiring multiple hand-offs
- Complex and fragile build processes
- Divergent dependency versions
- Managing multiple linux distributions
- Building a one-off deployment process for each application
Using the registry as a handoff point eases the communication between operations and development. Docker isolates operations teams from the build process and puts developers in charge of their dependencies.
Docker defines a single artifact as the result of a build.
Docker allows software developers to create Docker images that, starting with the very first proof of concept, can be run locally, tested with automated tools, and deployed into integration or production environments without ever having to be rebuilt
The application that is launched in production is exactly the same as what was tested
A single build step replaces a typically error-prone process that involves compiling and packaging multiple complex components for distribution
Organizations can begin to move away from having to create custom physical servers or virtual machines for most applications, and instead deploy fleets of identical Docker hosts that can be used as a large pool of resources to dynamically deploy their applications to
Developers gain more ownership of their complete application stack
Operations teams are freed from spending most of their time supporting the application and can focus on creating a robust and stable platform for the application to run on
Important Points#
You should build your container image exactly as you’ll ship it to production. If you need to make concessions for testing, they should be externally provided switches, either via environment variables or through command-line arguments
The whole idea is to test the exact build that you’ll ship, so this is a critical point.
Containers and Images#
In Dockerland, there are images and there are containers.
Images:
- Inert, immutable, file that’s essentially a snapshot of a container
- Created with the
build
command - Produce a container when started with
run
- Images are stored in a docker registry
View images with docker image list
$ docker image list
REPOSITORY TAG IMAGE ID CREATED SIZE
vm-api_web latest 6073bacca5c4 4 days ago 1.08GB
<none> <none> 94fccc848ad4 4 days ago 919MB
Container:
- Instances of the image
-
Lightweight and portable encapsulations of an environment in which to run applications
$ docker container list CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 22b895f2b5d6 postgres:10.10 “docker-entrypoint.s…” 4 days ago Up 4 days 5432/tcp vm-api_db_1 9832a0d8363e vm-api_web “python manage.py ru…” 4 days ago Up 4 days 0.0.0.0:8000->8000/tcp vm-api_web_1
Cleaning up#
There is a seemingly constant buildup of untagged images and stopped containers....what do you do?
Remove stopped containers first:
docker rm `docker ps --no-trunc -aq`
Remove untagged images:
docker images -q --filter "dangling=true" | xargs docker rmi
Manual Config: SSH and manually install on other servers have to repeat effort. Not portable, minimal overhead
Configuration management tools: Still need to run on each VM
Docker (In between): Offers most isolation benefits of traditional virtual machines without overhead of a guest OS
Traditional VM: Very portable, lots of overhead
Docker has no guest OS, the docker engine leverages the host OS to provide virtual environment. Binaries and libraries can be shared across applications.
Running Containers#
Rule of thumb: 1 docker container for each process in your stack
- Single image for load balancer
- Single image for applications
- Single image for database
Benefits of Docker#
There is a lot of things going on in your development environment. But when sharing between developers and deploying you couldn’t manage the environment.
Emulating the environment is easier, especially in corporates.
Someone has to create the initial environment, there is a learning curve.
Running docker#
You need the docker toolbox
It uses virtualbox
Getting into a docker container#
Getting in and running commands on the container means getting into it’s shell with:
docker exec -it <container_id> /bin/bash
or
docker exec -it <container_id> sh