En omfattende introduktion til Docker, virtuelle maskiner og containere

Docker har været et buzzword for tech-folk i de sidste mange år, og jo flere gange der går, jo oftere hører du om det. Vi ser det mere i jobkrav, og flere virksomheder begynder at indarbejde det. I dag føles det som om det er noget så grundlæggende og almindeligt i udviklingsverdenen, at hvis du ikke ved om det, er du bag alle andre.

Nej, men seriøst, hvad er denne "Docker"-ting? Hvorfor er alle så begejstrede for det? Hvad er det endda? Kan du definere det? er dette en desktop-app? CLI-værktøj? et websted? service? er denne ting til produktion eller er det et dev-værktøj? Begge? Jeg har hørt, at det har disse ting som "billeder" og "containere", og det er som en virtuel maskine, men ikke rigtig en virtuel maskine. Hvorfor har jeg brug for det, og hvad har alt dette egentlig at gøre med denne blåhval?

I denne artikel vil jeg forsøge at forklare:

  • hvad er "docker" nøjagtigt
  • hvorfor du muligvis har brug for det
  • hvilke problemer prøver den at løse
  • hvordan adskiller det sig fra en virtuel maskine
  • hvornår man skal bruge det over en virtuel maskine og omvendt
  • hvad er billeder og containere generelt
  • og hvordan de implementeres i docker.

Jeg vil gennemgå alle begreberne i en bestemt rækkefølge, så alle andre emner, jeg forklarer, kræver en forståelse af de tidligere begreber. Men mens du læser dette, hvis du ikke får noget, eller noget føles vagt, skal du bare fortsætte med at læse, det vil alt sammen give mening i sidste ende. Mit råd om denne artikel ville være at læse den to gange for at have dine “aha-øjeblikke”.

Okay nok med det, lad os komme i gang!

Hvad er Docker?

Der er mange "docker" -navne, du måske hører overalt på internettet, og for en nybegynder kan det være overvældende. Lad os tage et øjeblik og definere nogle af disse navne for i det mindste at vide, hvilken der er hvilken.

  • Docker, Inc.
  • docker-motor (samfund / virksomhed)
  • docker til Mac
  • docker til windows
  • docker-klient
  • docker vært
  • docker-server
  • docker-hub
  • docker-registreringsdatabasen
  • docker komponere
  • docker sværm
  • docker-maskine
  • docker-dæmon

En hel del dockere her, ikke? Jeg vil give dig en kort definition for hvert af udtrykkene her, så du ved, hvad de er.

Docker (firma)

Docker, Inc blev medstifter i 2010 af Solomon Hykes (CTO) i San Francisco, og på det tidspunkt blev det kaldt dotCloud, Inc . De har kørt en virksomhedstype PaaS (platform som en service), der ligner Heroku. For at implementere dette har de brugt Linux-containere.

I marts 2013 på PyCon afslørede Solomon et nyt produkt fra dotCloud, Inc kaldet "docker". Motivationen, som han beskriver det i sin tale (den første samtale nogensinde, hvor docker blev nævnt) var, at folk har været meget interesserede i Linux-containere, og hvordan de kunne bygge noget med dem, men problemet var, at Linux-containere var meget komplicerede. Hos dotCloud, Inc besluttede de at forenkle brugen af ​​Linux-containere og gøre dem tilgængelige for alle, så softwaren "docker" blev født.

Senere i 2013 meddelte dotCloud, Inc at de ændrede deres navn til Docker, inc og deres primære produkt fra nu af ville være "docker" (software). De udskillede deres PaaS-forretning til et andet firma, og resten er historie.

For os er vi primært interesseret i docker-software, ikke i selve virksomheden, men jeg synes det er godt at kende en lille smule historie bag den.

Docker (software)

Docker fås i 2 udgaver: Docker community edition (CE) og Docker enterprise edition (EE). For udviklingsmiljø og små hold er CE vejen at gå, så i denne artikel dækker vi ikke EE. CE er gratis, og EE er, hvordan Docker, Inc faktisk tjener penge.

Docker-software består af 2 separate programmer, dvs. docker-motor, også kendt som docker-dæmon (fordi det faktisk er en dæmon, der kører i baggrunden) og docker-klient.

Motor / dæmon

Docker-motor er det, der rent faktisk gør det muligt for Linux-containere at arbejde - det er så at sige "dockerens hjerne".

Docker-motoren er ansvarlig for at køre processer i isolerede miljøer. For hver proces genererer den en ny Linux-container, tildeler et nyt filsystem til den, tildeler en netværksgrænseflade, indstiller en IP til den, indstiller NAT til den og kører derefter processer inde i den.

Det styrer også ting som oprettelse, fjernelse af billeder, hentning af billeder fra det valgte registreringsdatabase, oprettelse, genstart, fjernelse af containere og mange andre ting. Docker-motor afslører resten API, som kan bruges til at kontrollere dæmonen.

Klient

Docker-klient leverer CLI til at kontrollere docker-dæmonen. Det er bare en HTTP API-indpakning. Dybest set sender dockerklienten API-anmodninger til dockermotoren, som i sig selv faktisk gør alt det magiske. Docker-klient og dæmon behøver ikke at være på den samme maskine. Du kan få adgang til CLI med dockerkommandoen fra terminalen.

Vært

Docker-vært er en computer, hvor docker-dæmonen kører. Nogle gange kaldes det også dockerserver.

Nav

Docker-hub er en docker-billedregistrering leveret af Docker, Inc selv. Det gør det muligt for brugere at skubbe billeder til deres lager, gøre dem offentlige eller private og trække forskellige billeder, alt sammen ved hjælp af docker-klienten CLI.

Der er billeder til stort set alt lavet af andre mennesker eller virksomheder, hvert sprog, hver database, hver version af det. Det er som GitHub til dockerbilleder. Der er Docker-billedregistre tilgængelige af andre virksomheder, såsom Quay, Google containerregister og Amazon Elastic Container Registry. Alternativt kan du være vært for din egen docker-registreringsdatabase.

Registreringsdatabase

Docker-registreringsdatabasen er en applikation på serversiden, der giver dig mulighed for at være vært for dit eget docker-lager. Det leveres i form af et billede hostet på docker-hub. For at få det til at fungere skal du trække et billede kaldet "registry" fra docker hub og dreje containeren op fra det. En Docker-vært, der kører en "registry" -container, er nu en registry-server.

Til Mac

Docker til Mac er en separat software fra docker, leveret af Docker, Inc, der forenkler udviklingen med docker på Mac OS. Pakken inkluderer docker-klient, den fulde virtuelle maskine, der kører på Mac OSs oprindelige HyperKit hypervisor, docker-dæmon installeret inde i denne maskine, docker-compose og docker-maskine orkestreringsværktøjer. Containerens eksponerede porte videresendes automatisk fra VM til localhost.

Til Windows

Docker til Windows er den samme konfigureret specielt til Windows. Det bruger Hyper-v (Windows 10s native virtualiseringsløsning) til sin virtualiseringssoftware og giver dig også mulighed for at køre Windows-containere sammen med Linux-containere.

Maskine

Docker-maskine er et orkestreringsværktøj, der giver dig mulighed for at administrere flere docker-værter. Det giver dig mulighed for at tilvejebringe flere virtuelle docker-værter lokalt eller i skyen og administrere dem med docker-machinekommandoer. Du kan starte, genstarte og inspicere administrerede værter. Du kan pege dockerklienten til en af ​​værterne og derefter administrere dæmonen på den vært direkte. Der er mange måder, du kan administrere docker-værter på med dette værktøj, bare se på CLI-referencen.

Komponer

Docker compose er også et orkestrationsværktøj til docker. Det giver dig mulighed for nemt at administrere flere containere, der er afhængige af hinanden inden for en docker-vært via docker-composeCLI. Du bruger en YAML-fil til at konfigurere alle containere. Med en kommando kan du starte alle containere i den rigtige rækkefølge og oprette netværk mellem dem. Her er referencen.

Sværm

Docker-sværm er et andet orkestrationsværktøj, der har til formål at styre en klynge af docker-værter. Mens docker-compose-ledere flere Docker-containere inden for en docker-vært, administrerer docker-sværm flere docker-værter, der administrerer flere Docker-containere.

I modsætning til docker-compose og docker-maskine er docker-sværm ikke en selvstændig orkestrationssoftware. Sværmtilstand er indbygget i dockermotor og styres via Docker-klient.

For at skabe en sværm skal du ssh ind i en maskine, du har til hensigt at gøre til en sværm ogdocker swarm init --advertise-addr sh>. This command will make a machine accessib le on publish>. Other docker hosts can now join the swarm on this IP.

Summary

Okay, so what did we learn so far?

Docker is not a standalone software, it’s a platform for managing Linux containers. Whenever someone mentions docker in the context of software, they are talking about docker CE or docker EE.

Docker is developed by Docker, Inc to simplify the usage of Linux containers. The platform consists of multiple tools for running and managing Linux containers, which include:

  • Docker daemon/engine that is responsible for generation and running of Linux containers.
  • Docker client that is a separate application which controls docker daemon through the REST API.
  • Docker-compose, docker-machine, and docker swarm are orchestration tools, they are not necessary for running processes inside Linux containers, but they make container management very simple. To be frank, in real life scenarios they are pretty much a necessity, because managing all those containers, hosts and clusters of hosts manually is…well, let’s say it’s a bad business strategy.
  • Docker hub is a service that provides a registry of docker images. We can store our images on the docker hub and pull images made by others for us to use.
  • Docker registry allows us to host our own private registry in case we don’t want to use an existing one.
  • Docker for Mac and Docker for Windows are separate tools that simplify developing with docker on Mac or Windows.

If you are a beginner, it’s ok if you don’t understand everything mentioned above 100%. Some things might be vague, you might have some questions, and that’s normal. I did mention images and containers multiple times but did not explain what they are.

This section is intended to help you navigate between all those names, remove uncertainty, and understand what is what so you don’t get overwhelmed when hearing all those different “docker ” type of titles.

With this sad, I think based on what we’ve learned so far, you should be able to more or less understand the following picture:

As you can see, docker client and docker daemon are on different machines here, so this might answer some of your questions like…

Why did they split docker into client and engine? Why did not they make it so that CLI would control the engine directly instead of the rest API

Well because it allows having the client and engine on different machines, so multiple different hosts can be managed from one computer.

With all those things clarified, we can dive deeper.

Virtual Machines

Hey, hey, wait a minute — are we talking about docker here or what?

Yes, we are, however at some point in learning docker a natural question will emerge:

What is the difference between VMs and Containers, and why would I use one over another?

Everybody who learns docker goes through this, and I think we might as well go through it now and get it out of the way.

There is a lot out there about how virtual machines work under the hood. We can’t go over all the details in this article, but I will explain just enough so that you understand the difference between VM’s and Containers.

Every computer, ever, be it the gigantic web server running Linux or your overpriced iPhone X, has 4 essential physical components:

  • Processor (CPU),
  • Memory (RAM),
  • Storage (HDD / SSD),
  • The network card (NIC).

The main task of any operating system is to basically manage those 4 resources. The part of the operating system that does this is called the Kernel, also referred to as the Core.

The kernel, simply put, is a part of the OS that controls the hardware. The kernel controls drivers for different IO devices such as a mouse, keyboard, headphones, microphone…etc. The kernel is the first program loaded when the computer is turned on, right after the bootloader, and then it handles the rest of the startup process. An absolute majority of the time that it takes to turn on the computer, is because of the Kernel.

Each operating systems has its own implementation of the kernel, but in fact they do all the same thing: they control the hardware.

So how is it possible to run one OS inside another? Essentially what we need is a program that enables the Guest OS (the operating system that is running inside another operating system) to control the hardware of the Host OS (an operating system that has a guest OS running inside of it).

Hypervisor

The hypervisor, also referred to as Virtual Machine Manager (VMM), is what enables virtualization (running several operating systems on one physical computer). It allows the host computer to share its resources between VMs.

There are 2 types of Hypervisors:

Type 1, also called “Bare Metal Hypervisor”

This software is installed right on top of the underlying machine’s hardware (so, in this case, there is no Host OS, there are only Guest OS’s). You would do this on a machine on which the whole purpose was to run many virtual machines.

Type 1 hypervisors have their own device drivers and interact with hardware directly unlike type 2 hypervisors. That’s what makes them faster, simpler and hence more stable.

Type 2, also called “Hosted Hypervisor”

This is a program that is installed on top of the operating system. You are probably more familiar with it, like VirtualBox or VMware Workstation. This type of hypervisor is something like a “translator” that translates the guest operating system’s system calls into the host operating system’s system calls.

The system calls (syscalls) are a way in which a program requests a service from a Kernel, and the Kernel does — remember what? It manages underlying hardware.

For example, in your program, say you want to copy the content of one file into another. Pretty straightforward right? For this, you need to take some bytes from one part of your Hard Disk and put them into another part. So basically, you are doing stuff with a physical resource, the Hard Disk in this example, and you would need to initiate a system call to do this. Of course in all programming languages, this is abstracted away from you, but you get the point.

Since all OS Kernels, despite being implemented in different ways, do the same job (control hardware), we just need a program that will “translate” a guest OS’s system calls to control the hardware.

An upside of a Type 2 hypervisor is that in this case we don’t have to worry about underlying hardware and it’s drivers. We really just need to delegate the job to the host OS, which will manage this stuff for us. The downside is that it creates a resource overhead, and multiple layers sitting on top of each other make things complicated and lowers the performance.

Containers

Virtual machines are not the only virtualization technique. In the case of a virtual machine, we have a full-blown virtual computer, in its entirety, with its own dedicated Kernel. We allocate RAM for it, we allocate memory for it, and we interact with it as if it was a standalone computer.

There are several problems with this. First and most obvious is inefficient resource management. Once you allocate some resources for a VM, it’s going to hold onto them as long as it’s running.

For example: if you allocate 4 GB of RAM and 40GB of disk memory for a VM, once you run it, those resources will be unavailable as long as this VM is running. It might only need 1 GB of RAM at some moment, and you might be lacking RAM for some other process in another VM or host machine. But since it has this amount of RAM allocated, it’s just going to sit there unused.

Another problem is boot up time. Since the VM has its own Kernel, in case you need to restart your machine, it will need to boot up an entire Kernel. While the machine is rebooting, your service that was running in VM will be unavailable.

Containers to the rescue

To put it simply, a container is a virtual machine without a Kernel. Instead, it is using the Kernel of a host operating system. To make this possible, we need a set of software and libraries that will allow containers to use the underlying OS Kernel, and sort of “link” them if you wish. Such libraries are, for example, “liblxc” and “libcontainer” (this last one is developed by Docker, Inc and is used inside docker engine).

Containers have their own allocated filesystem and IP. Libraries, binaries, services are installed inside a container, however, all the system calls and Kernel functionality comes from the underlying host OS.

Containers are very lightweight. Boot up and restart happens very fast because they don’t need to start up the Kernel every time. They don’t waste physical resources since they don’t need them to be allocated for their Kernel, as they don’t have a separate Kernel.

One drawback is that it’s only possible to run containers of the same type as the underlying OS. You can’t run Linux containers on Windows or Mac, because they need Linux Kennel to operate. The solution for Mac and Windows users would be to install a type 2 hypervisor such as VirtualBox or WMware Workstation, boot up the Linux machine, and then run Linux containers inside of it (in fact that’s what Docker for Mac and Docker for Windows do, but they use native hypervisors that come with the respective OS).

Setting up and running Linux containers is not that straightforward. It’s troublesome and requires a decent knowledge of Linux. Managing them is even more tedious.

As I’ve mentioned above, what Docker, Inc does is it makes Linux containers easy to use and available to everybody, and you do not have to be a Linux geek to use Linux containers nowadays thanks to docker.

Containers VS Virtual Machines

From the previous section about containers, you might think that containers are just better virtualization solutions than VM’s, but that’s not how it is.

A container’s purpose is running processes in an isolated environment, for docker each container for every single process. VM’s are for emulating an entire machine. Nowadays only Linux and windows containers exist, but there are all kinds of hypervisors to emulate any kind of operating system. You can run windows 10 inside an iPad if you wish. Those 2 are different technologies and they don’t compete with each other.

VM’s are more secure, since containers make system calls directly to the Kernel. This opens up a whole verity of vulnerabilities.

Some low-level software that messes with a Kernel directly should be sandboxed inside a virtual machine.

Often you can see docker containers running inside virtual machines in the production environment, so VM’s and containers actually stick together very well.

Docker images and containers

Docker introduces several concepts that simplify…or I would rather say revolutionize the usage of Linux Containers

Linux containers in docker are made from templates called “images”. An image is a basically a binary file that holds the state of a Linux machine (without the Kernel of course). You can draw a parallel to VM’s disk images such as .vdi, .vmdk or .vhd files.

Docker’s approach to images is different from a VM’s. In a VM you would just mount a disk image, run the VM, and you would have a running instance of a machine. Whenever you modify filesystem in VM, install or remove anything, all of this is reflected on an image you’ve mounted. The image is basically the Hard Disk of the machine.

In docker, images are read-only — you don’t run images directly, instead, you make a copy of an image and run it. This running instance of an image is called a container. By doing this you can have several instances of the same Linux container running at the same time, made from the same template, that are images. Whatever happens with a container does not affect the image it was made from. You can make as many instances of a container from an image as your hardware allows you to run.

Merge images via Union Mount

For creating and storing images, docker uses Union Filesystem. It’s a service in Linux, FreeBSD, and NetBSD. Union Filesystems allow us to create one filesystem out of multiple different ones by merging them all together. The

content’s of directories that have the same path will be seen together in a single merged directory. The process of merging is called “union mounting”.

This is roughly how it works:

There are 3 layers that come into play: base layer, overlay, and diff layer.

When merging 2 filesystems, the process looks something like this (keep in mind I’m oversimplifying here):

So we have a base filesystem, and we want to introduce some changes, add files/folders, remove files/folders.

First we will create an overlay filesystem (empty at this point ) and diff filesystem (also empty at this point ). Then we will union mount those filesystems using the union filesystem service built into Linux. When looking into the overlay filesystem it will give us the view of the base filesystem. We can add stuff to it, remove stuff from it, as the actual base filesystem will be unaffected. Instead all changes made to the overlay filesystem will be stored in the diff filesystem. The diff filesystem shows the difference between the base and overlay filesystems.

After we’re done editing the overlay filesystem, we will unmount it. In the end, we have the merged filesystem of overlay and base layers, and the actual base filesystem is unaffected.

This is exactly how docker images are “stacked” on top of each other. Docker uses this exact technology to merge image filesystems.

In order to create your image on top of the already existing image, you need to touch Dockerfile. This is a text file with a set of instructions on how to build an image. Take a look at this simple example.

Inside the terminal run:

docker build it>.

This command will build an image based on the instructions given in Dockerfile.

First line: FROM nodesource/trusty5.1

This line indicates that the base layer of this image is another image called nodesource/trusty5.1. By default docker will first try to look for this image locally. If it’s not there it will pull this image from docker hub, or from another docker image registry on this matter. So you just need to configure docker client to look for images in another image registry.

Second line: WORKDIR /app

This line tells docker that all the subsequent commands executed via RUN in Dockerfile will be executed from /app.

Third line: ADD . /app

This line tells docker which filesystems to merge on build. In this example, we see that the overlay layer is the current directory, relative to Dockerfile, and the base layer is /app inside nodesource/trusty5.1 (an image).

The base filesystem’s sub filesystem /app will be merged with an overlay filesystem. If /app filesystem does not exist in the base layer, it will be created as an empty folder.

RUN command will execute a command inside an image while building it via default shell /bin/sh

RUN nd> ; === /bin/sh <;command>

EXPOSE command will serve as documentation for a user to see which port the application is using. It’s not necessary.

CMD will run a command in a container that will be built from this image on startup.

In this example, nodesource/thrusty5.1 is an Ubuntu image with nodeJs 5.1 installed inside of it. Inside ./app directory relative to Dockerfile we have a nodeJs application. When merging them we’ll get an image of Ubuntu with nodeJs 5.1 installed in it and my application inside of it in the /app directory.

We can then spin up as many containers as we want from this template. Every container will execute npm start inside /app directory of a container on startup.

docker containers

Docker containers, as you already know, are running copies of an image. One additional thing that docker does when creating a container from an image is that it adds a read-write filesystem over the image’s filesystem because the image’s filesystem is read-only.

Docker containers are a bit different than usual Linux containers. Docker containers are made specifically to run a single process in an isolated environment of a Linux container. That’s why we have CMD in Dockerfile, which indicates which process is this going to be. The Docker container will be automatically terminated once there is no process running inside of it.

Docker containers are not supposed to maintain any state, so you can’t ssh into your docker container (well technically you can, but don’t). You should not have it running several processes at once, like, for example, the database and the app that use it. In this case, you would use 2 separate containers and make them communicate with each other. Docker containers are a specific use case of Linux containers to build loosely coupled stateless applications as services.

Inter-Container communication

As I’ve mentioned above, every container should only be running one process. So perhaps now a natural question will emerge: if for example, my app is running in one container and the database is running in another, how do I connect from my app to a database that is running in another container? You can’t connect to localhost in this case.

Docker introduced networking for standalone containers. A very high-level overview of network usage looks like this: you create a new network, which creates a subnet for this network alone. You start a container and attach it to this network, and all containers attached to the same network will be able to ping each other, as if they were on a LAN. Then you can connect from one service running in one container to a service running in another one, as long as they are on the same network.

Okay now, how does it look like?

Run docker network create me>:

You can list all available networks by running docker network list:

Run docker network inspect me> to see the network subnet and which containers are currently attached to it:

As you can see, it shows the network’s subnet, default getaway, and we also see there are no containers attached to it.

Now I’m going to create 2 containers, from 2 different images, nodejsapi, mongo and run them. --net options indicate which network to use:

docker run me> creates a container from an image and starts it. Now I’ll inspect the network again:

As you now can see 2 containers are running attached to this network. We can also see the IP’s they are using and that they are running on the same subnet. I should be able to ping one container from another now.

Let’s get an IP of one of the running containers:

Here I’ve executed the ifconfig command inside a container with id 8d3aaca5750f and redirected output to my terminal.

The IP happens to be 172.19.0.2.

Fo from this container, I should be able to ping another one with an IP of 172.19.0.3:

This was just a simple example of docker networks. There is much more to it, so check out the official documentation.

Volumes

As I’ve said before, Docker containers are not supposed to maintain any state. But what if we need state? In fact, some processes are inherently stateful, like a database. For example, a database needs to maintain all the files with data, as that’s a purpose of the database. If we store this data inside a container, when it’s is gone, so is the data. Additionally, we can’t share this data between multiple instances of the container.

To solve this problem, docker introduced volumes. Volumes allow us to store data on the host machine, or on any other machine for that matter, even on the cloud and link the container (or several containers) to this storage.

For example, previously you could see how I created a container from a MongoDB image and ran it using this command:

docker run -d — net=myTestNetwork mongo

When running a container like this, Mongo DB will run inside this Linux container, and save database files under the /data/db directory inside the container.

Now consider this:

docker run -d -v /folder-on-host-machine/data/db:/data/db — net=myTestNetwork mongo.

The -v flag mounts a volume to a container, so now data between host folder’s /folder-on-host-machine/data/db and the container’s /data/db will be synchronized. Now we can potentially run several instances of a MongoDB container and link them all to this volume on a host machine. If one of the instances shuts down, another one is still available and data is not lost because data is stored on a host machine, not inside a container. The container itself is stateless, as it should be.

There is much more to learn about volumes, like details and use cases, but we won’t cover them in this article. Here I just explained what are they and why we need them.

Final words

So this is Docker, in a nutshell! It’s an amazing technology that revolutionizes how we develop, deploy and scale our applications. Here we have just scratched the surface, more is on you to discover.

Any constructive feedback is appreciated.

If you made it this far, please give me some “claps” :)