Containers: an option to ease an effective software development way

In 2003 or 2004 I started to know about virtualization technologies, thanks to a colleague of mine who talked me about that and the VMware products, which at that moment seemed to me very good ones and let me practice and learn some topics. Some time after that I knew the VirtualBox software, which had been very useful for my workflow, especially as a software developer.

Then, between 2013 and 2014, I knew Vagrant, a very good software to ease the creation, configuration and disposing of development environments; such environments are declared in a text file, which can be managed with a version control software, and later it's easy to share those environments with other software developers or deploy  them in other computers, saving time and efforts to build automated and repeatable environments, especially to test software. All of these is of great help to raise the quality of the developed software (or at least this is the intent).

An existing concept in the Vagrant way of working is providers, which are the workhorses that really make those environments available, being VirtualBox my preferred provider, or at least this has been the case until I started to experiment with containers technologies.

Before going further with the topic of containers, I'd like to highlight the main point I don't like about virtual machines as environment providers of Vagrant: a lot of precious time is wasted, as a consequence of losing concentration, waiting while Vagrant deploys one or more of such environments. When a person is developing software it's very important to keep the mental track of what is being created, the problem being solved or the error being debugged, so a wait of 3 or more minutes can break that thin mental track and prolong the total time required to execute some activity, especially if those waitings are repeated several times during a workday. What's the reason for that wasting? The answer is simple: a very high price is payed to keep the virtualization running, because several layers are recreated: one for hardware abstraction, other for a full operating system and above both of them, the layer truly needed for the software being developed (libraries, data, runtimes for programming languages, etc). And this price is payed mainly in development time and performance: the more virtual machines being executed at the same time, obviously the greatest the performance degradation of the host system, some times to impractical levels, even with a relatively modern and powerful hardware. And what about completely removing that virtualization to provide development environments? Here is where containers come in...

For about ten years I've known superficially the concept of containers, that I intuitively summarize as "an operating system bounded within another operating system, limited to use certain resources under certain privileges defined in the host operating system containing it", however in 2014 I stumbled upon with Docker, a technology based on containers, sorrounded by an ecosystem and a platform that have become popular, capitalizing so other similar and prior technologies have not capitalized. But 2014 ended and my only progress was in the concepts behind Docker. Next, in february 2015 I knew about a requirement to be solved with Docker containers, so I said to myself, "this is a nice opportunity to get down to work". And it has been!

These are my main impressions of working for the first time with version 1.5 of Docker:

  • Docker requires an Intel or AMD 64 bit processor and a 64 bit operating system. I had a 64 bit computer with Ubuntu 14.10 for 32 bit, so I had to reinstall it with its 64 bit version.
  • Container technologies in general, and Docker in particular, only work with same family operating systems. For example, an operating system with Linux kernel (wichever the distribution) containing other operating systems with the same Linux kernel (again, whichever the distribution). This doesn't affect me because I use GNU/Linux distributions (Ubuntu or Debian mainly) in my own computers to develop software to run on platforms of the same type.
  • Docker requires a recent version of the Linux kernel (for my tests I'm using 3.16 version)
  • The Docker version present in the official repositories of both Ubuntu and Debian is not the newest one and considering that Docker is evolving at a fast pace, is better to use the alternative repository detailed in the official documentation to install it.
  • The three main components of Docker are: the docker daemon, the docker client (command line interface) and the registry used for publishing the images that will be used later to create the containers in the local computer. Both the server and the client are in the same executable file (for Ubuntu it's /usr/bin/docker) and it behaves in one way or another according to the received arguments; the server can listen on a local UNIX socket or a TCP port, offering two interfaces: one for command line execution and other for a RESTful API (HTTP or HTTPS). It's possible to create an account with Docker Hub to upload public images that will be available on the Internet, though it's possible to deploy a private registry (to be used inside a company, for example) but the web interface used with the official public registry will not available.
  • It's a must to be connected constantly to the Internet (the faster the connection the better), because frequently images of operating systems need to be downloaded from the public registry. Those images can be official (reviewed by the Docker team or trusted partners) or provided by the community (no formal revision, no warranty). After the images are downloaded it's possible to work offline, because the containers created from those images require only the image's layers to be stored locally (if a container requires the installation of other external software, then being online is obliged).
  • Using containers brings a really very fast performance! Compared with using virtual machines, we are talking about minutes of waiting reduced to seconds (less than 5) or milliseconds of waiting. Note: this comparison is only valid after downloading the image of the operating system from the Docker repository on Internet.
  • The overall performance of the host computer is a lot better, because it's alleviated of assigning resources (especial RAM and CPU) to keep running one or more virtual machines.
  • The storage capacity of the computer is not wasted. If two containers are created based on two different images then both images will be fully stored, but if the two containers are based on the same image, then only will be used the storage needed for a single image. Take note that the images are built from read only file system layers, as if they were logically stacked, with each layer uniquely identified to avoid duplicity; then, when a container is created, Docker pushs another new layer, this time a read-write one that will be used as a kind of workspace.
  • The philosophy of application deployment when using Docker is very different to what I regularly apply. For example, instead of assign a server (bare metal or virtualized) with a full operating system installed, the database server, and also the web server and the application code, with Docker the recommended approach is assign a computer (bare metal or virtualized) as the docker host, which will provide several containers: one container with some operating system running the database server as the only one process, another container with some operating system running the web server as the only one process and another container based on some operating system to provide a data volume with the application code and files, and finally some links between those containers as required.
  • With Docker or with other containers technology it's easier for me to keep the installed software packages and the running services on my machine to a minimum. If I need to develop a new application or test some new programming language, I could create a new container to install on it every required software and then link it to a data volume where the versioned source code is located, and if needed, a second link to another volume with some database. After the stated goal is achieved the versioned source code could be kept and all of the used containers (and images) could be deleted!

But with Docker not everything is rosy; there are some weak points:

  • Inside the operating system running the docker host, root privileges are required, directly or indirectly (with sudo or through group membership) to run all of the commands of docker (when I say all I mean all). In my local environment, within an Intranet, behind a firewall, the risk of suffering an attack may be low, but when deploying an application to a production environment, perhaps exposed to the Internet, things change a lot and more study, evaluation and tests will be needed. Also, the root user is used inside the containers (at least with the few images I've worked with)
  • The network addressing between the host system and the containers is totally dynamic and changes every time a container is started. Assigning a fixed or static IPv4 address is not possible (I read about some unofficial instructions to achieve that but it seemed too complicated, so I didn't apply it)
  • I'm not clear about where the company Docker Inc, the main force behind the Docker platform, is pointing. Docker Inc built its fame and success on LXC, a former, lesser known and pioneer project to work with containers that points to standarize this kind of technologies, but Docker Inc changed the LXC running format as the default one when a container is created, in favor of its own libcontainer. This is not necessarily a bad thing, but I would prefer the adoption and promotion of some standards instead of being tied to the Docker implementations.

Well, after playing a little with Docker, I came back to the matter I'm interested in from the point of view of a software developer: how to provide development and testing environments taking advantage of Vagrant and Docker? This is really simple and the Vagrant documentation to use Docker as a provider shows us the basic points to get started. I did my tests with both technologies working together and I conclude that this is the way that offers me the greatest benefits and productivity, so from now on I will follow it and will try that the people I work with on some of my projects take advantage of it too.

Finally, this is the landscape (very likely it's incomplete) that I glimpse for container technologies, which exist since some time ago and with Docker being a catalyst so the masses know about and use them, and no matter which specific technologies get the better positions, it's very likely that the adoption of containerization in data and computing centers will be a reality:

  • There's an interesting bid for leading the field. I don't see Docker native tools neither for orchestrating containers (in my tests I only got to communicate containers hosted in the same host and I have not yet played with Docker Swarm) nor for managing processes running within them, so CoreOS, a new Linux distribution based on Google technologies, offering a very interesting proposition for a distributed, secure and always updated operating system, seems a clear option to Docker, because it supports both Docker format for containers and its new launched format called Rocket (launched on december 2014). We can also add other interesting projects promoted by Canonical (the company behind the Ubuntu distribution), such as LXD (containers hypervisor) and Snappy Ubuntu Core (containers deployment on the private or public clouds), in addition to other proposals that could come from other parties.
  • The DevOps approach is favored with the use of containers, increasingly weakening the line dividing the programmers from the operators. I like this, because projects will flow better in less time.
  • Even for the Microsoft world there are containerization options, with Drawbridge and Spoon as two proposals I read about.

Añadir nuevo comentario