Out of [name]space issue

Introduction

I’m running Debian sid on my main laptop and if most of the time if works well there is from time to time some issues. Most of them fixes after a few days so most of the time I don’t try to fix them manually if there is no impact on my activity. Since a few weeks, the postinst script of avahi daemon was failing and as it was not fixing by itself during upgrade I’ve decided to have a look at it.

The usual ranting

As Debian sid is using systemd it is super easy to find a decent troll subject. Here it was the usual thing, systemctl was not managing to start correctly the daemon and giving me some commands if I wanted to know more:



So after a little prayer to Linux copy paste god resulting in a call to journalctl I had the message:

-- Unit avahi-daemon.service has begun starting up.
Dec 26 07:35:25 ice-age2 avahi-daemon[3466]: Found user 'avahi' (UID 105) and group 'avahi' (GID 108).
Dec 26 07:35:25 ice-age2 avahi-daemon[3466]: Successfully dropped root privileges.
Dec 26 07:35:25 ice-age2 avahi-daemon[3466]: chroot.c: fork() failed: Resource temporarily unavailable
Dec 26 07:35:25 ice-age2 avahi-daemon[3466]: failed to start chroot() helper daemon.
Dec 26 07:35:25 ice-age2 systemd[1]: avahi-daemon.service: Main process exited, code=exited, status=255/n/a
Dec 26 07:35:25 ice-age2 systemd[1]: Failed to start Avahi mDNS/DNS-SD Stack.
-- Subject: Unit avahi-daemon.service has failed
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit avahi-daemon.service has failed.
-- 
-- The result is failed.

So a daemon was not able to fork on a rather quiet system.

Understanding the issue

A little googling lead me to this not a bug explaining the avahi configuration includes ulimit settings. So I checked my configuration and found out that Debian default configuration file as a hardcoded value of 5.

My next command was to check the process running as avahi:

ps auxw|grep avahi
avahi    19159  1.0  3.0 6939804 504648 ?      Ssl  11:31   3:39 /usr/bin/java -Xms256m -Xmx1g -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Djna.nosys=true -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/elasticsearch-2.1.1.jar:/usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch start -Des.network.bind_host=0.0.0.0

So an Elasticsearch daemon was using the avahi user. This could seems strange if you did not know I’m running some docker containers (see https://github.com/StamusNetworks/Amsterdam).

In fact in the container I have:

$ docker exec exebox_elasticsearch_1 ps auxw
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
elastic+     1  1.0  3.0 6940024 506648 ?      Ssl  10:31   3:46 /usr/bin/java -Xms256m -Xmx1g -Djava.awt.headless=true -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:+DisableExplicitGC -Dfile.encoding=UTF-8 -Djna.nosys=true -Des.path.home=/usr/share/elasticsearch -cp /usr/share/elasticsearch/lib/elasticsearch-2.1.1.jar:/usr/share/elasticsearch/lib/* org.elasticsearch.bootstrap.Elasticsearch start -Des.network.bind_host=0.0.0.0

So the issue comes from the fact we have:

$ docker exec exebox_elasticsearch_1 id elasticsearch
uid=105(elasticsearch) gid=108(elasticsearch) groups=108(elasticsearch)
$ id avahi
uid=105(avahi) gid=108(avahi) groups=108(avahi)

That’s a real problem, the space of user ID in the container and in the host are identical and this can result in some really weird side effect.

Fixing it

At the time of the writing I did not found something to setup the user id mapping that is causing this conflict. An experimental feature using the recent user namespace in Linux will permit to avoid this conflict in a near future but it is not currently mainstream.

A super bad workaround was to stop the docker container before doing the upgrade. It did do the job but I’m not sure I will have something working at reboot.

I really hope this new feature in docker will soon reach mainstream to avoid similar issue to other people.