project: d.r.e.a.m

Setting up UMLs in a reliable and professional way

This document is meant for Sysadmins which are already familiar with Linux, but not with UML (as in User Mode Linux). I've used Debian GNU/Linux to do the things below, but any distribution will do.

First steps

Get your UML up and running

Making the whole thing reliable and easy to maintain

Iam using runit to maintain my services. I've decided to use runit for managing my UMLs too. There were a few difficulties in this setup, but i have found a way that works reasonably well.

You need something like a console for you UML. I've decided to use screen for this purpose.

Example: I have an UML called undef, which i want to start using runit:
/etc/service/uml/undef/run

#!/bin/sh
# uml runfile

exec 2>&1

HOSTNAME=`basename $(pwd)`
export HOME="/home/uml/$HOSTNAME"

touch $HOME/running
chown $HOSTNAME:$HOSTNAME $HOME/running
chpst -L $HOME/running /bin/true || {
        echo "WARNING!: UML $HOSTNAME is already running! sleeping 5s."
        sleep 5
        exit 100
}

cd $HOME/config

export TERM=vt100
exec screen -ln -D -m -T vt100 -S $HOSTNAME \
        chpst -u $HOSTNAME -L $HOME/running $HOME/config/run

This starts screen in a way that it does not detach from its parent. You can attach normally to this screen using screen -d -R (thanks to Joker for hinting me about the '-D ­m' switches). If the UML exits, screen exits. If the UML crashes, it will get restarted.

You cannot restart UML using 'runsvctrl -t /service/uml-undef'. This is essentially the same as pulling the plug on a real machine.

The following script can be used to shutdown an uml:
/etc/scripts/uml/shutdown

#!/bin/sh
while test ! -z "$2" ; do
        if [ "x`echo $1|cut -c1`" = "x-" ] ; then
                case "$1" in
                        -p)
                                SHUTDOWN_PERMANENT=1 ;;
                        -q)
                                QUIET=1 ;;
                        *)
                                echo "$0: $1 is not a valid option"
                                exit 99
                        ;;
                esac
        fi
        shift
done

if [ `id -u` != 0 ] ; then
        echo "Need root privileges to shutdown an UML"
        exit 99
fi

if [ ! -L /service/uml-$1 ] ; then
        echo "UML \`$1' does not exist or is not active"
        exit 99
fi

test -z "$QUIET" && echo "Shutting down UML \`$1'"

/command/runsvctrl o /service/uml-$1/
su - manager -c "ssh -q $1"

test ! -z "$SHUTDOWN_PERMANENT" && touch /service/uml-$1/down

You will need to have an account called manager on the UML, which will shut down the UML upon receiving a valid SSH connection. The following worked for me:

command="sudo shutdown -h now",no-port-forwarding,no-X11-forwarding,no-agent-forwarding,no-pty ssh-dss [ KEY ]manager@yuri.suug.ch

This has to be in ~manager/.ssh/authorized_keys.

Proper startup of the UML also requires the tun devices to be configured in the correct fashion. I've wrote two small scripts to manage network interfaces:
/etc/scripts/uml/netif-up

#!/bin/sh
. /home/uml/$1/config/ifconf

tunctl -u $uid -t $tunif </dev/null
ip link set dev $tunif up
ip addr add 127.0.0.1 dev $tunif
for I in $ip ; do
        ip ro add $I dev $tunif
done

/etc/scripts/uml/netif-down

#!/bin/sh
. /home/uml/$1/config/ifconf

ip link set $tunif down
tunctl -d $tunif </dev/null

Contents of the ifconf file should be self explenatory.

Starting the UMLs at system boot is a trivial task, but shutting down reliably isn't as easy. I wrote the following script:
/etc/init.d/uml

#!/bin/sh
case "$1" in
        start)
                $0 start-net
        ;;

        start-net)
                echo -n "enabling uml network interfaces:"
                for I in /home/uml/* ; do
                        if [ -x $I/config/ifconf ] ; then       
                                . $I/config/ifconf
                                ip addr sh dev $tunif </dev/null 2<&1 || {
                                        echo -n " `basename $I`"
                                        /etc/scripts/uml/netif-up `basename $I`
                                }
                        fi
                done
                echo "."
        ;;
        
        stop)
                echo -n "sending shutdown signal to UMLs:"
                for I in /service/uml-* ; do
                        if [ "`cat $I/supervise/stat`" = "run" ] ; then
                                echo -n " `basename $I |cut -d- -f2`"
                                /etc/scripts/uml/shutdown -q `basename $I |cut - d- -f2`
                        fi
                done
                echo "."
                
                echo -n "waiting ~30s for UMLs to be down: "
                ticks=0

                while [ $ticks != 31 ] ; do
                        umls=0
                        umls_down=0
                        for I in /service/uml-* ; do
                                let umls=$umls+1
                                if [ "`cat $I/supervise/stat`" = "down" ] ; then
                                        let umls_down=$umls_down+1
                                elif [ $ticks = 30 ] ; then
                                        echo "UML `basename $I | cut -d- -f2` didnt shut down" \
                                                | tee /var/log/uml_fatal_errors
                                fi
                        done
                        if [ $umls_down = $umls ] ; then
                                echo " done."
                                break 2
                        else
                                echo -n "."
                        fi
                        sleep 1
                        let ticks=$ticks+1
                done
        ;;
        
        stop-net)
                echo -n "shutting down uml network interfaces:"
                for I in /home/uml/* ; do
                        if [ -x $I/config/ifconf ] ; then       
                                echo -n " `basename $I`"
                                /etc/scripts/uml/netif-down `basename $I`
                        fi
                done
                echo "."
        ;;

        *)
                echo "$0: [start|stop|start-net|stop-net]"
        ;;
esac

Conclusions

It works. While it is a combination of SystemV Initscripts and the runit service managment, it does exactly what i want. It does not require any maintenance or special precautions. If you want to get updated/bugfixed versions of these scripts, have a look at SUUG SVN and my runfiles repo.