Cross-Platform Compatibility Testing On One Machine Without Rebooting

By Netocrat

Web Site:
http://members.dodo.com.au/~netocrat/
Copyright info:
http://members.dodo.com.au/~netocrat/copying.html

Summary

This article discusses how multiple operating systems (OSes) hosted under an emulator or virtual machine can be set up on a single host and used to automate cross-platform testing of code compilation or shell scripts.

It provides details of how to do this on a Unix-like OS, using Gentoo Linux on an Intel Pentium as an example host and using QEMU as the emulator.  The example includes running the following Unix-like operating systems as guests under the host: NetBSD, OpenBSD, FreeBSD, Solaris and Gentoo Linux (configured as a guest in case the host OS is changed).

Fully-functional scripts and reasonably detailed configuration steps are included.  Problems and alternative approaches are documented.

Limitations And Caveats

This document assumes a single-user system, however it does describe the issues (largely security-related) involved in a multi-user implementation and how to deal with them.  Two user types are defined so that it will be easier for a later revision of this document to default to multi-user instructions.

Whilst this example does not use guest OSes other than free-as-in-beer Unix-like OSes running on x86, the approach does not preclude their use.  Some modifications to the approach may be required for OSes significantly different from Unix, in particular when a shell-scripting environment is not present.

Most documentation occurred long after installation and configuration so I may have missed some steps.  Some scripts were tidied up or fully rewritten for public release, so they are recent and have not been long tested in production.

I would have liked to have used Apple's OSX as a guest OS - as well as being a popular Unix-family OS it would have been the only non-x86 (PowerPC) test of QEMU's emulation - but I don't have a copy of OSX and I am not willing to pay for one.

Formatting For Quick Reading

A complete and ordered instruction summary can be followed independently of the full article by reading only the text formatted like this.  At times the instructions assume Gentoo Linux as the host OS.  This instruction summary is not, however, standalone, as it refers to scripts/commands/code that must be copied from the full article,
Any outstanding problems or issues to be aware of are formatted like this,
Variables that should be replaced with appropriate values are formatted like this.
Reminders to myself of incompleteness or things to be improved (todos) are formatted like this.

Introduction

The Problem

How often have you wondered whether the code you have just written will compile on NetBSD or whether a particular shell script will work with the Solaris version of sed?

What if you need to know for many different operating systems and want to test on all of those OSes locally so that you have control over their configuration?  i.e. no remote internet-accessed test hosts.

You could partition your PC and install all of those OSes on separate partitions, and then boot into each OS and test separately.  This could even be automated so that as one OS shuts down it modifies the boot loader to boot the next OS in sequence.  The problem is that time is wasted during booting up and shutting down, and this means that each test run has a long duration during which the machine is unavailable for other uses.

The availability problem can be solved by setting up one or more separate networked machines that handle one or more OSes and invoking test processes over the network.  This is fine if you have more than one machine, but unless you have a dedicated machine for each OS you will still have to waste time booting between the OSes.  If you don't want to waste power by leaving the test machine/s turned on all the time then you will also still have to mess around switching it or them on and off manually, or buy or build some control hardware to do the job.

The Solution

Use an open-source emulator or virtualisation technique to run all the different operating systems as processes on a host OS.  Now there is no need for more than a single physical machine, no need to reboot the host OS and no period of unavailability - you can occupy yourself with your usual tasks while the tests run.

The End Result

Let me describe the final setup before giving instructions on how to achieve it.

An arbitrary script can be run on all of the following guest OSes: NetBSD, OpenBSD, FreeBSD, Solaris and Gentoo Linux.

This can be done using a single command:

runall-os-test -s my_script

The runall-os-test script connects to each OS in turn (first booting it under QEMU if necessary) through ssh and runs my_script, piping all the steps it takes as well as any error message from my_script to stderr with a time stamp for logging purposes.

The my_script script can assume that three specific local directories will exist on the guest OS under which it is running.  These directories are network-mapped to directories on the host OS to make it possible to review script output when the guest OSes are offline:

  1. a read-only directory shared by all guest OSes.  my_script must be located under this directory and will be run from this directory as mounted locally by the guest; any read-only files that my_script needs may be located here.
  2. a writable directory accessed exclusively by that guest OS.  The security goal of preventing other guest OSes from connecting and writing to this directory has not been achieved due to problems described later.
  3. a writable directory shared by all guest OSes.

In addition the script can assume that it runs on each guest OS under an identical username with an identical user id (UID).

A dynamic IP addresses on a unique subnet is assigned to each guest OS.  Using some scripting the /etc/hosts files are modified so that a guest OS can always be referred to by its hostname from the host OS and the host OS can always be referred to by its hostname from the guest OS.

cartoon

A Quick Note On Non-UNIX-like OSes

The main problem with non-Unices is that a shell scripting environment can't be guaranteed.  This would require modifications to the approach perhaps so that depending on what was being tested, a different launch or helper program was run on the guest OS - e.g. a version of make to test a compilation.

At the moment this approach maps directories over the network using NFS.  I have, however, tested running Windows 98 and XP as guest OSes under QEMU and they can successfully connect to a SAMBA server on the host OS; dhcp assignment of an IP address to the virtual network card succeeds under these OSes too.  So realistically NFS is not a limitation.  I haven't investigated the feasibility of automatically scripting changes to the hosts file on those OSes.  I suspect that it would be a little harder than for all the UNIX-like OSes that I currently use as guests, under which it was fairly easily achieved.  Possibly it would require a custom executable; alternatively the problem could be avoided by using a name server approach as discussed below.

Rationalising Decisions

QEMU

Given that a virtualisation technique would offer better performance, why use an emulator?  Firstly, because I wanted to use only open source or freely available software, and the only virtualisation technique that I know of that fits that bill is Xen.  However not all of the OSes that I want to run have had the required modifications made to their kernels so that they can run as guests under Xen, so Xen is not yet adequate for the task.  It may soon be ready, especially since hardware support has been promised by at least Intel and AMD.  This will allow any unmodified OS kernel to run under Xen, although a modified kernel will run even faster.

Also under Linux on x86 processors the kqemu module speeds up QEMU when emulating an x86 machine so that performance is reasonable and the additional performance offered by virtualisation is not quite so compelling - although it's still a good reason to switch to Xen when it is capable of running any x86 OS.

The other advantage of QEMU over virtualisors like Xen or the commercial VMWare is that it can emulate CPUs and hardware other than that of the host.  In particular this allows operating systems designed for PowerPCs (i.e. OSX) or Sparcs to be run.  I haven't yet attempted to run a guest OS other than for x86 though.

An alternative to QEMU is Bochs but my reading suggests that Bochs is very much slower than QEMU even when running QEMU without the kqemu module.  Boschs also does not emulate hardware other than x86.

QEMU may not be such an appropriate choice when the host machine is not x86-compatible and the guest OSes are to be native to the hardware of the host.  In that case other options should at least be considered, and another option will be necessary if the host's hardware is not emulated by QEMU.

cartoon

Dynamic IPs As Opposed To Static IPs

Why not use static IPs?  Because QEMU doesn't provide a means to set the IP address that it assigns to each tun interface as it boots the virtual machine.  This means that to maintain a static IP address, the guest OSes would have to be booted in a specific sequence, and for example the last OS in the sequence could not be booted without booting the rest of them.  It is far preferable to have the flexibility to boot them in any order.

Maintaining Hostname To IP Address Mappings By Scripting /etc/hosts Changes

Some sort of name service like bind could have been used to maintain IP address to hostname mappings rather than relying on scripted modifications to /etc/hosts files.  However I have no other need for bind and I've chosen to avoid it - partly because of the extra security risk of running unnecessary network software and partly because I prefer the minimalist approach.

The name service and /etc/hosts approaches each require scripting on the host; the drawback to the chosen /etc/hosts approach is that scripting is also required on each guest OS.

The NFS-mapped Directories

The choice of the three directories is somewhat arbitrary.  A simpler approach might use a single writable directory.  The reasoning behind the read-only directory is to prevent buggy scripts from deleting themselves or other scripts so that other guest OSes can no longer access them.  The shared write-only directory is included for situations where it is desirable to make it easier to look at output from separate guest OSes on the host.

It would be useful to map one or more of the specific directories chosen to the home directory/ies of the test-script user(s) on the guest OS, however I have not done that for this version of the document.

Detailed Description Of Setting Up The Cross-Platform Testing Environment

Assumed Knowledge

A reasonable level of proficiency in UNIX would be helpful, but the instructions are hopefully detailed enough for anyone to follow - provided they have access to documentation.

Requirements

  • A computer running a host OS that can run QEMU and that supports tun network interfaces.  Linux on an Intel/AMD CPU is preferable as it allows you to use the kernel module kqemu to dramatically speed performance of the emulation of x86 processors.  If using Linux, compile NFS server, tun networking and shm support (CONFIG_SHMEM=y) into the kernel.
  • Root access on the host computer.
  • A dhcp server for the host OS.
  • An NFS server for the host OS must be installed.
  • A reasonable amount of free disk space, depending on how many OSes you want to install.  For the OSes I installed I used about 6 Gb of space but Solaris is the majority consumer of that (3.3 Gb), and only because I am not experienced enough with it to do anything other than a full install.
  • A reasonable amount of RAM and swap space - everything was running fine on 256 Mb before I upgraded to 768 Mb RAM.
  • An internet connection or other source for obtaining QEMU, as well as the installation files/media for the various guest operating systems, all of which are accessible through my x86 OS page.

Specifications Of The Reference Host

The reference host on which this approach was developed currently has this configuration:

  • Operating System: Gentoo Linux with kernel version 2.6.11-gentoo-r6
  • CPU: Pentium4 1600 MHz
  • Memory: 756 Mb RAM
  • Storage: two hard disks with tens of Gbs capacity

Some modification to these instructions may be required for different hosts.

A Description Of The Initial Goal: A Virtual Network Of Virtual Machines

The initial goal is to set up a virtual network of virtual machines, each running a different operating system,  Each virtual machine may be booted or shutdown independently of any other machine (although rebooting the host on which QEMU is running will have some unfortunate consequences for the rest of the virtual network).  There must only be one running instance of each virtual machine.

Each virtual machine will be dynamically assigned an IP address as it boots up.  This IP address will be on a separate subnet to all other virtual machines.  A separate dhcp server will be invoked for each virtual machine.  The process that configures and runs the dhcp server is triggered by QEMU.  The (re)mapping of hostnames to IP addresses as an address is dynamically assigned will occur automatically on both guest OS and host OS through scripted changes to /etc/hosts.

cartoon

The QEMU-Invoking User and The Test-Script Users

There are two user types that I will refer to.  One is the single qemu-invoking user.  This is the user under which the QEMU processes will run.  This user needs to be able to do things that require root privileges, such as configure network interfaces, start dhcp servers for those interfaces and modify entries in /etc/hosts.  This is achieved through the use of the sudo package.

The other user type is the test-script user.  This is a user with an account on the host OS and matching accounts on each of the guest OSes (same username and UID, but GID isn't required to match).  This is the user under whose account will be run the runall-os-test script.

This document assumes a single-user system where the test-script user is the same as the qemu-invoking user.  The provided scripts and setup instructions make the same assumption, however the changes that would need to be made for a multi-user system are discussed.  For this reason, and because in a future revision of this document I may remove the assumption of a single-user system and provide specific multi-user instructions, the single user has been separated into two.  The main issue is of course security.

I highly discourage using the root account for either of these user types for the same reasons as always. Anyone who doesn't understand what this means would be well advised to find out.

Setting Up QEMU

These steps are all specific to Gentoo Linux.  For all other hosts, do whatever is necessary to install QEMU, preferably with kqemu support if using Linux on x86.

As the root user for this and subsequent commands unless told otherwise, add kqemu to the USE flags in /etc/make.conf.

Currently (July 2005) the earliest version of QEMU with kqemu support - 0.7.0 - is masked under portage.  To gain access to it, add this line to /etc/portage/package.keywords:

app-emulation/qemu ~x86

Then add this line to /etc/portage/package.mask:

>app-emulation/qemu-0.7.0

The above step prevents versions later than 0.7.0 from being installed.  Given that version 0.7.0 is currently masked, it is taking a risk running it in the first place and given also that I have tested qemu-0.7.0 on my machine and know that it works without as-yet obvious problems, there is no need to risk a later version.  Those wiling to take the risk - or if this article is old enough that versions have significantly changed - may choose to omit this step or change the version number.

Now, emerge QEMU:

emerge qemu

This will download, compile and install QEMU version 0.7.0 as well as the kernel module kqemu.

Be aware that kqemu is not open source and has specific licence requirements that you should read before using it.

Add the qemu-invoking user to the kqemu group.  This will give the user read and write permissions to /dev/kqemu which at some point will be automatically created (if not, the k/qemu documentation describes how to create it manually). 

Configuring The Virtual Network From The Host

Tun Permissions

First ensure that the qemu-invoking user has read/write permission on the /dev/net/tun device:

chmod ugo+rw /dev/net/tun

will be sufficient unless restrictions are desired in a multi-user system.

Installing The DHCP Server

The scripts provided assume the use of the udhcpd server from the udhcp package.  For Gentoo Linux, emerge udhcp:

emerge udhcp

It may be necessary to obtain the udhcp package from its website for hosts without a package manager or whose package manager does not include it.

For other dhcp servers the script may need modification - particularly the location and format of the option file created by the /sbin/start-tun-interface script as provided below.

The Invocation Of A Custom Script

QEMU by default invokes /etc/qemu-ifup at startup, passing as a single parameter the name of the tun interface that it is assigning to this guest OS (e.g. tun0, tun1, etc).  From the html documentation and observation, the interface tunX corresponds to host IP 172.<20+X>.0.1 and guest IP 172.<20+X>.0.2. 

This script must achieve several things:

  1. run /sbin/ifconfig to configure the tun interface for the host OS
  2. configure and run a dhcp server for the guest OS, first killing any existing dhcp server on that interface.  The dhcp server must assign the address 172.<20+X>.0.2 to the host which the script ensures by limiting the allowable range to this single address
  3. ensure that the mapping between the hostname of the guest OS that QEMU is starting to run and its IP address is correct - the IP address may have changed since the last time the guest OS ran.  This is a little tricky because the guest OS's hostname is not directly available - the only parameter received by the script is the tun interface.  The work-around is for the process invoking QEMU to store the guest OS's hostname in a file named for the invoking process's process id (PID) under a specific directory.  The script can then search this directory for a file named after one of its ancestors, starting with itself and then moving to its parent and continuing upward until it reaches init (PID 1) or finds a file that isn't too old.  A limitation to this approach is that it will fail if the QEMU process is run in the background with & from within a script because in that case it will be detached and lose its parent.  Also, on a multi-user system a precaution needs to be added to make it safe from misuse: the directory under which the PID files storing the guest OS's hostname are stored must be writable only by the qemu-invoking user.

The Need For Root Privileges

The /etc/qemu-if script will be run with the privileges of the qemu-invoking user.  As I've already said, root should not be used to run QEMU.  The script must perform privileged operations so it is invoked through sudo, which must now be installed if it is not already.  Under Gentoo Linux, emerge app-admin/sudo:

emerge app-admin/sudo

The script /etc/qemu-if will be a stub that invokes the real script - /sbin/start-tun-interface - with root privileges granted by sudo.  So give the qemu-invoking user permission to run /sbin/start-tun-interface as root.  Run visudo and add this line to the sudoers file:

username_of_qemu-invoking_user ALL = (root) NOPASSWD: /sbin/start-tun-interface
cartoon

Limiting Potential Misuse

Since the script will modify the contents of /etc/hosts it must be careful not to allow a user to corrupt this file.  There are two issues here.  One is intentional malicious vandalism and the other is accidental error.  The first issue only applies to a multi-user system, which this document isn't specifically tailored to.  The way to mitigate it is by not allowing a QEMU process to be invoked directly by a test-script user.  Instead the runall-os-test script (described later) would be broken into two parts.  The first part would run under the test-script user account, and the second part would be solely concerned with invoking QEMU processes.  The second part would make sure that options not specified in the guest OS config file could not be passed to qemu.  It would be non-writable by test-script users and it would be executed from the first script by using sudo to run it as the qemu-invoking user.  So a test-script user would be defined as a user given permission in the sudoers file to run the second script as the qemu-invoking user.  Unless there were an error in the script allowing the test-script user to break to a shell and assume the identity of the qemu-invoking user, this would prevent malicious access to the /etc/qemu-if script.  Even if that were to occur, though, there is a second layer of safety:

A third script - /sbin/update-dynamic-hosts - is used to limit any changes to a specific section of the hosts file and prevent the addition of hostnames or IP addresses that exist outside of this section.  This could further be extended to limit changes to those specific hostnames listed in the guest OS config file and to the possible range of IP addresses assigned by qemu.  Also it would be appropriate (if possible) for /sbin/start-tun-interface to check that the tun interface specified is not already in use before continuing.

Creating The Three Scripts

Script 1: /etc/qemu-ifup

Create the /etc/qemu-ifup script by copying-and-pasting the following block of commands into a terminal window (this should be done still as root).  Don't do this as the qemu-invoking user.  It's safer that the qemu-invoking user not have ownership or write permission on the scripts.  A warning: for convenience I've included the cat command to automatically create the scripts provided in this document, however pasting into some terminal programs, especially the larger scripts, is problematic - I've had some terminal programs skip lines and corrupt scripts.  So be wary of this approach and if you suspect corruption, then create each script in an editor and copy and paste from the code provided (ie the lines between the <<'ENDOFSCRIPT' and the ENDOFSCRIPT).

cat >/etc/qemu-ifup <<'ENDOFSCRIPT'
#!/bin/sh

# /etc/qemu-ifup
# Takes one parameter which should be tun<num> eg tun0, tun9, tun12, ...
# This will run under the qemu-invoking user account
# The qemu-invoking must be given sudo permission to run
# /sbin/start-tun-interface
# which must be owned by root and should have permissions 500 (r-x------)

sudo /sbin/start-tun-interface

ENDOFSCRIPT

chmod 755 /etc/qemu-ifup
Script 2: /sbin/start-tun-interface

Create the /sbin/start-tun-interface script by copying-and-pasting the following block of commands into a terminal window.  Edit it so that the variables set in the top block have appropriate values (refer to later instructions if the meaning of any variables is unclear).

cat >/sbin/start-tun-interface <<'ENDOFSCRIPT'
#!/bin/sh

# /sbin/start-tun-interface
# Takes one parameter which should be tun<num> eg tun0,tun9,tun12,...
# Configures the address of the interface and starts a new dhcp server
# for that interface; killing any existing dhcp server.
# Searches for a file giving the hostname of the guest OS as left by a
# parent process; if a file is found, /etc/hosts is updated so the
# guest OS's hostname aliases the newly assigned IP address.
# Note that running qemu as a background process after storing the
# OS's hostname will dissociate the qemu process so that it will
# not be able to find the hostname file - it will have lost its parent.
# This needs to run as root using sudo from the qemu-invoking user
# account.
# It should be owned by root and have permissions 544 (r-xr--r--)

# Configurable variables
TMPHOSTNAMEDIR=/tmp/qemuhostnames
CONFTMPDIR=/tmp
LEASEFILEDIR=/var/lib/misc
# paths to commands
UDHCPD=/sbin/udhcpd # path to udhcpd
IFCONFIG=/sbin/ifconfig # path to ifconfig
GREP=grep # path to grep
PS=ps # path to ps
SED=sed # path to sed
KILL=kill # path to kill
STAT=stat # path to stat
DATE=date # path to date
# misc config
BASE=172 # First part of the interface's IP address
DELAGE=3600 # hostname files older than one hour may be deleted
TOOOLD=600 # a hostname file ten minutes or older is out of date
# (should average around 10 seconds)

# strip the leading "tun" (if any) from
TUN=${1#tun}

# convert $TUN to a definite number (0 if $TUN is not numeric) stored
# in $TUNNUM
let TUNNUM=$TUN+0 2>/dev/null || let TUNNUM=0

# if was not in the form tun<num> where <num> has no superfluous
# preceding zero digits then exit with error 1
if [ "$TUN" == "" ] || [ $TUN != $TUNNUM ] ||
( [ $TUNNUM -eq 0 ] && [ "$TUN" != "0" ] )
then exit 1; fi

# TUN is validated as a number; now add 20 to it since
# interface tun<n> gets assigned address 172.<n+20>.0.1
let TUN=$TUN+20;

# Set up variables
IF= # interface eg tun0
CFG_FILE=$CONFTMPDIR/udhcpd.$TUN.conf # dhcp server config file
LEASE_FILE=$LEASEFILEDIR/udhcpd.leases.$IF # dhcp server lease file
DHCPCMD="$UDHCPD $CFG_FILE" # cmd to start dhcp server on interface

# configure the interface
$IFCONFIG $IF $BASE.$TUN.0.1;

# check for an existing dhcp server for the interface and stop it if
# one exists
# first store into $TMP a line(s) in the format "<PID><space(s)><DHCPCMD>"
TMP=`$PS ax o pid,cmd | $GREP "$DHCPCMD" | $GREP -v grep`
if [ -n "$TMP" ]
then
# separate the TMP variable into the positional parameters
set $TMP
# store the PID (if any) of the running dhcp server (ignoring
# the remote possibility of multiple running dhcp servers on
# the interface or a dhcp server invoked with a different
# command and from other than this script)
DHCPD_PID=""
# kill any existing dhcp server
if [ -n "$DHCPD_PID" ]; then $KILL $DHCPD_PID; fi
fi

# remove any existing config file or lease file
rm -f $CFG_FILE 2>/dev/null
rm -f $LEASE_FILE 2>/dev/null
touch $LEASE_FILE

# get a list of name servers from /etc/resolv.conf
DIG3="[[:digit:]]\{1,3\}"
IP="\($DIG3\.\)\{1,3\}$DIG3"
NS="^[[:space:]]*nameserver[[:space:]]*"
NAME_SERVERS=`$SED -n "s/$NS\($IP\).*//p" /etc/resolv.conf`
# translate newlines into spaces
NAME_SERVERS=`echo $NAME_SERVERS`

# write a specific config file for the dhcp server for this interface
cat >$CFG_FILE <<EOF
start $BASE.$TUN.0.2
end $BASE.$TUN.0.3
interface $IF
opt dns $NAME_SERVERS
option subnet 255.255.0.0
opt router $BASE.$TUN.0.1
option domain local
# short lease expiry time as Solaris 10 doesn't seem to renew its
# address on reboot if its lease hasn't expired
option lease 120
# since the MAC address is the same for all the different virtual
# machines, each ip address must be associated with the MAC address in
# a different file for each OS
lease_file $LEASE_FILE
EOF

# start the dhcp server for this interface
$DHCPCMD

# search for any file in TMPHOSTNAMEDIR containing the hostname of the
# OS being invoked; the file must have been created by an ancestor
# process and be named for that process's PID; closer ancestors are
# favoured over those more distant.
# If a hostname is found, the /etc/hosts file is updated
unset HN

# if we can't get the current time then skip the search
# the %s specifier is GNU-specific and returns the date
# in unix seconds-since-the-Epoch format
# this script must be modified if the date command does
# not support %s
if NOW=$($DATE +%s)
then
COUNT=1
# start at the current process
PID=$$;
# iterate up the process tree looking for a file named for its
# process ID until we reach init (PID of 1)
MAXCOUNT=100 # to prevent any bugs causing an infinite loop
while [ $COUNT -lt $MAXCOUNT ] && ! [ $PID -eq 1 ]
do
let COUNT=$COUNT+1
FILE=$TMPHOSTNAMEDIR/$PID
# file must exist and we must be able to stat it
if [ -f $FILE ] && TSTAMP=$($STAT -c %Y $FILE)
then
let AGE=$NOW-$TSTAMP
if [ $AGE -gt $DELAGE ]
then
# file too old - delete
rm -f $FILE 2>/dev/null
elif [ $AGE -lt $TOOOLD ] &&
read HN < $FILE && [ -n "$HN" ]
then
rm -f $FILE 2>/dev/null
# update /etc/hosts to reflect the
# guest OS hostname's newly assigned
# IP address
/sbin/update-dynamic-hosts $HN \
$BASE.$TUN.0.2
break;
fi
fi
# obtain from ps the parent process id preceded by a
# header line
PID=$($PS -p $PID -o ppid) || break # avoid inf loop
# ${#} removes the first line header returned by ps
# `echo $` removes the newline left by the header
PID=`echo ${PID#*PPID}`
done
fi

if [ -z "$HN" ]
then echo "Guest OS's hostname not found; /etc/hosts not updated" 1>&2
fi
ENDOFSCRIPT

chmod 544 /sbin/start-tun-interface

Different versions of sed may not support the [[:digit:]] or [[:space:]] constructs, in which case they must be replaced with an appropriate substitute: [[:space:]] represents any whitespace and includes tab as well as space; [[:digit:]] is simply a digit from 0 to 9.

Script 3: /sbin/update-dynamic-hosts

Create the /sbin/update-dynamic-hosts script by copying-and-pasting the following block of commands into a terminal window:

cat >/sbin/update-dynamic-hosts <<'ENDOFSCRIPT'
#!/bin/sh

# /sbin/update-host hostname ip-address
#
# Updates an IP address - hostname mapping in the /etc/hosts file
# Must be run as root
# Changes are limited to the section of the file enclosed by
# the delimiters (at line beginning) ##STARTDYNAMIC and ##ENDDYNAMIC
# The rules are: a hostname/IP address may be added/modified if both
# the hostname and the ip address do not exist outside the dynamic
# section.

HN="" # hostname
IP="" # IP address
HOSTS=/etc/hosts # hosts file to use
TMPFILE=/tmp/hosts.$$ # temporary file

# must be 2 arguments and $IP must be a valid-seeming IP address
DIG3="[[:digit:]]\{1,3\}"
IPREGEXP="\($DIG3\.\)\{1,3\}$DIG3"
if [ -z "" ] || ! echo "$IP" |
grep "$IPREGEXP[[:space:]]*$" >/dev/null 2>&1
then exit 1; fi

# delete a matching hostname or IP within the dynamic section (by not
# printing it)
# return error on matching hostname or IP outside dynamic section
# if no matching ##ENDDYNAMIC for a ##STARTDYNAMIC, then add it at end
# of file
# when end of dynamic section reached, insert the new $IP, $HP line
if ! awk "
BEGIN { dyn = 0 }
/^##STARTDYNAMIC/ { if (dyn == 0) dyn = 1;
print $0; next; }
/^##ENDDYNAMIC/ { if (dyn == 1) {
dyn = 2;
print \"$IP\", \"$HN\";}
print $0; next; }
/^[[:space:]]*$IP/ { if (dyn != 1) exit 1;
else next; }
/^[^#]*[[:space:]]$HN([[:space:]]|#|$)/ {
if (dyn != 1) exit 1;
else next; }
{ print $0; }
END { if (dyn == 0) {
dyn = 1;
print \"##STARTDYNAMIC\"; }
if (dyn == 1) {
print \"$IP\", \"$HN\";
print \"##ENDDYNAMIC\"; }}
" $HOSTS > $TMPFILE
then
rm $TMPFILE
exit 1
fi

# save a backup copy of hosts file
mv -f $HOSTS $HOSTS.old
# copy the new hosts file over the original
# clean up on error
if ! mv -f $TMPFILE $HOSTS; then rm $TMPFILE; exit 1; fi

ENDOFSCRIPT

chmod 544 /sbin/update-dynamic-hosts

Installing The Guest OSes

That's the host network setup done, Now create the qemu-invoking user account on the host OS if it does not yet exist. 

Qemu prefers to use /dev/shm, so if shm is supported by the host kernel then add an entry for /dev/shm in /etc/fstab such as this:

none /dev/shm tmpfs size=400m,defaults 0 0

That size is just enough to handle running all the operating systems simultaneously using the memory sizes shown above after they have been installed.  If there is not enough space on the shm mount, Qemu will not run and will print an error message explaining how to add space.  No memory is actually used by the shm device until requested by a process and it is pageable like other process memory in Linux.

Grant the qemu-invoking user access to /dev/shm.  Unless the host is multi-user and it is desired to limit access to the shm device, this command - as root - will suffice:

chmod 777 /dev/shm

Change over to the qemu-invoking user account and install the guest operating systems.  On each guest OS:

  • install the default ssh client and server and ensure that the sshd server starts at each boot
  • setup the default dhcp client and ensure that it binds to the ethernet card at boot
  • install nfs client support
  • create the qemu-invoking user account as well as the test-script user(s) account(s) ensuring that each has the same username and UID as on the host OS.  The command to use is typically something like useradd or adduser.  Make sure that a home directory is created for each user (typically the -m option).

I won't detail the installation process because each OS has its own sufficient installation documentation, although these issues deserve mention:

  • A particular kernel diagnostic displayed repeatedly after booting a fresh OpenBSD installation.  This made the screen unviewable.  From memory it was something to do with acpi.  I don't recall exactly how I removed it but I know that it was through a standard command that permanently changed the settings on the kernel so that it didn't look for that particular subsystem.  The command is documented in the standard manpages.
  • Sun's freely downloadable x86 Solaris 10 ISO images don't have network drivers for the NE2000 in the default install; they must be obtained separately on the Community Boot Driver ITU Diskette for Solaris x86.  Create a floppy disk image and pass that image as a floppy disk parameter to qemu; when the message "Press ESCape to interrupt autoboot" appears at the start of the Solaris installation boot process, press escape, then F4 to add drivers from the diskette; press F2 to continue; select Solaris 10 and press F2 - the drivers will be loaded; press F4 then keep pressing F2 and selecting from any options given until normal boot resumes.
  • Solaris must be installed in text-mode rather than graphical mode.  Graphical mode prevents some confirmation prompts from being displayed and the installer hangs.  An easy way to ensure this is to only grant it limited memory - passing qemu the option -m 96 will do the job.

Some general quick-start tips:

Use qemu-img to create disk images (virtual hard disks in a single file) to install the operating systems onto.  Ensure that the qemu-invoking user owns the disk images and that they are non-writable by other users; create them all in the same directory as runall-os-test assumes this:

chmod 544 diskimage

Use qemu to invoke an instance of the operating system installer using the disk image to install onto and booting from a floppy disk or cdrom image using the -boot, -cdrom and -fda options.  The amount of memory seen by the guest OS can be specified with -m.  Often it is necessary to pass -localtime as an option (see the qemu manpage).

For reference, these are the versions of the guest OSes I have installed as well as the approximate used space of their disk images, the size of the -m option used during install and the size of the -m option used post-install.  Åll versions are for x86 and I recommend considering these disk sizes as minimums (although if you are familiar with Solaris you may not have to do a full install as I did).

Operating System
Version
-m for install
-m post-install
Disk Image Size
FreeBSD
5.4
64
64
1Gb
Gentoo Linux
May 2005; kernel 2.6.11.10
64
64
600 Mb
NetBSD
2.0.2
64
64
400 Mb
Solaris
10
96
128
3.4 Gb
OpenBSD
3.7
64
64
400 Mb


If IP masquerading - also known as NAT - is enabled on the host and the host is internet-connected then the internet will be accessible from those guest OS installers that recognise the emulated network card (NE2000) and use dhcp to configure it (since in the previous section the host's dhcp server was set up) - which is most of them.  So installations over the internet using ftp are possible for those OS installers that support it.

Automating /etc/hosts Changes On The Guest OSes

Since the IP subnet of a guest OS may change between boots, it cannot have a constant mapping for the host OS's hostname in its /etc/hosts file.  This is handled by some scripting triggered by the dhcp client's acceptance of an IP address.  The details vary slightly for each guest OS.  Reminder to self: the grep -v should check that the host_os_hostname occurs at the beginning of the line with an optional arbitrary amount of whitespace preceding it.

The BSD Guests

In the FreeBSD, NetBSD and OpenBSD guest OSes, as root, find the first line in /sbin/dhclient-script that contains route add default $router and add this code immediately after that line:

      grep -v host_os_hostname /etc/hosts > /etc/hosts.new      
echo "$router host_os_hostname" >> /etc/hosts.new
      cp -f /etc/hosts /etc/hosts.old
      mv -f /etc/hosts.new /etc/hosts

This is not an ideal approach because /sbin/dhclient-script is a system file liable to be replaced on upgrade.  A more maintainable approach would be preferable.

The Gentoo Guest

In the Gentoo guest OS as root, create the /var/lib/dhcpc/dhcpd.exe file as follows:

cat >/var/lib/dhcpc/dhcpd.exe << 'ENDOFSCRIPT'
#!/bin/sh

GW=`grep GATEWAY ""`
grep -v host_os_hostname /etc/hosts > /etc/hosts.new
echo "${GW#*=} host_os_hostname" >> /etc/hosts.new
cp -f /etc/hosts /etc/hosts.old
mv -f /etc/hosts.new /etc/hosts

ENDOFSCRIPT

chmod 755 /var/lib/dhcpc/dhcpd.exe

and to /etc/conf.d/net add:

iface_eth0="dhcp"
gateway="eth0"

The Solaris Guest

In the Solaris guest OS as root, create the /etc/dhcp/eventhook file as follows:

cat >/etc/dhcp/eventhook << 'ENDOFSCRIPT'
#!/bin/sh

if [ "" = "BOUND" ]
then
read HOSTNAME < /etc/nodename
SERVERIP=`/sbin/dhcpinfo -i ServerID`
grep -v host_os_hostname /etc/hosts > /etc/hosts.new
echo "$SERVERIP host_os_hostname" >> /etc/hosts.new
cp -f /etc/hosts /etc/hosts.old
mv -f /etc/hosts.new /etc/hosts
fi;

ENDOFSCRIPT

chmod 755 /etc/dhcp/eventhook

Setting Up NFS Services

Creating Directories

Create the following directories on the host OS and on each guest OS; the directories should all be owned by root or the qemu-invoking user; for a single-user system I suggest the qemu-invoking user own them and that they have permission mode 744 - although the guest directory permissions are irrelevant as they will be overridden by the NFS server.  Note that if creating directories as per the scheme suggested below, the "common" subdirectory must be created in each of the RW_HOST_BASE/guest_os_name directories on the host so that it can be used as an NFS mount-point by the guest OS. The three directories to be used as NFS client mount-points on each guest OS are:

Name (replace this in scripts)
Purpose
Location in the reference guest OS
COMMON_RO_GUEST Read-only; shared by all guest OSes
/netshare-ro
COMMON_RW_GUEST Read-write; shared by all guest OSes
/netshare-rw/common
RW_GUEST Read-write; exclusive to each guest OS
/netshare-rw

On the host OS the corresponding directories that are mapped are as below.  The host directory corresponding to RW_GUEST is determined by RW_HOST_BASE/guest_os_name.

Name (replace this in scripts)
Purpose
Location in the reference host OS
COMMON_RO_HOST
Read-only; shared by all guest OSes
/data/qemu/netshare/common-ro
COMMON_RW_HOST
Read-write; shared by all guest OSes
/data/qemu/netshare/common-rw
RW_HOST_BASE
Read-write; base for guest OS-specific dirs
/data/qemu/netshare

Setting Permissions On The Host OS's NFS Server

Set up the host OS's NFS server permissions.  Under Linux this can be done by adding the lines below to the /etc/exports file.  Some of the mount options may be redundant, but don't remove any without checking.

COMMON_RO_HOST       172.0.0.0/255.0.0.0(ro,no_root_squash,nohide,sync,insecure)
COMMON_RW_HOST 172.0.0.0/255.0.0.0(rw,no_root_squash,nohide,sync,insecure)
RW_HOST_BASE/openbsd 172.0.0.0/255.0.0.0(rw,no_root_squash,nohide,sync,insecure)
RW_HOST_BASE/netbsd 172.0.0.0/255.0.0.0(rw,no_root_squash,nohide,sync,insecure)
RW_HOST_BASE/freebsd 172.0.0.0/255.0.0.0(rw,no_root_squash,nohide,sync,insecure)
RW_HOST_BASE/solaris 172.0.0.0/255.0.0.0(rw,no_root_squash,nohide,sync,insecure)
RW_HOST_BASE/gentoo 172.0.0.0/255.0.0.0(rw,no_root_squash,nohide,sync,insecure)

Start or restart the NFS server on the host OS (or otherwise get it to reread its export permissions),

As noted previously, these host NFS server export permissions are not specific enough to prevent guest OSes other than the intended owner from connecting and writing to each supposedly OS-specific directory.  This is because the IP address associated with each guest OS is subject to change and that this change is only reflected in the /etc/hosts file of the host rather than using a name service like bind.  There doesn't appear to be a way to get the NFS server to recognise the changed permissions without restarting it, which is not possible as it would destroy existing connections.  So instead of specific hostname-based permissions, generic permissions on the IP block 172.X.X.X are used.  I don't believe that using bind would solve the problem, but I haven't checked this out.

This is only a security problem on multi-user machines where a non-root user (user1) has root access on "their own"  OS run under QEMU.  This OS would not one of the guest OSes set up as part of the configuration described in this document - it would be user1's "personal" OS.  If permissions were set up such that user1's personal QEMU OS is properly networked through the tun interface to the host OS then user1 could access the "exclusive" NFS directories of the guest OSes through NFS connections from their personal OS; indeed they could override permissions by creating specific users with the same username/UID as those on the host OS.  Clearly this is a situation to avoid.

The other reason that it is a minor problem though is that some random bug or mistake could lead one of the guest OSes to accidentally connect to another guest OS's exclusive directory and remove/modify/create files.  Pretty unlikely and on a single-user machine it's not a significant concern, but it would still be nice to fix it.

cartoon

Configure The NFS Mounts On The Guest OSes

If a reboot of the guest OS has occurred since /etc/hosts changes were automated, the host OS's hostname should already be mapped in /etc/hosts.  If the guest OS has not yet been rebooted, add the host OS's hostname to the /etc/hosts file.

Tailor the lines below to each OS's required file format and add them to each guest OS's /etc/fstab (/etc/vfstab for Solaris).  Include an option specifying to mount the directories automatically at boot (the auto below accomplishes this).  Specifying read-only (ro) or read-write (rw) may be unnecessary for some OSes but it's useful for the mount options to match the permissions granted by the NFS server.  Specifying rw when the NFS server only grants ro will obviously not allow the directory to be written to.  Order is important if you name directories as suggested since the common writable directory is mounted off a sub-directory of the exclusive writable directory.

host_os_hostname:COMMON_RO_HOST             COMMON_RO_GUEST nfs auto,ro 0 0
host_os_hostname:RW_HOST_BASE/guest_os_name RW_GUEST nfs auto,rw 0 0
host_os_hostname:COMMON_RW_HOST COMMON_RW_GUEST nfs auto,rw 0 0

Then try mounting each directory and test the permissions.  Note that Solaris 10 uses NFS version 4 by default and won't connect to Gentoo Linux's NFS server unless told to downgrade to version 3.  This can be achieved by:

  1. editing /etc/default/nfs and setting NFS_CLIENT_VERSMAX=3 
  2. restarting the nfs client: svcadm restart nfs/client

Setting Up The SSH Services

SSH is the means by which the host OS connects to the guest OS and runs whichever test script is specified. 

Generate a ssh key for the qemu-invoking user and each test-script user on the host OS using a command like (this must be run under the account of the user in question):

ssh-keygen -t rsa

Copy the public key for each user on the host OS (~/.ssh/id_rsa.pub) to the authorised hosts file for the same user on each guest OS (~/.ssh/authorized_keys).  This will allow ssh connections without asking for a password.

Remember to ensure that the sshd service runs at startup on each guest OS,

Putting It All Together With The runall-os-test Script

The virtual network of virtual machines is now set up.  What is missing is a convenient (and due to permissions, in a multi-user setup, necessary) way to bring up any not-yet-running guest OSes and run a single script on each OS.  This is achieved through runall-os-test.  As was explained above , the script is not appropriate for multi-user hosts and would require modification for such usage.  By the way, if you independently perform such modifications before I get around to it, I would appreciate you forwarding them to me.

Usage Of The runall-os-test Script

The usage of the script is explained in the initial block of comments.  Basically it boots up the guest OSes specified as parameters and optionally runs a script on each.  If no OSes are specified, all OSes in the config file are booted. If no script is specified, the OSes are simply booted.  If a script is specified it must be the second option and the first option must be -s, -sf or -sn.  If the script is not a full path that starts with $COMMON_RO_LOCAL (set in the top block with the script's global variables) then it is assumed not to exist under the $COMMON_RO_LOCAL directory already and is copied there.  A multi-user system would need to copy it to the user's specific directory within this directory, but on a single-user system it is sufficient to copy it to the root.  The copy is interactive by default (cp -i) but this can be suppressed using -sn.  To perform a cp -f, specify -sf.  The script logs all steps it takes to $LOGFILE or stderr if this variable is not set.  The stdout and stderr of the scripts as run by ssh on the guest OSes is set to the same terminal as runall-os-test.

The first time a ssh connection is made to a guest OS, ssh will ask for confirmation of the host key.  By default it adds IP addresses as well as hostnames to the ~/.ssh/known_hosts file.  This causes repeated confirmation requests when the guest OS's IP address changes.  Thus the script includes a function to intelligently strip IP addresses from the known_hosts file.  Be aware if running as a single-user that runall-os-test will remove any IP addresses matching the $KNOWN_HOSTS_IP_REGEXP from the ~/.ssh/known_hosts file.

Brief Description Of The runall-os-test Script

runall-os-test reads configuration data for each guest OS from a config file and relies on a couple of helper scripts - runqemuosbgnoint and runqemuos - so that interrupts can be handled properly and the PID of the qemu process can be known.  The helper script execs qemu so it knows that qemu's PID is the same as its own.  Interrupts are a problem because without job control enabled, they are passed through to child processes.  Unfortunately, the qemu child process happens to terminate on this signal, which is not appropriate (it should allow the guest OS to shut down properly).  This could be handled by using set -m in the main script to turn on job control, but unfortunately for some reason this causes keyboard interrupt to be ignored during the builtin sleep command which occurs in polling loops.  So it is deferred to the runqemuosbgnoint script, which then calls the runqemuos script in the background.  For more details see the scripts themselves.

Installing Required Commands

The script relies on lsof so this command needs to be installed if it is not already.  Under Gentoo Linux, as root emerge lsof:

emerge lsof

How The Connection To The Guest OS Is Verified

runall-os-test relies on the command ~/hostname to return the hostname of the guest OS as specified in the config file on the host OS.  So this command must be present in the qemu-invoking user's home directory on each guest OS.  An appropriate way to acheive this is:

  1. permanently set the hostname by whichever means is most appropriate for each guest OS (this will require root access).  This is typically a matter of specifying it in a file under /etc as documented by each OS.  I could investigate how I've set this up and document the specific filenames here.
  2. as the qemu-invoking user on each guest OS create ~/hostname as an executable script that runs the appropriate command to determine the OS's hostname.  For Gentoo Linux this is /bin/hostname -a, for Solaris this is /usr/bin/hostname and on the BSDs it is /bin/hostname -s.

How The Host Is Notified That A Guest OS Has Finished Booting

The runall-os-test depends on the guest OS to provide notification that it has completed booting.  This could alternatively have been achieved through polling for a ssh connection, but the chosen approach ensures that the OS is in a stable, fully booted state (in actual fact polling is used in the script provided, but on my system I use a simple Linux-specific utility that I wrote to avoid polling.  It is named waitfile and uses the inotify kernel interface.  If anyone is interested in the code I will provide it, but it would bloat this document too much).  The means by which the OS notifies that it has completed booting is by writing to a file in one of the writable directories NFS-mapped to the host.  I chose this file as COMMON_RW_HOST/guest_os_hostname but arguably RW_HOST_BASE/guest_os_name/bootnotify would be more appropriate.  I should point out here that I've given my guest OSes different hostnames than their OS names, so the two variables guest_os_hostname and guest_os_name are distinct, but this needn't be the case.

The means by which this is achieved differs for each guest OS.  Here are the details:

The BSDs

Append a line to the end of /etc/rc.  As root, run this command on each guest BSD OS:

echo 'date >> COMMON_RW_GUEST/`hostname -s`' >> /etc/rc

Again, this is not ideal as /etc/rc is a system file liable to be replaced on upgrade.  There does not seem to be a maintainable way to configure a script to run at the end of the boot process in BSD, but such an approach would be preferable.

Gentoo Linux

Append a line to the end of /etc/conf.d/local.start.  In contrast to the BSD approach above, this is maintainable because the local.start file is a configuration file intended to be user-modifiable.  There is no specification anywhere that it will run last in the boot process, but it does, so this suits our purposes.  As root, run this command on the guest Gentoo Linux OS:

echo 'date >> COMMON_RW_GUEST/`hostname -a`' >> /etc/conf.d/local.start
Solaris

Create a new file in the /etc/rc3.d directory and give it a high number so that it runs last in the boot process.  This, as with the Gentoo approach, is maintainable, although the rc boot-up sequence seems to be unofficially deprecated in favour of the services approach.  As root, run this command on the guest Solaris OS:

cat <<'ENDOFSCRIPT' >/etc/rc3.d/S999999signal_host
#!/sbin/sh

if [ "" = start ]
then
        date >> /netshare-rw/common/solocrat
fi
exit 0

ENDOFSCRIPT

chmod 744 /etc/rc3.d/S999999signal_host

Potential Problems With The runall-os-test Script

The runall-os-test script performs quite a lot of checking to ensure that two instances of a guest OS are not booted simultaneously.  This is important because QEMU does not lock the disk image and does nothing to prevent multiple sessions from opening the disk image for writing.  It is possible that in some very unusual conditions (one example is where /usr/sbin/lsof is replaced with a dummy) that the script's checks fail and that a second OS instance is booted, but this is very unlikely.  The slender possibility of accidentally booting an OS that is already running could be avoided by not having the run-os-test script boot OSes and instead have it return an error if an OS is not reachable.  Guest OSes would instead be booted manually prior to running the tests and a manual check would be performed to ensure that only one instance of each OS is running.  This would require an administrator to perform the task on a multi-user system and could be a bit of a support problem.

Another potential problem is that on hosts where memory is limited or many OSes need to be run, it may be necessary to shut down each OS (or a select group of them) as the test-script completes - runall-os-test does not do this so by default at the end of the test run all OSes will still be running.  The script currently does not cater for this, but could be extended to do so.

Missing Functionality?

There is no functionality in the runall-os-test for it to run the specified test script on any guest OSes that have already booted while waiting for another guest OS to boot.  This is because batch processing is more efficient than multitasking where processes are CPU-bound and don't do a lot of waiting - e.g. on IO.  This seems to be the best model this situation where booting the guest OS under QEMU is fairly CPU-intensive.  It also simplifies the script not to jump around between OSes too (KISS).

Creating The runall-os-test Script

As root, cut and paste the following set of commands to create the runall-os-test script.  Edit the script and set the variables in the top block of the script to appropriate values.  A final reminder that this script is not suitable for use in a multi-user environment without modifications, because to use this script a user would need sudo permission to run /sbin/start-tun-interface as root, and that gives them permission not only to introduce undesired entries into the /etc/hosts file, but more importantly to start their own QEMU OSes with full - i.e. root - access to those parts of the host OS filesystem exported by NFS.  It assumes a single-user system where the qemu-invoking user is identical to the test-script user.  In fact there is not really any need for it to be owned by root, it might as well be owned by the qemu-invoking user and stored in their personal bin directory, but I prefer to locate it in /bin and have it owned by root but executable by all given that my system is single-user.

cat << 'ENDOFSCRIPT' >/bin/runall-os-test
#!/bin/bash

# run-qemu-os
#
# Launches the qemu-hosted OSes specified on the command line (or all
# if none specified). If a script is also specified, that script is
# run on each guest OS through ssh. A config file stores
# information on which OSes are availabe and how to invoke them. The
# location of this file is set below in the CONFIGFILE variable.
#
# => -s[n|f] indicates that a test script is to be run
# [non-interactive copy | force copy]
# (default is an interactive copy)
# => scriptname if is -s, -sn or -sf
# |3..$n => hostnames of OSes to run; "all" for all (starts from
# if is -s, -sn or -sf)
#
# The script to run (if any) must be located either
# (a) in a dir based in $COMMON_RO_HOST or
# (b) elsewhere.
# If the beginning of the specified script () compares equal with
# $COMMON_RO_HOST then (a) is assumed; else (b).
# If (b), and the file specified by exists, it is copied to
# $COMMON_RO_HOST with a prompt on overwrite if a file with its name
# already exists in $COMMON_RO_HOST. The prompt can be avoided by
# adding an f after th s. A chmod u+x will then be run on the
# destination file.
# It will run on the guest OS through a ssh connection. The script
# can assume that:
# 1) it will run with the same username/userid as on the host
# 2) it will have read-only access to the nfs-mounted dir
# $COMMON_RO_GUEST which corresponds to the host's directory
# $COMMON_RO_HOST and is shared by all qemu guests.
# 3) it will have write access to the nfs-mounted directory
# $RW_GUEST to which the cwd will be changed prior to the
# script's invocation; but it will actually be invoked from its
# location under $COMMON_RO_GUEST
# 4) the write access to $RW_GUEST is intended to be exclusive to
# the guest OS under which it runs, but currently this is not the
# case and it is actually writable by other guest OSes.
# 5) it will have write access to $COMMON_RW_GUEST; this is
# intentionally shared-writable by all guest OSes.
# 6) `~/.hostname` == the guest OS's hostname as known to the host
# OS (mapped in the host OS's /etc/hosts file).
# NOTE: (6) only applies on a single-user system where the
# qemu-invoking user is the same as the test-script user

# Configurable variables
CONFIGFILE=/data/qemu/config # where to read config from
QEMUIMGDIR=/data/qemu # where the disk images are located
LAUNCHDIR=/data/qemu/launch_times # where the files containing the
# pids of launched qemu processes
# are created
COMMON_RW_HOST=/data/qemu/netshare/common-rw
COMMON_RW_GUEST=/netshare-rw/common # NFS-mapped to the above
UPDIR=$COMMON_RW_HOST # dir the guest OSes write to upon
# completing boot
COMMON_RO_HOST=/data/qemu/netshare/common-ro
COMMON_RO_GUEST=/netshare-ro # NFS-mapped to the above
RW_GUEST=/netshare-rw
TMPHOSTNAMEDIR=/tmp/qemuhostnames # path to dir where the guest OS's
# hostname is stored in a file with the name
# identical to the invoked qemu process's PID
LSOF=/usr/sbin/lsof # path to lsof
PING=ping # path to ping
SSH=ssh # path to ssh
SED=sed # path to sed
DATE=date # path to date command (assumes gnu options)
STAT=stat # path to stat command
LOGFILE=~/log # if set, logging msgs go here; else stderr. Does not
# catch error messages generated by the shell or
# the user's script as run through ssh. Recommended to
# leave unset initially whilst testing
DEFAULT_START_TIMEOUT=600 # used if not specified by a guest OS ...
DEFAULT_STOP_TIMEOUT=600 # ... entry in the config file
POLLINT=10 # how many seconds to wait between polls for events
# whenever polling is used
POLLCONNECTFACTOR=6 # how many polls to skip between conn tests in
# f_poll_for_qemu_process_shutdown_or_connect
QEMU_INIT_PERIOD=20 # how many seconds to wait for a qemu process to
# open the disk image for writing
KNOWN_HOSTS_IP_REGEXP="172\.[[:digit:]]\{1,3\}\.0\.2" # IP addreses to
# remove from ~/.ssh/known_hosts
#command to print timestamp for logging
LOGTIMECMD="$DATE +%Y/%m/%d-%H:%M.%S"
#command to print timestamp followed by current hostname
#LOGTH="eval echo $($LOGTIMECMD) $HOST"
#not used anymore but left as a reminder of how to do this and an
#example of when eval is required.
#e.g. usage is: echo "$($LOGTH) :: some message"
#removing the "eval" will cause this command to print the wrong
#message due to the need for parameter substitution

# log the message specified by the passed in parameters
f_log_msg()
{
if [ -n "$LOGFILE" ]
then echo $($LOGTIMECMD) ${HOST:+${HOST}:: }$@ >> $LOGFILE
else echo $($LOGTIMECMD) ${HOST:+${HOST}:: }$@ 1>&2
fi
}

# log the message specified by the passed in parameters; treating it
# as an error message
f_log_err()
{
f_log_msg ERROR:: $@
# always return 1 so that this return value can be passed
# through as the calling function's return
}

# log the command as passed in $@ and then return the result of
# evaluating it
f_log_and_run()
{
f_log_msg CMD:: $@
eval $@
}

# remove numerical IP addresses matching the given regexp from the
# known_hosts file otherwise they provoke prompts requiring user
# response when IP address for the host changes. Remove entire
# lines if the IP address is at the beginning of the line with a
# space following it; otherwise strip it if it is preceded by a comma
# and an arbitrary amount (including none) of whitespace.
function f_strip_known_hosts_ips()
{
$SED -n "s/^$KNOWN_HOSTS_IP_REGEXP[[:space:]]//; t; \
s/,[[:space:]]*$KNOWN_HOSTS_IP_REGEXP//; p" \
~/.ssh/known_hosts >/tmp/known_hosts.$$
mv /tmp/known_hosts.$$ ~/.ssh/known_hosts
}

# attempt to run the script in $SCRIPT (if one was specified) on the
# guest OS using ssh. Returns 0 if the script runs successfully.
function f_run_script()
{
# only run script if one was specified
if [ -z "$SCRIPT" ]; then return 1; fi
f_log_msg "Running script..."
f_strip_known_hosts_ips
f_log_and_run $SSH -q $HOST $COMMON_RO_GUEST/$SCRIPT
RES=$?
if [ $RES -eq 0 ]
then
f_log_msg "Script ran OK"
return 0
fi
f_log_err "$SSH returned $RES"
return 1
}

# polls for the upfile for the current OS. Uses the time in
# $LAUNCHTIME as the boot start time. Returns 0 if a file with later
# modification time than $LAUNCHTIME appears
f_wait_for_upfile()
{
if [ -z "$LAUNCHTIME" ]
then
f_log_err "In f_wait_for_upfile, $LAUNCHTIME not \
defined (probably a freak case of $LAUNCHDIR/$HOST being deleted \
after being determined to exist, or a date command error)"
return 1
fi
let WAITENDTIME=$LAUNCHTIME+$START_TIMEOUT
while NOW=$($DATE +%s) && [ $NOW -lt $WAITENDTIME ]
do
if UPTIME=$($STAT -c %Y $UPDIR/$HOST) &&
[ $UPTIME -gt $LAUNCHTIME ]
then return 0; fi
sleep $POLLINT
done
}

function f_launch_os_run_script()
{
f_log_msg "Launching OS and waiting for it to come up"
let LAUNCHTIME=$($DATE +%s)
if ! [ -d $TMPHOSTNAMEDIR ]
then f_log_and_run "mkdir $TMPHOSTNAMEDIR"
fi
f_log_and_run "/bin/runqemuosbgnoint $HOST $TMPHOSTNAMEDIR \
$LAUNCHDIR $QEMUOPTS"

if f_wait_for_upfile # Uses $LAUNCHTIME
then
# os has apparently come up
f_log_msg "Finished waiting: OS is up"
f_run_script
return 0
fi
# timed out waiting for upfile
f_log_err "Timed out waiting on upfile"
return 1
}

# returns 0 if the guest OS's qemu disk image is open by any
# process in a mode other than read-only; if /usr/sbin/lsof returns
# an error then this function returns 2 and the caller must not
# proceed, since we cannot be sure no other process is writing to the
# disk image; otherwise returns 1 (disk image not open for writing)
f_is_disk_image_open()
{
# if lsof not exe, return lsof error; caller must abort
if ! [ -x $LSOF ]; then return 2; fi
f_log_and_run "LSOF_OP=$($LSOF -Fa0 $IMGFILE)"
# can't perform test below as lsof returns 1 if file is not
# open by any process as well as on real error. Not correct.
# Unsafe.
# if ! [ $? -eq 0 ]
# then return 2; fi # return lsof error; caller must abort
set -- $LSOF_OP
for OPT in $@
do
if [ ${OPT:0:1} == a ] && [ ${OPT:1:1} != r ]
then return 0; fi
done
return 1
}

# returns 0 if the PID in or $QEMUPID specifies a qemu process
f_pid_is_qemu_process()
{
if COMM=$(ps -o comm -p ${1:+$QEMUPID}) &&
[ ${COMM#*COMMAND} == qemu ]
then return 0; fi
return 1
}

# returns 0 if the launch file exists and its first line
# is the PID of a Qemu process
f_launch_file_specifies_qemu_process()
{
if [ -f $LAUNCHDIR/$HOST ] &&
read QEMUPID < $LAUNCHDIR/$HOST &&
f_pid_is_qemu_process $QEMUPID
then return 0; fi
return 1
}

# returns 0 if a launch file exists and a later upfile also exists
f_later_upfile_exists()
{
if [ -f "$LAUNCHDIR/$HOST" ] &&
[ "$UPDIR/$HOST" -nt "$LAUNCHDIR/$HOST" ]
then return 0; fi
return 1
}

# returns 0 if launch file exists and at least start_timeout
# seconds have expired since it was created
f_startup_timeout_has_expired()
{
if LAUNCHTIME=$($STAT -c %Y $LAUNCHDIR/$HOST) &&
NOW=$($DATE +%s) &&
[ $(($NOW-$LAUNCHTIME)) -gt $START_TIMEOUT ]
then return 0; fi
return 1
}

# polls for $STOP_TIMEOUT seconds for the PID in $QEMUPID to terminate
# or no longer be a qemu process or for a connection to be possible
# to $HOST. If a connection is possible, 0 is returned; if the
# process in $QEMUPID exits or stops being a qemu process, 1 is
# returned; otherwise (or on error) 2 is returned.
f_poll_for_qemu_process_shutdown_or_connect()
{
if ! POLLSTART=$($DATE +%s); then return 2; fi
NOW=$POLLSTART
NUMPOLLS=1
while [ $(($NOW-$POLLSTART)) -lt $STOP_TIMEOUT ]
do
if ! f_pid_is_qemu_process $QEMUPID
then return 1; fi
sleep $POLLINT
let NUMPOLLS=$NUMPOLLS+1
if [ $NUMPOLLS -gt $POLLCONNECTFACTOR ]
then
if f_can_connect ; then return 0; fi
NUMPOLLS=1
fi
if ! NOW=$($DATE +%s); then return 2; fi
done
return 2
}

# returns 0 if a ssh connection is possible and ~/.hostname on
# the guest OS matches $HOST
f_can_connect()
{
if ! f_log_and_run "$PING -c 1 $HOST &>/dev/null"
then return 1; fi
f_strip_known_hosts_ips
if f_log_and_run "HN=$($SSH -q $HOST ./hostname)" &&
[ "$HN" == "$HOST" ]
then return 0; fi
return 1
}

# main procedure to bring up a guest OS and run any specified script
# performs all checks necessary to ensure that only one instance of
# an OS runs
main_os_startup_function()
{
f_log_msg "Testing connection"
if f_can_connect
then f_log_msg "Test succeeded"
f_run_script
return
fi

# connect failed
f_log_msg "Connection test failed"
f_is_disk_image_open
DIOW=$?
if [ $DIOW -eq 2 ]
then f_log_err "Error whilst checking whether disk image is \
open; aborting."
return 1
elif ! [ $DIOW -eq 0 ]
then
# disk image not open
f_log_msg "Disk image is not open for writing."
if ! f_launch_file_specifies_qemu_process
then f_launch_os_run_script; return
# connect failed, NOT DIOW but existing Qemu process;
# the possibilities are that it is
# (a) initialising && hasn't yet opened the disk image
# (b) bugging and has closed the disk image fd
# (we assume not to later open it again)
elif ! f_later_upfile_exists
then
# assume (a)
f_log_msg "Launch file specifies a qemu \
process; later up file does not exist; waiting for \
${QEMU_INIT_PERIOD}s to give it time to open \ the disk image"
sleep $QEMU_INIT_PERIOD
f_log_msg "Finished waiting."
if ! f_is_disk_image_open
then f_log_err "Disk image still not \
open; aborting."; return 1
else # drop through without returning
f_log_msg "Disk image now open; \
assuming boot process has started"
fi
else
# assume (b) and don't launch new OS in case
# the "buggy" one decides to re-open the file
f_log_msg "Launch file specifies a qemu \
process and a later upfile exists. Time on upfile is \
$($STAT -c %y $UPDIR/$HOST)"
f_log_err "Not continuing in case active \
qemu process running apparently booted OS re-opens disk image"
return 1
fi
fi
# connect failed but DIOW
f_log_msg "Disk image is open for writing."
if ! f_launch_file_specifies_qemu_process;
then
# Could not connect but DIOW and no qemu process found
f_log_msg "PID in launch file is not a qemu process \
or file does not exist"
f_log_err "Can't launch OS as disk image is already \
open for writing; aborting"
return 1
fi
f_log_msg "PID in launch file specifies a qemu process"
# connect failed, but DIOW and a qemu process exists
if ! f_later_upfile_exists
then
f_log_msg "A later upfile does not exist; assuming \
OS is booting. Will wait until launchfile is ${START_TIMEOUT}s old"
# connect failed, DIOW, qemu process, no later upfile
# assume booting.
if f_startup_timeout_has_expired
then f_log_msg "Not waiting - launchfile is \
already older than ${START_TIMEOUT}s"
f_log_err "Can't launch OS; existing OS \
failed to advise of boot completion"
return 1
# $NOW was set by f_startup_timeout_has_expired
elif f_wait_for_upfile
then f_log_msg "Finished waiting - OS is up"
f_run_script
return
else f_log_err "Timed out waiting on upfile for \
previously active guest OS."
return
fi
# connect failed but DIOW, qemu process, later upfile;
# assume shutting down
else
f_log_msg "A later upfile exists; assuming OS is \
shutting down. Waiting ${STOP_TIMEOUT}s for qemu process to exit \
and testing for connection while waiting"
f_poll_for_qemu_process_shutdown_or_connect
RET=$?
f_log_msg "Finished waiting"
if [ $RET -eq 0 ]
then f_log_msg "OS is up"
f_run_script
return
elif [ $RET -eq 1 ]
then
f_log_msg "Process has exited"
if ! f_is_disk_image_open
then f_log_msg "Disk image not open"
f_launch_os_run_script
return
else f_log_err "Disk image still open for \
writing; aborting"
return
fi
fi
f_log_msg "Process still alive"
f_log_err "Cannot launch OS; aborting. Suggest ssh \
or guest os boot problem"
return
fi
f_log_err "Reached end of main_os_startup_function"
}

unset HOST # so f_log_msg doesn't try to print it

# check for script in cmd-line parameters
# copy to $COMMON_RO_HOST if required
if [ "${1:0:2}" == "-s" ]
then
if [ "${1:2:1}" == "f" ]; then CPOPTS="-f"
elif [ "${1:2:1}" != "n" ]; then CPOPTS="-i"; fi
shift
if [ "${1:0:${#COMMON_RO_HOST}}" == $COMMON_RO_HOST ]
then
# found under $COMMON_RO_HOST
SCRIPT=${1#${COMMON_RO_HOST}/}
elif [ -f "" ]
then
# copy to $COMMON_RO_HOST
f_log_and_run "cp $CPOPTS $COMMON_RO_HOST"
SCRIPT=${1##*/}
f_log_and_run "chmod u+x $COMMON_RO_HOST/$SCRIPT"
else
echo -n "Script not specified or " 1>&2
echo "not a regular file: " 1>&2
echo -n "Usage:: [s[f|n] scriptname] " 1>&2
echo "[OS1 OS2 OS3 ...]" 1>&2
echo " If no OSes specified; all are assumed" 1>&2
echo -n " If f is specified and the script is " 1>&2
echo "not located under $COMMON_RO_HOST " 1>&2
echo -n " then when it is copied there cp -f " 1>&2
echo "will be used;" 1>&2
echo -n " else if n is not specified then " 1>&2
echo "cp -i will be used;" 1>&2
exit 1
fi
shift
else
unset SCRIPT
fi

# check for OSes specified as cmd-line parameters
if [ -z "" ]
then
f_log_msg "No operating systems specified; assuming all"
ALL=0
else
ALL=1
OSES=$@
fi

# read OS data from config file and launch OSes, running script if
# specified. OSes specified on the command line will be run in config
# file order, NOT command line order. If no OSes were specified, all
# in the config file are launched
STATUS=start
while read -u 4 LINE
do
DONEOSREAD=1
set -- $LINE
# HOST IMGFILE START_TIMEOUT STOP_TIMEOUT MEM LOCALTIME
if [ -z "$LINE" ]
then
if ! [ $STATUS == start ]
then
DONEOSREAD=0
STATUS=start
fi
elif [ -z "${LINE%%\#*}" ]; then continue # skip comments
elif [ $STATUS == start ]
then
unset QEMUOPTS;
HOST=
shift
if [ $ALL -eq 1 ]
then
if [ -z "$OSES" ]
then
unset HOST
f_log_msg "All specified OSes done"
break
fi
OSES2=$OSES
unset OSES
MATCH=1
for OS in $OSES2
do
if [ $OS == "$HOST" ]
then MATCH=0
else OSES="$OSES $OS"; fi
done
if [ $MATCH -eq 1 ]; then continue; fi
fi
# imgfile (must be the last option)
IMGFILE=$QEMUIMGDIR/
QEMUOPTS=$IMGFILE
shift
START_TIMEOUT=${1:-$DEFAULT_START_TIMEOUT}
shift
STOP_TIMEOUT=${1:-$DEFAULT_STOP_TIMEOUT}
shift
# memory size option (imgfile must be last option)
QEMUOPTS="${1:+-m }$QEMUOPTS"
shift
# localtime option (imgfile must be last option)
if [ "" == "L" ]
then QEMUOPTS="-localtime $QEMUOPTS"; fi
STATUS=opts
elif [ $STATUS == opts ]
then
QEMUOPTS="$LINE $QEMUOPTS"
fi
if [ $DONEOSREAD -eq 0 ]
then
f_log_msg "START"
# When logging to a file, indicate on stdout which
# host we're up to
if [ -n "$LOGFILE" ]; then echo "-----$HOST-----"; fi
main_os_startup_function
f_log_msg "END"
fi
done 4< $CONFIGFILE
# opened on fd 4 so that changes to stdin made by other commands
# don't break this fd. The -u 4 option to "read" at top complements
# this.

if [ -n "$OSES" ]; then f_log_err "Not found: $OSES"; fi

ENDOFSCRIPT

chmod 555 /bin/runall-os-test

Configuring The Script

Create the directory specified by $TMPHOSTNAMEDIR with permissions allowing only the qemu-invoking user to write to it.

Create the configuration file.  It should not be writable by other than the qemu-invoking user (or better yet, root) but should be readable by all.  Also it should exist in a directory with permissions such that it cannot be deleted by other than root.  On the reference system it is located at /data/qemu/config.  An example follows:

## GUEST OS CONFIGURATION FILE
##
## Commented lines are ignored; blank lines delimit the OSes
## For safely, timeouts must be calculated based on running without
## kqemu. Timeouts are for boot and shutdown respectively.
##
## The first line of each OS is delimited by single spaces with fields
## as below. The next set of consecutive lines are additional options
## to pass to qemu (one specified per line)

# hostname imgfile starttimeout endtimeout memsize -localtime?(L=yes)

netqemu netbsd-2.0.2.img 240 60 64 L

solqemu solaris-10.img 2400 2000 128 L
-cdrom /data/os/solaris-10/sol-10-ccd-GA-x86-iso.iso

gentqemu gentoo.img 1200 1200 64 L

freeqemu freebsd-5.4.img 720 60 64 L
-cdrom /data/os/freeBSD-5.4/5.4-RELEASE-i386-disc1.iso

# openqemu should be last as it suffers least under kqemu-less
# operation
openqemu openbsd-3.7.img 240 60 64 L

#the last OS must be terminated by a blank line

Creating The Helper Scripts

As root, cut and paste the following set of commands to create the helper scripts for runall-os-test.  Set the QEMU variable if required (should be fine for default PATH variables).

cat << 'ENDOFSCRIPT1' >/bin/runqemuosbgnoint
#!/bin/sh

# turn on job control so child processes don't get passed keyboard intr
set -m

/bin/runqemuos $@ &

ENDOFSCRIPT1

chmod 555 /bin/runqemuosbgnoint

cat << 'ENDOFSCRIPT2' >/bin/runqemuos
#!/bin/sh
# => hostname
# => temporary hostname directory
# => launch_times directory
# .. => qemu options

# Configurable variables
QEMU=qemu # path to qemu executable

echo > /$$
echo $$ > /
shift 3
exec $QEMU $@

ENDOFSCRIPT2

chmod 555 /bin/runqemuos

Conclusion

Right, that's it then.  You should be able to run any scripts on any/all OSes using runall-os-test.  If it wasn't all plain sailing, send me an email and let me know what could be improved.

Last updated Tue 9 Aug 2005       Contact The Author


Copyright info:
http://members.dodo.com.au/~netocrat/copying.html

Got something to add? Send me email.





(OLDER) <- More Stuff -> (NEWER)    (NEWEST)   

Printer Friendly Version

-> -> Cross-Platform Compatibility Testing On One Machine Without Rebooting


4 comments



Increase ad revenue 50-250% with Ezoic


More Articles by © Netocrat







Mon Aug 8 22:57:49 2005: 946   TonyLawrence

gravatar
Great job - I hadn't hadn't heard of QEMU. Thanks for this great write up.





Sun Dec 20 08:38:13 2009: 7770   Gopalakrishna

gravatar


Very good article - useful for all those who are doing opensource development and want to test their code on multiple operating systems

Gopalakrishna
(link)



Tue Feb 8 15:09:07 2011: 9286   Gopalakrishna

gravatar


Is there a ready to use live-cd /dvd distribution available with the above suggested techniques?

Or any equivalent live distribution that can downloaded and used for building platform independent code is great.

- Gopalakrishna Palem
Creator of CFugue
(link)



Tue Feb 8 15:16:27 2011: 9287   TonyLawrence

gravatar


I have no idea, sorry.

------------------------
Kerio Samepage


Have you tried Searching this site?

Unix/Linux/Mac OS X support by phone, email or on-site: Support Rates

This is a Unix/Linux resource website. It contains technical articles about Unix, Linux and general computing related subjects, opinion, news, help files, how-to's, tutorials and more.

Contact us





The teaching of BASIC should be rated as a criminal offence: it mutilates the mind beyond recovery. (Edsger W. Dijkstra)

Getting information off the Internet is like taking a drink from a fire hydrant. (Mitchell Kapor)








This post tagged: