EPP Grid - Dynamic OS Cluster Environment with Xen


Start of topic | Skip to actions

Dynamic OS Cluster Environment with Xen

Overview

A Dynamic OS Cluster Environment gives greater flexibility with the types of jobs users can execute on a cluster. Using Xen, virtual machines can be created with an operating system different to the host machine. This allows virtual machines running different OS to be booted on computational nodes to meet the OS requirements of jobs submitted to the the node.

For example, the software package Athena, which will only run on Scientific Linux 3, can be run on any cluster regardless of the clusters operating system. When an job requiring Athena is detected by a computational node (host machine), it can start a virtual machine running Scientific Linux 3 for the job to be executed on. After the job is finished, the output is sent back to the host machine, and then back to the user via the job manager and the virtual machine shutdown, restoring the node to its original state.


Prototype

How It Works

A rough guide to how the Dynamic OS Cluster Environment prototype works.

Domain U File system

Domain U's file system is built from two block devices, a read only static file system image, and a writable ram disk image. The ram disk is mounted at / and contains /linuxrc, binaries (/bin) and libraries (/FS/lib) for the programs mount, and ln (?), a symlink /lib to /FS/lib and the writable sections of the file system such as /var and /tmp. The static file system contains everything except /boot, /dev, /etc, /proc, /root, /var and /tmp. On startup of Domain U the ram disk is loaded into memory the script /linuxrc is executed.

/linuxrc mounts the static file system at /FS and removes /bin using /FS/bin/rm and symlinks /bin to /FS/bin using /FS/bin/ln. This now gives us the full working file system, where the read only static file system image can be shared between many virtual machines, and all writable changes go to the ram disk in memory.

Execution of Commands on Domain U

To submit a job to the virtual machine, the user submits the name of the script they want to execute as the first parameter to domurun.sh. The script domurun.sh balloons down the memory of Domain 0 (host machine) and starts up Domain U (virtual machine). domurun.sh then waits till Domain U has halted before continuing execution. The user to execute the command as ($CUSTOMUSER), the command to execute ($CUSTOMCMD), the job id ($CUSTOMJOBID), and the directory to execute the command in ($CUSTOMWORKDIR) are passed to Domain U as kernel parameters by domurun.sh. $CUSTOMWORKDIR is equal to the current working directory ($PWD). This directory must be accessible by BOTH Domain 0 and Domain U at the same path. An NFS mount, mounted at the same path on both Domain 0 and Domain U works perfect for this.

The variables passed in on the kernel are read by the init script cmdexec; a script placed in init.d and symlinked to be executed last in the default run level on Domain U. If no command is given cmdexec has no effect, and Domain U will boot normally to a login prompt. If a command is found by cmdexec, the command is executed in the directory $CUSTOMWORKDIR, with stdout and stderr outputted to $CUSTOMWORKDIR/stdout.$CUSTOMJOBID, and $CUSTOMWORKDIR/stderr.$CUSTOMJOBID respectively. Once the command has finished execution cmdexec halts the virtual machine.

When domurun.sh has detected Domain U has halted, it prints $CUSTOMWORKDIR/stdout.$CUSTOMJOBID to stdout, and $CUSTOMWORKDIR/stderr.$CUSTOMJOBID to stderr on Domain 0 and cleans up all temporary files. domurun.sh is also responsible for updating a heartbeat file, which is created at $CUSTOMWORKDIR/vm.hb.$CUSTOMJOBID. Before cmdexec begins executing the given command, it executes hbmon.sh as a background process. hbmon.sh periodically checks to see if the byte size of the heartbeat file has changed. If no change is detected hbmon.sh halts the virtual machine. The heartbeat insures that if domurun.sh is terminated by the job manager, or the user before its execution has finished the virtual machine will be halted as well.


Installing and Using the Prototype

Quick Install

  1. Download and untar something.tar.gz and change to untarred directory.
  2. Add any extra fstab entries to fstab.append (/ and /proc entries are added by install.sh), eg. NFS mounts.
  3. Run install.sh and follow the prompts.
  4. Add any extra Xen VM settings to the $PREFIX-xen configuration file.
  5. Copy domurun.sh to a root executable path.

Example Install

  • Download tarball: wget something tar.gz

  • Untar tarball: tar -zxvf something.tar.gz

  • Change to installer directory: cd something

  • Add NFS mounted home to fstab.append so that fstab.append looks like:
    nfsserver:/export/home            /home           nfs rw,defaults 1 1

  • Run install.sh with root permissions: ./install.sh and answer install.sh prompts
    • Please enter prefix for static fs and initrd: newvm
    • Full path to SLC3 mount: /mnt/slc3
    • Location of domain U kernel: /boot/vmlinuz-2.6-xenU
    • Full path to directory to create Xen VM initrd: /storage/
    • Full path to directory to create Xen VM static fs (SIZE_REQUIRED MB): /storage/
      (NOTE: Must have at least SIZE_REQUIRED MB of free space)

  • install.sh will now generate:
    • The writable ram disk at /storage/newvm.initrd.gz
    • The read only static fs at /storage/newvm.staticfs
    • The Xen VM configuration file at ./newvm-xen

  • Add any extra Xen VM configuration to ./newvm-xen, refer to Xen manual

  • Copy domurun.sh to where it can be executed by users with root permission: cp domurun.sh /usr/sbin/

Running scripts/commands on Domain U

To run a script/command on Domain U simply run domurun.sh with the script/command as the first parameter. eg. domurun.sh testscript.sh will start up Domain U and execute testcript.sh. testscript.sh has to be on a directory accessible on both Domain 0 and Domain U at the same path, eg. an NFS mount mounted at the same position. The default directory for domurun.sh to use is the current working directory ($PWD), so domurun.sh should only be called from a directory accessible by Domain 0 and Domain U at the same path.

If domurun.sh is called with no parameters, it will start the virtual machine and cmdexec will be skipped so that the virtual machine will boot normally to a login prompt.

Troubleshooting

  • Problem: Getting no output back from execution of script/command on Domain 0.
    Possible Solution: Make sure that the directory you are calling domurun.sh is accessible on both Domain 0 and Domain U at the same path.

key Log In Revision:  r10 - 14 Feb 2006 - MarkZaloumis
Authorised by:  Geoff Taylor (G.Taylor @ physics.unimelb.edu.au)
Maintained using:  This site is powered by the TWiki collaboration platform
Copyright © 2000-2009 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.