Start of topic | Skip to actions
LCG Grid Middleware Deployment
OverviewThis document details the steps required to install and configure an LCG node to act as a front end to an existing computing cluster. This configuration assumes that the compute cluster to which the front end is attached uses PBS (job queueing system) / MAUI (job scheduling) and has the user filesystems (/home) NFS mounted from the server/management node (PBS server node) to all of the compute nodes. LCG middleware is certified to run on Scientific Linux 3 (sl3). For this installation we have chosen Scientific Linux 3.0.4 which is based on RedHat Enterprise Linux 3. Prior to detailing the installation procedure, it is worthwhile outlining briefly the architecture of the LCG middleware. The LCG middleware is defined in terms of components. These components include Worker Nodes (WN - in traditional HPC parlance these equate to compute nodes), a Compute Element (CE - which in traditional HPC parlance equates to a management / login node; i.e. cluster front-end), a Storage Element (SE), a Resource Broker (RB), a Monitoring Node (MON), an Information Index (BDII) and a User Interface (UI). In the context of the LHC Computing Grid, each of these components would be installed on a different physical computer (example). In this cartoon, each component of the middleware (blue boxes) is installed on a different physical computer - a computer per 'service'. However, this is not essential. In the configuration discussed here, an LCG front-end containing CE, SE and UI components is configured and attached to an existing cluster (example). In this example, the user interface, compute element and storage element components are combined on an 'LCG front-end' node which is then attached to / directed at existing resources. Note: It is known that CE and RB components can NOT coexist on the same physical machine. The LCG installation guide can be found here.Operating system installation and configurationInstall Scientific Linux 3.0.5A base server install only. Package management is achieved via apt-get. Accordingly, any LCG middleware dependencies will be taken care of during installation. Notes:(1) Anyone wishing to install SL should not install from the 3.0.3 iso's downloaded from SL.org. They don't work. Use either 3.0.2 or 3.0.4. Upgrading to 3.0.5 simply involves 'apt-getting' the system up to the latest version. apt-get update
apt-get dist-upgrade
(2) SL servers are *hugely* overloaded. I have found that using cern's repositories gives a much better response. That is, add the following lines to your sources list:
rpm http://linuxsoft.cern.ch cern/slc305/i386/apt os updates extrasrpm-src http://linuxsoft.cern.ch cern/slc305/i386/apt os updates extras Further, discussion on the LCG-rollout mailing list (http://www.listserv.rl.ac.uk/cgi-bin/wa.exe?SUBED1=lcg-rollout&A=1) has revealed that there are differences between standard and CERN Scientific Linux. Something to be aware of. Configuring SSH to allow HostbasedAuthentication between nodesGiven that our intention is to attach the LCG node to an existing cluster (and assuming that inter-node communication is configured via SSH HostbasedAuthentication), only the configuration of SSH on the LCG node will be detailed here.
EnableSSHKeysign yes (In most implementations of OpenSSH, enabling host based authentication automatically enables SSH keysign. However, it was found that on sl3, it was necessary to explicitly set the EnableSSHKeysign option).
HostKey /etc/ssh/ssh_host_rsa_key HostKey /etc/ssh/ssh_host_dsa_key StrictModes yes RSAAuthentication yes RhostsAuthentication no IgnoreRhosts yes RhostsRSAAuthentication no HostbasedAuthentication yes IgnoreUserKnownHosts yes PasswordAuthentication yes PermitEmptyPasswords no X11Forwarding no KeepAlive yes
machine1 machine1.fully.qualified.name machine2 machine2.fully.qualified.name etc...
machine1,machine1.fully.qualified.name,IP address ssh-rsa rsa-key machine2,machine2.fully.qualified.name,IP address ssh-rsa rsa-key Configure tcp-wrappersIn the hosts.allow file add the following line: ALL: *.ph.unimelb.edu.au In the hosts.deny file add the following line: ALL: ALL:Configure the firewall - IPTablesTHIS IS ONLY A GUIDE The LCG middleware (and all other components from which it is derived) provide services which require access both internally (e.g. to the computing cluster) and externally (e.g. to the global LHC grid). Clearly, the LCG front-end will act as a gateway between the outside world and the resources to which it is attached. Here is an example iptables setup script. Modify it as needed. NB: Network security policy will be site-dependent. These notes refer to an isolated test deployment of the LCG middleware.Configure network time synchronisationIn this case we use the University network time server. In the ntp.conf file in /etc, add the following lines: restrict 128.250.5.101 mask 255.255.255.255 nomodify notrap noqueryserver ntp.unimelb.edu.au In the step-tickers file in /etc/ntp, add the following lines: 128.250.5.101 Restart the service: service ntpd restart LCG middleware installation and configurationDownload and install the LCG installer package - YAIM (Yet Another Installation Method)Go to http://www.cern.ch/grid-deployment/gis/yaim/ and download the latest version of the installer (at the time of this writing it is lcg-yaim-2.6.0-8.noarch.rpm).wget http://www.cern.ch/grid-deployment/gis/yaim/lcg-yaim-2.6.0-8.noarch.rpm
rpm -ivh lcg-yaim-2.4.0-3.noarch.rpm
or
add the following line to /etc/apt/sources.list.d/cern.list:
rpm http://grid-deployment.web.cern.ch/grid-deployment/gis apt/LCG-2_6_0/sl3/en/i386 lcg_sl3 lcg_sl3.updates
and then
apt-get install lcg-yaim
YAIM is a script based installation method. It's found in /opt/lcg/yaim. Site information is defined in the file site-info.def, the generic grid user accounts (that will be created) are defined in file users.conf whilst the list of worker nodes (compute nodes in traditional parlance) of the cluster is defined in the file wn-list.conf.
Note: Field explanations are provided in the configuration files
Before commencing the installationDownload and install j2sdk1.4.2_08 (get it from http://java.sun.com).At a shell (bash/sh): export JAVA_HOME=/usr/java/j2sdk1.4.2_08 export PATH=${PATH}:/opt/condor/bin A brief explanation. JAVA is used extensively. However, due to its licensing, it cannot be distributed through the repositories. Installation of LCG componentsThe LCG middleware is subdivided into components. For this installation method, all of the components will be installed on one physical machine. Change to the yaim directory (/opt/lcg/yaim/scripts):cd /opt/lcg/yaim/scripts
./install_node ../examples/site-info.def lcg-CE-torque lcg-SECLASSIC lcg-UI
This script:- updates the packages already installed - downloads and installs any packages required by the middleware which are not yet installed - downloads and installs the packages relating to the component of the middleware being installed - downloads and installs the Certificate Authroity bundles Install host certificates / Certificate Authority (CA) bundlesInstall the hostcert.pem and hostkey.pem files in /etc/grid-security/cp hostcert.pem /etc/grid-security/
cp hostkey.pem /etc/grid-security/
In this case where we will initially be disconnected from the LCG global grid, we will need to manually install the certificate bundle of the authority which created these certificates (A2G).
see: http://www.vpac.org/a2g
Configure the components./configure_node ../examples/site-info.def CE_torque classic_SE UI
Completing the configuration of LCG node
$usecp *.ph.unimelb.edu.au:/home/ /home/ $usecp *.ph.unimelb.edu.au:/epp/ /home/ (NB: The setup of PBS is beyond the scope of this document. More information can be found via man pbs) In our setup, user accounts are NFS mounted from the server (roberts:/export/home/) to the LCG node at (grickle:/epp/home). Hence the lines above which state that PBS can use cp to move data from /home/ to /home/ and from /epp/ to /home/
cp /etc/profile.d/ /opt/profile.d/
This dir contains shell initialisation scripts that need to be accessible to all of the grid accounts.
cp /etc/grid-security/certificates/ /opt/globus/share/certificates/
This dir needs to be available to all of the worker nodes so that they can check the authenticity of the proxy accompanying the job.
Aside: Most clusters have the worker nodes on a private subnet and do not allow them to access the outside world. Generally, any data that a computation needs is staged in via PBS. In LCG, it is assumed that the grid gateways and the worker nodes do not have a shared filesystem. Accordingly, the lcgpbs job manager uses gsiftp to transfer data to the worker node. If this is not acceptable, then the standard globus pbs jobmanager can be used in its place as it does not do any 'gsiftp magic'. However, consider that if a computation operates on a large amount of data, it makes more sense for each worker node to get it's own data directly rather than PBS 'streaming' the data to each active node.
for i in $LCG_INIT_SCRIPTS do if [ -r "$i" ]; then . $i fi done Important LCG config files
| |