Start of topic | Skip to actions
Tutorial: Grid for Local UsersBefore starting you should know that Grid jobs have turn around times of 5 minutes or more. If a resource is full or network problems occur this can be even longer. So the Grid is designed for long running or very large numbers of jobs.Grid Tools SetupIn order to use the Grid toolkit (called Globus) you will need to modify your environment on the EPP machines. If you use Cshell variants (tcsh or csh) add the following lines to the end of your .tcshrc or .cshrc files...
if ( -r /usr/local/grid/globus/general_cshrc ) then
source /usr/local/grid/globus/general_cshrc
endif
If you use Bourne shell variants (sh, bash, ksh, zsh), add the following lines to
your .bashrc, .shrc, .kshrc, or .zshrc files...
if [ -r /usr/local/grid/globus/general_cshrc ]; then
source /usr/local/grid/globus/general_cshrc
fi
Verify that the tools are working by logging back in and running the following command:
globusrun -versionThe command should report a version number such as 3.6 . Getting AccountsObtaining access to Grid resources involves 3 steps.
CertificatesYou must first obtain a certificate to identify you on the Grid. A certificate can be obtained from the APAC Certificate Authority (which is actually at VPAC). Click here to apply for a certificate. Make sure you have typed the following into the form:
~/.globus/usercert.pem and ~/.globus/userkey.pem respectively.
NOTE: If you happen to have a PKCS12 certificate file (eg. something.p12) you can
convert it using the pkcs2pem command.
mkdir ~/.globus cd ~/.globus pkcs2pem pathto/mycert.p12 user AccountsInitially your account on the EPP machines will be enough. However, you can also apply for accounts at the Melbourne Uni Advanced Research Computing centre and at VPAC.See Computer Help - Access to Facilities. Mapping Certs to AccountsWhen asking a resource owner to map your certificate to an account you must specify the subject of your certificate. You can obtain this by running the following command:grid-cert-info -subjectOn the Melbourne EPP resources email your certificate subject and phone number to Lyle Winton winton@physics.unimelb.edu.au . On VPAC, once you have an account, you can manually map your certificate subject to it. Click here and enter your VPAC username and password. Data ManagementFor the moment, you just need to realise that on a Grid your jobs can run anywhere. And your files and data are not everywhere! So most jobs will need to stage in input files (config, data) and then stage out output files (logs, hbooks, root files, data).Creating a JobRunning a job on the Grid is difficult unless you have the right tools. We've built such a tool called gqsched. This tool allows you to run an ordinary script across the Grid, with only a few modifications for staging files. The tool requires little knowledge to get started, but is full featured for advanced users. In this section we will construct a 3 simple scripts: the first is just a dummy script, the second is a real job with staging, and the third will run multiple jobs from one script. The following script, written in any of your favorite scripting languages, will print the hostname, print the environment, sleep for 60 seconds, print the date then exit. Nothing to it!#!/usr/bin/tcsh echo Starting... hostname env sleep 60 date echo Stopped.The second script demonstrates the need to stage files to and from (in and out) the remote Grid resource. Staging is specified with directives of the form #:STAGEIN and #:STAGEOUT.
In the following example the local files recon.conf particle.conf event.conf
will be staged into any remote resource and myoutput.hbook will be staged back.
#!/usr/bin/tcsh #:STAGEIN recon.conf ; particle.conf ; event.conf ; data.mdst basf << EOF path create main path add_module main my_ana initialize histogram define myoutput.hbook process_event data.mdst terminate EOF #:STAGEOUT myoutput.hbookThe third script demonstrates running the same script over a number of events. This is done using a "parameter sweep" where the environment variable $MYINPUT
(ie. the parameter) evaluates to each value.
A separate job is created for each value and these jobs will be run in parallel.
In the bellow case, one job for each file mydata*.mdst.
#!/usr/bin/tcsh #:PARAM MYINPUT FILE mydata*.mdst #:STAGEIN recon.conf ; particle.conf ; event.conf #:STAGEIN $MYINPUT basf << EOF path create main path add_module main my_ana initialize histogram define myoutput-$JOBID.hbook process_event $MYINPUT terminate EOF #:STAGEOUT myoutput-$JOBID.hbookNOTE: You'll notice that $JOBID is used to prevent filename clashes for the output.
$JOBID will be set for every job to a different number.
Submitting a JobOnce you've got you're script submission is simple.grid-proxy-init gqsched myscript.cshRunning grid-proxy-init signs you on to the Grid with your certificate.
This "sign on" lasts for 12 hours but can be renewed or extended.
If you know you're job will run for longer than 12 hours you can set it for longer.
For example 48 hours...
grid-proxy-init -valid 48:00The gqsched tool will report the status of your jobs periodically (the job IDs) and should not be stopped until all jobs are complete. Getting Output and ResultsOnce complete the standard output and error from the job will be returned to your local directory asmyscript.csh.o1 myscript.csh.e1 myscript.csh.o2 myscript.csh.e2 ... etc.
Any staged output should also be returned to the local directory.
Troubleshooting
| |||||||||