Start of topic | Skip to actions
HOW-TO Configure an LCG CE Host to Submit to Multiple PBS ClustersIntroductionNB: This how-to assumes your clusters use PBS as the LRMS. The standard installation of an LCG compute element implicitly assumes that it will be the front-end to one cluster. This is fine if you wish to build (and can resource) one CE per cluster at your site. However, if you wish to use one CE to submit to all of your clusters, then read on... The basic idea is to have a jobmanager for each cluster. For this example, we have two clusters: brecca and edda. Globus / LCG - PBS Job Manager
cd /opt/globus/lib/perl/Globus/GRAM/JobManager
cp pbs.pm pbsbrecca.pm
cp pbs.pm pbsedda.pm
Save the following diff to a file called pbsbrecca.p1.
--- pbs.pm 2005-10-10 16:15:19.000000000 +1000
+++ pbsbrecca.pm 2005-10-10 18:56:18.000000000 +1000
@@ -6,7 +6,7 @@
use Config;
# NOTE: This package name must match the name of the .pm file!!
-package Globus::GRAM::JobManager::pbs;
+package Globus::GRAM::JobManager::pbsbrecca;
@ISA = qw(Globus::GRAM::JobManager);
@@ -19,7 +19,7 @@
$qstat = '/usr/bin/qstat';
$qdel = '/usr/bin/qdel';
$cluster = 1;
- $cpu_per_node = 1;
+ $cpu_per_node = 2;
$remote_shell = '/usr/bin/ssh';
}
@@ -384,6 +384,9 @@
$errfile = "2>>" . $description->logfile();
}
+ # MLR 10/10/05
+ $qsub = "/usr/bin/qsub -q ".$description->queue()."\@brecca-m.vpac.org";
+
$self->nfssync( $pbs_job_script_name );
$self->log("submitting job -- $qsub < $pbs_job_script_name $errfile");
chomp($job_id = `$qsub < $pbs_job_script_name $errfile`);
Note the qsub which submits directly to queue@brecca....
Apply the patch to the pbsbrecca.pm file.
patch -Np0 pbsbrecca.pm pbsbrecca.p1
Save the following diff to a file called pbsedda.p1.
--- pbs.pm 2005-10-10 16:15:19.000000000 +1000
+++ pbsedda.pm 2005-10-10 16:15:43.000000000 +1000
@@ -6,7 +6,7 @@
use Config;
# NOTE: This package name must match the name of the .pm file!!
-package Globus::GRAM::JobManager::pbs;
+package Globus::GRAM::JobManager::pbsedda;
@ISA = qw(Globus::GRAM::JobManager);
@@ -19,7 +19,7 @@
$qstat = '/usr/bin/qstat';
$qdel = '/usr/bin/qdel';
$cluster = 1;
- $cpu_per_node = 1;
+ $cpu_per_node = 4;
$remote_shell = '/usr/bin/ssh';
}
@@ -384,6 +384,10 @@
$errfile = "2>>" . $description->logfile();
}
+
+ # MLR 10/10/49
+ $qsub = "/usr/bin/qsub -q ".$description->queue()."\@edda-m.vpac.org";
+
$self->nfssync( $pbs_job_script_name );
$self->log("submitting job -- $qsub < $pbs_job_script_name $errfile");
chomp($job_id = `$qsub < $pbs_job_script_name $errfile`);
Apply the patch to the pbsedda.pm file.
patch -Np0 pbsedda.pm pbsedda.p1
Globus GatekeeperOk, so we now have two new job managers that we need to make the Globus gatekeeper aware of. cd /etc
Save the following diff output to a file called globus.conf.p1.
--- /etc/globus.conf.orig 2005-10-10 16:18:41.000000000 +1000 +++ /etc/globus.conf 2005-10-10 19:04:24.000000000 +1000 @@ -33,7 +33,7 @@ globus_gatekeeper=/opt/edg/sbin/edg-gatekeeper extra_options=\"-lcas_db_file lcas.db -lcas_etc_dir /opt/edg/etc/lcas/ -lcasmod_dir /opt/edg/lib/lcas/ -lcmaps_db_file lcmaps.db -lcmaps_etc_dir /opt/edg/etc/lcmaps -lcmapsmod_dir /opt/edg/lib/lcmaps\" logfile=/var/log/globus-gatekeeper.log -jobmanagers="fork pbs" +jobmanagers="fork pbs pbsbrecca pbsedda" [gatekeeper/fork] type=fork @@ -41,3 +41,13 @@ [gatekeeper/pbs] type=pbs + +[gatekeeper/pbsbrecca] +type=pbsbrecca +job_manager=globus-job-manager +machine_type=i686 + +[gatekeeper/pbsedda] +type=pbsedda +job_manager=globus-job-manager +machine_type=power64And patch the globus.conf file with it. patch -b -Np0 globus.conf globus.conf.p1
Restart the service.
service globus-gatekeeper restart
Almost there.
cd /opt/globus/share/globus_gram_job_manager/
cp pbs.rvf pbsbrecca.rvf
cp pbs.rvf pbsedda.rvf
##ToDo: Need to add some notes here about these files - what they are? why they're used? etc etc
So you know...
Under LCG-2, restarting the gatekeeper resulted in the creation of jobmanager-pbsbrecca and jobmanager-pbsedda in /opt/globus/etc/grid-services. These files look like:
> cat jobmanager-pbsbrecca stderr_log,local_cred - /opt/globus/libexec/globus-job-manager globus-job-manager -conf /opt/globus/etc/globus-job-manager.conf -type pbsbrecca -rdn jobmanager-pbsbrecca -machine-type i686 -publish-jobs > cat jobmanager-pbsedda stderr_log,local_cred - /opt/globus/libexec/globus-job-manager globus-job-manager -conf /opt/globus/etc/globus-job-manager.conf -type pbsedda -rdn jobmanager-pbsedda -machine-type power64 -publish-jobsNow we also have these two job managers available to us. Dynamic Information ProvidersNow we need to modify the dynamic information providers to query the correct servers. cd /opt/lcg/libexec/
rm lcg-info-dynamic-ce
Save the following in a file called lcg-info-dynamic-ce.
#!/bin/sh /opt/lcg/libexec/lcg-info-dynamic-pbs /opt/lcg/var/gip/lcg-info-generic.conf brecca-m.vpac.org /opt/lcg/libexec/lcg-info-dynamic-pbs /opt/lcg/var/gip/lcg-info-generic.conf edda-m.vpac.org EOF chmod +x lcg-info-dynamic-ce
Save the following diff to lcg-info-dynamic-pbs.p1.
--- lcg-info-dynamic-pbs 2005-10-11 07:46:16.000000000 +1000
+++ lcg-info-dynamic-pbs.new 2005-10-11 07:46:53.000000000 +1000
@@ -23,6 +23,7 @@
my $state;
my $num_pro;
my $Status;
+my $whichCluster;
# Reads the configuration file
if ($ARGV[0]) {
@@ -117,6 +118,21 @@
close QSTAT;
for(@dn){
+
+ # we need to match the $pbshost variable to the dn - if they don't match
+ # we don't write it
+
+ if ($pbsHost=~/edda/) {
+ $whichCluster="edda";
+ }
+ else {
+ $whichCluster="brecca";
+ }
+
+ if(not $_ =~ $whichCluster) {
+ next;
+ }
+
push @output, $_;
$queue=$_;
$queue=~s/,.*//;
And apply the patch to lcg-info-dynamic-pbs.
patch -b -Np0 lcg-info-dynamic-pbs lcg-info-dynamic-pbs.p1
Static InformationAlmost there... now we need to recreate the static information file. Luckily, we only need to do this once (this makes our job, and the hack, a lot easier). cd /opt/lcg/var/gip
Make a copy of the static ldif file if there isn't already an original copy.
if [ -a lcg-info-static.ldif.orig ]; then cp lcg-info-static.ldif.orig lcg-info-static.ldif else cp lcg-info-static.ldif lcg-info-static.ldif.orig fiFrom the original static ldif file create one for brecca and one for edda. cp lcg-info-static.ldif brecca-tmp.ldif
cp lcg-info-static.ldif edda-tmp.ldif
Modify brecca's ldif file
# CHANGE "dn: GlueSiteUniqueID" TO brecca - this needs to be unique for each cluster sed 's/dn: GlueSiteUniqueID=vpac,mds-vo-name=local,/dn: GlueSiteUniqueID=brecca,mds-vo-name=local,/' brecca-tmp.ldif > brecca.ldif mv brecca.ldif brecca-tmp.ldif # CHANGE ALL OCCURRENCES OF "jobmanager-pbs" --> "jobmanager-pbsbrecca" sed 's/jobmanager-pbs/jobmanager-pbsbrecca/' brecca-tmp.ldif > brecca.ldif mv brecca.ldif brecca-tmp.ldif # CHANGE THE RELEVANT STATIC INFORMATION TO SUIT THE CLUSTER sed 's/GlueHostBenchmarkSI00: 1500/GlueHostBenchmarkSI00: 1000/' brecca-tmp.ldif > brecca.ldif mv brecca.ldif brecca-tmp.ldif sed 's/GlueHostMainMemoryRAMSize: 256/GlueHostMainMemoryRAMSize: 1000/' brecca-tmp.ldif > brecca.ldif mv brecca.ldif brecca-tmp.ldif sed 's/GlueHostMainMemoryVirtualSize: 512/GlueHostMainMemoryVirtualSize: 2000/' brecca-tmp.ldif > brecca.ldif mv brecca.ldif brecca-tmp.ldif sed 's/GlueHostNetworkAdapterInboundIP: FALSE/GlueHostNetworkAdapterInboundIP: FALSE/' brecca-tmp.ldif > brecca.ldif mv brecca.ldif brecca-tmp.ldif sed 's/GlueHostNetworkAdapterOutboundIP: TRUE/GlueHostNetworkAdapterOutboundIP: FALSE /' brecca-tmp.ldif > brecca.ldif mv brecca.ldif brecca-tmp.ldif sed 's/GlueHostOperatingSystemName: ScientificLinux/GlueHostOperatingSystemName: RedHat/' brecca-tmp.ldif > brecca.ldif mv brecca.ldif brecca-tmp.ldif sed 's/GlueHostOperatingSystemRelease: 3.0.5/GlueHostOperatingSystemRelease: 7.3/' brecca-tmp.ldif > brecca.ldif mv brecca.ldif brecca-tmp.ldif sed 's/GlueHostOperatingSystemVersion: 3/GlueHostOperatingSystemVersion: 7/' brecca-tmp.ldif > brecca.ldif mv brecca.ldif brecca-tmp.ldif sed 's/GlueHostProcessorClockSpeed: 3200/GlueHostProcessorClockSpeed: 2800/' brecca-tmp.ldif > brecca.ldif mv brecca.ldif brecca-tmp.ldif sed 's/GlueHostProcessorModel: PIV/GlueHostProcessorModel: Xeon/' brecca-tmp.ldif > brecca.ldif mv brecca.ldif brecca-tmp.ldif sed 's/GlueHostProcessorVendor: intel/GlueHostProcessorVendor: intel/' brecca-tmp.ldif > brecca.ldif mv brecca.ldif brecca-tmp.ldifModify edda's ldif file # CHANGE "dn: GlueSiteUniqueID" TO edda - this needs to be unique for each cluster sed 's/dn: GlueSiteUniqueID=vpac,mds-vo-name=local,/dn: GlueSiteUniqueID=edda,mds-vo-name=local,/' edda-tmp.ldif > edda.ldif mv edda.ldif edda-tmp.ldif # CHANGE ALL OCCURRENCES OF "jobmanager-pbs" --> "jobmanager-pbsbedda" sed 's/jobmanager-pbs/jobmanager-pbsedda/' edda-tmp.ldif > edda.ldif mv edda.ldif edda-tmp.ldif # CHANGE THE RELEVANT STATIC INFORMATION TO SUIT THE CLUSTER sed 's/GlueHostBenchmarkSI00: 1500/GlueHostBenchmarkSI00: 1400/' edda-tmp.ldif > edda.ldif mv edda.ldif edda-tmp.ldif sed 's/GlueHostMainMemoryRAMSize: 256/GlueHostMainMemoryRAMSize: 8000/' edda-tmp.ldif > edda.ldif mv edda.ldif edda-tmp.ldif sed 's/GlueHostMainMemoryVirtualSize: 512/GlueHostMainMemoryVirtualSize: 16000/' edda-tmp.ldif > edda.ldif mv edda.ldif edda-tmp.ldif sed 's/GlueHostNetworkAdapterInboundIP: FALSE/GlueHostNetworkAdapterInboundIP: FALSE/' edda-tmp.ldif > edda.ldif mv edda.ldif edda-tmp.ldif sed 's/GlueHostNetworkAdapterOutboundIP: TRUE/GlueHostNetworkAdapterOutboundIP: FALSE/' edda-tmp.ldif > edda.ldif mv edda.ldif edda-tmp.ldif sed 's/GlueHostOperatingSystemName: ScientificLinux/GlueHostOperatingSystemName: SLES/' edda-tmp.ldif > edda.ldif mv edda.ldif edda-tmp.ldif sed 's/GlueHostOperatingSystemRelease: 3.0.5/GlueHostOperatingSystemRelease: 9/' edda-tmp.ldif > edda.ldif mv edda.ldif edda-tmp.ldif sed 's/GlueHostOperatingSystemVersion: 3/GlueHostOperatingSystemVersion: 9/' edda-tmp.ldif > edda.ldif mv edda.ldif edda-tmp.ldif sed 's/GlueHostProcessorClockSpeed: 3200/GlueHostProcessorClockSpeed: 1656/' edda-tmp.ldif > edda.ldif mv edda.ldif edda-tmp.ldif sed 's/GlueHostProcessorModel: PIV/GlueHostProcessorModel: Power5/' edda-tmp.ldif > edda.ldif mv edda.ldif edda-tmp.ldif sed 's/GlueHostProcessorVendor: intel/GlueHostProcessorVendor: ibm/' edda-tmp.ldif > edda.ldif mv edda.ldif edda-tmp.ldifRemove the original static ldif file. rm lcg-info-static.ldif
Create a new static ldif file from the brecca and edda files.
cat brecca-tmp.ldif > lcg-info-static.ldif
cat edda-tmp.ldif >> lcg-info-static.ldif
Clean up
rm brecca-tmp.ldif edda-tmp.ldif
Fingers crossed, lcg-infosites should now give you output like:
**************************************************************** These are the related data for belle: (in terms of queues and CPUs) **************************************************************** #CPU Free Total Jobs Running Waiting ComputingElement ---------------------------------------------------------- 144 11 107 36 71 nglcg.vpac.org:2119/jobmanager-pbsedda-dque 144 11 0 0 0 nglcg.vpac.org:2119/jobmanager-pbsedda-grid 144 11 0 0 0 nglcg.vpac.org:2119/jobmanager-pbsedda-lque 144 11 0 0 0 nglcg.vpac.org:2119/jobmanager-pbsedda-sque 178 38 96 59 37 nglcg.vpac.org:2119/jobmanager-pbsbrecca-dque 178 38 0 0 0 nglcg.vpac.org:2119/jobmanager-pbsbrecca-grid 178 38 11 11 0 nglcg.vpac.org:2119/jobmanager-pbsbrecca-lque 178 38 5 0 5 nglcg.vpac.org:2119/jobmanager-pbsbrecca-sque 144 11 0 0 0 nglcg.vpac.org:2119/jobmanager-pbsedda-testing ... | |