EPP Grid - Tracking back from an EDG ID to a PBS JOB ID


Start of topic | Skip to actions

Tracking back from an EDG ID to a PBS JOB ID

  • if you know what date the job was submitted on, then just grep for this ID through the appropriate gatekeeper log

grep "https://gdrb02.cern.ch:9000/5BY27AhkFUNDguwbFg0grw" /var/log/globus-gatekeeper.log.20060730040206.0

The result is something like:

JMA 2006/07/29 05:10:26 GATEKEEPER_JM_ID 2006-07-29.05:10:26.0000006834.0000002106 has EDG_WL_JOBID 'https://gdrb02.cern.ch:9000/5BY27AhkFUNDguwbFg0grw'
JMA 2006/07/29 05:20:10 GATEKEEPER_JM_ID 2006-07-29.05:20:10.0000006834.0000002109 has EDG_WL_JOBID 'https://gdrb02.cern.ch:9000/5BY27AhkFUNDguwbFg0grw'
JMA 2006/07/29 05:26:04 GATEKEEPER_JM_ID 2006-07-29.05:26:04.0000006834.0000002113 has EDG_WL_JOBID 'https://gdrb02.cern.ch:9000/5BY27AhkFUNDguwbFg0grw'
JMA 2006/07/29 05:32:01 GATEKEEPER_JM_ID 2006-07-29.05:32:01.0000006834.0000002115 has EDG_WL_JOBID 'https://gdrb02.cern.ch:9000/5BY27AhkFUNDguwbFg0grw'

Each of these is an attempted submission for this job.

  • Using the last two sections of the GATEKEEPER_JM_ID (e.g. 0000006834.0000002109) you can grep through the appropriate system log file (/var/log/messages.1) to get more info about the job.

grep "0000006834.0000002109" /var/log/messages.1

The result is something like:

Jul 29 05:20:23 lcg-compute gridinfo[27346]: JMA 2006/07/29 05:20:23 GATEKEEPER_JM_ID 2006-07-29.05:20:10.0000006834.0000002109 for /C=CH/O=CERN/OU=GRID/CN=Judit Novak 0973 on 137.138.154.178
Jul 29 05:20:23 lcg-compute gridinfo[27346]: JMA 2006/07/29 05:20:23 GATEKEEPER_JM_ID 2006-07-29.05:20:10.0000006834.0000002109 mapped to dteamsgm (18946, 2688)
Jul 29 05:20:23 lcg-compute gridinfo[27346]: JMA 2006/07/29 05:20:23 GATEKEEPER_JM_ID 2006-07-29.05:20:10.0000006834.0000002109 has GRAM_SCRIPT_JOB_ID 1154150423:lcgpbs:internal_4285332799:27346.1154150415 manager type lcgpbs
Jul 29 05:20:28 lcg-compute gridinfo[27346]: JMA 2006/07/29 05:20:28 GATEKEEPER_JM_ID 2006-07-29.05:20:10.0000006834.0000002109 JM exiting

From here you get two v. important pieces of information.
1. What account the real person was mapped to
2. The GRAM_SCRIPT_JOB_ID

  • Grep through the system log file for the GRAM_SCRIPT_JOB_ID (e.g. 1154150423:lcgpbs:internal_4285332799:27346.1154150415)

grep "1154150423:lcgpbs:internal_4285332799:27346.1154150415)" /var/log/messages.1

The result is something like:

Jul 29 05:20:23 lcg-compute gridinfo[27346]: JMA 2006/07/29 05:20:23 GATEKEEPER_JM_ID 2006-07-29.05:20:10.0000006834.0000002109 has GRAM_SCRIPT_JOB_ID 1154150423:lcgpbs:internal_4285332799:27346.1154150415 manager type lcgpbs
Jul 29 05:21:37 lcg-compute gridinfo: [24892-27752] Submitted job 1154150423:lcgpbs:internal_4285332799:27346.1154150415 to batch system lcgpbs with ID 3402.charm-mgt.hpc.unimelb.edu.au
Jul 29 05:23:35 lcg-compute gridinfo: [24892-24892] Job 1154150423:lcgpbs:internal_4285332799:27346.1154150415 (ID 3402.charm-mgt.hpc.unimelb.edu.au) has finished

From here you can get the PBS job ID: 3402.charm-mgt.hpc.unimelb.edu.au

  • The script: lcg-compute.hpc.unimelb.edu.au:/root/bin/traceGridJob.sh

On lcg-compute.hpc.unimelb.edu.au, in /root/bin you will find a script named "traceGridJob.sh". Give it one argument, an EDG_JOB_ID, and it will return the time the job hit PBS, it's PBS ID, which account it ran under and the EDG_JOB_ID for reference. It also searches the correct PBS logs for any info on that ID and the relevant maui stats files for that ID. You'll notice in the case of this particular EDG ID we can't tell from PBS which WN rejected the job. But, MAUI tells us.

Note that it looks in the logfile of the day of submission, and the logs of the day before / after. This is to catch messages going to different logfiles as the day rolls over.


[root@lcg-compute bin]# traceGridJob.sh

Usage: /root/bin/traceGridJob.sh [flag] [EDG JOB ID]
ex. /root/bin/traceGridJob.sh [ -s || -p || -m ] https://gdrb02.cern.ch:9000/4hXi8iLvJcid3JvwefogfA

A script to map an EDG_JOB_ID back to a PBS_JOB_ID via the logfiles.
To print out all of the information available - just don't specify any flags.

Flags:
-s, --summary                   just print a summary of the mapping between EDG_JOB_ID and PBS_ID
-p, --pbs                       print only what's found in the pbs logs for PBS_ID
-m, --maui                      print only what's found in the maui logs for PBS_ID


[root@lcg-compute bin]# ./traceGridJob.sh https://gdrb02.cern.ch:9000/4hXi8iLvJcid3JvwefogfA
******************************************************************
Jul 29 02:12:06 3393.charm-mgt.hpc.unimelb.edu.au dteamsgm https://gdrb02.cern.ch:9000/4hXi8iLvJcid3JvwefogfA

PBS info:
07/29/2006 12:12:06;0100;PBS_Server;Job;3393.charm-mgt.hpc.unimelb.edu.au;enqueuing into dteam, state 1 hop 1
07/29/2006 12:12:06;0008;PBS_Server;Job;3393.charm-mgt.hpc.unimelb.edu.au;Job Queued at request of dteamsgm@lcg-compute.hpc.unimelb.edu.au, owner = dteamsgm@lcg-compute.hpc.unimelb.edu.au, job name = STDIN, queue = dteam
07/29/2006 12:12:07;0008;PBS_Server;Job;3393.charm-mgt.hpc.unimelb.edu.au;Job Modified at request of root@charm-mgt.hpc.unimelb.edu.au
07/29/2006 12:12:07;0008;PBS_Server;Job;3393.charm-mgt.hpc.unimelb.edu.au;Job Run at request of root@charm-mgt.hpc.unimelb.edu.au
07/29/2006 12:12:07;0008;PBS_Server;Job;3393.charm-mgt.hpc.unimelb.edu.au;Job Modified at request of root@charm-mgt.hpc.unimelb.edu.au
07/29/2006 12:12:07;0008;PBS_Server;Job;3393.charm-mgt.hpc.unimelb.edu.au;MOM rejected modify request, error: 15001
07/29/2006 12:12:08;0010;PBS_Server;Job;3393.charm-mgt.hpc.unimelb.edu.au;Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.vmem=0kb resources_used.walltime=00:00:01
07/29/2006 12:12:08;0100;PBS_Server;Job;3393.charm-mgt.hpc.unimelb.edu.au;dequeuing from dteam, state 5

MAUI info:
3393                   0   1 dteamsgm     dteam  259200 Completed   [dteam:1] 1154139126 1154139127 1154139127 1154139129    [NONE] [NONE] [NONE] >=    0M >=      0M     [NONE] 1154139126   1    0 [NONE]:DEFAULT [NONE]    [NONE]          [NONE] [NONE]   0    0.00   DEFAULT      1      0M      0M      0M         0 2140000000 pnet16 CHARM [NONE] [NONE] [DEFAULT] [NONE] [NONE]
******************************************************************
Jul 29 02:22:06 3394.charm-mgt.hpc.unimelb.edu.au dteamsgm https://gdrb02.cern.ch:9000/4hXi8iLvJcid3JvwefogfA

PBS info:
07/29/2006 12:22:06;0100;PBS_Server;Job;3394.charm-mgt.hpc.unimelb.edu.au;enqueuing into dteam, state 1 hop 1
07/29/2006 12:22:06;0008;PBS_Server;Job;3394.charm-mgt.hpc.unimelb.edu.au;Job Queued at request of dteamsgm@lcg-compute.hpc.unimelb.edu.au, owner = dteamsgm@lcg-compute.hpc.unimelb.edu.au, job name = STDIN, queue = dteam
07/29/2006 12:22:07;0008;PBS_Server;Job;3394.charm-mgt.hpc.unimelb.edu.au;Job Modified at request of root@charm-mgt.hpc.unimelb.edu.au
07/29/2006 12:22:07;0008;PBS_Server;Job;3394.charm-mgt.hpc.unimelb.edu.au;Job Run at request of root@charm-mgt.hpc.unimelb.edu.au
07/29/2006 12:22:07;0008;PBS_Server;Job;3394.charm-mgt.hpc.unimelb.edu.au;Job Modified at request of root@charm-mgt.hpc.unimelb.edu.au
07/29/2006 12:22:07;0008;PBS_Server;Job;3394.charm-mgt.hpc.unimelb.edu.au;MOM rejected modify request, error: 15001
07/29/2006 12:22:08;0010;PBS_Server;Job;3394.charm-mgt.hpc.unimelb.edu.au;Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.vmem=0kb resources_used.walltime=00:00:01
07/29/2006 12:22:08;0100;PBS_Server;Job;3394.charm-mgt.hpc.unimelb.edu.au;dequeuing from dteam, state 5

MAUI info:
3394                   0   1 dteamsgm     dteam  259200 Completed   [dteam:1] 1154139726 1154139727 1154139727 1154139729    [NONE] [NONE] [NONE] >=    0M >=      0M     [NONE] 1154139726   1    0 [NONE]:DEFAULT [NONE]    [NONE]          [NONE] [NONE]   0    0.00   DEFAULT      1      0M      0M      0M         0 2140000000 pnet16 CHARM [NONE] [NONE] [DEFAULT] [NONE] [NONE]
******************************************************************
Jul 29 02:26:07 3395.charm-mgt.hpc.unimelb.edu.au dteamsgm https://gdrb02.cern.ch:9000/4hXi8iLvJcid3JvwefogfA

PBS info:
07/29/2006 12:26:07;0100;PBS_Server;Job;3395.charm-mgt.hpc.unimelb.edu.au;enqueuing into dteam, state 1 hop 1
07/29/2006 12:26:07;0008;PBS_Server;Job;3395.charm-mgt.hpc.unimelb.edu.au;Job Queued at request of dteamsgm@lcg-compute.hpc.unimelb.edu.au, owner = dteamsgm@lcg-compute.hpc.unimelb.edu.au, job name = STDIN, queue = dteam
07/29/2006 12:26:08;0008;PBS_Server;Job;3395.charm-mgt.hpc.unimelb.edu.au;Job Modified at request of root@charm-mgt.hpc.unimelb.edu.au
07/29/2006 12:26:08;0008;PBS_Server;Job;3395.charm-mgt.hpc.unimelb.edu.au;Job Run at request of root@charm-mgt.hpc.unimelb.edu.au
07/29/2006 12:26:08;0008;PBS_Server;Job;3395.charm-mgt.hpc.unimelb.edu.au;Job Modified at request of root@charm-mgt.hpc.unimelb.edu.au
07/29/2006 12:26:08;0008;PBS_Server;Job;3395.charm-mgt.hpc.unimelb.edu.au;MOM rejected modify request, error: 15001
07/29/2006 12:26:09;0010;PBS_Server;Job;3395.charm-mgt.hpc.unimelb.edu.au;Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.vmem=0kb resources_used.walltime=00:00:01
07/29/2006 12:26:09;0100;PBS_Server;Job;3395.charm-mgt.hpc.unimelb.edu.au;dequeuing from dteam, state 5

MAUI info:
3395                   0   1 dteamsgm     dteam  259200 Completed   [dteam:1] 1154139967 1154139968 1154139968 1154139970    [NONE] [NONE] [NONE] >=    0M >=      0M     [NONE] 1154139967   1    0 [NONE]:DEFAULT [NONE]    [NONE]          [NONE] [NONE]   0    0.00   DEFAULT      1      0M      0M      0M         0 2140000000 pnet16 CHARM [NONE] [NONE] [DEFAULT] [NONE] [NONE]
******************************************************************
Jul 29 02:31:07 3396.charm-mgt.hpc.unimelb.edu.au dteamsgm https://gdrb02.cern.ch:9000/4hXi8iLvJcid3JvwefogfA

PBS info:
07/29/2006 12:31:07;0100;PBS_Server;Job;3396.charm-mgt.hpc.unimelb.edu.au;enqueuing into dteam, state 1 hop 1
07/29/2006 12:31:07;0008;PBS_Server;Job;3396.charm-mgt.hpc.unimelb.edu.au;Job Queued at request of dteamsgm@lcg-compute.hpc.unimelb.edu.au, owner = dteamsgm@lcg-compute.hpc.unimelb.edu.au, job name = STDIN, queue = dteam
07/29/2006 12:31:08;0008;PBS_Server;Job;3396.charm-mgt.hpc.unimelb.edu.au;Job Modified at request of root@charm-mgt.hpc.unimelb.edu.au
07/29/2006 12:31:08;0008;PBS_Server;Job;3396.charm-mgt.hpc.unimelb.edu.au;Job Run at request of root@charm-mgt.hpc.unimelb.edu.au
07/29/2006 12:31:08;0008;PBS_Server;Job;3396.charm-mgt.hpc.unimelb.edu.au;Job Modified at request of root@charm-mgt.hpc.unimelb.edu.au
07/29/2006 12:31:08;0008;PBS_Server;Job;3396.charm-mgt.hpc.unimelb.edu.au;MOM rejected modify request, error: 15001
07/29/2006 12:31:09;0010;PBS_Server;Job;3396.charm-mgt.hpc.unimelb.edu.au;Exit_status=0 resources_used.cput=00:00:00 resources_used.mem=0kb resources_used.vmem=0kb resources_used.walltime=00:00:01
07/29/2006 12:31:09;0100;PBS_Server;Job;3396.charm-mgt.hpc.unimelb.edu.au;dequeuing from dteam, state 5

MAUI info:
3396                   0   1 dteamsgm     dteam  259200 Completed   [dteam:1] 1154140267 1154140268 1154140268 1154140270    [NONE] [NONE] [NONE] >=    0M >=      0M     [NONE] 1154140267   1    0 [NONE]:DEFAULT [NONE]    [NONE]          [NONE] [NONE]   0    0.00   DEFAULT      1      0M      0M      0M         0 2140000000 pnet16 CHARM [NONE] [NONE] [DEFAULT] [NONE] [NONE]

key Log In Revision:  r1 - 01 Aug 2006 - MarcoLaRosa
Authorised by:  Geoff Taylor (G.Taylor @ physics.unimelb.edu.au)
Maintained using:  This site is powered by the TWiki collaboration platform
Copyright © 2000-2009 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.