Submitting, Inspecting and Cancelling PBS Jobs

From SmartHPC Wiki
Jump to: navigation, search

Brief introduction to PBS[edit]

caption

Jobs submission and execution on the cluster is managed by the Portable Batch System (PBS) queuing system. All calculations MUST be submitted and executed on nodes; running interactively on the front-end (avogadro) is FORBIDDEN. Foreword: the following instructions have been written in Bash; however adapting them to Csh or Tcsh is straightforward.

Remember that you can always access the online manuals using the command man. Also, we strongly suggest to read the PBS user manual, you can download it from the Altair website.

What is a PBS job and its basic commands[edit]

In this section we are going to submit our first PBS job. A PBS job is simply a shell script, possibly with some PBS directives. PBS directives look like shell comments, so the shell itself ignores them, but PBS picks them up and processes the job accordingly. A Bash scripts usually begins with #!/bin/bash, or if you prefer Tcsh #!/bin/tcsh.
We are going to start by submitting a very simple shell script, that executes two Unix commands and then exits; it doesn't have any PBS directives. The script must be executable:

$ pwd
/home/m.martino/PBS

$ ls -l
-rw-r--r--  1 m.martino m.martino   76 Dec  2 22:50 job.sh

$ cat job.sh
#!/bin/bash
hostname
date
exit 0

$ chmod +x job.sh

qsub[edit]

The job is submitted with the command qsub:

$ qsub -q q02pople job.sh
12248.avogadro1

The command output is its job ID. It's the same ID that appears in the first column of qstat listing. qsub writes the standard output (STDOUT) of a job on a file in the same directory from which the job was submitted. Standard error (STDERR) is also returned on another file in the same directory:

$ ls
job.sh  job.sh.e12248  job.sh.o12248

The output file has this format:

''job name'' + .o + ''job ID number''

The same goes for the errors file name, with ".e" instead of ".o". In our example the error file is empty and the standard output file contains:

$ cat job.sh.o12248
pople02
Sun Sep  7 16:27:25 CEST 2015

and this tells us that the job was run on pople02.
Our job executes so fast that we can hardly catch it in action. We are going to slow it down by letting it sleep for a hundred seconds before exiting. Here is our modified version.

$ cat job.sh
#!/bin/bash
hostname
date
sleep 100
date
exit 0

And just to make sure that it's not going to hang forever, we are going to execute it interactively and check that it sleeps for 100 seconds only:

$ time ./job.sh
avogadro1
Sun Sep  7 16:53:41 CEST 2015
Sun Sep  7 16:55:21 CEST 2015
 
real	1m40.029s
user	0m0.000s
sys 	0m0.010s

This worked just fine: the job took 1 minute and 40 seconds, which is 100 seconds, to execute. Now we are going to submit it with qsub:

$ qsub job.sh
12259.avogadro1

Now, how can you check its state?

qstat[edit]

You can look at just the job that is of interest to you, by giving its ID as an argument to qstat. So, if you want to check the previous job, all you have to do is:

$ qstat 12259.avogadro1
Job id            Name             User              Time Use S Queue
----------------  ---------------- ----------------  -------- - -----
12259.avogadro1   job.sh           m.martino         00:00:29 R q02curie

The "R" in the 5th column indicates that the job is running. The 5th column is the status column the other values you can see there are:

E
the job is exiting after having run
H
the job is held - this means that it is not going to run until it is released
Q
the job is queued and will run when the resources become available
R
the job is running
T
the job is being transferred to a new location - this may happen, e.g., if the node the job had been running on crashed
W
the job is waiting - you can submit jobs to run, e.g., after 5PM

Let's see how qstat command can be typically used to recover information:

qstat -q 
list all queues on system
qstat -Q 
list queue limits for all queues
qstat -a 
list all jobs on system
qstat -au userid 
list all jobs owned by user userid
qstat -s jobid 
list jobid with status comments
qstat -r 
list all running jobs
qstat -f jobid 
list full information known about jobid
qstat -Qf queueid  
list all information known about queueid
qstat -B  
list summary information about the PBS server
qstat -iu userid 
get info for jobs of userid
qstat -n -1 jobid 
list nodes on which jobid is running in one line

qdel[edit]

A way to get rid of an unwanted job is to run qdel on it. This command deletes a job from PBS. If the job runs, the command sends SIGKILL to it. If the job is merely queued, the command deletes it from the queue. Here's an example:

$ qsub -q q02pople job.sh
12390.avogadro1

$ qdel 12390.avogadro1

$ ls
job.sh  job.sh.e12390  job.sh.o12390

$ cat job.sh.o12390
pople10
Sun Sep  7 17:26:01 CEST 2015

pbsnodes[edit]

To check the availability of a node that you want to request for your submission, you can use:

pbsnodes -a 
list all nodes with their features (but it's quite verbose)
pbsnodes <nodeid> 
check nodeid state (could be free, job-busy, stale, ...)
pbsnodes -l 
list all DOWN and OFFLINE nodes (which you can't request immediately)

Using queues, resources and groups[edit]

When you submit a job, you may want to ask for a certain amount of resources for that job; or request that the job will be appended to a specific queue; or that the user is submitting from a specific group of users. There are two types of resources that can be requested for a job: "chunk" resources and "job-wide" resources. "Chunks" are resources allocated as unit to one job, such as the number of CPUs, how much memory, how many vnodes. These resources are requested inside the select statement. "Job-wide" are resources that apply to the entire job, such as the cpu-time or the walltime. These resources are requested outside the select statement. You can request for resources in both job script or job qsub with the -l option. For example, a job script could be:

#!/bin/bash
#PBS -l select=1:ncpus=2
#PBS -l walltime=03:00:00
#PBS -q q02curie
#PBS -N sleep
sleep 3h

In this case, the job needs 1 vnode and 2 CPUs to run. Also, the user is requesting that the job is appended to a specific queue “q02curie” with the -q option.
-N option just gives a name to the job. Please note that on our cluster the queue name must be selected since there is no default queue.
If a job needs to be run, for example, on one of the Cannizzaro, you need to submit the job as user belonging to GPU group, since only users that are part of the group GPU can use them. So, using the -W group_list=<groupname> option, you should write:

#!/bin/bash
#PBS -l select=1:ncpus=2:walltime=03:00:00
#PBS -q q02cannizzaro
#PBS -W group_list=GPU
#PBS -N sleep
sleep 3h

Of course, you can request resources directly with the qsub command:

$ qsub -l select=1:ncpus=2 -q q02curie [...] job.sh

Repetita iuvant: Please try to select nodes and queues according to the real necessities of your calculation, and avoid crowding on the newest nodes only. Also, remember to specify a queue, since there is no default queue on our cluster.
Finally, if you use a custom script to submit your jobs, please take care of checking for temporary files left on scratch areas, especially if you have killed you job with qdel.
Further information on PBS can be found here:

Type of queues[edit]

As you can read on the page Resources available on Avogadro, there are servers with different architectures in the cluster. Different queues have been set up for each homogeneous server group (sub-cluster), to allow the maximum flexibility in choosing where to run your jobs. Since very long jobs are typically executed on the cluster, four different queue types have been associated to to each server group with maximum wall times of 48 h, 168 h (7 days), 336 h (14 days) and 672 h (28 days).
Naturally you can ask for a specific wall time (using -l walltime=hh:mm:ss) and selecting a queue with a maximum time greater than what your are asking for. To limit the excessive use of one node group or queue, to be fair to the other users, each queue allows a maximum of 4 running jobs by each user. Currently there is no limit to the number of jobs that can be submitted. However, trying to run on every available server or queue without consideration of architecture, queue duration, concatenation of jobs and (most of all) other users in the group is NOT an option, the admins may decide to apply severe limitations in retaliation for such behavior. Please write an email to Avogadro Staff if you have particular needs. Some of the available queues are:

  • Cannizzaro:
  1. q02cannizzaro queue
  2. q07cannizzaro queue
  3. q14cannizzaro queue
  4. q28cannizzaro queue


  • Curie:
  1. q02curie queue
  2. q07curie queue
  3. q14curie queue
  4. q28curie queue


  • Hoffmann:
  1. q02hoffmann queue
  2. q07hoffmann queue
  3. q14hoffmann queue
  4. q28hoffmann queue


...and so on.

Submitting scripts for selected applications[edit]

A limited number of applications are frequently used and automatic submission scripts are available for them.

Gaussian[edit]

To submit a gaussian jobs you can use the Python script subgau.py, that has several options for managing Gaussian input and output.

$ subgau.py --help
[...]
EXAMPLES OF USAGE:
  - Direct submission, using default parameters
    subgau.py input.com
  - Development-related tests on a short queue
    subgau.py -q q02curie -w /home/j.bloino/dev/gdv.h21/intel64-nehalem -g gdvh21 input.com
  - Keep memory and processors-related information from input file
    subgau.py -km,p input.com
  - Run FC-related jobs controlling the source files are all available
    subgau.py -fFC -kc input.com
     Note: For typical FC calculations, -fFC == -fFC -kc
  - Explicit path to the Gaussian executable
    subgau.py -g /share/gaussian/g09.b01/g09 input.com

You can also make a copy of the script and customize it for your needs. Note that using subgau.py is your choice and thus modyfing/improving/debugging it is your problem even if help may be provided when possible.

GROMACS[edit]

To submit GROMACS jobs you can use gmx_sub.sh script. (This script is currently under debugging.) gmx_sub.sh is an interactive script; simply type gmx_sub.sh and follow the instructions.

$ gmx_sub.sh -h
usage: ./gmx_sub.sh -i <input file> -j <job name> [-P] [-D deffnm] -e <ext>
                    [-x dx -y dy -z dz -p npme] [-q queue] -c <ncpus> i-n <nodes>
                    -d <rdd> -r <rcon> -s <dds> -C checkpoint -N <nrun> -T <chktime>
-b use dynamic load balancing instead of manual set of dd parameters (default no)
   if set ignore -d and -p values
-c number of CPUs to allocate (use sensible numbers for chosen arch); default 64 
-e extend each run by ext picoseconds
-d rdd value for domain decomposition
-h display help message
-i binary topology input file; should contain number of steps for a single run
-j job name 
-n number of nodes to use (only single node currently)
-p number of nodes reserved for PME (mandatory unless -b is used)
-q job queue (executables are determined by requested arch); default: q02pople
-r rcon value for domain decomposition
-s dds value for domain decomposition
-x -y -z domain decomposition parameters (mandatory unless -l is used)
-C use checkpoint file; automatic for consecutive jobs (after the first)
-D set suffix for mdrun output files (confxx.gro mdxx.log ...)
-N number of jobs
-P double precision (default single)
-T set -cpt option for checkpointing every chktime minutes (mdrun default is 15')

Amber[edit]

To submit Amber jobs you may use the qAmber script. For instructions on qAmber, type:

$ qAmber -h
AMBERHOME set, using /cm/shared/apps/amber/intel/12
usage: ./qAmber -h <help> [-i mdin] [-p prmtop] [-x inpcrd] [-r refc]
                       [-c cpus] [-n nodes] [-m memory] [-q queue]
                       [-j job name] [-o stdout ]
-h display help message
-i input parameters file
-p topology file
-x input coordinates
-r position restrains
-c number of CPUs to allocate (default 8)
-g use GPUs (yes/no)
-m memory (default 5GB)
-n number of nodes to use (only single node currently)
-j job name (default: amberrun-<PID>)
   output files are stored in $jobname_dir
-o standard output redirected here (default /dev/null)
-q job queue; also determines binary architecture (default q02cannizzaro)
-H forces to use a specific host instead of letting PBS choose one (ex: cannizzaro02)
example: qAmber -i mdin -p prm.top -x md.crd -r refc.crd -c 32 -m 2 -q q02cannizzaro

NAMD[edit]

To submit NAMD jobs you can use qNAMD script. For instructions on qNAMD, type:

$ qNAMD 
usage: qNAMD -h <help> -i inputfile [-c cpus] [-n nodes] [-m memory] [-q queue]
                         [-j job name] [-o stdout ]
-h display help message
-i input parameters file
-c number of CPUs to allocate (default 8)
-n number of nodes to use (only single node currently)
-j job name (default: namdrun-<PID>)
   output files are stored in $jobname_dir
-o standard output redirected here (default /dev/null)
-q job queue; also determines binary architecture (default q02cannizzaro)
example: qNAMD -i test.namd -c 32 -q q02cannizzaro

FAQ[edit]

SGI and GPU nodes[edit]

To be able to run your jobs on the SGI Ultraviolet 2000 and 3000 (Vanthoff and Pauling) or the nodes with a GPU (Cannizzaro01 to 07) you need to be authorized and be added to the group SGI or GPU by one of the admins. The command groups shows a list of the groups you are part of:

$ groups
SGI gaudev gaussian GPU mulliken hpc memos dreams_users smart iitpisa natta

To submit to one of the special nodes you need the use the flag -W group_list=groupname with qsub, like this:

$ qsub -q q02vanthoff -l select=1:ncpus=132 -W group_list=SGI myjob.sh

In this example, we submit a jobs using the script myjob.sh, requesting 132 cores and using the 2 days queue on Vanthoff (q02vanthoff).

Requesting an interactive session, the correct way to directly access a node[edit]

If you need to directly access a computation node, for example to test your job, you can use the flag -I (that is a capital i) of qsub to request an interactive session. For example:

$ qsub -I -q q02curie -l select=1:ncpus=4

This will request a 48 hours slot with 4 CPUs on the Curie sub-cluster.

I made a bad evaluation of my job resources[edit]

Job resources (the ones you previously asked with qsub, ex. -l walltime=xx:xx:xx) can be fixed by using the qalter command. For example:

$ qalter -l walltime=xx:xx:xx jobid

You can use qalter command just to decrement the usage of a job resource. If you need to increment a resource, email us at Avogadro Staff and explain your reasons. Then, the staff will proceed with the incremental.

Need for node reservation?[edit]

You can reserve computing time using the command pbs_rsub like this:

pbs_rsub -R start_date_and_time -E end_date_and_time 
Both date parameters are in datetime format.

Let's see and example:

$ pbs_rsub -R 16:30 -E 17:30

In this example you are requesting a reservation from 16:30 (4:30pm) to 17:30. The system will give you a reservation ID and you can use the command pbs_rstat to inspect the reservation. You can also ask for a specific amount of resources, the format is the same you use with qsub:

$ pbs_rsub -R 16:30 -E 17:30 -l select=1:ncpus=12

As usual, for more information you can check pbs_rsub man page.

In case of emergency or PBS failures, it is possible to ask to the staff for a node reservation. Email us at Avogadro Staff.

What if my execution machine has gone off-line?[edit]

Up to now, our system is set this way: when a server goes off-line, or in case of network issues, your job is going to be re-queued, and then started afresh. We are planning to modify the system so that it'll be possible to restart the job from a certain checkpoint (please note that your application could do it by itself!). By the way, if you want your job NOT to be re-runnable, you simply have to specify it on qsub:

qsub -r {n,y} 
-r is for "re-runnable", "n" is for "not re-runnable", "y" is for "re-runnable"


Important note: we recently (Dec 2016) changed the default behavior: your job are now NOT re-runnable by default ( qsub -r n )



Note that your job may still appear as Running, even after a shutdown of the slave node. If you want to check if your job is effectively running, you can do as follows:

$ qstat -u $USER

For each job id you want to check, type:

$ qstat -n1 <job ID> |grep comment

This will print the slave machine on which the job is presumably running. So, log into that server and check if there is actually some processes of yours:

$ top

If your job is not effectively running, you should provide the deletion of your own jobs with the following command:

$ qdel -W force <job ID>