Overview
Sun Grid Engine (SGE) is a tool for resource management and load balancing in cluster environments. Running a batch job with SGE begins with creating a wrapper script, followed by submitting to one of the queues. For some prepared templates for wrapper scripts for some of the most popular software packages on the GPC, navigate to /data/shared/scripts once you are logged in.
Queues
The following non-interactive queues are available on the GPC:
Queue | Usage |
all.q | General jobs, 192GB memory available per node. |
himem.q | High memory jobs. 256GB memory available per node. |
Interactive jobs can be run using 1 qlogin session at a time per user, which can run for no more than 24 hours. Users who need to use GPC resources for longer than 24 hours should do so by submitting a batch job to the scheduler using instructions on this page.
Creating Job Wrapper Scripts
Job submission scripts can be used to run batch jobs on the GPC. These scripts contain options for the job scheduler (specified using #$), as well as the commands needed to start your job. The following table contains a list of common scheduler options:
Option | Purpose |
-q queue_list | Defines 1 or more queues which may be used to run your job. This can be any of the queues listed in the table above. |
-pe pe_name # | Defines the parallel environment to use with '#' processes. |
-N name | Defines a name for your job. This will be displayed when using qstat. |
-o filename | Defines the file to use for standard logging. |
-e filename | Defines the file to use for error logging. |
-j y | Enables standard and error log output to a single log file. File can be specified by the '-o' option. |
-j n | Enables standard and error log output to a seperate log files. Files can be specified by the '-o' and '-e' options. |
-m b,e | Enables job notifications. Mail will be sent when the job begins (b), ends (e), or aborts (a). One or more of these options can be specified using commas. |
-M | The email to use for job notifications. |
-cwd | Execute the job from the current working directory. |
-S /bin/bash | Defines the shell to use for your job. The default is /bin/bash. |
-l h_rt=03:30:00 | Defines a time limit of 3 hours and 30 minutes for the job. |
-t 1-10:1 | Defines an array job, with subtask ids counting from 1 to 10 with an increment of 1. The starting id must be less than the end id, and the increment must be greater than 0. This essentially allows you to create 10 identical job submissions with a single job script. |
-tc 5 | Defines the number of concurrent tasks to run for an array job. This example would allow 5 array job tasks to run on the cluster at a time. |
(Note: A full list of these can be found under the Options section of man qsub)
Sample jobs scripts are provided below:
Example Serial Job
#!/bin/bash #$ -q all.q #$ -cwd #$ -S /bin/bash #$ -o output.log #$ -e error.log #$ -l h_rt=00:10:00 # Run for a max of 10 minutes
# Enable Additional Software . /etc/profile.d/modules.sh module load shared
# Run the job commands ./myprogram |
This example can be used for a serial job in the all.q queue.
Example Single Node Parallel Job
Single node shared memory jobs can run using a single node. Each node offers up to 40 slots, but our per-user slot limit is slightly less than this.
#!/bin/bash #$ -q all.q #$ -pe openmp 25 #$ -cwd #$ -S /bin/bash #$ -o output.log #$ -e error.log #$ -l h_rt=00:10:00
# Enable Additional Software . /etc/profile.d/modules.sh module load shared
# Run the job commands export OMP_NUM_THREADS=$NSLOTS ./myprogram |
This example can be used for an OpenMP job that uses 25 processes. The OMP_NUM_THREADS will automatically be set to the number of job slots by the job scheduler. The scheduler will place an OpenMP job on a single node.
Example Multi Node Parallel Job
#!/bin/bash #$ -pe openmpi 25 #$ -cwd #$ -S /bin/bash #$ -o output.log #$ -e error.log #$ -l h_rt=00:10:00
# Enable Additional Software . /etc/profile.d/modules.sh module load shared openmpi/gcc
# Run the job commands mpirun -n $NSLOTS -bind-to hwthread ./myprogram |
This example can be used for an OpenMPI job that uses 25 processes. The scheduler will first attempt to schedule workers using all available slots on a node, then span to another node for additional slots.
Job Management
Interactive Jobs can be started using qlogin. This will place you onto the shell of a node with the least amount of load.
ks347@gpc:~$ qlogin Your job 323 ("QLOGIN") has been submitted waiting for interactive job to be scheduled ... Your interactive job 323 has been successfully scheduled. Establishing /usr/global/sge/bin/qlogin_wrapper session to host node04 ... /usr/bin/ssh -Y -p 52039 node04
Last login: Wed Oct 14 17:02:33 2015 from gpc ks347@node04:~$ |
From here, you can run programs directly on the compute node. Please note, it is best to avoid running programs on the head node since it manages all of the compute nodes and provides access to the cluster from remote machines.
Submitting Batch Jobs
Batch jobs can be submitted using a wrapper script with the qsub command.
ks347@gpc:~/dev/MyJob$ qsub submit.sh Your job 324 ("submit.sh") has been submitted |
Deleting Jobs
The qdel command allows you to delete a job by JobID.
ks347@gpc:~/dev/MyJob$ qdel 325 ks347 has registered the job 325 for deletion |
Monitoring Jobs
The qstat command will show a list of all jobs that are currently running and scheduled to run in the job queue.
[ks347@gpc Laplace]$ qstat -f queuename qtype resv/used/tot. load_avg arch states --------------------------------------------------------------------------------- all.q@node01 BIP 0/0/2 0.01 lx-amd64 --------------------------------------------------------------------------------- all.q@node02 BIP 0/2/2 0.01 lx-amd64 26 0.55500 LaplaceMPI ks347 r 11/09/2015 10:46:22 2 1 --------------------------------------------------------------------------------- all.q@node03 BIP 0/0/2 0.01 lx-amd64 --------------------------------------------------------------------------------- all.q@node04 BIP 0/0/2 0.01 lx-amd64 26 0.55500 LaplaceMPI ks347 r 11/09/2015 10:46:22 2 1
############################################################################ - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS ############################################################################ 26 0.00000 LaplaceMPI ks347 qw 11/09/2015 10:46:15 4 5-8:1 27 0.55500 LaplaceMPI ks347 qw 11/09/2015 10:46:16 4 1-8:1 28 0.55500 LaplaceMPI ks347 qw 11/09/2015 10:46:17 4 1-8:1 |
Advanced Job Wrappers
Array Job Wrapper
An array job can be created to submit a batch of similar tasks. An example is as follows:
#!/bin/bash #$ -cwd #$ -q all.q #$ -t 1-20:1 #$ -tc 2 #$ -N "QUEUE_ARRAY_TEST"
# Enable Additional Software . /etc/profile.d/modules.sh module load shared openmpi/gcc
# Set per-task variables. $SGE_TASK_ID can be used to vary input to each task. # Each task will have a unique value counting from 1 to the max number of tasks. let i1=$SGE_TASK_ID let i2=$(($SGE_TASK_ID+1000))
# Run the job commands using the per task variables as input echo "Task: $i1 $i2" ./myprogram -a $i1 –b $i2 |
This script can be submitted using a standard qsub. The –t option specifies start_task_number-end_task_number:task_stride. The scheduler will create 20 jobs in the queue and allow at most 2 jobs (specified by -tc) to run on the nodes at the same time.
Compute node scratch space
Each compute node has ~1TB available for use as scratch space. I/O intensive jobs can move data onto nodes to speed up access during a job’s runtime. An example wrapper is as follows:
#!/bin/bash #$ -q all.q #$ -pe openmp 25 #$ -V
# Specify a few variables needed for this job PROGRAM=$HOME/dev/MyProgram/myprogram DATASET=$HOME/data/datafile SCRATCHDIR=/scratch/ks347/$JOB_ID
# Check whether the scratch directory exists and create as needed if [[ ! -d “$SCRATCHDIR” ]] then mkdir -p $SCRATCHDIR fi
# Check whether out data is in scratch and copy as needed if [[ ! -e “$SCRATCHDIR/datafile” ]] then cp $DATASET $SCRATCHDIR/datafile cp $PROGRAM $SCRATCHDIR/myprogram fi
# Navigate to the scratch dir cd $SCRATCHDIR
# Run our job commands from within the scratch dir export OMP_NUM_THREADS=$NSLOTS ./myprogram -f datafile -o outfile
# Copy the output from the job commands to your homedir # then delete the scratch dir cp outfile $HOME/data/outputfile.$JOB_ID rm -rf $SCRATCHDIR |
This job will create a scratch directory on the node that it runs on, copy data and the job into the scratch directory on the node, then copy the job output back to the network home directory. After the job completes, the temporary scratch directory is deleted.