SGE Job Scheduler

  • Faculty & Staff

Overview

Sun Grid Engine (SGE) is a tool for resource management and load balancing in cluster environments. Running a batch job with SGE begins with creating a wrapper script, followed by submitting to one of the queues. For some prepared templates for wrapper scripts for some of the most popular software packages on the GPC, navigate to /data/shared/scripts once you are logged in.

Queues

The following non-interactive queues are available on the GPC:

Available Queues
Queue Usage
all.q General jobs, 192GB memory available per node.
himem.q High memory jobs. 256GB memory available per node.

 

Interactive jobs can be run using 1 qlogin session at a time per user, which can run for no more than 24 hours. Users who need to use GPC resources for longer than 24 hours should do so by submitting a batch job to the scheduler using instructions on this page.

 

Creating Job Wrapper Scripts

Job submission scripts can be used to run batch jobs on the GPC. These scripts contain options for the job scheduler (specified using #$), as well as the commands needed to start your job. The following table contains a list of common scheduler options:

Scheduler Options
Option Purpose
-q queue_list Defines 1 or more queues which may be used to run your job. This can be any of the queues listed in the table above.
-pe pe_name # Defines the parallel environment to use with '#' processes.
-N name Defines a name for your job. This will be displayed when using qstat.
-o filename Defines the file to use for standard logging.
-e filename Defines the file to use for error logging.
-j y Enables standard and error log output to a single log file. File can be specified by the '-o' option.
-j n Enables standard and error log output to a seperate log files. Files can be specified by the '-o' and '-e' options.
-m b,e  Enables job notifications. Mail will be sent when the job begins (b), ends (e), or aborts (a). One or more of these options can be specified using commas.
-M The email to use for job notifications.
-cwd Execute the job from the current working directory. 
-S /bin/bash Defines the shell to use for your job. The default is /bin/bash.
-l h_rt=03:30:00 Defines a time limit of 3 hours and 30 minutes for the job.
-t 1-10:1 Defines an array job, with subtask ids counting from 1 to 10 with an increment of 1. The starting id must be less than the end id, and the increment must be greater than 0. This essentially allows you to create 10 identical job submissions with a single job script.
-tc 5 Defines the number of concurrent tasks to run for an array job. This example would allow 5 array job tasks to run on the cluster at a time.

  (Note: A full list of these can be found under the Options section of man qsub)

 

Sample jobs scripts are provided below:

Example Serial Job

#!/bin/bash

#$ -q all.q

#$ -cwd

#$ -S /bin/bash

#$ -o output.log

#$ -e error.log

#$ -l h_rt=00:10:00            # Run for a max of 10 minutes

 

# Enable Additional Software

. /etc/profile.d/modules.sh

module load shared

 

# Run the job commands

./myprogram

This example can be used for a serial job in the all.q queue.

 

 

Example Single Node Parallel Job

Single node shared memory jobs can run using a single node. Each node offers up to 40 slots, but our per-user slot limit is slightly less than this.

#!/bin/bash

#$ -q all.q

#$ -pe openmp 25

#$ -cwd

#$ -S /bin/bash

#$ -o output.log

#$ -e error.log

#$ -l h_rt=00:10:00

 

# Enable Additional Software

. /etc/profile.d/modules.sh

module load shared

 

# Run the job commands

export OMP_NUM_THREADS=$NSLOTS

./myprogram

This example can be used for an OpenMP job that uses 25 processes. The OMP_NUM_THREADS will automatically be set to the number of job slots by the job scheduler. The scheduler will place an OpenMP job on a single node.

 

 

Example Multi Node Parallel Job

#!/bin/bash

#$ -pe openmpi 25

#$ -cwd

#$ -S /bin/bash

#$ -o output.log

#$ -e error.log

#$ -l h_rt=00:10:00

 

# Enable Additional Software

. /etc/profile.d/modules.sh

module load shared openmpi/gcc

 

# Run the job commands

mpirun -n $NSLOTS -bind-to hwthread ./myprogram

This example can be used for an OpenMPI job that uses 25 processes. The scheduler will first attempt to schedule workers using all available slots on a node, then span to another node for additional slots.

 

 

Job Management

Interactive Jobs can be started using qlogin. This will place you onto the shell of a node with the least amount of load.

ks347@gpc:~$ qlogin

Your job 323 ("QLOGIN") has been submitted

waiting for interactive job to be scheduled ...

Your interactive job 323 has been successfully scheduled.

Establishing /usr/global/sge/bin/qlogin_wrapper session to host node04 ...

/usr/bin/ssh -Y -p 52039 node04

 

Last login: Wed Oct 14 17:02:33 2015 from gpc

ks347@node04:~$

From here, you can run programs directly on the compute node. Please note, it is best to avoid running programs on the head node since it manages all of the compute nodes and provides access to the cluster from remote machines.

 

Submitting Batch Jobs

Batch jobs can be submitted using a wrapper script with the qsub command.

ks347@gpc:~/dev/MyJob$ qsub submit.sh

Your job 324 ("submit.sh") has been submitted

 

Deleting Jobs

The qdel command allows you to delete a job by JobID.

ks347@gpc:~/dev/MyJob$ qdel 325

ks347 has registered the job 325 for deletion

 

 

Monitoring Jobs

The qstat command will show a list of all jobs that are currently running and scheduled to run in the job queue.

[ks347@gpc Laplace]$ qstat -f

queuename                      qtype resv/used/tot. load_avg arch          states

---------------------------------------------------------------------------------

all.q@node01                   BIP   0/0/2          0.01     lx-amd64

---------------------------------------------------------------------------------

all.q@node02                   BIP   0/2/2          0.01     lx-amd64

     26 0.55500 LaplaceMPI ks347        r     11/09/2015 10:46:22     2 1

---------------------------------------------------------------------------------

all.q@node03                   BIP   0/0/2          0.01     lx-amd64

---------------------------------------------------------------------------------

all.q@node04                   BIP   0/0/2          0.01     lx-amd64

     26 0.55500 LaplaceMPI ks347        r     11/09/2015 10:46:22     2 1

 

############################################################################

 - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS

############################################################################

     26 0.00000 LaplaceMPI ks347        qw    11/09/2015 10:46:15     4 5-8:1

     27 0.55500 LaplaceMPI ks347        qw    11/09/2015 10:46:16     4 1-8:1

     28 0.55500 LaplaceMPI ks347        qw    11/09/2015 10:46:17     4 1-8:1

   

Advanced Job Wrappers

 

Array Job Wrapper

An array job can be created to submit a batch of similar tasks. An example is as follows:

#!/bin/bash

#$ -cwd

#$ -q all.q

#$ -t 1-20:1

#$ -tc 2

#$ -N "QUEUE_ARRAY_TEST"

 

# Enable Additional Software

. /etc/profile.d/modules.sh

module load shared openmpi/gcc

 

# Set per-task variables. $SGE_TASK_ID can be used to vary input to each task.

# Each task will have a unique value counting from 1 to the max number of tasks.

let i1=$SGE_TASK_ID

let i2=$(($SGE_TASK_ID+1000))

 

# Run the job commands using the per task variables as input

echo "Task: $i1 $i2"

./myprogram  -a $i1 –b $i2

This script can be submitted using a standard qsub. The –t option specifies start_task_number-end_task_number:task_stride. The scheduler will create 20 jobs in the queue and allow at most 2 jobs (specified by -tc) to run on the nodes at the same time.

 

 

Compute node scratch space

Each compute node has ~1TB available for use as scratch space. I/O intensive jobs can move data onto nodes to speed up access during a job’s runtime. An example wrapper is as follows:

#!/bin/bash

#$ -q all.q

#$ -pe openmp 25

#$ -V

 

# Specify a few variables needed for this job

PROGRAM=$HOME/dev/MyProgram/myprogram

DATASET=$HOME/data/datafile

SCRATCHDIR=/scratch/ks347/$JOB_ID

 

# Check whether the scratch directory exists and create as needed

if [[ ! -d “$SCRATCHDIR” ]]

  then

    mkdir -p $SCRATCHDIR

fi

 

# Check whether out data is in scratch and copy as needed

if [[ ! -e “$SCRATCHDIR/datafile” ]]

 then

   cp $DATASET $SCRATCHDIR/datafile

   cp $PROGRAM $SCRATCHDIR/myprogram

fi

 

# Navigate to the scratch dir

cd $SCRATCHDIR

 

# Run our job commands from within the scratch dir

export OMP_NUM_THREADS=$NSLOTS

./myprogram -f datafile -o outfile

 

# Copy the output from the job commands to your homedir

# then delete the scratch dir

cp outfile $HOME/data/outputfile.$JOB_ID

rm -rf $SCRATCHDIR

This job will create a scratch directory on the node that it runs on, copy data and the job into the scratch directory on the node, then copy the job output back to the network home directory. After the job completes, the temporary scratch directory is deleted.