Folio2 user guide

  • Faculty & Staff
  • Students & Alumni

 

 

The cluster of folio2.sas.upenn.edu is a shared resource used by Profs. Aguirre, Bernstein, Sako and groups working on PAPER, DES, and SDSS data.

Please email astro-hpc@physics.upenn.edu with any and all system questions. The mailing list at physics-shredder@groups.sas.upenn.edu is available for user discussion.

 

Contents

hardware

  • login node
  • admin node
  • backup node
  • 16x compute nodes
    • Dell PowerEdge 1950
    • dual-socket quad-core Intel Xeon L5420 @ 2.50GHz
    • 32GB memory
  • 7x compute nodes
    • Dell PowerEdge R410
    • dual-socket 6-core Intel Xeon E5649 @ 2.53GHz
    • eSATA port on node16 for local data transfer
    • 32GB memory
  • gigabit ethernet interconnect
  • NFS Storage System
    • 200TB raw storage
    • two 73TB real filesystems
    • 10gig Ethernet link

software

python

  • /usr/bin/python - 2.7.5
  • /usr/global/anaconda - [Intro to anaconda]
    • 2.7.11 is default
    • versions 2.6.8 - 3.5.1 are also available

There are also other versions of Python maintained by the Paper and DES groups. See if your group maintains the modules that you need. Otherwise, sysadmins can install them for you in the global version.

remote access

ssh

Command line and X11 access can be done via the command line ssh client on Mac or Linux. [PuTTY] is recommended for Windows.

$ ssh folio2.sas.upenn.edu

vnc

VNC is a method for running a persistent remote desktop. The desktop session will be remotely on Folio2 and appear locally in a window. The vnc client runs locally on a laptop and connects remotely to a vnc server on Folio2. The [TigerVNC] client is recommended. Note that VNC only encrypts your password, not the session itself, so running vnc over an ssh tunnel is required.

Log in to Folio2 via ssh to start a vncserver instance. You must choose a display number that no one else is currently using. In this example, display :10 is used, aka port 5910 (5900+10=5910).

$ vncserver :10 -nohttpd -name FOLIO2 -depth 16 -geometry 1024x768

If this is your first time running vnc on Folio2, you will be prompted for a password to protect the server. Please then kill the server and edit its config file.

$ vncserver -kill :10

Update the ~/.vnc/xstartup config file.

# Uncomment the following two lines for normal desktop:
unset SESSION_MANAGER
exec /etc/X11/xinit/xinitrc

Now, start a vnc server again on Folio2.

$ vncserver :10 -nohttpd -name FOLIO2 -depth 16 -geometry 1024x768

If you are using TigerVNC, the client will set up an ssh tunnel automatically with the -via option. Start a vncviewer client locally on your laptop, matching the Folio2 vncserver display number.

$ vncviewer -PreferredEncoding Tight -LowColourLevel 1 -passwd ~/.vnc/passwd -via folio2.physics.upenn.edu :10

When finished, the client may be closed and the server will continue to run in the background. To end the desktop session, use the -kill option on Folio2.

$ vncserver -kill :10

Note that these last three commands can easily be aliased in your ~/.bashrc file:

alias vncstart='vncserver :10 -nohttpd -name FOLIO2 -depth 16 -geometry 1024x768'
alias vncstop='vncserver -kill :10'
alias vncfolio='vncviewer -PreferredEncoding Tight -LowColourLevel 1 -passwd ~/.vnc/passwd -via folio2.sas.upenn.edu :10'

job queues

The job queues and scheduling on the Folio2 cluster are handled by Son of Grid Engine (SGE). Grid Engine is commercially supported by Oracle, so you may find the latest documentation from Oracle useful:

A job is a shell script that will run your required code. Jobs are added to a queue with the qsub command. The CPU cores/slots and available memory on the cluster are finite resources. When resources are free, the scheduler will start the next highest priority job. Job scheduling is weighted by multiple factors and very good about assigning priority. Folio2 uses a priority scheme referred to as a user-level fairshare algorithm.

Jobs enter the a queue in the queue-wait state (qw) and will then enter the run (r) state when able. Job state and queue information can be obtained with the qstat command.

  • all.q
    • default
    • one week limit
    • nodes 1-22
  • blanco.q
    • node23

submitting jobs

overview

To submit jobs, do not specify queues. Instead, specify job length or priority.

all.q:

       $ qsub -l h_rt=01:00:00 job.sh
       $ qsub -l low,h_rt=01:00:00 lowjob.sh

 

simple batch job

 

Please see the example script located at

/usr/global/sge/examples/jobs/simple.sh

simple parallel job

 

mpi_hello.c example:


// a simple mpi test
// compile with:
// $ mpicc -o ~/mpi_hello mpi_hello.c
#include <stdio.h>
#include <mpi.h>
int main (argc, argv)
int argc;
char *argv[];
{
int rank,size;
MPI_Init(&argc,&argv); /* starts MPI */
MPI_Comm_rank(MPI_COMM_WORLD,&rank); /* get current process id */
MPI_Comm_size(MPI_COMM_WORLD,&size); /* get number of processes */
printf("Hello world from process %d of %d\n",rank,size);
MPI_Finalize();
return 0;
}

mpi_hello.sh example:

#!/bin/sh<br>
## a simple openmpi example<br>
## submit with:<br>
## $ qsub ~/mpi_hello.sh<br>
# Export all environment variables<br>
#$ -V<br>
# Your job name
#$ -N mpi_hello
# Use current working directory
#$ -cwd
# Join stdout and stderr
#$ -j y
# PARALLEL ENVIRONMENT:
#$ -pe ompi 16
# Enable resource reservation
#$ -R y
# The max hard walltime for this job is 16 minutes (after this it will be killed)
#$ -l h_rt=00:16:00
# The max soft walltime for this job is 15 minute (after this SIGUSR2 will be sent)
#$ -l s_rt=00:15:00
# The following is for reporting only. It is not really needed
# to run the job. It will show up in your output file.
echo "Got $NSLOTS processors."
# The mpirun command.
mpirun -np $NSLOTS ~/mpi_hello

 

array jobs

If you will be submitting many hundreds or thousands of similar jobs, a more manageable way to submit them is with array jobs. An array job is specified with a specific option provided to a single qsub command. It will run many iterations of a single job script, passed to the qsub command as an argument. Array jobs can be specified with the -t option. For instance, to qsub an array job that iterates 1000 times and limits itself to a maximum of 10 simultaneous tasks:


qsub -t 1000 -tc 10 myarrayjob.sh

 

Within the job script, you may reference the variable $SGE_TASK_ID to differentiate each of the individual runs, from 1 to 1000.

man pages

Please run these commands for more information:

  • man qsub
  • man qstat
  • man qacct

 storage

data directories

  • /data3
  • /data4

file search

Please do not use the find command to search for files on the data partitions. Instead, use the "locate" command to search the weekly-updated database of files.