GridEngine FAQ

This is a short introduction on HPCVL installation of Grid Engine and its basic usage commands. This shows to you how to use Sun Grid Engine (GE) to submit and control jobs on the HPCVL Sun Fire cluster. Note that the use of this software is mandatory. Please familiarize yourself with Grid Engine by reading this FAQ file, and the documentation listed in it.

Contents



1 General Overview of Grid Engine

What is Grid Engine?

Sun Grid Engine is a Load Management System (LMS) that allocates resources such as processors (CPU's), memory, disk-space, and computing time. Grid Engine like other LMS's enables transparent load sharing, controls the sharing of resources, and also implements utilization and site policies.

It has many characteristics including batch queuing and load balancing, as well as giving the users the ability to suspend/resume jobs and check the status of their jobs.

Grid Engine can be used through the command line or through a Graphical User Interface (GUI) called qmon, which both have the same set of commands.

Additional information about Grid Engine features will follow in the next sections and the documents referenced in this FAQ.

Which version of Grid Engine is currently in use in HPCVL machines?

The present version of Grid Engine in HPCVL machines is:

Sun Grid Engine 6.

This version of the software was designed for grid computing, ie, to allow the distribution of a workload over a network of computers that extends over several sites. This version allows the allocation of priorities, and the implementation of Utilization Policies. Please check this FAQ occasionally to stay informed about changes in the usage of Grid Engine.

How do I Setup my Environment to use Grid Engine?

When you first log in you will already have the proper setup for using Gridengine. This is because Gridengine is included in the default settings for usepackage. If for some reason Gridengine seems not to be part of your environent setup, you can add it by issuing the use sge6 command.

Part of the setup that is done automatically by usepackage is to source a setup-script that is located in the directory

/opt/n1ge6/default/common/

Depending on your login shell, you can also "source" those scripts manually:

For csh or tcsh:

source /opt/n1ge6/default/common/settings.csh

For ksh, bash, etc:

. /opt/n1ge6/default/common/settings.sh

The setup script modifies your search PATH and sets other environment variables that are required to get Grid Engine running. One of those variables is SGE_ROOT which contains the directory in which the Grid Engine-related programs are located.

How do I start using Grid Engine?

Grid Engine provides two ways to run your jobs, the first is directly from the command line or through the QMON GUI, and it's up to the user to choose what is convenient for her.

However, if the job is simple and consists on only few commands then the submission is more easily done done via the command line. If the job requires the setup of many options and special requests, the use of the GUI is helpful (at least first time when you are writing your script), and facilitates the navigation through the available options.

What are the most commonly used Grid Engine Commands?

Sun Grid Engine has a large set of programs that let the user submit/delete jobs, check job status, and have information about available queues and environments. For the normal user the knowledge of the following basic commands should be sufficient to get started with Grid Engine and have full control of his jobs:

qconf:

Shows (-s) the user the configurations and access permissions only.

For example qconf -sql will give you a list of all available queues.

qdel:

Gives the user the ability to delete his own jobs only.

qhost:

Displays status information about Sun Grid Engine execution hosts.

qmod:

Modify the status of your jobs (like suspend/resume).

qmon:

Provides the X-windows GUI command interface.

qstat:

Provides a status listing of all jobs and queues associated with the cluster.

qsub:

Is the user interface for submitting a job to Grid Engine.

All these commands come with many options and switches and are also available with the GUI QMON. They all have detailed man pages (e.g. ">man qsub"), and are documented in the Sun Grid Engine 6 User's Guide. (about 2.2 MB)

2 Submitting your job with Grid Engine

What are the different kinds of jobs that I can run with Grid Engine?

You can submit to Grid Engine all kinds of jobs, starting from a simple UNIX command like date to more elaborated batch scripts like shared-memory parallel jobs or MPI jobs.

You can also open interactive sessions to use e.g. visualization programs.

You can submit an array of jobs which is a job consisting of a range of independent identical tasks, which may be helpful in certain applications that involve repeated execution of the same set of tasks.

What are the Grid Engine Queues in HPCVL system?

Grid Engine uses the notion of a queue to distinguish between the different different types of jobs and the different components of the HPCVL cluster. Grid Engine queues can allow execution of many jobs concurrently, and Grid Engine tries to start new jobs in the queue that is most suitable and least loaded.

Note, that a job is always associated with its queue, and depends on the status of this queue, but, users do not need to submit jobs directly to a queue. You only need to specify the requirement profile of the job, which includes memory, available software and type of job (parallel or not, MPI,...).

Although you don't submit jobs directly to a queue you still need to know which queue is handling your job and what are the characteristics of this queue. On the HPCVL system, we have presently only two different queues, one for test jobs and the other for production jobs. If you type

qconf -sql

you will see a list of all available queues. In particular, you'll find the following:

  • production.q This is the standard serial and parallel queue. All jobs other than simple short test jobs are sent to this queue automatically. Only hpcvl[0-9].q are used by Grid Engine, since sfnode0 serves as a login and workup node.

  • test.q This queue is a testing queue for jobs that are executed outside of the normal production nodes. A job submited with the " -l test " option will be immediately scheduled here. Note that, although voluntary at present, the use of this queue will be mandatory in the future for all jobs other than simple OS commands, program development and file manipulation. A zone with 40 cores called testjobs is used to execute jobs from this queue.

How do I write and submit batch jobs?

To run a job with grid engine you have to submit it from the command line or the GUI. But first, you have to write a batch script file that contains all the commands and environment requests that you want for this job. If, for example, test.csh is the name of the script file (a sample script file can be found here), then use the command ``qsub'' to submit the job:

qsub test.csh

And, if the submission of the job is successful, you will see this message:

your job 1 (``test.csh'') has been submitted.

After that, you can monitor the status of your job with the command ``qstat'' or the GUI qmon.

When the job is finished you will have two output files called "test.sh.o1" and "test.sh.e1".

Now, let's take a look at the structure of a Grid Engine batch job script. We first recall that a batch job is a UNIX shell script consisting of a sequence of UNIX command-line instructions (or interpreted scripts like perl,...) assembled in a file.

And in Grid Engine, it is a batch script that contains additionally to normal UNIX command special comments lines defined by the leading prefix ``#$''.

The first line of the batch file starts with

#! /usr/bin/csh

which is default shell interpreter for Grid Engine. But you can force Grid Engine to use your preferred shell interpreter (bash for example) by adding this line at your script file

#$ -S /bin/bash

to tell GE to run the job from the current working directory add this script line

#$ -cwd

if you want to pass some environment variable VAR (or a list of variables separated by commas) use the -v option like this

#$ -v VAR (#$ -V passes all variables listed in env).

Insert the full path name of the files to which you want to redirect the standard output/error respectively (the full pathname is actually not necessary if the #$ -cwd option was used).

#$ -o {file for standard output}
#$ -e {file for standard error}

The prefix #$ has many options and is used the same way you use qsub, so check qsub man pages to take a look at those options.

Here is a serial sample script that has to be modified to fit your case. All entries enclosed in {} must be replaced.

Insert your email-address after the "#$ -M".

Note that that qsub usually expects shell scripts, not executable files. To submit the job you simply type

qsub serial.csh

Note that from the command line you can issue options and type, for instance:

qsub -cwd -v VAR=value -o /home/tmp -e /home/tmp serial.csh

How do I submit an Array of Jobs?

An array of jobs is a job consisting of a range of independent identical tasks.

You submit an array of jobs by using the qsub command with the -t option like this:

qsub -t 2-10:2 serial.csh

where the -t option defines the task index range (check qsub manpages for more details).

How do I Submit Jobs to the SunFire 15K Machines?

Our main production environment consists of 7 Sun Fire 25K machines. When you submit jobs, by default this is the set of machines on which your job will run.

The 25K machines contain Ultra Sparc IV+ chips, so natively optimized code generated by the Studio compilers is often tuned specifically for those chips to get the best performance.

We also have 3 Sun Fire 15K machines with Ultra Sparc III chips (these are a bit slower than the IV+ chips, 1.2 GHz vs. 1.5/1.8 GHz) that have been retained from our previous setup.

It is possible that code optimized for the US IV+ chips will not run properly on the US III chips. A job submitted to Grid Engine can often run anywhere on the compute grid, so one day your US IV+ code will run perfectly on a 25K, but the next day it could end up on a 15K and might crash for no obvious reason.

For this reason, the 15Ks are not included in the default production queues.

Default Production Queues

All jobs start with a default request for
production.q@@us4plus
@us4plus is a hostgroup that currently contains all the machines with US IV+ chips (right now, that means the 25Ks).

Submitting jobs to these machines

Grid Engine provides a number of ways to select potential target machines for jobs. In particular, we have set up a "hostgroup" and a queue.

The hostgroup @us3 is just a short-hand container name for machines with US III chips (currently the 15Ks). The production.q queue is also available on this hostgroup but is not part of the default request configuration within jobs.

How to add the 15Ks to your job request

Only do this if your code can run on these US III machines!
  1. (simplest) the job can run on any machine that is part of production.q:
    #$ ... other directives ...
    		#$ -q production.q
  2. the job can also run somewhere in the us3 hostgroup:
    #$ ... other directives ...
    		#$ -q *@@us3
  3. ensure the job must run somewhere in the us3 hostgroup:
    #$ -clear
    		#$ ... other directives ...
    		#$ -q *@@us3
Notes:

The -clear removes any defaults for subsequent Grid Engine directives in this job (and only in this job), in particular the default production queue setup.

There really are 2 "@" symbols in examples #2 and #3. The "-q" line means:

            *            @             @us3
				any queue    containing    the hostgroup

3 Monitoring and Controlling Jobs

How do I monitor my jobs?

After submitting your job to Grid Engine you may track its status by using either the qstat command, the GUI interface QMON, or by email.

Monitoring with qstat

The qstat command provides the status of all jobs and queues in the cluster. The most useful options are:

  • qstat: Displays list of all jobs with no queue status information.

  • qstat -u hpc1***: Displays list of all jobs belonging to user hpc1***

  • qstat -f: gives full information about jobs and queues.

  • qstat -j [job_id]: Gives the reason why the pending job (if any) is not being scheduled.

You can refer to the man pages for a complete description of all the options of the qstat command.

Monitoring Jobs by Electronic Mail

Another way to monitor your jobs is to make Grid Engine notify you by email on status of the job.

In your batch script or from the command line use the -m option to request that an email should be send and -M option to precise the email address where this should be sent. This will look like:

#$ -M myaddress@work
#$ -m beas

Where the (-m) option can select after which events you want to receive your email. In particular you can select to be notified at the beginning/end of the job, or when the job is aborted/suspended (see the sample script lines above).

And from the command line you can use the options (for example):

qsub -M myaddress@work job.sh

How do I Control my jobs ?

Based on the status of the job displayed, you can control the job by the following actions:

  • Modify a job: As a user, you have certain rights that apply exclusively to your jobs. The Grid Engine command line used is qmod. Check the man pages for the options that you are allowed to use.

  • Suspend/(or Resume) a job: This uses the UNIX kill command, and applies only to running jobs, in practice you type

    qmod -s/(or -r) job_id (where job_id is given by qstat or qsub).

  • Delete a job: You can delete a job that is running or spooled in the queue by using the qdel command like this

    qdel job_id (where job_id is given by qstat or qsub).

    Note that if your job is not on the waiting queue, but is already executing, you need to issue the -f (force) option with the qdel job_id command to terminate the job.

Monitoring and controlling with QMON

You can also use the GUI QMON, which gives a convenient window dialog specifically designed for monitoring and controlling jobs, and the buttons are self explanatory.

4 Parallel jobs with Grid Engine

What are the Parallel Environments available under HPCVL Grid Engine?

A Parallel Environment is a programming environment designed for parallel computing in a network of computers, which allows execution of shared memory and distributed memory parallelized applications. The most commonly used parallel environments are Message Passing Interface (MPI) for distributed-memory machines, and OpenMP for shared-memory achines.

  • For MPI there is a SUN implementation which is part of Sun HPC ClusterTools. It's located under /opt/SUNWhpc directory, (check the HPCVL Parallel Programming FAQ for more details)

  • For OpenMP, no separate runtime environment is required. Details about shared-memory programming and multi-threading with OpenMP may be found in the HPCVL Parallel Programming FAQ.

    Grid Engine provides an interface to handle parallel jobs running on the top of these parallel environments. For the users convenience HPCVL has predefined parallel environment interfaces for them. You can check the list of available PE by the command qconf -spl, which gives the environments described hereafter:

    # qconf -spl
    dist.pe
    shm.pe
    
    • dist.pe

      This environment is intended for distributed memory applications using the Sun HPC ClusterTools libraries, in particular Sun MPI. Grid Engine will assign the dist.pejobs to the production.q queue and try to use fastest connection available between the slots and nodes. Although the system will try to allocate processes on as few nodes as possible, it will be allowed to spread them out over the cluster, since this parallel environment is meant to handle distributed-memory jobs.

    • shm.pe

      This environment is intended for shared-memory applications. Grid Engine will assign the processors in a single node to take advantage of the fastest connection available between the slots.

      shm.pe jobs are submitted to the production.q queue, i.e. to nodes hpcvl[0-9]. It is permissable to use shm.pe for distributed-memory (e.g. MPI) jobs, if the intention is to keep them within a single node. Note that this might speed up communication, but also lead to longer waiting periods.

    How do I submit a multi-threaded job?

    You need to specify the parallel environment to use, which is shm.pe in our case, and how many processors are going to be used. This is done via the script line:

    #$ -pe shm.pe 16

    if you want to use 16 processors. This sets and environment variable NSLOTS and requests the corresponding number of processes.

    There is no request for parallel queues or special complexes, but like in an interactive run of multi-threaded program you need to set the variables PARALLEL and also OMP_NUM_THREADS (in case of OpenMP applications) to the number of processors to be used. Add the following lines to your mt_job.csh script file (csh syntax):

    setenv PARALLEL $NSLOTS

    setenv OMP_NUM_THREADS $NSLOTS

    Here is a multi threaded sample script with these environment variables predefined, in which all entries enclosed in {} need to be replaced by the appropriate values (for instructions, see the serial job section). In that case to run the job you simply type

    qsub mt_job.csh

    How do I submit a parallel MPI job?

    A specific parallel environment needs to be specified, to let the system know which environment and how many processors are going to be used. This is done via the script line:

    #$ -pe dist.pe 16

    where the number of processors is 16 in this case.

    In the standard "mprun" command, you have to use the option ``-x sge'' to let the Cluster Tools runtime system know that resource allocation will be done by Grid Engine. The option ``-np 16'' that is normally used by mprun to request a certain number of processors is not required when used in a Grid Engine script, since the above ``#$ -pe'' directive will be used to determine the number of processes.

    Here is an mpi sample script , in which all entries enclosed in {} need to be replaced by the appropriated values (for instructions, see the serial job section).

    To run this job you simply type

    qsub mpi_job.csh

    5 Where can I get more help and documentation?

    Grid Engine has a lot more options and possibilities for every kind of jobs. Here, we gave the user only the basic steps to get started using GE. Detailed documentation is available. First, there is the Manual (which includes a User's Guide that should answer almost all of your questions).

    For specific commands, the man pages are very comprehensive and should be consulted.

    HPCVL also offers user support; for questions about this FAQ and the usage of Grid Engine in HPCVL machines contact us.

  •  
     
       
    © HPCVL 2007