To submit a production job on HPCVL clusters, you need to us
the load-balancing software Grid Engine. To obtain
details, read
our
Gridengine FAQ .
For a Fluent production job, this means
that rather than issuing the above batch command directly, you wrap
it into a Grid Engine script. For an example for such a
batch script please click
here. This script needs to be altered by replacing all the
relevant items enclosed in {} by the right values.
The batch script is submitted to the GridEngine by typing
qsub fluent.sh
The advantage to submit jobs via a load balancing software
is that the software will automatically find the resources
required and put the job onto a set of processors that have a
low load. This will help executing the job faster. Note
that the usage of Gridengine for all production jobs on
HPCVL clusters is mandatory. Production jobs with a
running time of more than 3 hours that are submitted outside
of the load balancing software will be terminated by the
system administrator.
The Fluent jobs that you will want to run on the HPCVL
machines are likely to be quite large. To utilize the parallel
structure of the Sun Fire's, Fluent offers several options to
execute the solver in a parallel environment, i.e. on several
CPU's simultaneously. The default option for such runs
is MPI i.e., it uses the SUN native version of
the Message Passing Interface for
inter-process communication.
To take advantage of the parallel capabilities of Fluent, you
have to call the program with a series of commandline options that
specify the details of your parallel run. Here is a short
overview:
- -tn where n is the number of processors
requested, e.g. if you want to run with 8 processors, you
would use the option -t8
- -pvmpi specifies that the (default) vendor MPI
communication system is to be used. May be omitted.
- -g specifies that the GUI should be
surpressed. Required for batch jobs.
Parallel jobs should only be run in batch using the Grid
Engine. The number of processors specified in our example script
appears only once, after
#$ -pe fluent.pe
which is
where you let the Gridengine know how many processors to
allocate to run the program. The internal environment variable
$NSLOTS will automatically be set to this value and can then
be used in the fluent command line.
It is also necessary to source a setup file called
/opt/fluent/Fluent.Inc/setup.sh for the 32-bit
version. This will set various environment variables and
enable the Fluent program to properly interact with Grid
Engine. If you are interested, take a look. The file is
readable. If you are using the 64-bit version of Fluent, you
have to alter the batch script to source the
/opt/fluent/Fluent.Inc/setup_64bit.sh file instead.
In the above script, the parallel environment
fluent.pe is for Fluent jobs only, and is
used to keep track of the available licenses. The licensing
situation can also be checked interactively by typing:
flulic
Grid Engine is able to interact with the license
manager of Fluent (FlexLM) to check if sufficient licenses
are available for running. This will keep the scheduler from
starting jobs because enough processors are available, just to
be stopped again because there is not enough licenses. Grid
Engine keeps an internal counter of available "license slots"
which gets updated frequently. Everytime Grid Engine attempts
to schedule a Fluent job and is kept from doing so because not
enough licenses are available, it will "requeue" the
job. Since this causes the issue of an email if notification
is requested, our example script contains no #$ -m
be line. Notification happens automatically when a job
starts and when it is finished, and will be sent to the email
specified in the line
#$ -M email@address
in the
script.
All processes are allocated within a single node.
This is to make communication more efficient and to avoid problems with
the control by Gridengine. The effect of this is that, while
still using MPI, Fluent employs a so-called shared-memory
layer for communication. The disadvantage is that it takes
longer until the required resources (dedicated processors) are
available, i.e. you spend more time on the Grid Engine waiting
queue.
Once the script has been adapted, it can be submitted to the
Gridengine by
qsub fluent.sh
from the login node (which is the GridEngine submit
host). Note that the job will appear as a parallel job on the
GridEngine's qstat or qmon commands. Note also
that submission of a parallel job in this way is only
profitable for large systems that use many CPU cycles, since
the overhead for assigning processes, preparing nodes, and
communication between them is considerable.
There is an easier way to do this: We are supplying
a small perl script called that can be
called directly, and will ask a few basic questions, such as
the name for the job to be submitted and the number of
processes to be used in the job. Simply type
FluentSubmit
and answer the questions. The script expects a Fluent input
file with "file extension" .flin to be present and will
do everything else automatically. This is meant for simple
Fluent job submissions. More complex job submissions are
better done manually.