Search HPCVL Website Link HPCVL Sitemap Link HPCVL Users Homepage Link HPCVL Portal Link
HPCVL Home Page Link
News and Events Link
HPC Links
background graphic background graphic background graphic
Research Areas Link Research Scholarships Link Reasons to Join Link
HPCVL Tagline - Accelerating the pace of Canadian Research
About HPCVL Header
What is HPCVL Link
Research Areas Link
Research Areas Link
Research Scholarships Link
Research Chairs Link
Show Publications Link
Newsletters Link
Press Releases Link
Partnerships Link
Certification Authority Link
Contact Us Link
User Information Header
Account Applications Link
Account Signup Procedure Link
Access Levels Link
Access Policies Link
Allocation Process Link
Support and Training Link
Getting Started Link
Support Personnel Link
HPC Training Link
HPC Environment Link
M9000 Cluster Link
Victoria Falls Cluster Link
HPCVL Cluster Storage Link
Beowulf Cluster Link
HPCVL File System Link
Software Link
Frequently Asked Questions Link
Security Link
Compute Canada
Sunfire Cluster

HPCVL's Sunfire Cluster

The Sunfire Compute Cluster at HPCVL is the default production cluster. It is based on Symmetric Multiprocessor (SMP) systems using the UltraSPARC line of processors and the Solaris Operating Environment. This page explains essential features of the cluster and is meant as a basic guide for its usage.

1. What is the cluster?

The Sunfire compute cluster that is based on Sunfire 25000 servers each of which have 72 x (2MB on-chip L2 cache and 32MB L3 cache) dual-core (CPU) UltraSPARC-IV+ processors . There are seven (7) of these servers available, called hpcvl0-6, plus an E2900 login node called sflogin0 with the same chip architecture and OS.

The current configurations are:

  • Six Sun Fire 25000 Nodes (hpcvl0 to hpcvl5) with 72 X dual-core UltraSPARC-IV+ 1.5 GHz processors with 576 GB of RAM.
  • One Sun Fire 25000 Node (hpcvl6) with 72 X dual-core UltraSPARC-IV+ 1.8 GHz processors with 576 GB of RAM.
  • One SunFire E2900 (sflogin0) with 24 x 1.8 GHz UltraSPARC-IV+ processors and 192 GB RAM.
  • Two Sun Fire 6900 Nodes (1 at U of O, and 1 at Carleton) with 24 x UltraSPARC-IV+ processors with 192 GB of RAM. Both are to be mainly used as workup nodes.
  • One Sun Fire 4800 with 12 x UltraSPARC-III processors with 48 GB of RAM at Ryerson University. Currently used as a workup node.

2. Why this cluster?

The main emphasis of the Sunfire cluster is on "standard parallel jobs". Because they are SMP machines, they offer a substantial amount of memory. With a 2 Floating-Point Units per compute core, they are able to process floating-point intensive jobs at a theoretical peak of 345.6 GFlops (518.4 GFlops) per server.

3. Who should use this cluster?

The Sunfire machines are curerently the default compute cluster, and are suitable for applications that require considerable amount of memory and/or scale to a moderate number of processors. They can process both sharded-memory based applications (usually programmed using OpenMP directives), and distributed-memory parallel programs often using MPI.

Applications that are very floating-point extensive, or depend crucially on cache usage should be run on this cluster or on our M9000 servers.

We suggest you consider using the compute cluster if

  • Your application is explicitly or automatically multi-threaded (for instance, using OpenMP) and shows at least some scaling for moderately large numbers of threads (>20).
  • Your application is based on MPI or PVM, and uses substantial amounts of communication. The SMP nature of the Sunfire 25Ks enables very fast intra-node communication.
  • Your application uses substantial amounts of memory. For extremely large memory usage, the M9000 servers should be preferable.
  • Your application is commercially licensed on a per-process basis.

The cluster might not be suitable if

  • Your application is "trivially parallel", employing distributed-memory systems such as MPI, and uses almost no communication. For this purpose, our Victoria-Falls cluster is preferable.
  • Your application consists of a very large number of independent serial runs. Again, the Victoria-Falls cluster should be used.

4. How do I use this cluster?

a) ... to access

Login access to the headnode of the compute cluster is available via the HPCVL Secure Portal at https://portal.hpcvl.queensu.ca/.
Clicking on the "Secure Desktop" tab in the portal will present you with a list of applications. Choose the one saying "xterm (sfnode0)" or "dtterm (sfnode0)". This will bring up a login terminal on the Sunfire cluster login node sflogin0. Note that the compute nodes of the Sunfire cluster are accessed via Grid Engine by default.

The file systems for all our clusters are shared, so you will be using the same home directory. Everything else will also be very similar on all standard clusters, including OS, shell setup, and Grid Engine usage. The login node can be used for compilation, program development, and testing only, not for production jobs.

b) ... to compile and link

Compilingn and linking for the Sunfire Cluster is very simple:

  • Make sure you are using Studio 12 compilers. This is the default, but if you have entries in your shell setup that reset the compiler, you might have to modify these by typing use studio12
  • Many optimization options in the Studio compilers, such as -fast imply settings that involve -native, i.e. they optimize for the architecture and chipset of the machine on which you are doing the compilation. These settings do not have to be changed. The compilation should be done on the login node sflogin0.

For a general introduction, see http://www.hpcvl.org/faqs/programming/parallel-prog-faq.html.

For applications that cannot be re-compiled (for instance, because the source code is not accessible), compilations for any post-USIII UltraSparc chip will work.

c) ... to run jobs

As mentioned earlier, program runs for user and application software on the login node are allowed only for test purposes. Production runs must be submitted to Grid Engine. This is exactly as on the Sunfire cluster. For a description of how to use Grid Engine, see the HPCVL GridEngine FAQ

Grid Engine will schedule jobs to a default pool of machines unless otherwise stated. This default pool contains presently only the Sunfire 25K's, i.e. hpcvl0-hpcvl6. Therefore, no additional changes need to be made to use them.

Note that the number of processes for these machines must be chosen such that dedicated scheduling is possible. It is therefore important, that if a maximum of 8 processes are running, 8 CPU'as are requested through Grid Engine. Which specific number to choose must be determined largely by experimentation specifically for each application. d) ... to optimize While in many cases, optimization options such as -fast will result in excellent performance, for larbger applications it is often necessary to analyze the timing profile of typical runs to uncover bottlenecks and optimize on a source-code level. The Sun Studio Performance Analyzer is an exceelent tool to help with this task.

5. Help?

...to find more information

Our user support (please contact us at help@hpcvl.org), can supply you with specific help, and is glad to answer questions about cluster usage.

 
 
   
© HPCVL 2010