Our default compute cluster consists of eight large shared-memory servers of the Enterprise M9000 type.This page explains essential features of these machines and is meant as a basic guide for their usage.
Our cluster consists of eight shared-memory machines that are highend Sun SPARC Enterprise M9000 Servers which Sun Microsystems built in partnership with Fujitsu. Access is handled exclusively by Grid Engine, including test jobs that are specific to these servers. The server nodes are called m9k0001...m9k0008
Each of these servers consists of 64 quad-core 2.52 Ghz Sparc64 VII processors. Each of these chips has 4 compute cores, and each core is capable of Chip Multi Threading with 2 hardware threads. This means that each of the servers is capable of working simultaneously on up to 512 threads. In total they are able to process more than 4000 threads. As each core carries two Floating-Point Units that can handle Additions and Multiplications in a "Fused" manner (FMA), the cluster has a TPP of up to 20 TFlops.
Chip Multi Threading (CMT) is a technology that allows multiple threads (process) to simultaneously share a single computing resource, such as a core. This increases the efficiency of usage of the core. At the same time, multiple cores share chip resources, thereby improving their utilization.
Our servers have a total of 2TByte of memory (8 GB per core). These machines are suitable for very-high-memory applications.
For more information on the Sparc64 VII Architecture, please check out this website.
The main emphasis in these high-end Shared-Memory servers is to deliver the maximum possible floating-point performance while not compromising on memory requirements. These machines are to some degree complementary to our Victoria Falls Cluster, where the emphasis is on "Throughput", while here the focus is on sheer TFlops.
The large memory of these servers make them ideally suited for large-scale computations. Large L2 caches keep memory latencies low, while chip multithreading technology increases core utilization. These are true high-performance machines.
Large-memory Shared-Memory machines are ideally suited for applications that require fast access to well-organized memory, combined with rapid execution of Floating-Point Operations. While the multi-threaded chip does not explicitly make the distinction between multi-threaded shared-memory applications and multi-processing distributed-memory programs, there is some preference for the former because of the lower degree of communication latencies. Applications that are very floating-point extensive, or depend crucially on cache may be good candidates for running on these servers.
If you think your application could run more efficiently on these machines with some modifications, please contact us (email@example.com) to discuss any concerns and let us assist you in getting started.
Note that on these SMP machines, we have to enforce dedicated cores or CPUs to avoid sharing and context switching overheads. No "overloading" can be allowed.
The servers are accessed via the HPCVL Secure Portal at https://portal.hpcvl.queensu.ca/ from the login node sflogin0 (also called sfnode0).
Clicking on the "Secure Desktop" tab in the portal will present you with a list of applications. Choose the one saying dtterm (sfnode0) or xterm (sfnode0). This will bring up a login terminal on the login node sflogin0.
The file systems for all of our clusters are shared, so you will be using the same home directory. The login node can be used for compilation, program development, and testing only, not for production jobs.
Since the architecture of the Sparc64 VII chips of the M9000 Servers differs in some important details from the one of the login node, it may be a good idea to re-compile your code whenever possible. This is in most cases very simple:
For applications that can not be re-compiled (for instance, because the source code is not accessible), compilations for any post-USIII UltraSparc chip will work, usually pretty well.
For a general introduction to program development, compilation and application building on our systems see the HPCVL Parallel Programming FAQ.
As mentioned earlier, program runs for user and application software on the login node are allowed only for test purposes. Production runs must be submitted to Grid Engine. For a description of how to use Grid Engine, see the HPCVL GridEngine faq
Grid Engine will schedule jobs to a default pool of machines unless otherwise stated. This default pool contains presently only the our M9000 nodes m9k0001-8. Therefore, you need to add no special script lines to be scheduled to these servers exclusively.
Note that your jobs will run on dedicated threads, i.e. up to 512 processes can be scheduled to a single server. The Grid Engine will do the scheduling, i.e. there is no way for the user to determine which processes run on which cores.
For a more thorough review of Multi-core environment, please check out this PDF. You might want to follow some of the links provided in this document. General information about using HPCVL facilities can be found in our FAQ pages.
We also supply user support (please contact us at firstname.lastname@example.org), so if you experience problems, we can assist you.