HPCVL is operating a cluster of X86 based multicore machines running Linux.This page explains essential features of this cluster and is meant as a basic guide for its usage.
Our cluster consists of multiple X86 multicore nodes made by Sun Microsystems (AMD based), Dell (Intel x5670 based), and IBM (also based on Intel x5670). All nodes run CentOS Linux 5.8 and share a file system. Access is handled by Grid Engine. The server nodes are called sw0001...sw0050.

Several of the nodes on the HPCVL "Software Cluster" are Sun Fire X4140 Servers. These are based on the AMD Opteron 2356 chip and have two sockets with quad-core chips for a total of 8 cores per node. The base speed of these chips is 2.3 GHz. Each node has 32 Gbyte of physical memory.
We also operate one Sun Fire X4440 Server with four sockets based on the 6-core AMD Opteron 8431 chip. These chips run at 2.4 GHz. The total physical memory of the node is 80 GB.
Most of the nodes of the SW cluster are Dell PowerEdge R410 Servers that have 2 sockets with a 6-core Intel® Xeon® processor (Intel x5670 / x5675) that runs at 2.9 GHz. These nodes offer a total of 12 cores that are 2-fold hyperthreaded, i.e. they support up to 24 threads. These nodes have 32 Gbyte (64 Gbyte for some) each.
A few nodes are IBM XServers 3550-M3 that are also based on the Intel® Xeon® processor (Intel x5690). These servers are dual-socket with 6 cores per chip for a total of 12 cores per node and support for up to 24 threads (hyperthreading). The clock speed for these machines is 3.46 GHz.
The main emphasis in these systems is a high floating-point performance for a modest number of processes / threads. Since commercial software such as Fluent and Abaqus is increasingly focussed on support for Linux only, this cluster was acquired to continue to offer recent versions of these software packages. In addition, the higher single-core performance of these nodes (compared to the Sparc/Solaris based M9000 cluster, for instance), allows for a more efficient use of license seats which usually a priced per-core.
The software cluster runs on the Linux operating system, and should therefore only be used if the software cannot be compiled or run on the Sparc/Solaris platform. Runs that require more than 64 Gbyte of memory should be performed on the M9000 cluster unless the program is parallelized using MPI with distributed memory and very low communication requirements.
If you think your application could run more efficiently on these machines, please contact us (help@hpcvl.org) to discuss any concerns and let us assist you in getting started.
Note that on these cluster (as on the M9000's), we have to enforce dedicated cores or CPUs to avoid sharing and context switching overheads. No "overloading" can be allowed.
The HPCVL Secure Portal at https://portal.hpcvl.queensu.ca offers a direct link called xterm [linux login node]. This link connect via a terminal to sw0010 which is designated as a login/workup node for the cluster. If encounter issues with the portal login please let us know. Meanwhile, it is possible to "ssh" directly from sflogin0 to sw0010 by typing ssh sw0010 and re-typing your system password.
The file systems for all of our clusters are shared, so you will be using the same home directory as when you are using the M9000 servers or the standard login node sfnode0. sw0010 node can be used for compilation, program development, and testing only, not for production jobs.
Since the SW cluster has a completely different architecture than the M9000 Servers code must be re-compiled when migrating to this cluster. The compiler that we are using on this cluster is the Intel Compiler Suite. This includes compilers for Fortran, C, and C++, as well as MPI and OpenMP support, debuggers and development suite. This software resides in /opt/ics and is only visible to the Linux cluster. The versions are:
We also supply the gnu C/C++/Fortran compilers (version 4.1.2 20080704 (Red Hat 4.1.2-52)) to enable compilation of some of the more "temperamental" public-domain software, but it is preferable to use the Intel suite if possible.
For applications that can not be re-compiled (for instance, because the source code is not accessible), a pre-compilerd Linux version (x64 for Redhat will do the trick) needs to be obtained.
... to run jobs
As mentioned earlier, program runs for user and application software on the login node are allowed only for test purposes or if interactive use is unavoidable. In the latter case, please get in touch to let us know what you need. Pruduction jobs must be submitted through the Grid Engine load scheduler. For a description of how to use Grid Engine, see the HPCVL GridEngine FAQ.
Grid Engine will schedule jobs to a default pool of machines unless otherwise stated. This default pool contains presently only the M9000 nodes m9k0001-8. Therefore, you need to add the following two lines to your script for your job to be scheduled to the Linux SW cluster exclusively:
#$ -q abaqus.q
#$ -l qname=abaqus.q
The abaqus name for the queue that is added here derives from the initial software Abaqus that was (and still is) run on this cluster.
Note that your jobs will run on dedicated threads, i.e. typically up to 12 processes can be scheduled to a single node. The Grid Engine will do the scheduling, i.e. there is no way for the user to determine which processes run on which cores.
General information about using HPCVL facilities can be found in our FAQ pages. We also supply user support (please contact us at help@hpcvl.org), so if you experience problems, we can assist you.