Compute Canada

Access Levels

A guide to help choose a Category of access level for HPCVL.

We are currently charging $0 for Catagories 1,2 and 5

 

Please read the Access Levels Policies prior to reading this document.

The following discussion assumes that the Central site has three main production pools as outlined given below. There are other resources that HPCVL may offer to groups but the following is for illustration only. Obviously, the resources outlined below have to be differentiated when considering classifying their weight, such as, systems and jobs with large memory and small-memory-footprint throughput jobs.

  • Victoria Falls cluster (VF) with 74 Sun SPARC T5140 servers. These servers have two Niagara 2+ eight-core chips (1.2GHz) with 8 threads per core and 32 GB or 64 GB of memory. This means there are 128 compute threads available on each server for a total of 9472 threads (processes) available in the cluster.
  • Sun Enterprise M9000 cluster (M9K) with eight M9000 systems each containing 64 quad-core SPARC64 VII chips (2.52 GHz) and 2 TB of memory. Each CPU core runs two threads giving 512 compute threads (processes) per system or 4096 compute threads for the cluster.

Resource Limits

As of December 2008, the default number of jobs per user permitted through the queues has been changed to a total of 8 executing jobs and the default number of threads/processes per user per cluster as:

  • Victoria Falls (vf001-73): 512
  • M9000 (m8k0001-8): 64

General Institutional Access (Category 1)

Category 1 access level should allow a PI (with a small group) to access the resources of HPCVL in a low-key manner. That is, a relatively small numbers of threads or processes per job and not burdening the systems all the time. So, should a Cat.1 PI have 3 graduate students, each submitting jobs that are using 4 to 12 threads or processes, the total usage might be using 192 threads or processes on the M9K cluster 50% of the time. Alternatively, a Cat. 1 PI may have only 1 graduate student but this graduate student requires 280 threads or processes for 50% of the time on a combination of the clusters. This is not yet approaching the point where resources might be denied other users and this PI would be okay here but perhaps would be asked to move to Category 2, the Enhanced Institutional Access Category, if their average monthly usage starts approaching the monthly user limits for resources and the group is large. Note that "the resources" are loosely defined here as factors such as memory and type of CPUs (i.e. queues) may vary and be weighted differently.

Another example is with a group that starts out using 24 to 64 threads with 1 user. More users in the group sign up and 3 users end up collectively using 400 threads 60% of the time. This is a situation that would require the PI move from a Cat. 1 level to a Cat. 2 level depending on the queue being used. Flexibility should be preserved as one day a group might have 3 people needing 400 threads altogether but 90% of the time they use only 50. Constant monitoring is required and we understand that minor bumps need not mean that people have entered into a more computationally demanding phase of their research programs. We are also keeping in mind that research capability is why HPCVL exists. Again keep in mind that not all resources are equal and that the Victoria Falls cluster threads are counted with less weight than say the M9K cluster threads with the large memory.

Enhanced Institutional Access (Category 2)

The Category 2 access level should allow a PI and their group to access greater resources. Although the restriction on numbers may be implicit for the Cat. 1 level, for the Cat. 2 level there should be no such implicit restriction. These groups could involve 1 or more users whose computing needs exceed 200 threads on the M9000 cluster say 60% of the available time. Say a group uses 500 threads on average 70% of the available time on the M9000 cluster. At present, that would constitute 8.5% of the available 4096 threads available on that cluster. Once again, keep in mind that some threads will be "valued" more than others or rather some will be valued less depending on the cluster being used.

With the installation of additional resources, the percentages may become a little more complicated to calculate and we will be updating these guidelines as needed. Throughout all of this is the understanding that the Cat. 2 access level has a higher level of priority. Should a user in a group need even higher priority and more resources (say over 900 threads for 6 weeks), a special fee may need to be paid to administer this service. However, provided the Cat. 2 PI whose group may need say the 900 threads for 6 weeks is making every effort to optimize their code, this type of work should be encouraged. It is an example of capability computing that we strive for. We may be resource bound at times and be unable to provide this service. We may also have a special pool of threads available at times to facilitate this type of work at some point.