Please note: The FAQ pages at the HPCVL website are continuously being revised. Some pages might pertain to an older configuration of the system. Please let us know if you encounter problems or inaccuracies, and we will correct the entries.
This document is intended as a quick reference on basic questions about the file systems used on HPCVL clusters. It includes information on home directories, work and scratch space, disk quota, and our tape library backup system.
The /home file systems (there is actually several of them, but they are all called /home) is the main area where our users keep their data. Each user's home directory resides there, and is called/home/hpcXXXX where hpcXXXX denotes the user name.
Physically, this file system resides in 4 racks containing approximately 1 PB of raw disk space in the form of 4 Sun "Unified Storage System" 7410 units. The building blocks of the file system consist of many RAID Z2 volumes. This configuration is designed to tolerate the failure of multiple disks without the loss of data or disruption in service. This is achieved with the implementation of a spare pool of disks. If one (or even two) member drives of an array fails, the global spare drive joins the logical drive and automatically starts to rebuild. Our disk arrays are both read and write cached through flash memory to increase speed. Access speed is homogeneous throughout the file system.
The file system is NFS at the front end and based on ZFS at the back. A high degree of redundancy is built into the system, both on the level of the headnodes (which are dual active/active), and on the level of connectivity (10 Gig Ethernet with two switches and dual cables). Part of the older SAM-QFS based storage systems serve as a backup management system that connects our disk arrays to our tape StorEdge L1400 tape library. This allows the continuous backup of files.
Disk quotas are active on the /home, /u1, and /scratch subsytems. At present these quotas have been fixed at 500 GB per user for /home, which is enough for most of our users. This quota pertains to the /home file system only. The /u1 file system has a quota of 2 TBytes to avoid uncontrolled fill-ups ("sanity quota"). Data that exceed these limits have to be moved, either off the system (see below question 4) or to the /scratch subsystem, which is not backed up (see below question 5). The /scratch system has a disk quota of 5 TB.
Note that disk quotas are enforced automatically. Once a user exceeds a quota, no further data can be written to the file system by that user, making it impossible to log in in some cases. If this happens, you need to contact us and arrange for freeing up disk space.
Some of our users currently exceed our disk quotas on /home and /u1, due to previous negotiated arrangements. For groups that require such larger quotas we can temporarily raise them to allow continuing work. We will contact these users and arrange for moving data to bring them back within the standard quota. We do not provide long-term data storage as a default.
If you have special needs concerning disk usage that exceeds the above quotas, you can contact us and make a temporary arrangement for more. However, this arrangement will be periodically reviewed and has to be time-limited.
Files in /home and /u1 are automatically backed up. Users do not have to do anything for these activities to occur.
Experience shows that some users need disk space exceeding the 500 GB disk quota for their home directory, sometimes over an extended time period. Examples would be the trajectory files of a molecular dynamics run, or the results of large fluid dynamics simulations. Files that contain information which is occasionally accessed may be moved away from the /home file system into an alternative area denoted /u1. This file system is subject to considerably increased disk quota (2 TB per user).
Data residing in /u1 are backed up by default. When you receive a user account, a directory /u1/work/user-ID is automatically created, and access is restricted to to the owner. The structure of the files and directories below this is left to the user.
If you need more disk space than the disk quota on /home and /u1 allows, you should consider the following options, preferably in that order:
Scratch space is supplied in the /scratch area of the file system. This space is intended for transitory data that are generated during a calculation and are usually deleted shortly after the calculation has finished. However, it is worthwhile to consider keeping other intermediate results that are only needed for a short time on scratch space if there is a danger of exceeding disk quota in /home or /u1.
/scratch is subject to a quota of 5 TB per user to avoid sudden overflow on a disk array. If you require more, please contact us.
Note that our scratch space is global, i.e. accessible from all nodes. While this implies somewhat slower access than local scratch, it allows data to be used from different nodes within a program run (e.g. of an MPI program), and it simplifies maintanance.
Scratch space is accessed via the /scratch directory. A directory /scratch/user-ID is automatically created when you receive and HPCVL account. By default, it is only accessible by the owner.
To use the scratch, you will often have to set an application specific environment variable, which can then be given the name /scratch/hpcXXXX to work on all nodes, eg. for the quantum-chemistry codeGaussian, one would set (in a csh):
export GAUSS_SCRDIR=/scratch/hpcXXXX
Note that the above setting is automatically applied when you issue a "use g03" command.
By default, HPCVL maintains backups for the purpose of securing user data (disaster recovery) only, not for permanent storage or external use. This means that it is the responsibility of the individual user to remove data that are to be kept permanently from the cluster and store them on external media, such as disks, tapes, or DVD's.
User data that reside in the /home file system are backed up on a short cycle (in the order of days) to our L1400 tape library. This happens automatically as soon as a file appears on the files system. Whenever a file changes, the change will be committed to the backup as well.
Data that reside in /u1 are also backed up. The backup cycle is the same as for /home. No user data residing outside these directories are backed up. This holds specifically for /tmp and /scratch.
The general answer is contact us. The system administrators may be able to retrieve the lost data from the regular backup on the L1400 tape library. Keep in mind that changes that you made to the data before the loss occured might be lost since the copy of your file may be outdated. Likewise, if you made accidental changes to your files, you might be able to revert to an earlier version by retrieval from a backup copy. However, if the changes were already committed the earlier file could be lost. To avoid such problems, consider a version control system.
If the loss is the consequence of a general disk failure, the part of the file system that was affected will be restored from safety backups, and it is not necessary (nor useful) to contact the administrator for the retrieval of individual files. In that case, you will have to wait until the file system is restored to normal. This may take several days in the case of a severe failure.