Local/Network File Systems
Local and network file systems provide users with temporary disk space and locations for user-developed and supported applications and commonly used binaries, libraries, and documents.
NFS allows disks (file systems) to appear local to multiple machines so that copying or moving files between machines (such as with FTP) is unnecessary. LC uses NFS to support the common home (/g) and shared temporary (/nfs/tmpn) directories that appear on all its computers. Performance issues are one drawback to mounting NFS globally across all production machines under heavy I/O load. If just one user attempts massive parallel I/O to /nfs/tmpn or their common home directory instead of using Lustre, then all NFS users on all machines can experience seriously degraded performance. Consult directory properties and limits to see the quotas and the backup and purge policies of the two NFS-mounted directories (/g and /nfs/tmpn).
The /var/tmp file systems (/tmp and/usr/tmp are links to /var/tmp) are local to each individual node. They are faster than NFS (but much smaller). There is no quota, but there are no backups, and the file systems are purged between batch jobs.
LC provides the /usr/gapps file system for user-developed and supported applications on LC systems. There is a single /usr/gapps file system globally available to all LC systems (one on OCF and one on SCF). See the /usr/gaaps File System Web page for more information.
Traditionally, /usr/local is the location of commonly used binaries, libraries, and documents that are machine specific, but it is also used for code development and file storage. /usr/global is common to all machines and is used for the /usr/local packages that can benefit from a common area. See the /usr/local File System Web page for more information.
Parallel file systems are specially designed to efficiently support large-file parallel I/O from every compute node in a cluster. At LC, the installed parallel file systems are Lustre and Spectrum Scale -- commonly known as GPFS. Each LC parallel file system has a name of the form
where the lustre or gpfs name denotes wh ich parallel file system technology is used and an integer designates which of that type of file system is being referenced. Each LC center (CZ, RZ, SCF, iSNSI) has at least one Lustre file system (e.g. /p/lustre1 ) and may have more.
Lustre is an open source, object-based, parallel shared file system used in high performance computing environments around the world. LC provides multiple Lustre file systems on both the OCF and SCF that provide users with the ability to store and retrieve files from the LC Linux/TOSS cluster where they reside or on the cluster where the job is scheduled. The Lustre file systems in LC have the following characteristics and rules:
- A common directory naming scheme (/p...).
- Huge capacity and high bandwidth.
- The same general service constraints of other large file systems—no backup of files, multiple tiers of quotas, with no purging; space management is performed by the user.
- The same performance and access trade-offs. Lustre was designed to support parallel I/O with automatic load balancing across disks. Extensive parallel I/O on a standard, globally mounted file system (such as /nfs/tmpn or your /g home directory) can seriously degrade performance for all users across all the machines where that file system is mounted.
Parallel file systems often interact strongly with the application code performance of MPI-IO (parallel I/O using the message passing interface library). To minimize such problems, Slurm now flushes the Lustre page cache after every job ends.
LC has deployed new, non-purged workspaces for all users and groups on LC systems (Collaboration Zone, CZ; Restricted Zone, RZ; and Secure Computing Facility, SCF), with a quota of 1TB per user and group.
The pathnames to these workspaces will be:
- CZ: /usr/workspace/ws[a,b]/<workspaceName>
- RZ: /usr/workspace/wsrz[c,d]/<workspaceName>
- SCF: /usr/workspace/ws[1,2]/<workspaceName>
Ownership and permissions on these workspaces will be:
|User||owner = root, group = name of user||2570 (r-xrws--)||root:fahey2|
|Group||owner = root, group = name of group||2570 (r-xrws--)||root:lchotline|
These workspaces will be cross-mounted on all appropriate LC clusters. All files owned by a given user in each /usr/workspace file system will count towards that user’s quota—whether they are in a user workspace or a group workspace. Each subdirectory will have daily snapshots taken at 12:00 P.M. and 7:00 P.M. and will be kept for 7 days (14 snapshots in total).
Workspaces will not be backed up nor stored off-site. Workspaces will never be purged. Because these file systems are shared using the Network File System (NFS) protocol, parallel I/O of the workspace to LC production clusters is highly discouraged.