Formerly titled "EZFILES," this user guide provides information about the many storage options available for LC users. These options range from temporary compute node storage which persists until the end of a job to permanent data storage within a state of the art tape archive.
NOTE For a matrix of file system to cluster connections visit the following pages:
LC Storage Quotas
LC Storage Type | Tier 1 | Tier 2 | Tier 3 | Snap- | Back up to tape? |
---|---|---|---|---|---|
Parallel File Systems /p/lustre* (Lustre) /p/gpfs* (OCF- pending deployment) /p/vast*† | 20TB/1M†† | 75TB/25M†† Fill out this form to request Tier 2 increase | Contact LC Hotline to initiate conversation with Livermore Computing and programmatic stakeholders | No | No |
Parallel File Systems
/p/gpfs* (Sierra GPFS) |
50TB/5M†† | 400TB/40M†† Fill out this form to request Tier 2 increase |
|||
NAS/NFS Project /usr/workspace, /usr/WS* /usr/gapps† /usr/gdata† | 2TB/10M†† 10GB† 10GB† |
4TB/25M†† 30GB† 30GB† Fill out this form to request Tier 2 increase |
Yes Yes Yes | No Yes Yes | |
Home Directory /g/g* | 24GB |
Tier 2 form not available. Contact LC Hotline to initiate conversation with Livermore Computing and programmatic stakeholders | Yes | Yes | |
Archive (HPSS) /users/u* | 300TB |
No | No | ||
Home Directory /g/g* | 24GB |
Tier 2 form not available. Contact LC Hotline to initiate conversation with Livermore Computing and programmatic stakeholders | N/A |
Yes | Yes |
Archive (HPSS) /users/u* | 300TB | N/A |
No | No | |
Object Storage S3-protocol StorageGRID | 4TB | N/A | N/A | No | No |
†these quotas are per directory, not per user as in all other cases
††NOTE for quotas with a "/", the second number equals inode limits in M (million) number increments where appropriate.
Directory Structure and Properties
This section shows how the important public directories (and their underlying file systems) are organized on LC machines. LC production Linux clusters differ in size, chipset, and available switch (interconnect), which can affect the scale and parallelism of applications for which each Linux cluster is best suited. See LC's Compute Platforms page for configuration details of the LC Linux clusters. Most systems run TOSS, the Tri-Lab Operating System Stack, a modified version of Red Hat Linux built for supercomputing at LLNL, and have the same basic file system hierarchy and public directory properties.
Each Linux cluster has a public directory structure as described below.
/tmp
A link to /var/tmp (for system use).
/usr/tmp
A link to /var/tmp (for general use).
/var/tmp
Temporary storage space for system activity and, optionally, for user activity. On diskless clusters, /var/tmp uses real memory, and TOSS clears out any data immediately after every batch job ends.
/g
A file system of globally available ("common") home directories on highly reliable NFS-mounted RAID (redundant array of independent disks) disks on each OCF and SCF production system. Your child subdirectory here (/g/gnn/yourname) is your default arrival directory and contains your startup and run-control dot files, but it is limited in size.
/opt
Contains third-party tools and is accessed via the module command. Some of the compilers that LLNL uses reside in /opt, as well as the default compilers used by LANL and Sandia. Versions of MVAPICH and Open MPI compiled with the various versions of the compilers also reside in /opt. The compilers and MPIs in /opt are kept updated to match what is available in /usr/local/tools.
/p
A set of parallel file systems designed to accommodate large, intensive parallel I/O, including Lustre, GPFS and VAST. File system names consist of the file system type followed by an integer, e.g. lustre2 or vast1 that increases with additional file systems in each LC center. See the Parallel File Systems section for details.
/usr/apps
A link to /usr/gapps/$SYS_TYPE. See /usr/gapps below.
/usr/gapps
A file system of globally available common code-management directories on NFS-mounted RAID disks on each machine similar to the /g common home directories. The directory contains some important non-commercial shared application codes and tools, such as ALE3D, BASIS, HYDRA, and Python. See the /usr/gapps Web page for details about the /usr/gapps file system.
/usr/bin
Contains hundreds of standard UNIX software tools (or their Linux counterparts) along with the C and Fortran compilers.
/usr/local
Contains some important commercial shared tools, such as TotalView, Valgrind, mpiP, and the PathScale compilers.
/usr/workspace/ws[a,b]
A file system containing 2TB directories for every LC user. Directories are also created for all UNIX groups, where user quotas determine write-access. This file system is not backed up, but does have one week of snapshots.
Common Home Directories
All production nodes on all OCF and SCF LC systems share home directories that reside on global file system /g, NFS mounted from several dedicated servers. This "common home" arrangement makes keeping redundant home files (and doing redundant updates) on multiple machines unnecessary, and it allows the same path name and directory structure for home to be shared on every host involved.
The /g directory path name is /g/gnn/uname (e.g., /g/g16/smith).
gnn
On OCF, it is g0 for LC staff and g2 or above for other users; on SCF, it is g5 for LC staff and g10 or above for other users.
uname
This is your LC login account name (not the numeral uid by which your file and block quotas are reported).
Included in a common home directory are master dot files, system-specific dot files, personal subdirectories or files, and online backup subdirectories.
- Master dot files are the basic startup and run-control files. They detect the machine type and operating system you are using by evaluating the HOST_GRP and SYS_TYPE environment variables and then invoking the appropriate machine- or system-specific dot files. Only customizations intended for all machine types and operating systems should go into these master dot files.
- System-specific dot files contain user customizations and apply only to specific systems. "System-group" names as suffixes simplify sharing these customizations among like machines.
- Personal files and subdirectories and any other files you want to have in your home directory, up to the quota, are organized as you wish under /g/gnn/uname. These files will appear to reside on every machine that shares the common home directory.
- Online backup subdirectories consist of four complete but read-only backup copies of every file and subdirectory in your common home directory that are made automatically at noon and 7:00 p.m. every day. The most recent backup resides in .snapshot/hourly.0, with each earlier backup in a correspondingly numbered hourly.n directory. This .snapshot subdirectory is unusual because it is hidden. It is not reported by running ls, as the other dot-named children of your home directory are, but you can change directories (cd) into it to list and copy its files.
Common home directories have 16 GB disk-space quotas. Note also that the output from quota-monitoring tools varies somewhat from one brand of computer to another. Consult the quota utility section below for a comparison of disk-usage reporting formats.
By default, your common home directory allows access only to you (as the owner). You can widen this access by running chmod. Although you can enable world or group write access to your home subdirectories if you wish, consider using give or take instead.
You cannot checkpoint (for restart) any job whose files reside on NFS-mounted disks. Because your common home directory is NFS-mounted, any batch job you run on a common-home machine that spawns a shell will access its dot files on those disks, and hence, that job will not checkpoint.
Your common home directory resides on an NFS file system not designed to handle high-volume parallel I/O. Plan your parallel I/O for only parallel file systems designed to support it.
Directory Properties and Limits
Because of their different roles, the home (/g) and work (/var/tmp) directories on LC systems have different properties. LC provides its common home directories (/g) and workspace directories (/usr/workspace) by using NFS. Parallel file systems are specially designed to efficiently support large-file parallel I/O from every compute node in a cluster. At LC, the installed parallel file systems are Lustre.
|
/g/gnn/uname |
/var/tmp |
/usr/workspace |
---|---|---|---|
Role | Home directory | Working directory | Cross-mounted data storage |
Use | Small, permanent files shared among machines (dot files, source code) | Temporary storage of input or output local to each machine | Large permanent storage for input or output shared among many machines |
Aliases | ~, $HOME | /usr/tmp | /usr/workspace |
Status | NFS mounted (shared) | Local to each machine | NFS mounted (shared) |
Shared across machines? | Yes | No | Yes |
Quotas? ...Per user? ...On file count? |
Yes 24 GB No limit |
No No No limit |
Yes 2 TB 9,500,000/user* |
Purge? Files vulnerable? |
No Never |
Yes** As space is needed |
No Never |
Automatic backup? | Yes, 4 copies every 12 hours | No, use storage | No, use storage |
* The limit here is only for inodes (or index node). UNIX file systems use inodes to keep track of the (often scattered) disk blocks that comprise each file. Because the inode size is fixed, a large file may require more than one inode to list all of its disk blocks; therefore, users with large files may find that slightly less than 9,500,000 files are allowed. ** LC Linux clusters use diskless compute nodes (/tmp and /var/tmp) that use real memory rather than disk space, making this storage ephemeral. TOSS quickly reclaims this memory as soon as you delete files from /tmp or /var/tmp and clears out /tmp and /var/tmp completely immediately after each batch job. You must move job files from here to /usr/workspace, Lustre, or archival storage before the job ends. |
File Systems
Local/Network File Systems
Local and network file systems provide users with temporary disk space and locations for user-developed and supported applications and commonly used binaries, libraries, and documents.
/var/tmp
The /var/tmp file systems (/tmp and/usr/tmp are links to /var/tmp) are local to each individual node. They are faster than NFS (but much smaller). There is no quota, but there are no backups, and the file systems are cleared between batch jobs.
/usr/gapps
LC provides the /usr/gapps file system for user-developed and supported applications on LC systems. There is a single /usr/gapps file system globally available to all LC systems (one on OCF and one on SCF). See the /usr/gaaps File System Web page for more information.
/usr/local, /usr/global
Traditionally, /usr/local is the location of commonly used binaries, libraries, and documents that are machine specific, but it is also used for code development and file storage. /usr/global is common to all machines and is used for the /usr/local packages that can benefit from a common area. See the /usr/local File System Web page for more information.
Parallel File Systems
Parallel file systems are specially designed to efficiently support large-file parallel I/O from every compute node in a cluster. At LC, the installed parallel file systems are Lustre and Spectrum Scale -- commonly known as GPFS. Each LC parallel file system has a name of the form
/p/[lustre[1..N],gpfs[1..N]]
where the lustre or gpfs name denotes which parallel file system technology is used and an integer designates which of that type of file system is being referenced. Each LC center (CZ, RZ, SCF, iSNSI) has at least one Lustre file system (e.g., /p/lustre1) and may have more.
Lustre
Lustre is an open source, object-based, parallel shared file system used in high performance computing environments around the world. LC provides multiple Lustre file systems on both the OCF and SCF that provide users with the ability to store and retrieve files from the LC Linux/TOSS cluster where they reside or on the cluster where the job is scheduled. The Lustre file systems in LC have the following characteristics and rules:
- A common directory naming scheme (/p...).
- Huge capacity and high bandwidth.
- The same general service constraints of other large file systems—no backup of files, multiple tiers of quotas, with no purging; space management is performed by the user.
- The same performance and access trade-offs. Lustre was designed to support parallel I/O with automatic load balancing across disks. Extensive parallel I/O on a standard, globally mounted file system (such as /usr/workspace or your /g home directory) can seriously degrade performance for all users across all the machines where that file system is mounted.
Parallel file systems often interact strongly with the application code performance of MPI-IO (parallel I/O using the message passing interface library). To minimize such problems, Slurm now flushes the Lustre page cache after every job ends.
Workspace
LC has deployed new, non-purged workspaces for all users and groups on LC systems (Collaboration Zone, CZ; Restricted Zone, RZ; and Secure Computing Facility, SCF), with a quota of 2TB per user.
The pathnames to these workspaces will be:
- CZ: /usr/workspace/ws[a,b]/<workspaceName>
- RZ: /usr/workspace/wsrz[c,d]/<workspaceName>
- SCF: /usr/workspace/ws[1,2]/<workspaceName>
Ownership and permissions on these workspaces will be:
Workspace |
Ownership |
Permission |
Example |
---|---|---|---|
User | owner = root, group = name of user | 2570 (r-xrws--) | root:fahey2 |
Group | owner = root, group = name of group | 2570 (r-xrws--) | root:lchotline |
These workspaces will be cross-mounted on all appropriate LC clusters. All files owned by a given user in each /usr/workspace file system will count towards that user’s quota—whether they are in a user workspace or a group workspace. Each subdirectory will have daily snapshots taken at 12:00 P.M. and 7:00 P.M. and will be kept for 7 days (14 snapshots in total).
Workspaces will not be backed up nor stored off-site. Workspaces will never be purged. Because these file systems are shared using the Network File System (NFS) protocol, parallel I/O of the workspace to LC production clusters is highly discouraged.
File Management Guidelines
This section provides information on how file system use should be guided by and can be affected by policies implemented at LC.
Backup Policy Summary
Because some file systems contain valuable, frequently used information whose loss would be very disruptive, full backups are made once a month, and incremental backups occur every night except Saturday and Sunday. Other heavily used file systems are simply too large to allow practical backup copies to be made, or they are mounted on machines for which no appropriate backup software is commercially available. For example, the Lustre parallel file systems with their multiple terabytes of capacity are not backed up, and there is minimal redundancy in their underlying storage servers. (This means that one hardware failure can make many distributed files unavailable.)
If you have files on these systems, it is your responsibility to make copies in the Archival Storage system, HPSS, of all crucial files in case you need to restore them on your own after a disk failure. To easily store very large files in HPSS, consider using HTAR, LC's highly efficient software tool designed for this specific task. For more details about using storage, consult Using LC Archival Storage. For details about using HTAR, consult the HTAR Reference Manual.
This table summarizes the backup status for each major file system on the LC production machines. (Those not listed are not protected by backup.) Files from the four most recent backups of your common home directory (i.e., /g/gnn) are retrievable from your .snapshot directory. The .snapshot subdirectory is not reported by running ls, as the other dot-named children of your home directory are, but you can change directories (cd) into it to list and copy its files.
File System |
Backup Status |
|
---|---|---|
Automatic Backup |
No Backup |
|
/g/gnn | X | |
/usr/local | X | |
/usr/gapps | X | |
/usr/tmp | X | |
/var/tmp | X | |
/p/lustre* (Lustre) | X | |
/usr/workspace/ws* | X |
File Purge Policy
File systems at or near their capacity often show degraded performance, higher I/O error rates, or sometimes complete service failure. To make service more predictable and reliable, LC has historically destroyed ("purged") files intentionally on at-risk file systems. LC has moved away from this model in favor of a quota system that limits file system usage. Presently LC does not purge any of its file systems.
Search Paths
When you try to execute a program by typing its (simple) file name, the UNIX shell searches through the file structure looking for a file with that name to execute. The order in which it searches directories is specified by your search path, a colon-delimited ordered list of directories stored in the environment variable PATH (all uppercase). If you use a program's absolute path name, the shell ignores your search path. You can reveal your current search path on any LC machine by executing echo $PATH.
This section provides information on how file system use should be guided by and can be affected by policies implemented at LC.
File Permissions
All files and directories have an owner (usually the person who initially created the file or directory). The owner can assign UNIX permissions to other users, and these permissions control who can manipulate the files and directories.
There are three classes of users who may have different permissions for a file or directory:
u = user (the owner)
g = group (the owner's group)
o = others (everyone else)
There are three kinds of permissions for files and directories (r, w, x) that may be assigned to any or all of the above classes of users
r = read (read, copy files; list files in directory)
w = write (edit, append files; create or remove files in directory)
x = execute (run, execute; cd into directory)
Run ls with the -l option to see the permissions assigned to your files and directories. Add the -a option to the command (ls -la) to see invisible files that reside in the directory, e.g., files whose names begin with a dot(.).
Only the owner of a file or directory (or the super user) can change a file's permissions. The changes are made using the chmod utility. There are two forms of chomd syntax: one specifies the desired permissions as an absolute (octal numeric) value; the other is symbolic and changes permissions incrementally.
Detailed information about UNIX file and directory permissions and modes is easily found online using your favorite Web search engine.
Citizenship and Permissions
Groups of users exist so that group members can use chgrp and chmod to allow shared access to files among everyone in that group. The groups utility, run without options, reports the names of the groups to which you currently belong.
On SCF machines, user citizenship can affect file sharing and the assignment of file permissions. Every user belongs to an "extra" group that reflects that user's citizenship. For example, every U.S. citizen belongs to the group us_cit. This allows restricted file access based on citizenship group, such as
chgrp us_cit myfile
and the use of chmod to open group permissions but limit (or eliminate) world permissions on that file
chmod 750 myfile
If you think your file management activities call for more details on the interaction of citizenship with file permissions, contact the LC Hotline at 925-422-4531 with specific questions.
Top-Level World Permissions Disabled
World (or "other") permissions on top-level files and directories invite unauthorized access and other security problems. An automatic monitoring process systematically disables all world permissions (read, write, and execute) on top-level user directories and files in the following file systems on each LC production machine:
- /g/gnn
- /var/tmp (sometimes called /usr/tmp)
- /tmp
- /p/lustre*
- /p/gpfs
- /p/vast1
- /usr/gapps (linked from former /usr/apps)
Permissions on files below the top level remain unchanged. Because disabling top-level world permissions is a security policy, exceptions will require a justification memo. Contact the LC Hotline if you want to apply for a specific exemption to the restrictions on world access.
Several alternatives to sharing files safely are available on LC machines, and each has its own strengths and weaknesses. See the File-Sharing Alternatives section for a comparison of these alternatives.
File-Sharing Alternatives
The standard UNIX technique for sharing access to files among several users is to enable read, write, or execute permissions on those files and their directory trees for "world" or "other" users. On LC production machines, however, all top-level world permissions are automatically disabled (set to 0) by monitoring software as a security policy. This effectively prohibits world permission file sharing at LC. (An exemption requires specific approval; contact the LC Hotline via e-mail or via telephone at [925] 422-4532.) Consider using one of several alternative file-sharing techniques.
The give and take utilities are well suited to sharing a few seldom-changed files with another specific user but are not appropriate for sharing large sets of files with many users, especially if the files change often. Both giver and taker must have accounts on the system on which the give occurs. The take can occur on any system on which the taker has an account. (The file system where the give/take files are spooled is global, but the give command needs to chown the given files so that they are owned by the taker. The chown requires a user name-to-uid translation for the taker, and this cannot happen unless the taker has an account on the system on which the give occurs.)
File group membership and permissions are well suited to sharing large sets of files or whole subdirectories with a stable list of other users. It also allows sharing between machines if the files are in a globally mounted file system, such as the common home directories, and if the same user group exists on several machines. A signed approval form is required to create a group, and no LC user can belong to more than 32 groups at once.
One variation on file sharing by group involves enabling group permissions on file(s) in the global folder /usr/gapps. (A special /usr/gapps request form must be submitted.) Consult the /usr/gapps file system Web page for additional information.
A second variation on file sharing by group involves using the directory in /usr/workspace/ws[a,b]. For example all members of a UNIX group EOS can read and write to the directory /usr/workspace/wsb/eos.
A third variation involves enabling group permissions on stored files. Archival Storage (HPSS) storage groups and online groups are not necessarily the same, however, and group assignments change when a file is stored. See Using Archival Storage for detailed instructions on sharing files in Archival Storage (HPSS).
Using Groups
A group is a named set of users created to enable easier file sharing among group members. By default, every user at LC belongs to a group of one with the same name as their login name, and your newly created files are assigned by default to that unique group. If you belong to other groups, you can change your default group for the current session or the group to which any of your files is assigned to take advantage of the group permissions. On LC machines, software constraints limit every user to membership in no more than 32 groups.
The table below shows how to perform the most common group-related tasks.
Group-Related Task |
Command |
---|---|
Reveal who belongs to a specified group | grep grpname /etc/group |
Reveal all groups to which you belong | groups |
Reveal all groups to which username belongs | groups username |
Change your default group to grpname | newgrp grpname |
Restore your original default group | newgrp |
Change a file's group assignment | chgrp grpname filename |
Change a file's group permissions | chmod |
Create or join a group at LC |
Contact LC Hotline |
Your group membership on LC's Archival Storage systems - HPSS (storage.llnl.gov) may differ from your group membership on LC's production machines, which may also often differ among themselves. You can use group assignments to control the sharing of stored files, but only if you discover who belongs to which storage groups, and only if you assign a file to a group after you store it. (Group assignment does not persist during file transfer.) You can change the group to which a stored file is assigned by using chgrp in NFT or quote site chgrp in FTP, but only if you belong to the target group. Note: Despite its name, NFT's group command begins asynchronous file transfers and has nothing to do with managing file permission groups.
To create, delete, or change the membership of a group on any OCF or SCF machines, contact your LC Computer Coordinator, or follow these instructions to be added to a group.
Snapshots: Retrieving Deleted Files
In your /g home directory there is a special hidden directory called .snapshot. As a "hidden" directory, it is not listed by the ls command but you can cd into it.
In the .snapshot directory you will find four full backup copies of your home directory. (After you cd to the .snapshot directory, type ls -lu to list the directories and the actual date and time of each backup.)
hourly.0 [most recent backup] hourly.1 [next most recent] hourly.2 [third most recent] hourly.3 [fourth most recent]
Backup snapshots are created twice each day, 1200 and 1900. These files and directories are read-only and may be copied as needed to your regular home directory. In addition, system backups are done on a nightly basis.
Two typical file deletion/retrieval scenarios are presented below.
Scenario 1
It is now 4:00 p.m. on 8/19/05.
hourly.0 contains the 1200 backup from 8/19/05 hourly.1 contains the 1900 backup from 8/18/05 hourly.2 contains the 1200 backup from 8/18/05 hourly.3 contains the 1900 backup from 8/17/05
You have accidentally deleted file1 from your home directory, and you want to retrieve it. You simply go to the appropriate .snapshot directory and list the files. You see file1 listed, and you copy file1 back into your home directory.
cd .snapshot/hourly.0 ls -l cp file1 ~
Scenario 2
It is 9:00 a.m., and you have been working on file2 in your home directory. You want a copy of file2 as it was the morning of the previous day, before you made yesterday's and today's changes. From your home directory, you simply enter:
cp .snapshot/hourly.2/file2 file2.old
There are .snapshot online backup directories for several other NFS-provided file systems, including /usr/gapps, /usr/local, and user group-owned file systems. Files can be recovered on those file systems by following the same steps as shown in the sample scenarios above. Online backups are created in the same manner on SCF as on the OCF.
On most platforms, you will not be allowed to overwrite an existing file with the .snapshot copy. Either remove (rm) or move (mv) the existing file before copying from .snapshot.
NOTE: There are no .snapshot directories (or any other form of backup) for gpfs, Lustre file systems, or any "tmp" directories, including /usr/tmp.
Quotas
LC Storage Type | Tier 1 | Tier 2 | Tier 3 | Snap- | Back up to tape? |
---|---|---|---|---|---|
Parallel File Systems /p/lustre* (Lustre) /p/gpfs* (OCF- pending deployment) /p/vast*† | 20TB/1M†† | 75TB/25M†† Fill out this form to request Tier 2 increase | Contact LC Hotline to initiate conversation with Livermore Computing and programmatic stakeholders | No | No |
Parallel File Systems
/p/gpfs* (Sierra GPFS) |
50TB/5M†† | 400TB/40M†† Fill out this form to request Tier 2 increase |
|||
NAS/NFS Project /usr/workspace, /usr/WS* /usr/gapps† /usr/gdata† | 2TB/10M†† 10GB† 10GB† |
4TB/25M†† 30GB† 30GB† Fill out this form to request Tier 2 increase |
Yes Yes Yes | No Yes Yes | |
Home Directory /g/g* | 24GB |
Tier 2 form not available. Contact LC Hotline to initiate conversation with Livermore Computing and programmatic stakeholders | Yes | Yes | |
Archive (HPSS) /users/u* | 300TB |
No | No | ||
Home Directory /g/g* | 24GB |
Tier 2 form not available. Contact LC Hotline to initiate conversation with Livermore Computing and programmatic stakeholders | N/A |
Yes | Yes |
Archive (HPSS) /users/u* | 300TB | N/A |
No | No | |
Object Storage S3-protocol StorageGRID | 4TB | N/A | N/A | No | No |
†these quotas are per directory, not per user as in all other cases
††NOTE for quotas with a "/", the second number equals inode limits in M (million) number increments where appropriate.
Summary of Default File Quotas
Quota limits are necessary to ensure that disk space can be made available to all users and that work is not impeded by full file systems. The /g home directories are on both the OCF and SCF machines. The default quota is 16 GB.
The /g/g* (global) home file systems are provided on Network Appliance NFS servers. If you are using over 90% of your allotted space, a warning will be displayed upon login. Please don't miss this login announcement.
To see your current usage and quota limits, type
quota -v
(where -v signifies verbose).
Requests for additional disk space for /usr/workspace and the home directories should be directed to the LC Hotline through your Computer Coordinator or PI. The request must include justification for the larger need.
Quota Warnings
PLEASE ATTEND TO WARNING MESSAGES!
You risk losing data if you exceed your quota. Output to a home directory that is over quota will be truncated. Files moved to such a directory will also be truncated. Just opening a file for editing while you are over quota can cause the loss of the entire file's contents. Unfortunately, these are results of the UNIX implementation of quota and can be prevented only by user attention to quota warnings.
Carefully consider before you execute a batch job that attempts to write to your home directory space. If you exceed disk space quota:
- You will get truncated or zero length files and waste the run because the output is lost as a result.
- You will lose further data if your home directory remains filled after the job finishes, if you attempt to copy, write or edit files.
- MULTIPLE ATTEMPTS TO WRITE TO A DIRECTORY THAT IS OVER QUOTA WILL CREATE A SIGNIFICANT BURDEN ON THE NFS SERVER, CAUSING INTERACTIVE DELAY AND PERFORMANCE PROBLEMS. In some cases, jobs create hundreds of thousands of write attempts, quota checks, and refusals to write within a few minutes.
File Management Tools
This section summarizes how to use several file management tools that are either unique to the local computing environment (e.g., give and take) or that have special significance at LLNL, where so many resources are shared by many users (e.g., quota). You may want to use these tools on several platforms from different vendors or in combination with each other, so note the unit discrepancies in the last column.
Tool |
Function |
---|---|
give | Offers files to another user to take, even across machines |
take | Accepts files given by another user, even across machines |
quota | Reports your current local and global disk usage and limits (includes common home directories) |
limit | Summarizes (and sets) machine configuration limits |
du | Reports disk usage (for current or specified directory) |
df | Reports free disk space (for current or specified file system) |
htar | Bundles files into TAR-format archives or extracts archive members, for archives on remote machines |
Give
The give utility transfers files to another specified user (i.e., changes the owner) by copying the files to a holding directory from which only the intended recipient can retrieve the files by running take.
Important: Both giver and taker must have accounts on the system on which the give occurs. The take can occur on any system on which the taker has an account.
The syntax for give is:
give [-f] [-i] [-l] [-n] [-u] takername flist
where
takername is the login name of the user you intend to receive the files. You cannot specify several recipients with one execution of give.
flist is the name, space-delimited list of names, or file filter that specifies the file(s) or directories to give to takername. Path names are accepted for files not in the current directory.
When you give a file, you always automatically retain the original. Because the transferred copy waits in a special temporary subdirectory, it may be removed if the recipient delays retrieving it longer than the local purge interval. Options let you confirm (-l) or cancel (-u) files not yet taken. If you try to give two files with the same name, only the first is copied to the holding directory (you get a warning about the second). Files that are given always arrive with the recipient as owner and group, regardless of how you set the group and permissions on the file before you give it.
Given files are copied to the file system /usr/give, which has a whole system quota of 200 GB and a per user quota of 25 GB of disk space. If you plan to give a few large files (or even many small ones), you should probably confirm that you will not violate the quota by first running
df -h /usr/give
to get a current report on how much free space (in KB, MB, or GB) is still available in /usr/give. (See below for more background on using df.) There is no quota on the number of files that you can give.
Give is not capable of giving directories. Instead, create and give a tar file.
For additional information and details about options, consult the man page for give.
Examples:
give mjones file1 | Give one file to user mjones |
give mjones file* | Give multiple files via wildcard to user mjones |
give -l mjones | List the files given to user mjones |
give -u mjones file1 | Ungive (remove) file1 previously given to (but not yet taken by) user mjones |
Take
The take utility transfers files from another specified user (i.e., changes the owner) by copying the files from a holding directory to which they were copied by a user running give.
Important: Both giver and taker must have accounts on the system on which the give occurs. The take can occur on any system on which the taker has an account.
The syntax for take is:
take [-f] [-i] [-l] [-n] givername flist
where
givername is the login name of the user who previously ran give to transfer files to you. To actually retrieve any files, you must specify the user who gave them, and you cannot take files from several users at once.
flist is the name or space-delimited list of names (but not a file filter) that specifies the file(s) to take from the giver. Taken files arrive in the directory where you run take.
When you take a file (a copy), the giver always automatically retains the original. Because the transferred copy waits in a special temporary subdirectory, it may be removed if you delay retrieving it longer than the local purge interval. Options let you list (-l) given files awaiting retrieval or control how files are retrieved (-i). If you try to take a file with the same name as one already in your current directory, take warns you that "xxx already exists locally" and does not overwrite the existing file unless you have specified the -f option to force overwriting of files with conflicting names. Taken files arrive with you as the owner and group regardless of how the giver set the group and permissions on the original file.
For additional information and details about options, consult the man page for take.
Examples:
take mjones file1 | Take one file given by user mjones. |
take mjones | Take all files given by user mjones. |
take -i mjones | List all files given by user mjones with a query as to whether or not they should be taken. |
Quota Utility
The quota utility reports your current disk usage in bytes as well as the your current byte limits imposed on each file system. Output is presented in MB and GB units.
The syntax for quota is:
quota [-v]
where
-v displays disk usage and limit information on mounted global file systems (such as common home directories and /usr/workspace) only. Note: Run without arguments, quota only reports quota violations. It is intended to run in a login script to notify users when they need to take action.
For additional information and details about other options, consult the man page for quota.
Some exceptions:
For CORAL-1 systems (e.g. Lassen and Sierra), LC has developed a script called gquota which you will be able to use to see quota information on login nodes. The gquota script is located in /usr/bin.
For VAST, quotas are per-directory, and can be displayed using the “df” command:
df -h /p/vast1/gpt Filesystem Size Used Avail Use% Mounted on vastcz164-nfs.llnl.gov:/vast1 19T 0 19T 0% /p/vast1
This technique also works with /usr/gapps quotas.
Limit
The limit utility displays a brief summary of the current machine configuration limits. It can also be used to set those limits.
The syntax for limit (running tcsh or csh) is:
limit
In general, any field reported by limit (e.g., coredumpsize) can also be set if you use that field name as a limit option followed by a numerical value. Details vary by operating system (this includes which fields are really configurable, exact field names, relevant units, and allowed numerical ranges).
To display the summary of current machine configuration limits while running the bash or Bourne shell, type
ulimit -a
Du
The du utility reports the amount of disk usage of each file (recursively for directories and each of the individual subdirectories of those directories). If you have many (sub)directories, this can be a helpful aid in planning and monitoring your use of disk space because du reports information much more fine-grained than the summed reports from quota.
The syntax for du is:
du [options] [directory]
where
options
control the units that du uses and the level of detail in its disk-space reports. By default, du reports in 512-byte (=0.5-kbyte) blocks and overtly itemizes all the subdirectories of the target directory.
Two especially useful options are -k (causes du to report all disk usage in 1024-byte [= 1-kbyte] blocks) and -s (eliminates the default report of itemized subdirectories and just prints the total disk usage for the one directory you specify on the execute line).
directory
specifies (with a path name) which directory to report on. By default, du reports on your current directory when you run it.
For additional information and details about other options, consult the man page for du.
Df and Bdf
The df utility reports the free space on any mounted device (file systems and their associated directories), both in absolute terms and as a percentage of the total space available to users.
The syntax for df is:
df [options] [filesystem]
where
options
control the units that df uses and the level of detail in its disk-space reports. By default, df reports in 512-byte (= 0.5-kbyte) blocks on IBM machines and 1024-byte (= 1-kbyte) blocks on Linux/TOSS systems.
filesystem
specifies (with a path name) which file system to report on. By default, df reports on all currently mounted file systems, but it excludes any automounted file systems, such as the global file system supporting the common home directories. See quota as a supplement.
Details of df's reports vary by vendor (and hence by platform). Because typical df output lists huge, often awkwardly aligned byte counts, LC has deployed on each production machine a Perl script called bdf that reports comparable information to df but in easy-to-read standard units (PB, TB, GB, MB). The difference between output styles in shown in the examples below.
Examples:
df /var/tmp
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sda5 339626397 12627462 309742314 4% /
bdf /var/tmp
Filesystem total used free cap Mounted on
/dev/sda5 324G 13G 296G 4% /
For additional information and details about options, consult the man page for df.
Additional Tools
HTAR
HTAR (HPSS TAR) is TAR-like utility program developed by LC that makes TAR-compatible archive files with storage support and enhanced archive-management features. HTAR writes files to LC's archival storage (HPSS) or to other specified LC hosts.
HTAR's specially tuned features make it possible to:
- Bundle many small files together in memory (without using more local disk space, as standard TAR requires) for more efficient handling and transfer.
- Send the resulting large archive file directly to OCF or SCF storage (or to another LC machine if you use -F) without your needing to invoke FTP separately.
- Retrieve individual files from a stored archive without moving the whole large archive back to your local machine first (or, optionally, without even staging the whole archive to disk).
- Accelerate transfers to and from storage by deploying multiple threads and by automatically using as many parallel interfaces to storage as are available on the (production) machine where it runs.
- Easily create incremental backup archives to supplement a master archive with (only) files recently changed (with -n).
Consult the HTAR Reference Manual for usage suggestions, annotated examples, technical tips, full option details, and tips for circumventing known problems. The Hopper controller can also run HTAR using a GUI.
HSI
The Hierarchical Storage Interface (HSI) communicates with HPSS via a user-friendly, UNIX shell-style interface that makes it easy to transfer files and manipulate files and directories using familiar UNIX-style commands. HSI supports recursion for most commands, as well as csh-style support for wildcard patterns and interactive command line and history mechanisms. Directories and files can be listed using ls, and traversing directories can be accomplished using cd.
When HSI is launched, it performs the following actions:
- Parses command line options.
- Reads startup files (the user's $HOME/.hsirc, and the system-wide hsirc file that is optionally installed by the system administrator), if they exist. In general, most settings that are defined in the system-wide hsirc file can be overridden by the user's private .hsirc file.
- Authenticates using one of the mechanisms that were enabled when the application was compiled, such as Kerberos or a username/password combination.
For details about how to run HSI and a complete list of its commands, see the HSI documentation.
Hopper
Hopper offers a graphical interface to storage and other LC resources, including support for simple drag-and-drop file-transfer services using FTP, NFT, HSI, HTAR, etc. By invoking Hopper, you can do many file-management tasks graphically, including:
- Transfer files to and from storage.
- Synchronize directories between LC resources or your desktop and storage.
- Search for files in storage by name, size, age, or other criteria.
- Create, view, and extract from HTAR archives.
More general background information on Hopper is available at the Hopper Web site. See "Getting Started" for instructions on how to download Hopper to your local desktop machine.
Tools for Obsolete File Types
To help you properly identify still-needed files that were made on LC's former CRAY and CDC7600 computers, and to (optionally) convert some files that can be rescued, LC provides several tools for managing obsolete file types. These special tools usually perform familiar tasks, such as listing file formats or unpacking libraries, but on file types ignored by standard UNIX tools. See also the "CRAY File Conversions" section of the Using LC Print Services to learn how IBM's trans tool can sometimes help with such legacy files. Additional information about trans is also available on the trans Web page. Note: These tools are in /usr/global/tools/translators on the Linux clusters and /usr/local/tools/translators on the BG/P systems.
The available tools for obsolete files on LC production machines are:
lft [-c|-n] filelist
reports for each file in filelist two columns that specify each file's name and its file type.
- lft reports GIF, PDF, and PS files as ASCII.
- With -n, lft reports on CRAY and CDC7600 files without conversion.
- With -c, lft converts each (specified) CRAY or CDC7600 text file to UNIX format (but leaves all other files unchanged).
lib76 libname l|x|b|a [filelist]
processes CDC7600 lib-format and lix-format library files. Specify the library with libname and the files to process within it with filelist.
- Option l lists the files.
- Option x extracts them as text.
- Option b extracts them as binary files.
- Option a extracts them as ASCII and converts them to UNIX format.
If no filelist is specified, lib76 processes all files in the specified library.
Warning: Odd-number length files will gain four extra bits, and lib76 will overwrite without warning any existing local files with the same name as those it extracts.
nlib76 [ library-name ]
displays and extracts files from a CDC7600 library file. library-name is the name of such a file that resides in the current working directory; if omitted, nlib76 will prompt with "File?" for the name of a library file. Text files may be extracted as ASCII text; other files will require conversion by trans.