Formerly titled "EZSTORAGE," this user manual introduces tools for effectively storing and archiving files from Livermore Computing (LC) computers by using the High Performance Storage System (HPSS), also referred to herein as "storage." Individual reference manuals provide detailed technical instructions on the tools and techniques introduced in this user guide: FTP, NFT, HPSS, HTAR, HSI, and Hopper. Additionally, the Using LC File Systems document is a basic guide to using local directories and general file-handling software at LC. For help, contact the LC Hotline via e-mail or via telephone at 925-422-4531.

Reliable, massive, archival data storage is a crucial part of any effective high-performance computing environment. Although the actual disk and tape resources for storing files at LC are large and elaborate, the user interface is constrained to use either the FTP daemon—the protocol for FTP clients or local alternatives to FTP clients, such as NFT and parallel FTP (PFTP), or the Hierarchical Storage Interface Gateway Daemon (hsigwd)—the protocol for HSI and HTAR. Hopper can use either protocol depending on user settings.

Moving files to and from LC production machines, open or secure, is a mainstream storage mission, easy to perform and very reliable. Using file storage in this context avoids quotas on user home directories, avoids purges of files on temporary work disks, and provides virtually unlimited capacity for managing data or computational output. Transfer rates are fast, and FTP connections are very reliable. Customized FTP-daemon interfaces to handle special storage needs (such as NFT for persistent storage transfers or HTAR for very efficiently making large archives directly in storage) are available, too.

Moving files to and from other LLNL machines is more complex. Features of special FTP clients, together with the need to protect unusual file formats during transfer to or from storage, may call for taking extra steps.

Finally, moving files to and from non-llnl.gov machines, such as computers at other sites or the workstations of distant ASC collaborators, is the most complicated of the three situations. It requires either using a two-stage process or running extra enabling software such as VPN (Virtual Private Network). This may involve running FTP twice, or using non-FTP transfers to an LC production machine before actually storing the files with FTP (run on an LC machine).

Storage Summary

This section briefly summarizes the chief storage system constraints and tells how to perform the most important file-storage tasks at LC.

Storage System Constraints

HPSS has the storage system constraints noted below.

Constraint Type

Constraint Parameters

Largest allowed file size 100 TB (using FTP, NFT, HSI, or Hopper interface)
68 GB/member; 100 TB archive (using HTAR interface)
Longest file name 1023 characters (with HTAR, longest entry name or soft link is 100 characters)
Problem characters in file names
   Treated as file filters
   Forbidden first characters
   Forbidden in any position
? * {a,b}
- ! ~
* ? [ {

Common File Storage Commands

The following commands are used for common file storage tasks. These commands are also available graphically by using Hopper. To efficiently transfer a very large number of (related) files as a manageable archive or library, use HTAR.

Task

FTP

NFT

HSI

Connect to storage ftp storage nft hsi
Make storage directory mkdir dir mkdir dir mkdir dir
Change storage directories cd dir cd dir cd dir
Store a file put file put file put file
Retrieve a stored file get file get file get file
Retrieve from within a stored archive See HTAR See HTAR See HTAR
Delete a stored file delete file delete file delete file
List stored files dir dir ls
Change file permissions chmod chmod chmod
Change "class of service" (COS) site setcos setcos chcos
Start migration of stored file from tape site stage file

stage file

Control file overwriting
   Prevent overwriting
   Allow overwriting
 

[default]

noclobber
clobber
 

[default]

Storage Home Directories

Regardless of their access software (FTP, NFT, etc.), LC users arrive at HPSS in their storage home directory. This always has a path name of the form

/users/u[00-54]/username

where username is your LC login name (for example: /users/u34/jsmith). This basic directory structure supports customized division into subdirectories (e.g., by using the mkdir command) as well as access control of stored files.

Accessing Storage

Accessing storage is most easily done from an LC production machine but can be done from non-LC machines and from offsite in some circumstances.

When onsite, NFT, FTP, HSI, and Hopper can be used to transfer between and LC production host and storage. If onsite but not on an LC production host, use FTP or Hopper to transfer files to or from storage.

Offsite users may access storage only if connected to the LC network via a VPN client or if their network has a trust relationship with LC's network. See https://access.llnl.gov/ for information regarding VPN. See Access Information for access prerequisites and additional information about accessing LC systems. Offsite users are limited to FTP and Hopper for accessing storage.

Copies in Storage

Some files may be so important to your project that you want to store separate, duplicate copies on independent storage media. LC's Open Computing Facility (OCF) and Secure Computing Facility (SCF) storage systems offer such dual-copy storage using the "class of service" (COS) concept.

The storage server(s) assign a COS to every incoming file based on the file's size and the client that writes it:

  • Files written with FTP, NFT, or HSI that are smaller than 256 MB are automatically assigned a COS that provides two separate copies on separate storage tapes. For these files you never need to request duplicate storage.
  • Files written with FTP, NFT, HSI, HTAR and Hopper that are 256 MB or larger are assigned a COS that stores only a single copy. For mission critical files in this category, you can request dual-copy storage by using the FTP command

    site dualcopy

    or the NFT command

    dualcopy

    or the HSI command

    set cos=dualcopy

before you put the large file(s) into HPSS. NFT's dir command (if used with the -h option) reports the current COS for already stored files.

  • For mission critical files written with Hopper, you can request dual-copy storage for a Hopper session (the setting does not persist beyond a session) by setting HPSS Class of Service to Dual-Copy (under File, Preferences, General, HPSS).
    For mission critical files written with HTAR you can request dual-copy storage by using the command

    -Y dualcopy

    on the HTAR execute line that creates your stored archive.

For more COS technical details, consult the SETCOS section of LC's NFT User Guide.

Storage Interfaces

FTP

FTP is the standard interface to HPSS. When you run FTP (on an OTP or Kerberos-passworded LC machine) with storage as the target host, access is "preauthenticated" and you are not prompted for your password. Also, on all LC production machines (but not necessarily on other LC machines), a parallel FTP client (equivalent to PFTP) is the default. All files that are 4 MB or larger automatically move to or from storage using parallel FTP.

Note: Because the storage FTP daemon (based on the HPSS version) behaves differently from the other LC FTP daemons (based on the WU FTP daemon), users should be aware that "m" commands (mdelete, mget, mput, etc.) may produce unintended results. These "m" commands process multiple files by using as their argument either an explicit file list or a file filter (an implicit file list specified with one or more UNIX wildcard or metacharacters.) The best method for checking the behavior is to type ls pattern where pattern is what will be used with the "m" command. If ls pattern returns something unexpected, the pattern should be reformulated.

For more on FTP, please consult our FTP mini-manual.

NFT

NFT is a locally developed file transfer tool. Although NFT uses standard FTP daemons to carry out its file transfers, it offers enhanced features.

  • A special NFT server preauthenticates all NFT transfers, so all NFT executions are passwordless.
  • NFT elaborately tracks and numbers all transfers. It automatically persists if system problems delay storing any file, and it keeps detailed records of your file-storage successes and problems.
  • Input from and output to files is easy, and NFT's command syntax (unlike FTP's) lends itself to practical use in scripts and batch jobs.
  • Some NFT commands especially facilitate transfers to and from storage (so some users regard NFT as primarily a file-storage rather than a general file-transfer tool). Also, NFT automatically "routes" storage-related file transfers to take advantage of fast, jumbo-frame network connections whenever they are available (especially helpful for transfers between the Lustre parallel file system and storage).

For a complete analysis of NFT syntax and special features, along with a thorough alphabetical command dictionary, consult the NFT User Guide.

HSI

HSI provides a UNIX shell-style interface to HPSS and supports several of the commonly used FTP commands with the following differences:

  • The dir command is an alias for ls in HSI. The ls command supports an extensive set of options for displaying files, including wildcard pattern-matching and the ability to recursively list a directory tree.
  • The put and get family of commands support recursion.
  • There are "conditional" put and get commands (cput, cget).
  • The syntax for renaming local files when storing files to HPSS or retrieving files from HPSS is different from FTP. With HSI, the syntax is always local_file : hpps_file, and multiple such pairs may be specified on a single command line. With FTP, the local file name is specified first on a put command and second on a get command. For example, when using HSI to store the local file "file1" as HPSS file "hpss_file1" and then retrieve it back to the local file system as "file1.bak", the following commands could be used:

    put file1 : hpss_file1
    get file1.bak : hpss_file1

    With FTP, the following commands could be used:

    put file1 hpss_file1
    get hpss_file1 file1.bak

  • The "m" prefix is not needed for HSI commands; all commands that work with files accept multiple files on the command line. The "m" series of commands are intended to provide a measure of compatibility for FTP users.

For more on HSI, see our HSI mini-manual.

Hopper

Hopper offers a graphical interface to storage and other LC resources, including support for simple drag-and-drop file-transfer services using FTP, NFT, HSI, HTAR, etc. By invoking Hopper, you can do many file-management tasks graphically, including:

  • Transfer files to and from storage.
  • Synchronize directories between LC resources or your desktop and storage.
  • Search for files in storage by name, size, age, or other criteria.
  • Create, view, and extract from HTAR archives.

More general background information on Hopper is available at the Hopper page. See "Getting Started" for instructions on how to download Hopper to your local desktop machine.

HTAR

On LC production machines (but not at other ASC sites), HTAR is a separate, locally developed utility program that combines a flexible file bundling tool (like TAR) with fast parallel access (it acts as an alternative to the PFTP client) to storage that lets you store and selectively retrieve even very large sets of files very efficiently.

HTAR's enhanced features include the following:

  • Imposes a 100 TB limit on the total size of the archives that it builds and accepts input files (archive members) as large as 68 GB.
  • Uses a TAR-like syntax and supports TAR-compatible archive files.
  • Bundles files in memory using multiple concurrent threads and transfers them into an archive file built directly in storage by default (to avoid needing extra local online disk space).
  • Takes advantage of available parallel interfaces to storage to provide fast file transfers.
  • Uses an external index file to easily accommodate thousands of small files in any archive and to support retrieval of specified files from within a still-stored archive without first retrieving the entire archive from HPSS. (Warning: You can use filters such as * to create an HTAR archive, but you cannot reliably use filters to retrieve files from within an already stored HTAR archive. See the "Retrieving Files" section of the HTAR User Guide for possible workarounds.
  • Allows easily building and storing incremental archives (consisting of only recently changed files).

Complete details about using HTAR are available in the HTAR User Guide.

Sharing Stored Files

Sharing some stored files with one or several other users is one of the most common storage goals. You may also want to consider using other file-sharing techniques available on LC production machines. Consult the File-Sharing Alternatives section of Using LC File Systems for an overt analysis of several choices.

All sharing of stored files on LC's HPSS happens by means of storage groups. You and those with whom you want to share stored files must first find or create an LDAP (Lightweight Directory Access Protocol) storage group to which you all belong, assign the files to be shared and every parent directory of them to that common storage group, and open the file and directory permissions (of the whole tree) to allow group reads (executes, or writes).

NOTEusers on the Restrict Zone (RZ) may not share stored files in their home directory.

Using Storage Groups

A group is just a named set of users that agree among themselves to optionally allow (some of their) files to be readable, or even writable, by all group members. At LC, groups are obtained from LDAP. However, a file loses its group status at the time you store it, so you must arrange the sharing of stored files by working exclusively with groups. For basic information about using groups, see the Using Groups section of Using LC File Systems.

Setting Stored-File Permissions by Group

Once you have the files you want to share and the name of a group to whom all sharing users belong (see the previous subsection), you can follow these steps, all involving (somewhat unusual) FTP commands, to enable the sharing of stored files.

1. Open an FTP session to storage.

ftp storage

2. Create a storage directory to hold the shared files. In this example, the shared-files directory is called "share" and the shared file is called "share.code." In your FTP session type

mkdir share

3. Assign your storage home directory to the share group. For example, if your default arrival directory in storage is /users/u34/jfk and if the storage group containing all the file-sharing users is "sgroup," then use this FTP command

chgrp sgroup /users/u34/jfk

to associate the two. One side effect is that you cannot share with two different groups at once.

4. Assign your file-sharing directory to the share group. Because you made the share directory as a child of /users/u34/jfk in step 2, you can now associate it, too, with the file-sharing storage group sgroup:

chgrp sgroup share

5. Assign group permissions to the file-sharing directory. To allow other members of storage group sgroup to read, write, and execute (list) the file(s) in the share directory, use this FTP command

chmod 775 share

to expand its default group permissions.

6. Store the files to be shared. If you move (cd) to the file-sharing directory and put the file(s) to be shared, they will lose their online permissions but they will arrive associated with the share group sgroup, which they inherit from the file-sharing directory:

cd share

put share.code

put ... [if there are more files to share]

7. Assign group permissions to the file(s) to be shared. Even if their online permissions allowed sharing by group, storing the file(s) erased those decisions. So as with step 5 above, you need to declare the availability of each file to the members of sgroup:

chmod 775 share.code

Reading Shared Stored Files

After you have used the previous two subsections to enable others in storage group sgroup to share the file(s) in the share directory, they can follow these steps to retrieve those file(s):

ftp storage

cd /users/u34/jfk/share

get share.code

Note that attempts to directly get the file /users/u34/jfk/share/share.code (while in another storage directory) may misleadingly fail with the message "no such file or directory."

HPSS: High Performance Storage System

HPSS user documentation is available from the HPSS Collaboration website