HTAR ("HPSS Tape Archiver") is an LC-designed TAR-like utility program that makes TAR-compatible archive (library) files but with High Performance Storage System (HPSS) support and enhanced archive-management features. HTAR's enhancements include its ability to:
- Bundle many small files together in memory (without using more local disk space, as standard TAR requires) for more efficient handling and transfer.
- Send the resulting large archive file directly to storage without your needing to invoke FTP separately.
- Retrieve individual files from a stored archive without moving the whole large archive back to your local machine first (or, optionally, without even staging the whole archive to disk).
- Accelerate transfers to and from storage by deploying multiple threads and by automatically using as many parallel interfaces to storage as are available on the (production) machine where it runs.
- Easily create incremental backup archives to supplement a master archive with (only) files recently changed (with -n).
The TAR and HTAR Compared section compares traditional UNIX TAR with LC's enhanced HTAR feature by feature to reveal the value of this added tool, but in general HTAR maintains full output compatibility with the POSIX 1003.1 standard TAR format while successfully archiving hundreds or even thousands of incoming files and handling files of greatly mixed sizes or types. In most cases, creating a stored archive directly using HTAR will be much faster than either creating a local TAR file and then copying it to storage with HSI or piping TAR output into an FTP connection to HPSS or into HSI.
HTAR can store archive-member files as large as 68 GB. There is no maximum size for a whole HTAR archive other than site-imposed restrictions (100 TB) or amount of space available. HTAR makes two copies of each stored archive by default only for files up to 256 MB; you can request dual-copy storage (for extra safety) of a mission critical archive of any size by using HTAR's -Y dualcopy option. (The HTAR -Y dualcopy option can also be specified to pick the class of service (COS) for either the archive file, the index file, or both, or to specify automatic COS selection.) NFT's DIR -h command-with-option combination reveals the COS value of stored files (in output column 3). FTP and HSI can reveal COS, too.
Because HTAR combines two features usually separate (file bundling and file storage), the How HTAR Works section explains the relationship among the three files (the archive file, the index file, and the consistency file) that HTAR uses. See How to Run HTAR for information about how to run HTAR, common error conditions, known limitations (with work-arounds), and HTAR environment variables. The HTAR Options section describes the function of each HTAR option (distinguishing the required action options from the control options). HTAR Examples gives annotated step-by-step examples of how to use HTAR to handle common file-archiving tasks and problems.
HTAR users may also benefit from familiarity with another LC-developed specialty tool that provides nonstandard file-handling and file-transfer features linked to file storage, namely NFT (see the NFT Reference Manual for details). HTAR itself does not, however, use NFT's "persistence" mechanisms to manage file-storage delays. For a general introduction to LC storage tools and techniques, see Using LC Archival Storage. Hopper, LC's graphical file-handling interface, can also serve as a front end for HTAR in situations where Hopper's scalability limits are not too severe. Also of interest to the HTAR user is the HSI utility, which provides a user-friendly UNIX-style interface to storage. HSI can recursively store, retrieve, and list entire trees with a single command. Consult the HSI User Guide for details.
For help, contact the LC Hotline at 925-422-4531 (OCF e-mail: lc-hotline@llnl.gov, SCF e-mail: lc-hotline@pop.llnl.gov).
How HTAR Works
HTAR makes an archive (or library) file in the standard POSIX 1003.1 TAR format, which allows TAR to open any HTAR archive file. Because HTAR offers more services than ordinary TAR, it needs extra internal machinery to support those services, some of which reveals itself in HTAR status messages or command responses. This section briefly explains how HTAR makes an archive file and the role that several support files play in that process.
- Archive File (name.tar): When you run HTAR with the create-archive (-c) option, the program first opens a connection to storage (HPSS). It then deploys multiple threads to transfer in parallel (but not with PFTP) the local disk files that you specify into a TAR-format envelope file created (unless you request otherwise) in your storage home directory. This archive file never exists on local disk (unless you demand it with the -E option), even in temporary directories on the machine where HTAR runs. Instead, HTAR reads the member files piecewise into its internal buffers and moves the data directly to HPSS, where it assembles the archive. HTAR simultaneously builds a separate index file (outside the archive) and a little consistency file (deposited last inside the archive), discussed below. Archives smaller than 256 MB are automatically assigned a COS that provides two separate copies on separate storage tapes. For files of special importance (only), use HTAR's -Y dualcopy option to force creation of a duplicate (invisible) backup copy. Use -K to verify your archived results. Note: The .tar suffix is not required for the archive file name, but it may be useful to the user as an indication of file type.
- Index File (name.tar.idx): To support the direct extraction of any stored archive member(s) without retrieving the whole archive to local disk, HTAR automatically builds an external index file to accompany every archive that you create. While making the archive, HTAR temporarily writes the index file to the local /tmp file system on the machine where it runs, then transfers it (by default) to the same storage directory where the archive itself resides at the end of the process. Each HTAR index file contains one 512-byte record for every member file, directory entry, or symbolic link stored in the corresponding archive file, regardless of the member file's size (so even a 10,000-file archive will have an index file of only about 5 MB). HTAR index files are so much smaller than the archives that they support that the index file often remains on HPSS disk (to rapidly respond to queries) even when the larger archive file itself migrates to storage tape. If you use HTAR's -E option to force the archive to local disk, the index file is written to the same location as the archive file.
- Consistency File (/usr/tmp/HTAR_CF_CHK_nnnnn_mmmmmmmmm): Because the archive and index files are separate, HTAR maintains a consistency check between them in an additional 1-block (256-byte) file always included (as a last step) at the end of each archive. This consistency file's name has the long numerical format shown above, but it begins with /var/tmp/uname. HTAR never extracts this file (unless you specifically request it), but every use of -t and -v (together with -c or -x) reports this consistency file at the end of HTAR's list of archived contents. (Verification option -K neither reports this consistency file nor counts it.)
TAR and HTAR Compared
HTAR is specifically designed to efficiently store a set of files together in HPSS or get them back (not merely to make an archive file and leave it).The table below compares the TAR features and effects with those of HTAR.
Feature |
TAR |
HTAR |
---|---|---|
Can create an archive file without storing it? | Yes (the default) | With -E |
Can create an archive file without using local disk space? | No | Yes (the default) |
Can store an archive file while creating it? | No, needs FTP | Yes (the default) |
Can read any TAR archive file? | Yes (the default) | Yes, if -X first |
Can read any HTAR archive file? | Yes | Yes (the default) |
Can extract just one file from a stored archive? | No | Yes (the default) |
Can add file(s) to an existing archive? | Yes | No |
Default target if no archive specified? | Yes (tape) | No, -f required |
Treats input directories recursively? | Yes | Yes |
Preserves original permissions on files? | No (uses UMASK) | Yes, with -p |
Depends on HPSS availability to work? | No | Yes |
Archive duplicated automatically in storage? | No | Only with -Y dualcopy |
Builds and needs an external index file? | No | Yes |
Builds and needs a consistency check file? | No | Yes |
Overwrites existing files without warning? | Yes | Yes (-w disables) |
Can use standard input or output? | Yes (with -f -) | Yes (with -L, -O) |
Order of options important? | Somewhat | Somewhat |
Table of contents (-t) reveals what? | File names only | File names and properties |
Can create and verify CRC checksums of member files? | No | Yes |
Can verify contents of a newly created archive as part of creation operation? | No | Yes |
How to Use HTAR
HTAR Execute Line
To run HTAR you must log on to an LC production machine where HTAR has been installed at a time when the storage system (HPSS) is up and available to users. The HTAR execute line has the general form
htar action archive [options] [filelist]
and the specific form
htar -c|t|x|D|K|U|X -f archivename [-BdEhHILmMoOpPSTvVwY] [flist]
where exactly one action and the archivename are always required, while the control options and (except when using -c) the filelist (or flist) can be omitted (and the options can share a hyphen flag with the action for convenience). See the HTAR Options section for details.
Syntax Issues
Traditional TAR is such an old utility that syntax differences have evolved under different versions of the UNIX operating system. Linux at LC offers some different TAR options and uses some of the same options (such as -L) for different purposes. (Refer to the comparison of TAR and HTAR features.) Generally, HTAR syntax follows the more restrictive implementations of TAR. Thus, with HTAR:
- One "action" (-c|t|x|X|K) is always required, but it need not come first on the HTAR execute line. However, if the first option on the execute line starts without a minus sign but is an HTAR action character, it is treated as if the option did start with a minus sign. For example, the following two command lines are equivalent:
htar -c -v -f abc.tar * htar cv -f abc.tar *
- The archive specifier -f is always required and it must immediately precede its argument (-f archivename), regardless of where that pair falls on the HTAR execute line.
- Any HTAR flag character that requires an argument, such as -L pathname, requires that the argument immediately follow the option character, with or without preceding white space.
- All HTAR options, whatever their order, must precede the first member file name (all options must precede flist or any filters that take the place of flist).
- Options may share the flag character (-) as long as the other rules above are also followed. Thus, these three combinations
htar -c -v -f abc.tar * htar -cvf abc.tar * htar -v -f abc.tar -c *
are all equivalent, acceptable HTAR execute lines.
Defaults
Directories. By default, HTAR creates an archive by copying files from the online directory where you run it into a file in your storage (HPSS) home directory, and it extracts files by reversing that process. You must always specify the name of the archive file on which HTAR operates (there is never a default archive). In its reports, HTAR appends slash (/) to each directory name listed.
File Names. Once you name the archive, HTAR calls the corresponding external index file archivename.idx by default and stores it in the same HPSS directory as the archive (by default). HTAR's -I option lets you specify a nondefault name or location for the index file. The HTAR consistency file's name begins with with /var/tmp/uname/HTAR, where uname is your login name on the machine where you run HTAR.
Class of Service (COS). By default, HTAR stores two copies of each archive in HPSS for files up to 256 MB; you can request dual-copy storage of any mission critical HTAR archive, regardless of its size, by using the -Y dualcopy option on HTAR's execute line. NFT's command DIR -h reports the COS for stored files (in output column 3). FTP and HSI also report COS.
Executing HTAR Using Hopper
Hopper is a graphical front end for several file-transfer tools (FTP, NFT, HTAR, HSI) that is installed on all LC production machines. With Hopper you can create HTAR archives by graphically dragging files and directories to the Hopper storage window, and you can just as easily extract contents of HTAR archives. For more details on Hopper, type "man hopper" on an LC host, use Hopper's built-in help package, or visit the Hopper Web pages.
HTAR Error Conditions
HTAR prefixes all ordinary messages with the string 'HTAR:', but it prefixes nonfatal errors with 'INFO:' and fatal errors with 'ERROR:'. Unexpected situations are usually flagged with a '###WARNING' prefix. The most common error conditions and HTAR's responses to them are summarized here to help you troubleshoot:
Storage (HPSS) is down. When HPSS is unavailable to users, no stored archive can be read or written. HTAR returns a message of this form and ends. (There is no persistence as with NFT.)
hpssex_OpenConnection: unable to obtain remote site info result = -5000, errno = 0 Unable to setup communication to HPSS. Exiting...
Specified archive directory does not exist. If -f specifies a child directory (of your storage home directory) that you have not previously created (with FTP's or HSI's mkdir option), when you attempt to create an archive in a nonexistent (sub)directory, HTAR responds:
***Error -2 on hpss_Open (create) for archivename
When you attempt to extract files from an archive in a nonexistent (sub)directory, HTAR replaces the first line of this error message with:
***Fatal error opening index file archivename.idx
Specified archive file does not exist. If -f specifies an archive file that does not exist (perhaps because you deleted it or mistyped its name), HTAR responds:
[FATAL] no such HPSS archive file: archivename
Specified index file does not exist. If you try to list (-t) or extract (-x) files from an actual HTAR archive whose corresponding external index file (archivename.idx) has been deleted or moved, HTAR pinpoints the problem only by reporting the missing index name:
No such file: archivename.idx
You can work around the missing index by using HTAR's -X option to rebuild the index while the archive remains stored, or you can retrieve the whole archive from storage with FTP or HSI and then open it with TAR.
HTAR's filelist omitted. If you try to create (-c) an archive without specifying a filelist (or without using a filelist replacement such as -L), HTAR connects to HPSS but quickly ends with the message
Refusing to create empty archive.
If some of the files are omitted from the filelist, HTAR creates the tar and exits with the message
HTAR: HTAR SUCCESSFUL
HTAR: SOME FILES WERE OMITTED. PLEASE LOOK AT THE WARNING/INFO MESSAGES ABOVE.
If you try to list (-t) or extract (-x) without specifying a filelist, HTAR defaults to processing all files in the archive.
HTAR run with no options. Because HTAR requires one action (-c|t|x|X|K) and a specified archive file (-f) to run, executing the program with nothing else on the execute line yields a terse syntax summary. There is no prompt for input, and HTAR terminates.
Command line too long for shell. The easy way to build an HTAR archive of very many like-named files is to specify them indirectly by using a UNIX metacharacter (filter, wild card) such as * (to match any string) or ? (to match any single character). But if the selected file set has thousands of members, the list of input names that the UNIX shell generates by expanding such an "ambiguous file reference" may grow too long to handle. See the Limitations and Restrictions section below for several ways to work around such excessively long command lines when building large archives with HTAR.
Wild cards (metacharacters) used for retrieval. HTAR allows * only to create an archive, not to retrieve files from one ("no match" is the usual, but not the only possible, error message). See the Retrieving Files example.
HTAR Limitations and Restrictions
The current version of HTAR has the following known limitations or usage restrictions.
I/O
You can redirect any HTAR output into a file (with >) for separate postprocessing (see the Retrieving Files section for one helpful application of this), but HTAR normally does not read from or write to UNIX pipes (standard input, standard output). Two HTAR control options, however, let you enable the use of pipes if you need them:
- Read From Standard Input. Use HTAR's -L inputfile option with a hyphen as the input file (that is, -L -) to read a list of files from standard input instead of from the usual execute-line filelist. The "Too Many Names" discussion later in this section shows how to apply this technique to solve a practical problem when creating archives with very many input files
- Write To Standard Output. Use HTAR's -O option to write a file extracted with the -x option to standard output. Thus,
htar -xf abc.tar -O def
extracts file DEF from archive ABC.TAR in your storage home directory and displays it at your terminal, while
htar -xf abc.tar -O def | wc
instead reports DEF's line, word, and character count. Because HTAR does not separate files in the output stream, this usually is useful only when extracting a single file.
Metacharacters
HTAR leaves all processing of metacharacters (filters or wild cards, such as *) to the shell. This means that when you create an HTAR archive you can use * to select from among your local files to store, but when you retrieve specific files from within an already stored archive you cannot use * to select from among the stored files to retrieve. See the Retrieving Files example for details on this limitation and a few suggested ways to work around it. Another side effect of this approach to metacharacters is that C shell (csh) users must type the three-character string -\? (instead of -?) to display HTAR's help message.
Updates
No options exist to update (replace), remove (delete), or append to individual files that HTAR has already archived. You must replace (create again) an entire archive to alter the member files within it.
Name Length
To comply with POSIX 1003.1 standards regarding TAR-file input names, the longest input file name of the form prefix/name that HTAR can accept has 154 characters in the prefix and 99 characters in the name. Link names likewise cannot exceed 99 characters.
File Size
The maximum size of a single member file within an HTAR archive is 68 GB. HTAR's maximum size for an archive file is 100 TB, and local disk space (when using -E) or storage space might externally limit an archive's size. Users can specify a maximum number of member files per archive with HTAR's -M option.
Passwordless FTP
Because HTAR (unlike FTP) does not support user dialog with a server and has no password-passing option, you can only manipulate HTAR archives on machines with preauthenticated (passwordless) FTP notrun the PFTP client.
Too Many Names
For users who make HTAR archives containing thousands of files, limitation is of the UNIX shell rather than of the HTAR program itself. One would normally select multiple files for archiving by using a UNIX "ambiguous file reference," a partial file name adjacent to one or more shell metacharacters (or "wild card" filters, such as the asterisk). Your current shell automatically expands the metacharacter(s) to generate a (long) alphabetical list of matching file names, which it inserts into the execute line as if you had typed them. Thus,
htar -cf test.tar a*
might become equivalent to a command line with dozens of a-named files on the end. Each shell has a maximum length for execute lines, however, and if your specified metacharacter filter matches thousands of file names, HTAR's execute line may grow too long for the shell to accept, which would prevent building your intended many-file archive.
The most effective, least resource-intensive way to work around the problem of having a (virtual) HTAR execute line too long for the shell to handle is to plan ahead and keep (or generate) in a single directory all and only the files that you want to archive. HTAR processes directory names recursively by default, so, if you specify only the relevant directory name on HTAR's execute line, HTAR will (internally) archive every file within the directory without any filter-induced length problems. For example,
htar -cf test.tar projdir
will successfully archive any number of files within the "projdir" directory (and use no shell-mediated file-name generation to do it).
The UNIX find utility is designed to produce lists of files (that meet specified criteria) to feed into other programs for further processing and so offers a second way for HTAR to archive very large numbers of files without having a very long execute line. Indirection is required for success, however. Enable find to pipe standard input directly into HTAR by invoking HTAR's -L option with a hyphen (-) argument instead of a file name. The correct sequence is:
find . -name 'a*' -print | htar -cf test.tar -L -
The find -name option generates the list of matching names internally without expanding find's execute line, and the use of the metacharacter * in the execute line does not pose the same too-long problem as it did originally in HTAR's execute line because the surrounding quotes shelter the filter from shell processing. If you need to keep the list of input names (for verification or audit purposes, for example), you could break this single line into two equivalent steps mediated by a helper file (here called "alist") that you preserve.
find . -name 'a*' -print > alist htar -cf test.tar -L alist
HTAR Options
Action Options
One of these action options is required every time that you run HTAR.
-c
(create) opens a connection to storage, creates an archive file at the storage location (not online) and with the name specified by -f, and transfers (copies) into the archive each file specified by filelist (required whenever you use -c). If archivename already exists, HTAR overwrites it without warning. To create a local archive file instead (the way TAR does), also use -E. If filelist specifies any directories, HTAR includes them and all of their children recursively. Use -P with -c to automatically create all needed subdirectories along the archive path name.
-D
(soft delete) opens a connection to storage and reads the existing index file, creating a new temporary index file in the local file system and marking each of the specified member files as deleted in the new index file. It then replaces the existing index file with the new temporary copy.
-K
(verify) opens a connection to storage, verifies the index file for the archive that you specify with -f, then uses the index file to verify every entry in (member of) the archive file itself. The default responses from -K appear very quickly and overwrite, so you may only be able to read the last one ("HTAR successful," if it is). If the index file is missing for an archive, -K reports the error message "no such file archivename.idx." If you combine -K with -v, HTAR lists the name of each file that it finds in the specified archive in alphabetical order, one per line, along with the size of each in bytes and in blocks (excluding the consistency file), then gives a total file count.
-t
(table of contents) opens a connection to storage, then lists the files currently within the stored archive file specified by -f, along with their owner, size, permissions, and modification date (the list includes HTAR's own consistency file). Here filelist defaults to * (all files in the archive), but you can specify a more restrictive subset (usually by making filelist a filter).
-U
(undelete) undeletes the specified member files from the archive that were previously soft-deleted by -D by removing the deleted flag in their index file entries.
-x
(extract) opens a connection to storage, then transfers (extracts, copies) from the stored (remote) archive file specified by -f each internal file specified by filelist (or all files in the archive if you omit filelist). If filelist specifies any directories, HTAR extracts them and all their children recursively. If any file already exists locally, HTAR overwrites it without warning, and it creates all new files with the same owner and group IDs (and if you use -p, with the same UNIX permissions) as they had when stored in the archive. (If you lack needed permissions, extracted files get your own user and group IDs and the local UMASK permissions; if you lack write permission then -x creates no files at all.) Note that -x works directly on the remote archive file; you never retrieve the whole archive from storage just to extract a few specified files from within it.
-X
(index) opens a connection to storage, then creates an (external) index file for the existing archive file specified by -f (a stored TAR format file by default or a local TAR-format file if you also use -E). Using -X rescues an HTAR archive whose (stored) index file was lost, and it enables HTAR to manage an archive originally created by traditional TAR. The resulting external index file is stored if the corresponding archive is stored, but local if the archive is local (with -E). See the How HTAR Works section for an explanation of HTAR index files.
Archive Option
This option is required every time you run HTAR unless you only use the -? option.
-f archivename
(required option) specifies the archive file on which HTAR performs the action options -c|t|x|X|K. HTAR has no default for -f (whose argument must appear immediately after the option name). Because HTAR operates on stored archive files, archivename also locates the archive file relative to your HPSS home directory: a simple file name here (e.g., abc.tar) resides in your storage home directory, while a relative pathname (e.g., xyz/abc.tar) specifies a subdirectory of your storage home directory (i.e., /users/unn/username/xyz/abc.tar). Never use tilde (~) in archivename. HTAR's -f makes no subdirectories; you must have created them in advance.
Control Options
Control options change how HTAR behaves, but they are not required. Default values are indicated when they exist.
-?
displays a short syntax summary of the HTAR execute line and a one-line description of each option. Users running HTAR under some shells may need to protect the question mark by using the three-character string -\? to display this help message.
-B
adds block numbers to the listing (-t) output.
-d debuglevel
(default is 0) sets to an integer from 0 through 5 the level of debug output from HTAR, where 0 disables debug information for normal use and 1 to 5 enable progressively more elaborate debug output.
-E
emulates TAR by forcing the archive file to reside on the local machine (where you run HTAR) rather than in HPSS (storage), where it resides by default (-f always specifies the archive pathname, which -E interprets as local rather than remote). The HTAR index file goes into the same (local) directory as the archive. Option -P works with -E.
--exclude options
Note: The exclude family of options only applies to creation of new archive files. These options are applied during the initial directory scan. The specification (but not the code) for these options was based on the "exclude" feature of the popular GNUTAR program. However, there is no guarantee that the HTAR exclude feature will operate exactly the same as the GNUTAR exclude feature.
Files and directories that are excluded from the archive are not listed by default. To enable listing of excluded files, create an .htarrc file in your home directory with the following contents:
DisplayExcludedObjects = yes
--exclude=pattern
causes htar to recursively avoid including files or directories whose name matches the shell wildcard pattern. Multiple --exclude options may be given.
--exclude-from=file
causes htar to read a list of shell patterns from file to be recursively excluded.
Note: a frequent error that can be hard to find is whitespace characters after a name read from the file. However, empty lines are ok.
Multiple --exclude-from options may be given.
--exclude-vcs-ignores
Before htar archives a new directory found during the prescan, it looks to see if the directory contains any of the following files:
cvsignore, .gitignore (R), .bzrignore (R), .hgignore(R)
If so, it reads patterns from the file and ignores objects that match any of the patterns.
It treats the files in the same way that the version control system would treat them, some recursively starting at the new directory (marked with R) and some that apply just to the new directory. Patterns in .bzrignore and .hgignore files can be either shell globbing patterns or regular expressions. .bzrignore and .hgignore files can also contain comments whose first character is ’#’.
--exclude-ignore=file
Before scanning a new directory, htar checks if it contains file. If so, it reads exclusions patterns, which apply only to new directory, from file.
--exclude-ignore-recursive=file
This is the same as exclude-ignore, except that patterns apply recursively to the new directory and to all of its subdirectories.
--exclude-vcs
Excludes files and directories used by the following version control systems:
CVS, RCS, SCCS, Arch, Bazaar, Mercurial and Darcs. This includes all of the following files and directories:
- ‘CVS/’, and everything under it
- ‘RCS/’, and everything under it
- ‘SCCS/’, and everything under it
- ‘.git/’, and everything under it
- ‘.gitignore’
- ‘.cvsignore’
- ‘.svn/’, and everything under it
- ‘.arch-ids/’, and everything under it
- ‘{arch}/’, and everything under it
- ‘=RELEASE-ID’
- ‘=meta-update’
- ‘=update’
- ‘.bzr’
- ‘.bzrignore’
- ‘.bzrtags’
- ‘.hg’
- '.hgignore’
- ‘.hgtags’
- ‘_darcs’
--exclude-backups
causes htar to exclude backup and lock files that match the following shell globbing patterns (with the quotes removed): ".#*" "*~" "#*#"
--exclude-caches options
causes htar to exclude directories that contain a standard CACHEDIR.TAG file, in the form specified by http://www.brynosaurus.com/cachedir/spec.html
There are 3 variations of the exclude-caches option, each with slightly different semantics:
- --exclude-caches – do not archive the contents of the directory, but archive the directory itself and the CACHEDIR.TAG file
- --exclude-caches-under – do not archive the contents of the directory, nor the CACHEDIR.TAG file, archive just the directory itself
- --exclude-caches-all – entirely omit directories containing the CACHEDIR.TAG file
--exclude-tag options
is a generalization of the ’exclude-caches’ option which allows specifying the filename to look for (instead of CACHEDIR.TAG).
- --exclude-tag=file – do not archive the contents of the directory, but archive the directory itself and file
- --exclude-tag-under=file – do not archive the contents of the directory, nor file, archive just the directory itself
- --exclude-tag-all=file – entirely omit directories containing file
-h
(used only with -c; has no effect otherwise) for each symbolic link that it encounters, causes HTAR to replace the link with the actual contents of the linked-to file (stored under the link name, not under the file's original name). Later use of -t or -x treats the linked-to file as if it had always been present as an actual file with the link name. Without -h, HTAR records, reports, and restores every symbolic link overtly, but it does not replace the link with the linked-to contents.
-H subopt[:subopt...]
specifies a colon-delimited list of HTAR suboptions to control program execution. Possible subopt values include:
acct=id/acctname
specifies the numeric account ID or alphabetic account name to use for the current HTAR run. This option is only meaningful for HPSS-resident archives.
cix
used with the extract (-x) operation with HPSS-resident archives. If specified, precopies the index file to a temporary local file before reading the archive file. This option is normally not needed, but was added to avoid problems that were encountered with multithreaded I/O on some hardware platforms.
crc
enables generation of Cyclic Redundancy Checksums (CRCs) when copying member files into the archive and when verifying the contents of the archive (-K command line option, or -Hverify option for creates). Enabling checksums usually degrades HTAR's I/O performance and increases its CPU utilization.
exfile=path
specifies a path name to an "exceptions" file, which contains a list of failed member files and an explanation of the failure. Note: This option is currently implemented only for the GPFS/HPSS Interface (GHI).
family=id[,index_id]
specifies tape file family ID to use when creating HPSS-resident archive files, and, optionally, the family ID to use when creating the index file. This option is useful at sites which make use of the HPSS "file family" capability. Family ID 0, which is the default, uses the default pool of tapes. Contact your HPSS administrator to determine the file families that are available at your site.
nocfchk
causes HTAR to disable the verification of the index file and the consistency file. Use of this option can avoid extra tape mounts if the consistency file lives on a different tape cartridge than the specified member file(s). Currently, this option is only effective for the -D (soft delete) action.
nocrc
(the default) disables generation of CRCs when creating files and when extracting files from or verifying existing archive files.
nostage
avoids prestaging tape-resident (stored) archive files when HTAR performs -x or -X actions.
port=x
Specifies the TCP port number to use when HTAR connects to the remote HPSS server. This parameter is only used in conjunction with the - -Hserver parameter.
relpaths
used with the verify (-K) action. When comparing member files in the archive file with local files, forces relative local file paths to be used by removing any leading "/" from the member file path name before attempting to read it in the local file system.
rmlocal
removes local member files after HTAR has successfully written both the archive file and the index file (used with -c).
server=host
specifies the hostname or TCP/IP address of the HPSS server. The HPSS administrator defines the default server host or IP address when HTAR is built. The -Hport parameter (see above) can be used in conjunction with this option to completely specify the connection address to be used.
tss=stack_size
specifies the thread stack size to be used when HTAR creates threads to read local files during a create (-c) operation. In most cases, the system default value can be used, but situations such as the case where the default thread stack size is set very large, for example, on machines that are tuned for compute-type problems, can cause HTAR thread creations to fail. stack_size can be specified in bytes, kilobytes, or megabytes by appending a case-insensitive suffix (k, kb, m, or mb).
umask=octal_mask
used with the -c option. This specifies the HPSS umask value to be set during HTAR startup. This impacts the permissions that are set on the resulting archive and index files that HTAR creates in the same manner as the Unix umask command.
verify=option[,option,..]
specifies one or more verification options that should be performed following successful creation of the archive (-c), or for the verify (-K) command. Multiple options can be specified by separating them with a comma, with no whitespace. Options are processed from left to right, and, in the case of conflicting options, the last one encountered is used without comment. The options can be either individual items or the keyword "all" or a numeric value of 0, 1, or 2. Each numeric level includes all of the checks for lower-valued levels and adds additional checks. The verifications options are:
all
enables all possible verification options except paranoid.
info
reads and verifies the tar-format headers that precede each member file in the archive.
crc|noncrc
enables or disables recalculation of the cyclic redundancy checksum (CRC) and verification that it matches the value that is stored in the index file. Note that this option only applies if the -Hcrc option was specified, which causes a CRC to be generated for each member file as it is copied into the archive file.
compare|nocompare
enables or disables byte-by-byte comparison of the local member files with the corresponding archive files. If -Hrelpaths is not specified, then absolute paths for member files in the archive will also be treated as absolute local paths.
paranoid|noparanoid
enables or disables (the default) extreme efforts to detect problems (such as discovering whether local files were modified during archive creation before deleting them if authorized by RMLOCAL).
0|1|2
0 enables the "info" verification. 1 enables level 0 and "crc" (i.e., info,crc). 2 enables level 1 and "compare" (i.e., info,crc,compare). It is also possible to specify a verification option such as "all" or a numeric level such as 0, 1, or 2, and then selectively disable one or more options.
-I indexname
specifies a nondefault name for the HTAR external index file that supports the archive specified by -f.
WARNING: if you use -I to make any nondefault index name (3 cases, below) when you create (-c) an archive, then you MUST also use -I with the same argument every time you extract (-x) files from that archive (else HTAR will look for the default index, not find it, and end with an error).
There are three cases based on the first character of indexname:
. (dot)
If indexname begins with a period (dot), HTAR treats it as a suffix to append to the current archive name.
Example: -I .xnd yields an index file called archivename.xnd
/
If indexname begins with a / (slash), HTAR treats it as an absolute path name (you must create all the subdirectories ahead of time with FTP's or HSI's mkdir option).
Example: -I /users/unn/yourname/projects/text.idx uses that absolute path name in storage (HPSS) or the local file system (-E) or remote file system (-F) for the index file.
other
If indexname begins with any other character, HTAR treats it as a relative pathname (relative to the storage directory where the archive file resides, which might be different than your storage home directory).
Example: -I projects/first.index locates first.index at storagehome/ projects/first.index if the archive file is in your storagehome (the default), but tries to locate first.index at storagehome/projects/projects/ first.index if the archive was specified as -f projects/aname in the first place. (All such subdirectories must be created in advance or the -P command line option must be specified to create any missing intermediate subdirectories.)
-L inputfile
(used with -c) writes the files and directories specified by their literal names (in the inputfile, which contains file names one per line) into the archive specified by -f. Directories are treated recursively; a directory entry and its subdirectories or subfiles are all written to the archive. Normal metacharacters (tilde, asterisk, question mark) are treated literally, not expanded as filters. Replace inputfile with a hyphen (-L -) for HTAR to read the list of file names from standard input; the HTAR Limitations section shows how to use this technique.
(used with -x) retrieves the files and directories specified by their literal names. See the Retrieving Files example below for how to use -L instead of wild cards to retrieve only specified files from a stored archive.
WARNING: HTAR's -L differs from both AIX TAR's -L (which handles directories nonrecursively) and Linux TAR's -L (which changes tapes).
-m
(used only with -x; applies only to files) makes the time of extraction the last-modified time for each member file (the default preserves each file's original time of last modification). For directories, HTAR itself always preserves the original modification time for top-level directories that it copies from an archive, even if you invoke -m. However, subsequently creating subdirectories or files within a directory may cause the operating system to change the modification time on one or more directories (so that it too appears to be the time of extraction).
-M maxfiles
(default is 10,000,000 at LLNL) specifies the maximum number of member files allowed when you use -c to create an HTAR archive. Internal limits are set when HTAR is compiled at each site; at LLNL, you can increase maxfiles as high as 50,000,000.
-n timeinterval
(used only with -c; has no effect otherwise) includes in a new archive only those files (that meet your other naming criteria and) that were either created or modified between now and the start of timeinterval. Option -n is intended mostly to simplify the creation of incremental backup archives. Here timeinterval can have the form:
d
an integer that specifies days (e.g., 5 for 5 days), or
:h
an integer that specifies hours (e.g., :12 for :12 hours), or
d:h
a pair of integers that specify days and then hours (e.g., 1:6 for 1 day and 6 hours).
-o
(lowercase, used only with -x) (default for all nonroot users) causes the extracted files to take on the user and group ID (UID, GID) of the person running HTAR, not those of the original archive. This makes a difference for root users but not for ordinary HTAR users.
-O
(uppercase, used only with -x, mimics the Linux TAR --to-stdout option) writes the file(s) extracted from an archive (with -x) to standard output (and hence to a UNIX pipe for postprocessing, if you wish). The HTAR Limitations section shows how to use this technique. Because HTAR does not separate files in the output stream, -O is usually useful only when you extract a single file.
-p
preserves all UNIX permission fields (on extracted files) in their original modes, ignoring the present UMASK (the default changes the permissions to the local UMASK where HTAR extracts the files). Root users can also preserve the setuid, setgid, and sticky bit permissions with this option.
-P
(used only with -c, has no effect otherwise) automatically creates all intermediate subdirectories specified on the archive file's pathname if they do not already exist. HTAR's -P thus works the same as MKDIR's -P option. You can use -P with archives created in HPSS (storage, the default) or on your local machine (with -E).
-q
(quiet mode) suppresses most HTAR informational messages, such as its usual interactive progress reports as it creates an archive file.
-S bufsize
(default is 16 Mbyte) specifies the buffer size to use when HTAR reads from or writes to an HPSS archive file. Here bufsize can be a plain integer (interpreted as bytes), an integer suffixed by k, K, kb, or KB for kilobytes, or an integer suffixed by m, M, mb, or MB for megabytes (e.g., 16mb). -S is intended mostly for LC staff, not ordinary HTAR users.
-T maxthreads
specifies the maximum number of threads that HTAR will use to copy member files to or from the archive file (default varies from 5 to 20 threads). This value is ignored when extracting member files from an archive (-x). HTAR reports the actual number of threads used on each run if you invoke -v or -V. HTAR creates a maxthreads pool of threads and then uses buffer size (see -S), average member file size, and HPSS network transfer rates to estimate how many threads to actually deploy. Normally, the smaller the member file size, the more threads can be active when creating files. For small files, setting -T to a larger number (up to 100 has been tested) can dramatically improve the transfer rates if the operating system is able to support the load.
-U
undeletes soft-deleted member files (see -D above) by copying the existing index file to a temporary local file, removing the deleted flag in the specified index entries along the way, and then rewriting the temporary index to the same location.
-V
requests "slightly verbose" reporting of file-transfer progress (often very brief, overwritten messages to the terminal). Do not use with -v.
-v
requests "very verbose" reporting of file-transfer progress. For each member file transferred to an archive, HTAR prints A (added) and its name on one line; for each member file extracted from an archive, HTAR prints X, its name, and its size on a line, along with a summary of the whole transfer at the end. For each file added during a build index (-X) operation, HTAR prints i and its name. For each file verified during a verify operation (-K), HTAR prints v (or V if comparing archive and local file contents), its name, and a trailing / if this is a directory. For each file that is soft-deleted during a delete (-D) operation, HTAR prints d; similarly, for an undelete (-U) operation, HTAR prints u. Do not use with -V.
-w
(works only with -x, -D, -U, not with -c) lists (one by one) each member file to be extracted from the archive and prompts you for your choice of confirmatory action, where possible responses are:
y[es]
extracts the named file.
n[o]
skips the named file.
a[ll]
extracts the named file and all remaining (not yet processed) selected files too.
q[uit]
skips the named file and stops prompting. HTAR ends.
-Y auto | [archiveCOS][:indexCOS]
specifies the HPSS class of service (COS) for each stored archive and its corresponding index file. The default is AUTO, archives smaller than 256 MB are automatically assigned a COS that provides two separate copies on separate storage tapes. You can specify a nondefault COS for the archive, the index, or both (e.g., -Y 120:110), but this is usually undesirable except when testing new HPSS features or devices (if your archive size grows to exceed that allowed by a nondefault COS, HPSS will stop the transfer and HTAR will end with an error). Use -Y dualcopy to request dual-copy storage of any mission critical archive of any size for extra safety. Using -Y overrides the HTAR_COS environment variable. NFT's DIR command with the -h option reports the COS for stored files (in output column 3), while NFT's SETCOS command offers a different way to specify the storage class of service.
HTAR Examples
Creating an HTAR archive file
Goal
To create an HTAR archive file in a subdirectory of your storage home directory and use a filter to install several files within that stored archive.
Strategy
1. One HTAR execute line can perform all of the desired tasks quickly and in parallel:
- The -cvf options create (c) an archive, verbosely (v) report the incoming files, and (f) name the envelope file.
- The relative pathname case3/myproject.tar locates the archive (myproject.tar) in preexisting subdirectory case3 of your storage home directory (omitting case3/ leaves the archive at the top level of your storage home directory). HTAR will not create case3 by default, however; you must either have previously used FTP's or HSI's mkdir command or else you must invoke -P to explicitly request creation of all needed subdirectories along the archive pathname.
- File filter tim* selects all and only the files whose names begin with TIM (in the directory where you run HTAR) to be stored in the archive.
2. HTAR opens a preauthenticated connection to your storage (HPSS) home directory and reports its housekeeping activities (very quickly, in lines that overwrite, so you may not notice all of these status reports on your screen).
3. HTAR creates your requested archive and uses parallel connections (but not the PFTP client) to move your requested files directly into it. Directories are handled recursively and directory names (if any) appear with a slash (/) appended to identify them.
4. The last incoming file that HTAR reports is always the 256-byte consistency file by which HTAR coordinates your archive with its external index file.
5. HTAR summarizes the work done (time, rate, amount, thread count), then copies into storage the index file that it made, destroys the local version, and ends.
htar -cvf case3/myproject.tar tim* ---(1) HTAR: Opening HPSS server connection ---(2) HTAR: Getting HPSS site info HTAR: Writing temp index file to /usr/tmp/aaamva09A HTAR: creating HPSS Archive file case3/myproject.tar ---(3) HTAR: a tim1.txt HTAR: a tim2.txt HTAR: a tim2a.txt HTAR: a tim3.a HTAR: a time.txt HTAR: a time2.gif HTAR: a /tmp/HTAR_CF_CHK_13805_997722535 ---(4) HTAR: Create complete for case3/myproject.tar. 399,360 ---(5) bytes written for 6 member files, max threads: 8 Transfer time: 0.555 seconds (7.257 MB/s) HTAR: Copying Index File to HPSS...Creating file
Retrieving Files from within an Archive
To retrieve several files from within an existing stored HTAR archive file (without retrieving the whole archive first).
HTAR does not process metacharacters (file filters such as *) itself, but leaves them for the shell to expand and compare with file names in your local directory. Hence, you CANNOT use * to select a subset of already archived files to retrieve. For example, "natural" execute lines
htar -xvf case3/myproject.tar time* [WRONG] htar -xvf case3/myproject.tar 'time*' [WRONG]
both FAIL to select (and hence to retrieve) any stored files from the MYPROJECTS.TAR stored archive (each yields its own set of error messages). These lines work only accidentally, if you happen to have files with the same name in both your local directory and your stored archive (unlikely except when you are just testing HTAR).
Work-Arounds
A. Type the name of each file that you want to retrieve (at the end of the HTAR execute line).
B. If you have a long list of files to retrieve, or if you plan to reuse the same retrieval list often, put the list of sought files into a file and use HTAR's -L option to invoke that list. You can use HTAR's -t (reporting) option to help generate that retrieval list by reporting all the files you have archived and then editing that report to include only the relevant file names to retrieve. For instance,
htar -tf case3/myproject.tar > hout grep 'time' hout | cut -c 50-80 > tlist
captures the list of all your stored files in the local file HOUT and then selects just the file names that contain the string TIME for use with HTAR's -L option (here, in local file TLIST). Note that HTAR automatically appends slash (/) at the end of every directory name that -t reports.
C. Use Hopper to run HTAR as a controllee, then select visually the files that you want to retrieve.
1. Once you have laid the groundwork above, a single HTAR execute line can retrieve your specified files quickly and in parallel from within your stored archive:
- The -xvf options request retrieval/extraction (x), verbosely (v) report the retrieved files, and (f) name the target archive.
- The relative pathname case3/myproject.tar locates the archive (myproject.tar) in preexisting subdirectory case3 of your storage home directory.
- The explicit file list (A) or name-containing file (B) selects all and only the files that you want (here, those whose names begin with TIME, a subset of all files stored in this archive in the previous example).
2. HTAR opens a preauthenticated connection to your storage (HPSS) home directory and reports its housekeeping activities (very quickly, in lines that overwrite, so you may not notice all of these status reports on your screen).
3. HTAR uses its external index to locate in the archive the (two) specific files that you requested and then it transfers them by using parallel connections (but not the PFTP client) to your local machine without retrieving the whole archive file.
4. HTAR summarizes the work done (time, rate, amount) and then ends.
htar -xvf case3/myproject.tar time.txt time2.gif ---(A) OR htar -xvf case3/myproject.tar -L tlist ---(B) HTAR: Opening HPSS server connection ---(2) HTAR: Reading index file HTAR: Opening archive file HTAR: Reading archive file ---(3) HTAR: x time.txt, 1085 bytes, 4 media blocks HTAR: x time2.gif, 3452 bytes, 8 media blocks HTAR: Extract complete for case3/myproject.tar, ---(4) 2 files. total bytes read 116,736 in 0.070 seconds (1.669 MB/s) HTAR: HTAR SUCCESSFUL
Rebuilding a Missing Index
To rebuild the missing index file for a stored HTAR archive file and thereby (re)enable blocked access to the files within it (and extract some).
1. You try to retrieve all files (-xvf) from the HTAR archive myproject.tar in the case3 subdirectory of your storage home directory.
2. But HTAR cannot find the external index file (here, called myproject.tar.idx) for this archive, and it returns a somewhat cryptic error message, retrieves no requested files, and ends. (File myproject.tar.idx may have been moved, renamed, or accidentally deleted from storage.)
3. So you execute HTAR again with the special action -X (uppercase, not lowercase, eks) to request rebuilding the external index for the (same) disabled archive.
4. HTAR opens a preauthenticated connection to your storage (HPSS) home directory, locates the archive in subdirectory case3, scans (but does not retrieve) its contents, and thereby creates a new myproject.tar.idx file (temporarily on local disk, then moved to the same storage directory as the archive file that it supports). HTAR ends.
5. Now you again try your original (1) file-retrieval request.
6. HTAR opens a preauthenticated connection to your storage (HPSS) home directory and reports its housekeeping activities (very quickly, in lines that overwrite, so you may not notice all of these status reports on your screen).
7. HTAR uses its (newly rebuilt) external index to locate the files within the archive and transfers them by parallel connections to your local machine (it transfers all of them because there is no filelist on the execute line).
8. HTAR summarizes the work done (time, rate, amount) and then ends.
htar -xvf case3/myproject.tar ---(1) HTAR: Opening HPSS server connection HTAR: Getting HPSS site info ERROR: Received unexpected reply from server: 550 ---(2) ERROR: Error -1 getting Index File attributes... HTAR: HTAR FAILED ###WARNING htar returned non-zero exit status. 72 = /usr/local/bin/htar.exe... htar -Xf case3/myproject.tar ---(3) HTAR: Opening HPSS server connection ---(4) HTAR: Reading archive HTAR: Copying Index File to HPSS... creating file HTAR: HTAR SUCCESSFUL htar -xvf case3/myproject.tar ---(5) HTAR: Opening HPSS server connection ---(6) HTAR: Reading index file HTAR: Opening archive file HTAR: Reading archive file ---(7) HTAR: x tim1.txt, 3503 bytes, 8 media blocks HTAR: x tim2.txt, 4310 bytes, 10 media blocks HTAR: x tim2a.txt, 5221 bytes, 12 media blocks HTAR: x tim3a., 5851 bytes, 13 media blocks HTAR: x time.txt, 1085 bytes, 4 media blocks HTAR: x time2.gif, 3452 bytes, 8 media blocks HTAR: Extract complete. total bytes read: ---(8) 28,160 in 0.141 seconds (0.200 MB/s) HTAR: HTAR SUCCESSFUL