Format for Tape Aggregations
The formats below have been discussed; some of their pros and cons are listed:
- (GNU) TAR (link
):
- Pros: Originally designed for tape archiving. Each file comes with a metadata header; files are aligned within blocks.
- Cons: Limited support for large file names (via GNU extensions which require creating an additional file keeping the name in the payload). Not all metadata needed by CASTOR (such as checksums) can be accomodated in the headers. Alignment is done on 512-byte blocks and unlike the tar record size the blocksize cannot be changed.
- CPIO (link
):
- Pros: A relatively recent CPIO format revision comes with a CRC-based checksum (SRV4 CRC format). It also deals with up to 1024char file names out of the box.
- Cons: It cannot store payloads of over 8GB in size. It also cannot do any kind of block alignment for files.
- IL (link
):
- Pros: Highly redundant and block aligned - given any block, it is clear which file it belongs to and at which location it is within the file. Format can be tailored to specific CASTOR needs.
- Cons: homegrown format which requires further specification. Slight overhead in media consumption as each block has a header. Requires special tools for reading/writing data.
- ZIP (link
):
- Pros: Widely used open standard.
- Cons: The ZIP format uses a central file metadata section which is located at the end of the ZIP file. While this is not a problem for disk (random) based access, it can be problematic for linear tape archiving as reading individual files out of an aggregation would require reading the metadata at the end first, and therefore an additional tape (re)seek/positioning is required. There is no block alignment. The compression offered by ZIP is of no interest as data compression is already done by the tape drive.
--
GermanCancio - 15 Aug 2008