Buffered tape marks in CASTOR
Introduction
“Buffered” or non-synchronizing tape marks have been identified as an alternative to a new CASTOR tape format. “buffered” means avoiding tape stopping and synchronization (equivalent to fsync on disk) which increases throughput and reduces tape wear (“shoe-shining”). Buffered tape marks are part of the SCSI standard since SCSI-2 (via the ‘immed’ bit in the 0x10 filemark CDB) and are supported by standard drives including Sun and IBM (both enterprise+LTO). Using this functionality is extremely beneficial in terms of performance, as it is possible to issue tape mark separations for every file, but effectively commit the files (with a synchronizing tape mark) after having written N files, and/or N GB of data for example. Another advantage of using buffered filemarks is that there is no change needed in the format of the data written to tape.
- Support for buffered tape marks via the Linux kernel had to be added, as neither the default Linux (“st”) nor IBM tape drivers expose these. DSS/TAB established contact with the Linux tape driver maintainer (Kai Makisara); following our request he produced a patched
scsi/st driver version supporting buffered tape marks via ioctl(), by defining a new operation (MTWEOFI) for use with the immediate bit set in the 0x10 SCSI CDB. This patch has been tested on the CERN tape drives and is distributed via Scientific Linux CERN 5.
- This new operation has also been integrated into the mainstream Linux kernels as of kernel version 2.6.37. See kernel patch details here
.
- We have modified and tested the CASTOR remote copy daemon (RTCPD) (see SVN
) in order to take advantage of buffered tape marks via a non-default configuration option. If this option ( TAPE BUFFER_TAPEMARK ) is set to YES, RTCPD will buffer the two tape marks between the AUL header/payload and payload/trailer, and only write one synchronized tape mark after the file trailer. The changes required to RTCPD to use this new ioctl operation are very simple; and the CASTOR file format is not changed in any way (so backwards compatibility is fully preserved).
- In January 2012, we have extended the buffering to span over multiple CASTOR files, and to write synchronized tape marks only after a given threshold (in terms of volume, time, or files) has been reached. Files to be migrated are buffered on the disk pools until a synchronizing tape mark is successfully written.
Performance improvements
Drive performance (as measured on T10KB drives)
Horizontal axis: file size (MB); vertical axis: drive speed (MB/s)
- 3 sync/file: All 3 tape marks are synchronous. (CASTOR up to 2.1.9-8).
- 1 sync/file: Only the third tape mark is synchronous, the other two are buffered. (CASTOR, 2.1.9-9 and newer)
- 1sync/4GB: all tape marks are buffered except a synchronizing TM every N (e.g. N=4,6,..) GB of data. (CASTOR 2.1.11 as of January 2012)
Repack performance
Drive/days to repack the CASTOR file space
- AUL == 3 sync/file (see above)
- ALC == 1 sync/file
- ALB == 1 sync/N GB
Links
--
GermanCancio