ProdAgentLite DBS File block administration

Contact Alessandra Fanfani Carlos Kavka

Introduction

This document presents the proposed structure for the file block administration in the DBS Interface component of the Production Agent.

Files belonging to a dataset are organized in file blocks, where each file block is associated to a single storage element. Every time a job has finished and output files are created, the DBS Interface has to include these files into a file block. The concept of event collection is ignored in this document.

File block administration

In the following sections, a file block is defined as open when new files can still be assigned to it. A file block is defined as closed when no more files can be added to it.

In order to perform the administration, the DBS Interface component maintains a structure openFileBlocks which contains one entry for each open dataset. The information associated to each dataset is the corresponding list of open file blocks implemented as a dictionary with the name of the corresponding storage element as a key. For each open file block, the file block id and the space remaining on it is maintained.

The following example shows the structure openFileBlocks containing information on two datasets, with respectively one and two open file blocks associated to them.

openFileBlocks['/Primary1/Tier1/Processed1'] = { "se003.cern.ch": (45, 500000) }
openFileBlocks['/Primary2/Tier2/Processed2'] = { "se002.cern.ch": (33, 250000),
                                                 "se009.bo.infn.it": (69, 300000) }

In the example, the dataset /Primary1/Tier1/Processed1 has only one open file block stored in the storage element se003.cern.ch. The dataset /Primary2/Tier2/Processed2 has two open file blocks stored respectively in the storage elements se002.cern.ch and se009.bo.infn.it.

Every time a job finishes successfully (JobSuccess event), the DBS Interface gets from the Framework Job Report, the name of the output files and the storage element where they are stored. Files are added in an open file block, if there is one associated to the corresponding storage element in the dataset and it has enough space on it. If there is no open file block associated to the required storage element, a new file block is created. If there is no enough space, the file block is closed in the DBS, the entry is removed from the openFileBlocks structure, and a new file block is created to perform the insertion.

As a summary, a new file block is created when:

  • a NewDataset event is received.
  • a JobSuccess event is received and the output files belong to a SE not yet registered for the current dataset.
  • a JobSuccess event is received and the file block associated to the SE is full.

An open file block is closed when:

  • a JobSuccess event is received and the file block associated to the SE is full.

Algorithm

The following algorithms are written in pseudo code python.

When the event NewDataset is received, the DBS file block administrator has to create a new entry associated to the the dataset specified in the payload:

openFileBlocks[datasetPath] = {}

After receiving a JobSuccess event, the structure has to be checked to determine if a file block associated to the storage element specified in the Framework Job Report is defined there. If not, a new file block has to be created. Lack of space means that the file block has to be closed, and a new one has to be created.

# get information
dataset = openFileBlocks[datasetPath]
storageElement = FrameworkJobReport['SE']
fileSize = FrameworkJobReport['OutputSize']
fileNames = FrameworkJobReport['ListOfFiles']

# get information on file block associated to SE
try:
  (id, size) = dataset[storageElement]

# not present, create new file block
except KeyError:
  id = createNewFileBlockInDBS(datasetPath)
  size = defaultSize
  openFileBlocks[datasetPath][storageElement] = (id, size)

# check size
if fileSize > size

  # not enough, close file block and create a new one.
  closeFileBlockInDBS(datasetPath, id)
  id = createNewFileBlockInDBS(datasetPath)
  size = defaultSize
  openFileBlocks[datasetPath][storageElement] = (id, size)

# add new files in DBS
addNewFilesInDBS(datasetPath, id, fileNames)

# update structure
openFileBlocks[datasetPath][storageElement] = (id, size - fileSize)

References to the variable FrameworkJobReport are used to get information provided by the Framework Job Report. The functions createNewFileBlockInDBS, closeFileBlockInDBS and addNewFilesInDBS are used to create a new file block in the DBS, close it and add new files respectively.

-- CarlosKavka - 28 Mar 2006

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2006-03-28 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback