Introduction

Collection of requirements for DataflowEvolution project.

Requirements

Lists of Use Cases (UC) and User Requirements (UR).

Comments, corrections, inputs and suggestions to be posted in DF evolution SharePoint page

Generic

This is a collection of generic User Requirements for all the applications.

  • UR_GEN02 Network latencies
    It shall be possible to measure network latencies
    • This functionality should be provided by the message passing library

  • UR_GEN10 The mapping of ROBs to ROSes should remain in the dataflow software and, in the HLT, it should not be used further than internally in the HLT ROBDataProviderSvcROBDataProvider Read more More... Close
    • Algorithms should only work with ROBs and their mapping to geometrical regions. This is already the case now and should be conserved to assure maximum decoupling of dataflow (OKS) configuration and HLT configuration.

Sysmon

There is a dedicated twiki page: SysmonRequirements ( requirements )

HLTSV

Use cases

  • UC_SV10 HLTSV receives L1-Result from RoIB
    • Or, the sw RoIB component running inside the HLTSV receives data from n S-links and assembles a L1-Result fragment

  • UC_SV20 HLTSV delivers a L1-Result to a DCM running on one of the available HLT nodes

  • UC_SV30 HLTSV receives a clear message from DCM when the latter has either fully collected an event or has rejected it

  • UC_SV31 HLTSV re-assigns the event to another node if a DCM does not reply before a configurable timeout
    • The reassigned event should be force accepted and fully built by the other DCM

  • UC_SV40 HLTSV groups and broadcasts ROBs-clear messages to ROS PCs.

  • UC_SV50 HLTSV receives a message from DCM when a processing slot (a cpu core) is available
    • N.B.: during HLT processing after full data collection, the fragments can be deleted from ROBs, but a cpu core is still busy in EF processing

User requirements

  • UR_SV01: HLT input rate
    The HLT system shall be able to sustain an input rate of 100kHz

  • UR_SV02: Single HLTSV
    A single HLTSV instance should be able to receive L1-Results at 100kHz and distribute them to the HLT processing nodes
    • If the requirement cannot be satisfied, the HLT farm should be divided in sub-farms, each managed by its own HLTSV

  • UR_SV03: HLT farm balancing
    The HLTSV shall take care of the HLT farm balancing

  • UR_SV04: Software based RoIB
    It should be possible to merge the HLTSV with a software based RoIB. Read more More... Close
    • This scenario is possible only with a single HLTSV
    • The combined system shall be able to receive, assemble and distribute L1-Results at 100 kHz

  • UR_SV10 Condition data injection
    The HLTSV should be an injection point for condition data changes. Read more More... Close
    • Presently updates for conditions data at lumi block boundaries are communicated to the L2Pus and EFPTs by additional fields in the CTP ROB fragment. Updates include prescale changes and some conditions data folders for detectors. Changing these fields in the CTP fragment requires a firmware update and has strong space limitations.

  • UR_SV20: Event ownership
    The HLTSV is in charge of an event (L1ID) until it is rejected by the HLT or fully collected by the DCM

  • UR_SV21: Fault tolerance
    The HLTSV shall be able to identify and h/w or s/w problems in an HLT node and re-assign the event allocated to the given node to another machine in the farm. Read more More... Close
    • For example via the implementation of an event time-out mechanism

  • UR_SV22: Event reassignment
    Reassigned events shall be properly labeled to prevent data duplication. Read more More... Close
    • The node owning the initial copy of the event could be still alive and it could still accept the event

  • UR_SV40: ROB occupancy load
    The HLTSV shall inform the ROSs when the fragments of a given event can be deleted.
    • I.e. when an event has been rejected or fully collected by an HLT node.

  • UR_SV41: ROS load
    The HLTSV should reduce as much as possible the load on the ROS PC.
    • In order to reduce the ROS PC load, the HLTSV should group the ROB-clear messages. The grouping shall not contribute significantly to the ROB occupancy.

  • UR_SV50: Occupancy and deadtime
    The HLTSV shall provide information about its buffers occupancy and dead-time

DCM

Use cases

  • UC_DCM01 The DCM registers itself to the HLTSV, providing the number of available processing slots. Read more More... Close
    • "Processing slots" == "Number of HLPUs on the node"

  • UC_DCM10 The DCM receives an L1-Result (list of ROB fragments) from HLTSV

  • UC_DCM20 The DCM assigns an L1-Result (list of ROB fragments) to a free HLTPU

  • UC_DCM30 On HLTPU request, the DCM collects ROB fragments and makes them available to the caller HLTPU

  • UC_DCM40 When an event is either rejected or fully collected, the DCM send a clear message to the HLTSV: the fragments of that event can be deleted from ROBs

  • UC_DCM50 The DCM forwards accepted event to the SFO

  • UC_DCM60 The DCM force accept events in case of HLTPU crash or dead loop. Read more More... Close
    • Event reassignment would be less safe

User requirements

  • UR_DCM01: Single DCM instance per node
    There should be a single DCM instance for each HLT node.

  • UR_DCM05: Input format
    Data received by the DCM (from HLTSV or ROSs) shall be formatted as array of eformat::ROBFragments

  • UR_DCM06: Output format
    Data sent by the DCM to the SFO shall be formatted as eformat::FullEventFragment.

  • UR_DCM07: Data collection
    The DCM shall be able to collect data from the ROS PCs according to a list of ROB identifiers provided by the HLTPU

  • UR_DCM08: Data size
    The DCM shall be able to manage events with sizes ranging from few kB to tens of MB. Read more More... Close
    • The average event size can change run by run
    • In each run physics and calibration events are mixed:
      • Physics event: about 1-2 MB
      • Calibration events: from few kB to tens of MB

  • UR_DCM10: Data integrity
    The DCM shall guarantee data integrity inside the node

  • UR_DCM11: Fault tolerance for HLTPU problems
    The DCM shall be able to identify processing problems in an HLTPU (application crashes or processing dead-loops) and force accept the event assigned to the given application. Read more More... Close
      • In the current system,
        • a PU crash is identified via the "socket hangup" caught by the unix domain socket server
        • a processing dead-loop via an event processing timeout

  • UR_DCM12: Fault tolerance for DCM crashes
    In case of DCM crash, it should be possible to recover fully-collected events Read more More... Close
    • For example: at restart the DCM should be able to retrieve data from disk
    • NB: event not yet built are still hosted in the ROS system and they will be assigned to other nodes by the HLTSV(s)

  • UR_DCM30: ROS occupancy
    The DCM shall reduce as much as possible its contribution to the ROBs occupancy.
    • The DCM shall inform the HLTSV as soon as an event has been fully collected or it has been rejected

  • UR_DCM31: processing power exploitation
    The DCM shall exploit as much as possible to available processing power.
    • The DCM should inform the HLTSV when an HLTPU has done with a given event and therefore a processing slot is free. Read more More... Close
      • NB: the DCM shall inform the HLTSV as soon as an event has been has been rejected or fully assembled, but in the latter case the event is still being processed (EF) and a core is busy.
      • A dedicated message should be used to avoid the assignment of events to busy nodes.

  • UR_DCM40: Data compression
    The DCM shall provide online raw event compression. Read more More... Close
    • Simplify the offline treatment of the raw data and reduce by a factor ~2 the bandwidth and storage requirements for the SFOs.
    • Strategies have to be defined to accommodate:
      • the SFO processing of the data (e.g. uncompressed event header, event header duplication, ..)
      • the stripping of calibration events (e.g. selective compression, event duplication in the EF, ..).

  • UR_DCM41: Data sampling
    The DCM shall provide event sampling capability. The DCM should provide the possibility to sample also rejected events.

  • UR_DCM42: High rate calibration streams
    For calibration purpose, the DCM or the HLTPU should provide a mechanism to manage high rate of small events. Read more More... Close
    • ALERT! ToDo: create dedicated page with muon calibration requirements provided by Enrico Pasqualucci
    • In the current system muon tracks are selected in each L2PU, collected via a hierarchy of dedicated servers and written to disk in a specific stream. The data bypasses SFI, EFD and SFO. See Calibration_stream_software.pptx
    • The current muon calibration infrastructure could in principle be re-used with minimal effort in the evolution.
    • On the other hand in the evolution the muon calibration information will be one step away from the storage system. It makes therefore sense to try to route these data through the standard data-flow, with the aim of reducing the number of components to be maintained.
      • Based on the experience with the muon calibration in the current system, in particular in terms of rates and bandwidth, dedicated aggregation and transport strategies have to be developed.
    • In case a very high rate of calibration data is expected at the SFOs, one should consider both the implications on the SFOs performance and the operational impact on monitoring requirements.

  • UR_DCM50: Occupancy and deadtime
    The DCM shall provide information about its buffers occupancy and dead-time

  • UR_DCM70: Full event pre-fetching
    The DCM should have the possibility to trigger full data collection independently from HLT request. Read more More... Close
    • E.g.: the DF could automatically initiate full data collection (pre-fetching) if already ~ X% of all ROBs are retrieved.
    • N.B.: ALERT! This requirement is under discussion

  • UR_DCM80: ROS pre-fetching
    The DCM should have the possibility to collect all the ROBs in a give ROS independently from HLT request. Read more More... Close
    • E.g.: If a request to a ROS is issued which wants to retrieve already most of the ROBs in this ROS, DCM may retrieve then already the complete set ROBs in this ROS.
    • N.B.: ALERT! This requirement is under discussion

HLTPU

Use cases

  • UC_PU00 The HLTPU subscribes to the DCM running in its node.

  • UC_PU10 The HLTPU receives a L1-Result (as array of eformat::RobFragments) from DCM via a dedicated interface

  • UC_PU11 The HLTPU forwards the L1-Result to the HLT steering layer via a dedicated interface

  • UC_PU20 The HLT steering sends ROB requests to the HLTPU

  • UC_PU21 The HLTPU forwards the requests to the DCM and returns the collected data to the HLT steering

  • UC_PU30 The HLT steering provides a processing decision

  • UC_PU40 In case of accepted event, the HLT steering can provide an HLT result fragment to be appended to the event

User requirements

  • UR_PU00: Steering framework
    The HLTPU shall host the HLT steering framework

  • UR_PU02: Data request
    On demand, the HLTPU shall provide the HLT steering framework with ROB fragments (eformat::RobFragments). The data request should be forwarded to the DCM
    • Data source emulator can replace DCM functionality

  • UR_PU06: Data reception
    The HLTPU shall receives data from DCM as array of eformat::RobFragments.

  • UR_PU10: Number of HLTPU
    There shall not be limit on the number of HLTPUs running on each node
    • The value depends on the number of available cores and on the amount of memory

  • UR_PU20: Data protection
    The HLTPU shall not have any possibility to corrupt the ROB fragments.

SharedHeap

The SharedHeap was renamed HIPC: Host Inter Process Communication (by Christophe).

Work in progress, under construction Work in progress

Use cases

  • UC_SH01 The DCM asks for a free memory block to store L1-Result received from HLTSV
    • The L1-Result is an array of ROB fragments

  • UC_SH02 The DCM asks for a free memory block to store ROB fragments being received from ROSs

  • UC_SH03 The DCM asks for the deletion of a no more needed memory block

  • UC_SH10 The HLTPU reads the data in a memory Block ...

User requirements

  • UR_HIPC01 NEW The DCM shall store buffered data fragments in a persistent storage so that data can be recovered in case of crash or unexpected premature termination.

  • UR_HIPC02 NEW The DCM and HLTPUs on the same host shall use a zero data copy mechanism to exchange data fragments. Only data references are passed between the processes.

  • UR_HIPC11 NEWThe DCM shall support appending HLT ROB fragment data to an event storage at any time, updating the integrity check information as required.

  • UR_HIPC13 NEW The DCM shall send the full event to the SFO by sending the new eformat::FullEventHeader followed by the HLT result ROB fragment, followed by the detector fragments hosted in the DCM, so that the SFO receives a valid eformat::FullEventFragment

SFO

At the moment, no changes for this component are foreseen.

Comments


Major updates:
-- AndreaNegri - 19-Mar-2012

%RESPONSIBLE% AndreaNegri
%REVIEW% Never reviewed

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r2 - 2013-01-09 - AndreaNegri
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback