Introduction

Collection of requirements for DataflowEvolution project.

Requirements

Lists of Use Cases (UC) and User Requirements (UR).

Sysmon

Dedicated twiki page: requirements

HLTSV

Use cases

  • UC_SV10 HLTSV receives L1-Result from RoIB
    • Or, the sw RoIB component running inside the HLTSV receives data from n S-links and assembles a L1-Result fragment

  • UC_SV20 HLTSV delivers a L1-Result to a DCM running on one of the available HLT nodes

  • UC_SV30 HLTSV receives a clear message from DCM when the latter has either fully collected an event or has rejected it

  • UC_SV31 HLTSV re-assigns an event to another DCM in case of s/w or h/w problems on the initial node

  • UC_SV40 HLTSV groups and broadcasts ROBs-clear messages to ROS PCs.

  • UC_SV50 HLTSV receives a message from DCM when a processing slot (a cpu core) is available
    • N.B.: during HLT processing after full data collection, the fragments can be deleted from ROBs, but a cpu core is still busy in EF processing

User requirements

  • UR_SV01: HLT input rate
    The HLT system shall be able to sustain an input rate of 100kHz

  • UR_SV02: Single HLTSV
    A single HLTSV instance should be able to receive L1-Results at 100kHz and distribute them to the HLT processing nodes
    • If the requirement cannot be satisfied, the HLT farm should be divided in sub-farms, each managed by its own HLTSV

  • UR_SV03: HLT farm balancing
    The HLTSV shall take care of the HLT farm balancing

  • UR_SV04: Software based RoIB
    It should be possible to merge the HLTSV with a software based RoIB. Read more More... Close
    • This scenario is possible only with a single HLTSV
    • The combined system shall be able to receive, assemble and distribute L1-Results at 100 kHz

  • UR_SV10 Condition data injection
    The HLTSV should be an injection point for condition data changes. Read more More... Close
    • Presently updates for conditions data at lumi block boundaries are communicated to the L2Pus and EFPTs by additional fields in the CTP ROB fragment. Updates include prescale changes and some conditions data folders for detectors. Changing these fields in the CTP fragment requires a firmware update and has strong space limitations.

  • UR_SV20: Event ownership
    The HLTSV is in charge of an event (L1ID) until it is rejected by the HLT or fully collected by the DCM

  • UR_SV21: Fault tolerance
    The HLTSV shall be able to identify h/w or s/w problems in an HLT node and re-assign the event allocated to the given node to another machine in the farm. Read more More... Close
    • For example via the implementation of an event time-out mechanism

  • UR_SV22: Event reassignment
    Reassigned events shall be properly labeled to prevent data duplication. Read more More... Close
    • The node owning the initial copy of the event could be still alive and it could still accept the event

  • UR_SV40: ROB occupancy load
    The HLTSV shall inform the ROSs when the fragments of a given event can be deleted.
    • I.e. when an event has been rejected or fully collected by an HLT node.

  • UR_SV41: ROS load
    The HLTSV should reduce as much as possible the load on the ROS PC.
    • In order to reduce the ROS PC load, the HLTSV should group the ROB-clear messages. The grouping shall not contribute significantly to the ROB occupancy.

  • UR_SV50: Occupancy and deadtime
    The HLTSV shall provide information about its buffers occupancy and dead-time

DCM

Use cases

  • UC_DCM01 The DCM registers itself to the HLTSV, providing the number of available processing slots. Read more More... Close
    • "Processing slots" == "Number of HLPUs on the node"

  • UC_DCM10 The DCM receives an L1-Result (list of ROB fragments) from HLTSV

  • UC_DCM20 The DCM assigns an L1-Result (list of ROB fragments) to a free HLTPU

  • UC_DCM30 On HLTPU request, the DCM collects ROB fragments and makes them available to the caller HLTPU

  • UC_DCM40 When an event is either rejected or fully collected, the DCM send a clear message to the HLTSV: the fragments of that event can be deleted from ROBs

  • UC_DCM50 The DCM forwards accepted event to the SFO

  • UC_DCM60 The DCM force accept events in case of HLTPU processing problems.

User requirements

  • UR_DCM01: Single DCM instance per node
    There should be a single DCM instance for each HLT node.

  • UR_DCM04: Input
    The DCM shall be able to receive L1 results from the HLTSV. The data shall be formatted as array of eformat::ROBFragment

  • UR_DCM06: Data collection
    The DCM shall be able to collect data from the ROS PCs according to a list of ROB identifiers provided by the HLTPU. The data shall be formatted as array of eformat::ROBFragment

  • UR_DCM08: Event assembly
    The DCM shall assemble accepted events as eformat::FullEventFragment according to the list of ROB identifiers provided by the HLTPU.
    • An empty list could indicate all the collected ROBs

  • UR_DCM10: Output
    The DCM shall send assembled events to the SFO as eformat::FullEventFragment

  • UR_DCM12: Emulation
    For testing purpose, it shall be possible to internally emulate input, output and data collection functionalities

  • UR_DCM14: ROB mapping
    The mapping of ROBs to ROSes should reside in the dataflow software. Read more More... Close
    In the HLT, it should not be used further than internally in the HLT ROBDataProviderSvcROBDataProvider. Algorithms should only work with ROBs and their mapping to geometrical regions. This is already the case now and should be conserved to assure maximum decoupling of dataflow (OKS) configuration and HLT configuration.

  • UR_DCM16: Data size
    The DCM shall be able to manage events with sizes ranging from few kB to tens of MB. Read more More... Close
    • The average event size can change run by run
    • In each run physics and calibration events are mixed:
      • Physics event: about 1-2 MB
      • Calibration events: from few kB to tens of MB

  • UR_DCM18: PUIO
    The DCM shall implement the PUIO to provide, on demand, ROB fragments to the HLTPU

  • UR_DCM20: HLT result
    The DCM shall be able to receive an HLT result fragment (an eformat::ROBFragment) via the PUIO interface and it shall append the fragment to the accepted event

  • UR_DCM24: Data integrity
    The DCM shall guarantee data integrity inside the node

  • UR_DCM26: Fault tolerance for HLTPU problems
    The DCM shall be able to identify processing problems in an HLTPU (application crashes or processing dead-loops) and force accept the involved event. Read more More... Close
      • In the current system (EFD),
        • a PU crash is identified via the "socket hangup" caught by the unix domain socket server
        • a processing dead-loop via an event processing timeout

  • UR_DCM28: Fault tolerance for DCM crashes
    In case of DCM crash, it should be possible to recover fully-collected events Read more More... Close
    • For example: at restart the DCM should be able to retrieve data from disk
    • NB: event not yet built are still hosted in the ROS system and they will be assigned to other nodes by the HLTSV(s)

  • UR_DCM30: ROS occupancy
    The DCM shall reduce as much as possible its contribution to the ROBs occupancy.
    • The DCM shall inform the HLTSV as soon as an event has been fully collected or it has been rejected

  • UR_DCM31: processing power exploitation
    The DCM shall exploit as much as possible the available processing power.
    • The DCM should inform the HLTSV when an HLTPU has done with a given event and therefore a processing slot is free. Read more More... Close
      • NB: the DCM shall inform the HLTSV as soon as an event has been has been rejected or fully assembled, but in the latter case the event is still being processed (EF) and a core is busy.
      • A dedicated message should be used to avoid the assignment of events to busy nodes.

  • UR_DCM40: Data compression
    The DCM shall provide online raw event compression. Read more More... Close
    • Simplify the offline treatment of the raw data and reduce by a factor ~2 the bandwidth and storage requirements for the SFOs.
    • Strategies have to be defined to accommodate:
      • the SFO processing of the data (e.g. uncompressed event header, event header duplication, ..)
      • the stripping of calibration events (e.g. selective compression, event duplication in the EF, ..).

  • UR_DCM41: Data sampling
    The DCM shall provide event sampling capability. The DCM should provide the possibility to sample also events rejected by the HTLPU.

  • UR_DCM42: High rate calibration streams
    For calibration purpose, the DCM or the HLTPU should provide a mechanism to manage high rate of small events. Read more More... Close
    • ALERT! ToDo: create dedicated page with muon calibration requirements provided by Enrico Pasqualucci
    • In the current system muon tracks are selected in each L2PU, collected via a hierarchy of dedicated servers and written to disk in a specific stream. The data bypasses SFI, EFD and SFO. See Calibration_stream_software.pptx
    • The current muon calibration infrastructure could in principle be re-used with minimal effort in the evolution.
    • On the other hand in the evolution the muon calibration information will be one step away from the storage system. It makes therefore sense to try to route these data through the standard data-flow, with the aim of reducing the number of components to be maintained.
      • Based on the experience with the muon calibration in the current system, in particular in terms of rates and bandwidth, dedicated aggregation and transport strategies have to be developed.
    • In case a very high rate of calibration data is expected at the SFOs, one should consider both the implications on the SFOs performance and the operational impact on monitoring requirements.

  • UR_DCM50: Occupancy and deadtime
    The DCM shall provide information about its buffers occupancy and dead-time

  • UR_DCM70: Full event pre-fetching
    The DCM should have the possibility to trigger full data collection independently from HLT request. Read more More... Close
    • E.g.: the DF could automatically initiate full data collection (pre-fetching) if already ~ X% of all ROBs are retrieved.
    • N.B.: ALERT! This requirement is under discussion

  • UR_DCM80: ROS pre-fetching
    The DCM should have the possibility to collect all the ROBs in a give ROS independently from HLT request. Read more More... Close
    • E.g.: If a request to a ROS is issued which wants to retrieve already most of the ROBs in this ROS, DCM may retrieve then already the complete set ROBs in this ROS.
    • N.B.: ALERT! This requirement is under discussion

HLTPU

Use cases

  • UC_PU00 The HLTPU subscribes to the DCM running in its node.

  • UC_PU10 The HLTPU receives a L1-Result (as array of eformat::RobFragment) from DCM via a dedicated interface (PUIO)

  • UC_PU11 The HLTPU forwards the L1-Result to the HLT steering layer via a dedicated interface (hltinterface)

  • UC_PU20 The HLT steering sends ROB requests to the HLTPU

  • UC_PU21 The HLTPU forwards the requests to the DCM and returns the collected data to the HLT steering

  • UC_PU30 The HLT steering provides an event selection decision

  • UC_PU40 In case of accepted event, the HLT steering can provide an HLT result fragment (as eformat::RobFragment) to be appended to the event

User requirements

  • UR_PU00: HLT interface
    The HLTPU shall host the HLT steering framework. The interface between HLTPU and HLT steering shall be defined by a dedicated interface (hltInterface)

  • UR_PU02: PUIO interface
    On demand, the HLTPU shall provide the HLT steering framework with ROB fragments (array of eformat::RobFragments). The data is obtained from the DCM via the PUIO interface.

  • *UR_PU03: PUIO interface: emulation *
    For testing purpose it shall be possible to emulate the DCM functionality Read more More... Close
    • For example using an implementation that reads fragments from a file

  • UR_PU03: PUIO interface: data
    The HLTPU shall receives data from DCM as array of eformat::RobFragments.

  • UR_PU10: Number of HLTPU
    There shall not be limit on the number of HLTPUs running on each node
    • The value depends on the number of available cores and on the amount of memory

  • UR_PU20: Data protection
    The HLTPU shall not have any possibility to corrupt the ROB fragments.

SFO

At the moment, no changes for this component are foreseen.

Message passing

  • UR_MP01 Network latencies
    It shall be possible to measure network latencies

  • UR_MP?? .....

Comments


Major updates:
-- AndreaNegri - 19-Mar-2012

%RESPONSIBLE% AndreaNegri
%REVIEW% Never reviewed

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2013-01-29 - AndreaNegri
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback