T0 Ops FAQ

Contents:

Data Flow in the Tier-0:

Files from the CMS detector are transferred from P5 to the t0streamer by the TransferSystem. The tier0 takes all the input files from there (.dat files), depending on the stream the file belongs, it can follow a different process in the Tier-0 system. In the end we produce .root files that may belong to several datatiers, Prompt Calibration Loop (PCL) uploads to the dropbox and uploads to the DQM GUI. In the following diagram, you can actually see details on how the Tier-0 data flow is. Note: The t0streamer is currently on castor but it will eventually be mounted on EOS for Run2.

Tier0_Data_Flow.png

T0 Critical tickets:

Ticket type definitions :

  • If the issue is critical and CMS cannot wait until the next working day for a solution, the ticket should be opened as a GGUS ALARM ticket. Prompt action anytime guaranteed
  • If the issue is not critical, and CMS can wait until the next working day before a site intervention, the ticket should be opened as a GGUS TEAM ticket. Action guaranteed only on the next business hours
  • Detailed and specific documentation Here

General guidelines

  • If in doubt on how to create a ticket, please follow the instructions Here
  • If there is danger of data loss, an ALARM ticket is warranted
  • If the problem is not likely to cause data loss and can wait till the next business hours, a team ticket is appropriate. A team ticket can always be escalated to an alarm ticket if necessary
  • A normal ticket should not be used, because it cannot be escalated to an alarm ticket
  • when opening a ticket, cms-crc-on-duty should always be cc'ed
  • If possible, the CRC should be contacted and consulted before opening an ALARM ticket, just to keep him in the loop.
  • Disk Quota problems (/store/data quota full) In this case we must contact the Virtual Organization Contact Operator (currently (03/2015) Ivan.Glushkov@cernNOSPAMPLEASE.ch and 0041764877257@mail2smsNOSPAMPLEASE.cern.ch). Going to IT could take longer.

Most seen urgent problems

  • Repack/Express stuck -- files inaccessible/stuck in CASTOR/EOS.
    • For Repack, we can survive with it stuck until PromptReco comes, a bit before I would say (44h) - TEAM ticket
    • For Express, we want to be done before the PCL 12h delay - ALARM ticket if during STABLE BEAMS, TEAM if Cosmics data taking is taking place.
  • PCL upload failing
    • In the range of what we can do, is usually for Express jobs failing due to inaccessible files - ALARM ticket
  • Frontier Problems
    • Usually PromptReco will fail when getting conditions, also Reco jobs everywhere. If not much luminosity is being processed, we can do a TEAM ticket, but if we have to catch up backlog, is better to have an ALARM ticket and the issue fixed ASAP.
    • Express will also fail. If this is preventing STABLE BEAMS express to run, ALARM ticket.

Useful monitoring links

T0 Subsystems:

  • Storage Manager (PENDING, get link)

Interesting links to know when collisions happens and how :

  • Here is for more real time monitoring, to know when CMS is taking data, the DAQ System

  • Here's a live twitter feed about the LHC status, and CMS usually takes data when there's STABLE BEAMS

  • Here's a bunch of LHC monitors that are not that helpful, but if you know how to read, you can start to understand the machine and predict when you have to pay more attention :

  • And finally a tool that helps a lot to get useful runs for replays, based in peak PileUp (heavier events) and how much data was taken (in 1/pb)

Operations tools

How do I give the CSP shifters updated instructions?

In case there is an issue we are aware of and don't need to be notified about, a note should be added to the 'Computing Plan of the Day' by writing an email to cms-crc-on-duty@cernNOSPAMPLEASE.ch explaining what the new/temporary instructions are. The CRC maintains this page and should integrate your updates. The current plan is available here.

Monitoring questions

How do I check the progress of a run processing in the Tier 0?

Go to the Run tab in the production WMStats. Compare 'run status' with the status you can find in this diagram

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng Tier0_Data_Flow.png r9 r8 r7 r6 r5 manage 293.8 K 2015-03-02 - 17:03 LuisContreras  
Edit | Attach | Watch | Print version | History: r20 < r19 < r18 < r17 < r16 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r20 - 2017-12-13 - VytautasJankauskas
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback