-- AlbertoGasconBravo - 2015-06-03

Prompt Calibration Loop (OLD - TO BE DELETED - REPLACED BY SECTION 4 ABOVE)

Overview

The prompt calibration loop of the SCT creates conditions data necessary for the ATLAS bulk reconstructions at Tier0 (and reprocessing at Tier1s) and for performance monitoring of the SCT. The conditions data used in the reconstruction are the Noisy strips. The conditions data used in the monitoring are

  • NoiseOccupancy,
  • RawOccupancy,
  • Efficiency,
  • Lorentz Angle,
  • DeadChip,
  • DeadStrip and
  • ByteStreamErrors.

The last three tasks have been recently implemented and might not work smoothly. Details of possible errors and their solutions are given elsewhere. The work of the shifter will consist in monitoring the offline jobs (status, possible reasons to failed jobs, reactivation of these if necessary) and uploading the conditions data to COOL. Each type of data has some requisites regarding the minimum number of events needed to be processed and when should they be uploaded. The following table has a summary of the properties of each job

List of types for processing

  • Types in the table below are automatically processed in prompt calibration loop.
    • In the automatic processing, each input dataset for a run are required to satisfy the following two criteria.
      • The data was taken with stable beam flags
      • The input dataset contains the number of events more than minimum statistics.
  • Task type is used in Task Lister for job monitoring ( see shifter work below)
  • Number of jobs in each type is defined in the processing for a nominal operation.
  • Processing timing and upload timing are described below.

Type Task type Stream Format Minimum statistics Processing start timing Upload timing Comments
Hitmaps scthm calibration_SCTNoise RAW - After end of run no upload Generation of hitmaps later used to identify noisy strips
Noisy strip sctns calibration_SCTNoise RAW 5,000 After end of run Before bulk reco Bulk reconstruction and performance monitoring
Dead strip sctds express_express RAW 200,000 After end of run When available Performance monitoring
Dead chip sctdc express_express RAW 200,000 After end of run When available Performance monitoring
Noise occupancy sctno express_express HIST 5,000 After bulk reco When available Performance monitoring
Raw occupancy sctro express_express HIST 5,000 After bulk reco When available Performance monitoring
Lorentz Angle sctla express_express HIST 5,000 After bulk reco When available Performance monitoring
Efficiency scteff express_express HIST 5,000 After bulk reco When available Performance monitoring
Bytestream error sctbse express_express HIST 5,000 After bulk reco When available Performance monitoring

The identification of the noisy strips used to reach easily the processing time limit. Its implementation was changed, splitting in two steps (generation of hitmaps and identification of noisy strips) and parallelizing the jobs that generate the hitmaps. As such, the hitmaps jobs don't generate any conditions data to be directly uploaded.

Timing in prompt calibration loop and ATLAS bulk reconstruction at Tier0

After the end of a run, a period of 48 hours is given to run the offline calibration, before bulk reconstruction starts at Tier0. The processing for noisy strip, dead chip and dead strip:

  • Uses RAW input datasets, that are made available immediately after the run finishes.
  • Offline jobs are automatically launched on the input datasets (if the minimum statistics criteria is met) and creates conditions data.
  • Typical processing time is a few hours (< 6 hours for noisy strip; it might take much time for dead chip and dead stip.).
  • Outputs (conditions data) are copied to SCT server every one hour by a cron job and are then displayed on a dedicated web ( see below) to be ready for uploads.

Important : Before bulk reconstruction starts, noisy strip has to be uploaded to COOL. 6 hours before bulk reconstruction starts, if the noisy strips have not been uploaded, and email is sent to the current shifter and the calibration loop experts. No upload timing requirement for dead chip and dead strip.

Processing for noise occupancy, raw occupancy, efficiency and bytestream error, using HIST

  • Input dataset is HIST (monitoring histograms) and is made available after the bulk reconstruction finishes.
  • Offline jobs are automatically launched on the input datasets and creates conditions data.
  • Typical processing time is about 10 minutes.
  • Outputs (conditions data) are copied to SCT server every one hour by a cron job and are then displayed on a dedicated web( see below) to be ready for uploads.

Upload to COOL

The upload of conditions data is manually scheduled from the calibration loop web( see below). Every hour, a cron job reads the list of runs to be uploaded, and makes the actual upload. Noisy strip conditions data has to be finished within the fixed 36 hour time-window and well before the deadline. However, there is no upload deadline for the rest of tasks (dead chip, dead strip, noise occupancy, raw occupancy, efficiency, bytestream error and Lorentz angle). The upload is done when conditions data is available.

Shifter work

  • Send message to the experts, in case of any questions and issues.
    • Message can be sent to the experts, using a link (phone mark) in the top-left corner on https://pc-sct-www01.cern.ch/24hLoop/index.php.
    • On-call expert will also receive the same message for info.
    • In this page, it takes about 10 seconds to load a list of e-mail addresses (reading on-call expert from OTP). Wait until the list of e-mail addresses appears before sending a message.
  • Uploads of conditions data to COOL
    • Go to our dedicated Web page : https://pc-sct-www01.cern.ch/24hLoop/index.php
    • Login for the shifter is automatic, reading the OTP. Please make sure that you are automatically logged in.
    • Check if Last Cron Update box in the top-right corner is updating or not. We are running cron jobs for automation.
      • If the box is in blue and the "Last Cron Update" is updating, the status is fine.
      • If this box is in red, the cron jobs might be frozen/dead. Send message to the experts.
    • Check a list of runs in time-window of prompt calibration loop
      • A list of runs can be found on ATLAS Run Query plugged into the page and are indicated as (in calib loop) just below run number.
      • Conditions data for the list of runs are created by automatic processing and are uploaded by the shifter.
      • Automatic processing will not run if input datasets don't satisfy stable beam flags and the requirement for minimum statistics ( see the table above). Hence, there will be no conditions for upload.
    • Upload the conditions data when available. Do not upload BytestreamErrors.
      • Select one of tabs (NoisyStrip, NoiseOccupancy, RawOccupancy, DeadChip, Efficiency, ByteStreamErrors or DeadStrip).
      • Repeat procedure below for each tab.
        • Criteria for uploads
          • Noisy strip : Check Test column in the page. If it's green, go ahead for the upload. If not (yellow or red), send message to the experts.
          • Other types : No criteria. Go ahead for the uploads.
        • Upload procedure
          • Select runs by clicking check boxes in upload column.
          • Click Send. A cron job is set up for the uploads every one hour.
          • Check log column some time later (after the cron job does the uploads).
            • If the log is in green, the upload was successful
            • If the log is in red, check the log.
              • During the upload : the log stays in red. No problem here.
              • After the upload : if the upload is failed (seen in the log), send message to the experts.
  • Monitoring of offline calibration jobs
    • Go to Task Lister Web page : https://tzcontzole01.cern.ch/prod1/tasklister/ or https://tzcontzole02.cern.chprod1/tasklister/
    • Monitor jobs for a list of runs in time-window of prompt calibration loop
      • Select sctcalib in Username in the most left column
        • This filters jobs for SCT.
        • if there is no job to run at all, there is no sctcalib in Username.
      • Type in Task Information specifys processing type (e.g. sctns for noisy strip), as shown in the table above.
      • Check if jobs for a list of runs found in ATLAS Run Query (plugged into the upload page) are defined or not.
      • Monitor Status in Task Information
        • RUNNING : job is defined (not running actually. e.g. waiting for input dataset).
        • YELLOW : shown as a yellow band. job is running.
        • FINISHED : job is successfully done.
        • Others states
          • Other color states will appear during transition state (e.g. from RUNNING to YELLOW).
          • Send message to the experts, if jobs are found to be failed.

Problem Solving

  1. I cannot upload flags
    • I cannot access http://atlasdqm.web.cern.ch/atlasdqm/DQBrowser/DBQuery.php.
      • When i try i get a pop up window saying:
                    "You have chosen to open makeMatrix.php which is a : PHP script from https://atlasdqm.web.cern.ch
                    what should Firefox do with this file?" 
                    
      • This is usually a temporary error, seen by many systems. We believe it may be a connection issue.
      • Please wait 5 min and try again ....
    • please send elog message (message type=data quality, DQ_Type=offline) detailing the problem, with as much information as you can provide.
    • Please continue with the Dq reports, make a note of which flags you wish to set for what runs. Email Helen Hayward with the reason why you cannot upload the flag (including error message if any), and the flags you wish to set.
  2. How can i confirm i have updated the flags correctly?
  3. The SCT DQ WebTool is not working
    • Check the status of the server using http://atlasdqm.cern.ch/alive/ or http://atlasdqm.cern.ch:8088/alive/ depending on whether you are trying to see the production or development version (8088)
    • If the server is down, please send elog message (message type=data quality, DQ_Type=offline) detailing the problem.
    • If the server is up, do you observe the same problem on the pixel and trt sub-pages?
      • if yes, the server may need to be restarted: please send elog message (message type=data quality, DQ_Type=offline) detailing the problem, with as much information as you can provide.

      • if no: please email Helen Hayward and Graham Sellers with the details.
      • Note if the pixel/trt pages are slow and the sct page fails... this can still be a server issue which means the server needs to be restarted.
      • please notify Helen Hayward and Graham Sellers by email.
    • If this is a non-SCT specific DQ problem, and is not fixed after ~15 minutes, please call the DQ expert (161809)
  1. I do not understand a histogram flag set by the DQMF ?
    • please email Helen Hayward and Gabe Hare
  2. I cannot use http://atlas-runquery.cern.ch/query.py
    • please email Andreas Hocker and Jörg Stelzer with the problem (cc. Helen Hayward)
  3. I cannot see the histograms using http://atlasdqm.cern.ch/webdisplay/tier0/
    • Check the status of the server using http://atlasdqm.cern.ch/alive/ or http://atlasdqm.cern.ch:8088/alive/ depending on whether you are trying to see the production or development version (8088)
    • If the server is down, please send elog message (message type=data quality, DQ_Type=offline) detailing the problem.
      • if not resolved, or replied to within 15 minutes, please call the DQ expert (161809)
    • Is it timing out on a SCT specific page?
      • Please email Gabe Hare and Helen Hayward
    • You can check the online DQM histograms at: http://atlasdqm.cern.ch/webdisplay/online/ Simply find run number and follow path SCT-MDA-Histogramming -> Entire Run -> Histogramming-SCT-iss. Note that when the SCT is in STANDBY, tracks are not reconstructed offline so these histograms will be empty in the offline DQM, but can be seen in the online DQM.
    • You can retrieve the original monitoring.root file, to look at the histogram in ROOT from castor:
      • export STAGE_SVCCLASS=atlcal
      • to find histogram file:
         rfdir /castor/cern.ch/grid/atlas/tzero/prod1/perm/<project>/<stream>/<run number> 
        • where <project> is data09_cos, data09_calophys, data09_90GeV (find out using: runquery tool
      • copy using a command :
        • rfcp /castor/cern.ch/grid/atlas/tzero/prod1/perm/data09_cos/physics_IDCosmic/0138182/data09_cos.00138182.physics_IDCosmic.merge.HIST.f165_m235 monitoring.root
        • (i find that you have to rename the file to something.root inorder to be able to open it inside root)
      • browse this monitoring.root file directly using root.
  4. I would like to see list of problem modules in easy text format with module serial numbers.
    • please log in to lxplus
    • The list of problem modules which usually appears in the automatic report can be found here:
      • /afs/cern.ch/user/a/atlasdqm/dqmdisk1/cherrypy/static/text/sct/SctDqTxtFiles/
      • If there is no file here (or it is empty), it usually means that it has not been produced yet. please wait a while
      • if it has not appeared after a few hours, please contact Helen Hayward
  5. The Sct DqWebTool says "no flags"
  6. The DCS flag appears white ?
    • Please email Katharine Leney, Tim Andeen (cc. Helen Hayward)
  7. How to find which luminosity block corresponds to a certain time (i.e. You know the time at which a problem occured so you know what lumiblocks to flag).
    • Bring up the ATLAS Run Query
    • Enter "f r last < run number > / show all"
    • Scroll over to one of the triggers that has lumiblock dependent rates and click on it to bring up the pop-up of rates
    • This contains the timestamp of the start of each lumiblock in the list
  8. The DQM flag for the SCT configuration maps is non-green due to the number of noisy modules.
    • Until further notice please set the SCT_MOD_NOISE_GT40 for the runs in question.
    • This problem is likely due to a combination of a software bug and a real effect.
    • The value measured in the pass 2 reconstruction (f* reco tag) is currently more reliable than the pass 1 due to the nature of the bug.
    • The value measured by the online monitoring should also be reliable - the online shifter has been instructed to include this in their elogs.
Edit | Attach | Watch | Print version | History: r5 | r4 < r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2015-06-03 - AlbertoGasconBravo
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback