--
AlbertoGasconBravo - 2015-06-03
Prompt Calibration Loop (OLD - TO BE DELETED - REPLACED BY SECTION 4 ABOVE)
Overview
The prompt calibration loop of the SCT creates conditions data necessary for the ATLAS bulk reconstructions at Tier0 (and reprocessing at Tier1s) and for performance monitoring of the SCT. The conditions data used in the reconstruction are the
Noisy strips. The conditions data used in the monitoring are
- NoiseOccupancy,
- RawOccupancy,
- Efficiency,
- Lorentz Angle,
- DeadChip,
- DeadStrip and
- ByteStreamErrors.
The last three tasks have been recently implemented and might not work smoothly. Details of possible errors and their solutions are given elsewhere.
The work of the shifter will consist in monitoring the offline jobs (status, possible reasons to failed jobs, reactivation of these if necessary) and uploading the conditions data to COOL. Each type of data has some requisites regarding the minimum number of events needed to be processed and when should they be uploaded. The following table has a summary of the properties of each job
List of types for processing
- Types in the table below are automatically processed in prompt calibration loop.
- In the automatic processing, each input dataset for a run are required to satisfy the following two criteria.
- The data was taken with stable beam flags
- The input dataset contains the number of events more than minimum statistics.
- Task type is used in Task Lister for job monitoring ( see shifter work below)
- Number of jobs in each type is defined in the processing for a nominal operation.
- Processing timing and upload timing are described below.
Type |
Task type |
Stream |
Format |
Minimum statistics |
Processing start timing |
Upload timing |
Comments |
Hitmaps |
scthm |
calibration_SCTNoise |
RAW |
- |
After end of run |
no upload |
Generation of hitmaps later used to identify noisy strips |
Noisy strip |
sctns |
calibration_SCTNoise |
RAW |
10,000 |
After end of run |
Before bulk reco |
Bulk reconstruction and performance monitoring |
Dead strip |
sctds |
express_express |
RAW |
200,000 |
After end of run |
When available |
Performance monitoring |
Dead chip |
sctdc |
express_express |
RAW |
200,000 |
After end of run |
When available |
Performance monitoring |
Noise occupancy |
sctno |
express_express |
HIST |
5,000 |
After bulk reco |
When available |
Performance monitoring |
Raw occupancy |
sctro |
express_express |
HIST |
5,000 |
After bulk reco |
When available |
Performance monitoring |
Lorentz Angle |
sctla |
express_express |
HIST |
5,000 |
After bulk reco |
When available |
Performance monitoring |
Efficiency |
scteff |
express_express |
HIST |
5,000 |
After bulk reco |
When available |
Performance monitoring |
Bytestream error |
sctbse |
express_express |
HIST |
5,000 |
After bulk reco |
When available |
Performance monitoring |
The identification of the noisy strips used to reach easily the processing time limit. Its implementation was changed, splitting in two steps (generation of hitmaps and identification of noisy strips) and parallelizing the jobs that generate the hitmaps. As such, the hitmaps jobs don't generate any conditions data to be directly uploaded.
Timing in prompt calibration loop and ATLAS bulk reconstruction at Tier0
After the end of a run, a period of 48 hours is given to run the offline calibration, before bulk reconstruction starts at Tier0. The processing for
noisy strip,
dead chip and
dead strip:
- Uses RAW input datasets, that are made available immediately after the run finishes.
- Offline jobs are automatically launched on the input datasets (if the minimum statistics criteria is met) and creates conditions data.
- Typical processing time is a few hours (< 6 hours for noisy strip; it might take much time for dead chip and dead stip.).
- Outputs (conditions data) are copied to SCT server every one hour by a cron job and are then displayed on a dedicated web ( see below) to be ready for uploads.
Important : Before bulk reconstruction starts,
noisy strip has to be uploaded to COOL. 6 hours before bulk reconstruction starts, if the noisy strips have not been uploaded, and email is sent to the current shifter and the calibration loop experts. No upload timing requirement for
dead chip and
dead strip.
Processing for
noise occupancy,
raw occupancy,
efficiency and
bytestream error, using HIST
- Input dataset is HIST (monitoring histograms) and is made available after the bulk reconstruction finishes.
- Offline jobs are automatically launched on the input datasets and creates conditions data.
- Typical processing time is about 10 minutes.
- Outputs (conditions data) are copied to SCT server every one hour by a cron job and are then displayed on a dedicated web( see below) to be ready for uploads.
Upload to COOL
The upload of conditions data is manually scheduled from the calibration loop web(
see below). Every hour, a cron job reads the list of runs to be uploaded, and makes the actual upload. Noisy strip conditions data has to be finished within the fixed 36 hour time-window and well before the deadline. However, there is no upload deadline for the rest of tasks (
dead chip,
dead strip,
noise occupancy,
raw occupancy,
efficiency,
bytestream error and
Lorentz angle). The upload is done when conditions data is available.
Shifter work
The main duties of the shifter will be monitoring the offline jobs and uploading the conditions data generated to COOL. In case of any problems or questions, send an email to the experts.
On the top-left of the
24h Calibration Loop webpage there is a link (phone picture) that leads to the email form. It takes about 10 seconds to load a list of e-mail addresses (reading on-call expert from OTP). Please wait until the list of e-mail addresses appears before sending a message.
To upload conditions data to COOL go to the
24h Calibration Loop webpage
. The login for the shifter is automatic, reading the OTP, and you can check it at the top of the page. On the top-right corner you will see two boxes:
- The first box will show Last Run Uploaded and the number and date of the last noisy strip run uploaded to COOL
- The second box will show the last time the upload cron was executed (over a blue background) or a message over a red background warning that the upload cron stopped working. In this case, please email the experts.
At the bottom of the page there will be a list of runs (retrieved from the ATLAS Run Query) within the time-window of the prompt calibration loop that can be uploaded. To mark runs for upload:
- Select one of the tabs,
- select runs clicking on the checkboxes in the upload colums,
- click the Send button.
- Repeat procedure for each of the tabs
Once every hour a cron job will check which runs have been marked for upload to try and upload them. Before uploading
noisy strips, check the the
Test column in the page. If it's
green, go ahead. If
yellow or
red, please send a message to the experts. Some time after marking the runs for upload, check whether the uploads have been successful or not. During the upload process, the
log field remains red, but it should change to green for successful runs. Yo can check check if the upload has failed opening the log file. In that case, please send an email to the experts.
The offline calibration jobs can be monitored in the
Task Lister web
. You can filter the SCT jobs selecting
sctcalib in the
UserName selection menu on the left. You can choose a conditions task in particular checking it using the the
TaskType selection menu. Only jobs from the last 3 days are shown. If you want to check previous jobs, you can click the
Get older data button. One click shows 3 additional days (6 in total). A further click shows another 9 days (15 days in total).
The jobs should be defined for the runs found in ATLAS Run Query (plugged into the upload page) [currently being fixed]. However, no job will be defined if the input datasets don't satisfy the stable beam criteria and don't have enough statistics.
what should I check in the Task Lister? In the
Status column, below
Task Information you can check the status of the jobs:
- If the status is RUNNING, it means the job is defined, but still not really running, but waiting for the input dataset;
- if the status is a yellow band , the job is running;
- if the status is FINISHED, the job has finished successfully.
- Other colors will appear in the transition between states.
- If the jobs fail, please email the experts.
Don't worry that the field
#Events always shows 0. It should be that way. You can check the log file of FINISHED jobs opening the drop down menu that will appear clicking on the
#Done field In the case of failed jobs, the menu will apper when clicking on the "#Abrt." field.
Useful links
Problem Solving
- I cannot upload flags
- I cannot access http://atlasdqm.web.cern.ch/atlasdqm/DQBrowser/DBQuery.php
.
- please send elog message (message type=data quality, DQ_Type=offline) detailing the problem, with as much information as you can provide.
- Please continue with the Dq reports, make a note of which flags you wish to set for what runs. Email Helen Hayward with the reason why you cannot upload the flag (including error message if any), and the flags you wish to set.
- How can i confirm i have updated the flags correctly?
- The SCT DQ WebTool
is not working
- Check the status of the server using http://atlasdqm.cern.ch/alive/
or http://atlasdqm.cern.ch:8088/alive/
depending on whether you are trying to see the production or development version (8088)
- If the server is down, please send elog message (message type=data quality, DQ_Type=offline) detailing the problem.
- If the server is up, do you observe the same problem on the pixel and trt sub-pages?
- if yes, the server may need to be restarted: please send elog message (message type=data quality, DQ_Type=offline) detailing the problem, with as much information as you can provide.
-
-
- if no: please email Helen Hayward and Graham Sellers with the details.
- Note if the pixel/trt pages are slow and the sct page fails... this can still be a server issue which means the server needs to be restarted.
- please notify Helen Hayward and Graham Sellers by email.
- If this is a non-SCT specific DQ problem, and is not fixed after ~15 minutes, please call the DQ expert (161809)
- I do not understand a histogram flag set by the DQMF ?
- please email Helen Hayward and Gabe Hare
- I cannot use http://atlas-runquery.cern.ch/query.py
- please email Andreas Hocker and Jörg Stelzer with the problem (cc. Helen Hayward)
- I cannot see the histograms using http://atlasdqm.cern.ch/webdisplay/tier0/
- Check the status of the server using http://atlasdqm.cern.ch/alive/
or http://atlasdqm.cern.ch:8088/alive/
depending on whether you are trying to see the production or development version (8088)
- If the server is down, please send elog message (message type=data quality, DQ_Type=offline) detailing the problem.
- if not resolved, or replied to within 15 minutes, please call the DQ expert (161809)
- Is it timing out on a SCT specific page?
- Please email Gabe Hare and Helen Hayward
- You can check the online DQM histograms at: http://atlasdqm.cern.ch/webdisplay/online/
Simply find run number and follow path SCT-MDA-Histogramming -> Entire Run -> Histogramming-SCT-iss. Note that when the SCT is in STANDBY, tracks are not reconstructed offline so these histograms will be empty in the offline DQM, but can be seen in the online DQM.
- You can retrieve the original monitoring.root file, to look at the histogram in ROOT from castor:
- I would like to see list of problem modules in easy text format with module serial numbers.
- please log in to lxplus
- The list of problem modules which usually appears in the automatic report can be found here:
- /afs/cern.ch/user/a/atlasdqm/dqmdisk1/cherrypy/static/text/sct/SctDqTxtFiles/
- If there is no file here (or it is empty), it usually means that it has not been produced yet. please wait a while
- if it has not appeared after a few hours, please contact Helen Hayward
- The Sct DqWebTool says "no flags"
- At least one DQ region is out of configuration. There is currently a bug in the DqWebTool. Please check the status of the flags using :
- The DCS flag appears white ?
- Please email Katharine Leney, Tim Andeen (cc. Helen Hayward)
- How to find which luminosity block corresponds to a certain time (i.e. You know the time at which a problem occured so you know what lumiblocks to flag).
- Bring up the ATLAS Run Query
- Enter "f r last < run number > / show all"
- Scroll over to one of the triggers that has lumiblock dependent rates and click on it to bring up the pop-up of rates
- This contains the timestamp of the start of each lumiblock in the list
- The DQM flag for the SCT configuration maps is non-green due to the number of noisy modules.
- Until further notice please set the SCT_MOD_NOISE_GT40 for the runs in question.
- This problem is likely due to a combination of a software bug
and a real effect.
- The value measured in the pass 2 reconstruction (f* reco tag) is currently more reliable than the pass 1 due to the nature of the bug.
- The value measured by the online monitoring should also be reliable - the online shifter has been instructed to include this in their elogs.