%CERTIFY%

ATLAS MDT Calibration Shift

This page contains all the relevant informations for MDT calibration shifts. Instructions, tutorials, useful links and contact persons are written or listed here.

warning.gif THIS WIKI IS UNDER DEVELOPMENT SHIFTERS ARE INVITED TO CONTRIBUTE TO IT!

Expert Shifter
  Period Contact
Domizia Orestano 18/04->25/04 email
Elena Solfaroli 11/04->17/04 email
Fabrizio Petrucci 04/04->10/04 fabrizio.petrucci@cernNOSPAMPLEASE.ch
Elena Solfaroli 21/03->27/03 email
Cesare Bini 14/03->20/03 email

Useful Links during shifts

* Munich LCDS - Michigan LCDS - Rome LCDS

* Calibration E-Log

* Muon Shifter White Board

* ATLAS Run Query

Other useful Links

* Shift booking

* Muon Calibration Wiki

Links to pages to be UPDATED BY THE EXPERTS

* Shifter Check List

* Conditions experts page

* AtlasMdtCalibShiftExperts

Before your Shift

You will have to
  • Obtain a valid GRID certification
  • Send a mail to your site Shift Manager including the DN of your certificate, he will take care of its insertion in the Access Control List of the Local Calibration Data Splitters at the 3 calibration sites. The DN of your certificate is a string like "/C=IT/O=INFN/OU=Personal Certificate/L=Roma 3/CN=Domizia Orestano" which can be obtained for example typing voms-proxy-info. Finally try to access all the 3 LCDS.
  • Belong to the e-group atlas-dataQuality-automatic-notifications to receive the daily list of runs to be analyzed. In order to subscribe to this list you have to:
    • Go to https://e-groups.cern.ch ;
    • Search e-group name contains: “atlas-dataQuality-automatic-notifications”;
    • Click on subscribe.
  • Spend some time navigating through this documentation and sit together with somebody taking a shift performing in parallel with him the operations listed below.
  • You are than ready to take your shift. You can book your shifts through the ATLAS OTP Tool following the instructions in AtlasMdtCalibShiftOTP.

Shift structure

There are two types of calibration shift: regular and expert. Unless explicitly stated, "shift" means a regular shift. There are three shifts a day, covering the hours of 0600-1400, 1400-2000 and 2000-0600 (CERN time). The expert shift lasts one week and the expert shifter is expected to coordinate the calibration activities throughout that week. In addition, the expert shifter should be the first port of call in case of problems or questions. You can find the expert shifter by searching for task number 529913 in OTP, and see also AtlasMdtCalibShiftExperts.

Overview

The Calibration Data Flow

  • Calibration stream data are selected by the LVL2 muon algorithm muFast at the pattern recognition level within a road around the muon track. The maximum rate is ~10KHz but 2KHz is adequate for calibration, so when a higher rate is available we reduce it selecting the more interesting fraction of it (e.g. pT cut or hits in >2 muon stations). See for more details ATL-MUON-PUB-2008-004.
  • Datasets of the RAW data files are automatically created and registered in DQ2. They are then automatically subscribed and replicated to the calibration sites, even as a run is in progress.
  • You can have a rough idea of the order of magnitude of the number of events from the file size assuming 0.5KB/event.
  • The Local Calibration Data Splitter running at the calibration sites looks regularly for the arrival of new datasets. Once a dataset is received it is split into fragments (corresponding to the calibration regions, 204 at the moment) - this is the "splitting" from which the LCDS gets its name. Calibration ntuples are then automatically produced from these fragments on the local batch farm.
  • The ntuples thus produced are used to produce a data quality assessment (DQA) and MDT calibration for each run. If statistics is not enough (~1M events for DQA, ~30M events for calibration) you can analyse many runs together. These two operations are controlled by the shifter and may be run in parallel.
  • The Data Quality Assessment looks at tube occupancy, drift time spectra and other quantities that give information about the quality of the detector, down to the tube level.
  • Calibration involves computing rt functions, extracting t0 values and resolutions, and validating the results. The granularity is automatic, adjusting between multilayer, mezzanine or tube calibration, depending on the available statistics.
  • The calibration results are stored in a local calibration database. When appropriate, they can be copied to the conditions database used for Tier0 reconstruction.

The Operations Time Scale

The cycle described above must be completed within 24h from the end of the run. Splitting and ntuple production starts already before the end of the run, as soon as a data file is completed and replicated to the sites.

Typical time scale for the above operations is (in case of normal operation):

  • latency between end of the run and distribution to calibration sites: less than one hour.
  • splitting and ntuple production processing time: ~10 hours.
  • DQA processing time: few hours.
  • calibration processing time: few hours.
  • the ORACLE replicas have a negligible latency.

Possible delays are related to the data distribution latency to the queue latencies and to possible time for decisions. We should be safely within the 24h. If the times are significantly larger than those indicated above, contact the expert.

The Local Calibration Data Splitter

Information about the LCDS and the links for the LCDSs of the 3 sites are listed in AtlasMdtCalibLCDS.

The Shift

Shifter Duties

Each single point in this list is described in the sections below.

--> Check the status of the Proxy at beginning of shift;

--> Submit an Elog entry at beginning and end of shift and when an operation is done (start FIT and start DQA, end FIT and end DQA);

--> Collect run information on the runs to process, eg from the ATLAS run query tool;

--> Monitor data transfer: Look at Atlas AtlasRunQuery page and compare with Dataset list and with ReplicationStatus at beginning and during the shift;

--> Monitor the status of fragment and ntuple production;

--> Launch the DQA on each datasets notified in the daily mail, when all ntuples are ready. If statistics is not enough (~1M events for DQA), analyse many runs together.

--> Launch the Calibration, when a new data sample with ~30 Mevents is available (all ntuples are ready). It is possible to calibrate group of runs, provided they have been taken in the same conditions not too far in time.

--> Monitor the DQA and calibration jobs.

--> For morning/night shifter: prepare the DQA/fit report for the 15:10 meeting;

--> For afternoon shifter: attend the 15:10 meeting and present the data (The meeting name is "Muon Daily Operations and Data Quality Meeting");

Each step will now be described in more detail. Note that your shift may not begin with monitoring the data transfer. Depending on the runs taken and the actions of the previous shifter, your work may begin at any point in the cycle.

Check the status of the proxy used by LCDS

Look at the home page of the LCDS: if the Proxy Time Left is less than 16h please inform the expert shifter. Note: please remember to check all three splitter sites!

The elog

An electronic logbook for calibration is linked from the menu on the left of the LCDS web pages. Shifters must use the elog to register:
  • the start of their shift;
  • the end of the shift with a short summary of it;
  • the beginning of a calibration (see below) and/or a DQA session;
  • the end of a calibration and/or of a DQA session;
  • all the observed problems.

Collect information on the runs

Information on the runs can be obtained from the Atlas run-query tool: other useful links:

Monitor the arrival of the datasets

If everything works correctly the calibration stream data transferred from Point1 to CERN castor are automatically registered as datasets (1 dataset per run, which may contain many files) and replicated to the 3 sites. Typical latency from the start of a run with stable beam and dataset "arrival" is expected to be ~1h. When a dataset is received it should appear in the datasets page accessible from the LCDS interface. Please note that the dataset creation time appearing in datasets page is referred to GMT, while all the other times are GMT+2. If after 1h from the start of the run the corresponding dataset is not received check its status using the replication status page of the LCDS. If you suspect a problem in data distribution, contact the expert shifter and, if agreed, send a single email to atlas-muoncalib-oper@cernNOSPAMPLEASE.ch AND atlas-adc-expert@cernNOSPAMPLEASE.ch (removing SPAMNOT from the address and in CC to the expert shifter) describing the problem and listing the sites affected. Record this information, the answer you get and the evolution of the problem in the elog.

Monitor the splitting

Datasets are split into fragments accordingly to a predefined region map. The map can be found in the LCDS configuration page, where the regions assigned to the calibration site are listed. Looking at the datasets page, given N regions the expected number of fragments when the dataset is SPLIT will be
  • N+1 times the number of files in the dataset, if the parameter streamGroupSize in configuration is set to 1 (one fragment contains everything not matching the N regions and will not be processed).
  • a smaller number when data from different input files are grouped in the same fragments (streamGroupSize larger than 1). By clicking on the number in the fragment column of a given dataset you access the list of its fragments and you can eventually check how their are grouped by looking at the range of input files (FILE1_FILE2) used to generate the fragment from the fragment name dataYY_calib.RRRRRRR.calibration_MuonAll.daq.RAW.o4.FILE1_FILE2-REGIONID.data.
However it is possible that some regions don't get any data (sectors off, trigger timing problems in a sector...). In that case the number of fragments will be lower.

As a general rule the shifter should check that the number of generated fragments is stable from dataset to dataset (taking into account the different number of files) in absence of hardware changes or changes in streamGroupSize parameter.

The LCDS also performs a check on the total size of the fragments for a dataset compared to the dataset size and may display the Tot Frag Size in red or yellow when inconsistencies are found. The criteria adopted in the check depend upon the splitting configuration in use. If the LCDS signals inconsistencies on every file a misconfiguration problem of the checks is possible and the Shift Manager should be informed, if the problem occurs only for some datasets it probably indicates a splitting error. Please log any anomaly you observe in the elog.

Monitor the ntuple production

ATHENA jobs for the production of the nutples are submitted by the LCDS for every fragment. Therefore in the presence of many input files jobs can be submitted also when the dataset is still in the SPLITTING status. Check regularly that the ntuples are indeed being produced for a given dataset by looking at the Ntuples field in the datasets page. It shows the number of submitted ntuple jobs together with the number of successfully completed ones in parentheses.

  • If the dataset status is SPLIT and the two number coincide the production is successfully completed.
  • If the dataset status is SPLIT and not all the ntuples are ready you can access the ntuples page for the dataset by clicking on, or hovering your mouse over, the number in the Ntuples column to check if there are ntuples job still running (ntuples in the CREATING status) or if the processing is finished by some faling jobs. Please record the fraction of failures in the elog.

Launch the DQA

To start the DQA session for a dataset, go to the dqa create page of the LCDS (open the MDT DQA/DQMF submenu if you can't see it). Then, select one or more dataset numbers in the Dataset selection menu and click Select. When the list refreshes, select the regions to analyse using the tick-boxes on the right hand side of the table. Normally, you will want to analyse all regions; in this case simply click the tick box at the top of the column, marked "Selection". Then scroll to the bottom of the page and click Create DQA histograms. From this point, the DQA processing is automatic, and after few hours a new entry should appear on the dqmf page. Once all four flags have been assigned, the DQA session is terminated.

During processing, before the run appears on the dqmf page, its progress may be monitored on the intermediate pages dqa histo, dqa merge and dqa finalize. In general, each sector must finish one stage before beginning the next.

Record DQA results

To record the DQA results, use the Powerpoint template provided. The analysis is split into two parts: DQA macro and DQMF.

The DQA macro is produced on the dqmf page. Click on the run ID you want to analyse, and make sure that all four regions have finished processing. The report is accessed using the Download Report column - if it has not yet been produced, you need to click on Start Macro and wait a couple of minutes before refreshing the page. Then download the report and compile the following results (numbers refer to rows in the powerpoint template):

  1. Occupancy percentage: Fill these for I,M,O in each region, so 12 numbers in all Read these off pages 3 and 4 of the report ("Fraction of ML alive");
  2. Average efficiencies: Find these on pages 10 and 11 of the report. Again, there are 12 numbers. Take special note if any are below 99%;
  3. Overall flags: Take these from the yellow table on the dqmf page.

In addition, make a note if the residuals (pages 8 and 9) are not within the expected band.

The DQMF part of the report requires you to open up the detailed histograms for each region. These are accessed by clicking on the globes in the Link column on the dqmf page. For a detailed description of the DQMF histograms and algorithms, see the ATLAS note ATL-MUON-INT-2011-001. In this, we are primarily looking for tubes and multilayers that are recorded as dead. Here we list the main histograms, what you might see and what to do about it in each case.

  • AllSectors/HitsPerML: If any of these are red, it means that one or more multilayers are absent. This will be correlated with chambers that have an "Undefined" DQ status. To find out if it is really dead, check whether the chamber was calibrated (use the calibration database observer, as described below). If it was calibrated, then this is simply a processing error and can be ignored. Else, record the name(s) of the affected chamber(s) in row 4 (New Dead MLs) of the table.
  • Other apparent errors in AllSectors can generally be ignored. For example, AllSectors/TDC_AllChambers_X_Y_Z and AllSectors/t0PerMLXXX are sensitive to t0 shifts and jumps, but these can be investigated in much more detail at the chamber and tube level.

Then, if you open up sectors with yellow flags, and scan for chambers with red or grey flags. There are four histogram groups: Chamber, Dead Status, Efficiency and Occupancy.

  • Chamber: If grey, see comment about AllSectors/HitsPerML above. If red check the TDC and ADC spectra, and add information as appropriate to rows 5 and 6 of your table. If other histograms are red, use the Dead Status, Efficiency and Occupancy histograms to understand why.
  • Occupancy: If this is red, check for obviously dead tubes, where the occupancy suddenly drops to zero. If the only "problem" is that the eta distribution has changed slightly, it can be ignored.
  • Dead Status: When you see one or more "dead" tubes, check the equivalent plots under Occupancy. Many chambers have limited acceptance, where the occupancy drops rapidly, but not straight to zero. If this happens and the "dead" tubes are located where there are limited statistics, make a note of the chamber but do not report it as dead. If no , record the dead tubes in row 7 of your table. Ignore red flags caused by tubes coming "back to life".
  • Efficiency: Only seems to be red or grey if there are problems elsewhere, ask the expert if this is not the case.

Once both DQA and calibration sections are complete (or anyway at the end of your shift), write a message in elog, attaching the report.

Record DQA results in the database

Insert DQA flags in the Detector Status Browsing page (https://atlasdqm.web.cern.ch/atlasdqm/DQBrowser/DBQuery.php) The flags to insert are those shown in the DQMF page, corresponding to the flag of the AllMDT folder.

Launch the calibration

The calibration is launched from the fittables page (open the MDT Fits submenu if you can't see it). The selection of the dataset and regions for analysis proceeds in the same way as for the DQA just described. Select one or more dataset numbers in the Dataset selection menu and click Select. Then on the right column click the tick box marked Selection if you want to calibrate all the regions, otherwise select only the regions you want to calibrate. Then scroll to the bottom of the page and click Fit. Launching all the calibrations takes some time so don't worry if the page is refreshed after a while.

Monitor the calibration jobs

Go to the fit page (in the MDT Fits submenu) and click on the dataset that is being fitted. A yellow list will appear with the status of all the regions. DONE means that the calibration is completed for that region, FAILED means that the calibration failed, FITTING means that the calibration is still in progress. The number of files in each state can also be found on the datasets page by hovering your mouse over the relevant entry in the Ntuples column. The number of failures is indicated in the rightmost column of the dataset entry. A failure rate of < 10% is normal, if it is higher, try to find out why.

The Calibration Database Observer

The Calibration Database Observer allows to follow the calibration procedure while it is proceeding and after it is finished; click on calibdb observer menu item of the LCDS to access it. Use the top left menu to select the database you want to check, and click the white arrow. The MUNICH_NEW, MICHIGAN_NEW and ROME_NEW entries correspond to the CERN replicas of each single local database. When you start a FIT session a new HEAD_ID entry appears in the list. That is the Identification number of that calibration. Clicking on the t0 or rt button you enter an overview scheme of the detector with a color code indicating the present status of the calibration for each region. The scheme is updated once new regions are calibrated.

The documentation can be found in https://twiki.cern.ch/twiki/bin/view/Atlas/MuonCalibDbObserver.

If calibration starts and no new HEAD_ID appears, it means that there is a replication problem, and the fit is only written to the local database. Replication problems must be recorded in the elog. However the calibration should continue, and the data are stored in the local calibration DB and will be replicated when the streams restart. In this case, and only then, use the local calibration database (e.g. ROME_LOCAL) to monitor the calibration.

End of calibration

The end of the calibration should be recorded in the elog. The logbook entry should contain the calibration header number (accessible from the Calibration Database Observer) and calibration quality information.

Check Last Calibration

This is a script allowing to check last calibration and compare its results to a previous reference one. In the LCDS left menu, click on the check last calibration button. The page will load showing the last comparison made. Assuming you want to check another calibration, enter appropriate HEAD_ID numbers (and sites) for the reference calibration and the one you want to analyse. Then click "Change". Creating the histograms takes a few minutes, after which the page will automatically refresh. There are 29 rows of histograms, each with one or two plots:

Rows 1 to 4: distributions of t0s and tdrifts for last calibration and their dependence on the sector.

Row 5: (left) distribution of the number of events for each fit; (right) distribution of the rising edge slopes from the fits.

Row 6: (left) t0 uncertainty from the fit as a function of the number of events; (right) tdrift uncertainty from the fit as a function of the number of events.

Rows 7 to 10: distributions of the differences of t0s and tdrifts between last calibration and reference calibration.

Rows 11 and 12: t0 and tdrift differences between last and reference calibrations as a function of the station index (0=BIL, 1=BIS, 2=BMS, 3=BML, 4=BOS, 5=BOL,)

Rows 13 to 24: t0 differences between last and reference calibrations for each station type as a function of the chamber index. (chamber index = 10 x phi + eta)

Rows 25 and 26: RT relations from last calibration

Rows 27 to 29: difference between last and reference RT relations, mean (line 28) and width (line 29).

Filling in the calibration report

The final four rows in the DQA/calibration report can be filled out using the "check last calibration" page once calibration is complete. By default, compare to the last calibrated run, unless instructed differently by the expert shifter. The information to record is this:

  • delta t0: mean and RMS: These are found in the plots on rows 7 and 9, in nanoseconds. Record each value for each region (8 numbers in total).
  • delta tdrift: mean and RMS: These are found in the plots on rows 8 and 10, in nanoseconds.
  • t0 jumps: Check the plots in rows 13 to 24. These show the t0 variation in each chamber between the two runs. Any variation of more than 10 ns should be regarded as a jump. Count the number of jumps, and record where they occur, in as much detail as possible.
  • RT max variation: Look at row 28 (second to last), with the mean variation of RT. Record the maximum absolute value taken in each region. Please multiply by 1000 to convert from millimetres to micrometres.

Finally, check the calibration DB observer for uncalibrated chambers (this is best done using the rt button) and note down their names - if there are significant numbers of them, record this in the AOB section of the report.

Once both DQA and calibration sections are complete (or anyway at the end of your shift), write a message in elog, attaching the report.

Template for presentation at the 15:10 muon DQA meeting

  • Template.ppt: Fill one table like this for each run and present at the 15:10 muon DQA meeting and attach as elog entry.

Tutorial for new shifter

Slides with the instruction for shifters, presented at the last tutorial session:

Previous edition:

Expert tasks

Setup SSH Keys for muoncal Account Access

We need all experts to provide ssh keys to be used to access the muoncal splitter account at each of our 3 calibration centers. Please follow the instructions documented here.

Constants insertion

The copy of constants from the calibration DB replica to the conditions database will be automatized at some point. Meanwhile it will be handled by experts (see AtlasMdtCalibShiftExperts#ConditionsExperts ). The code for this operation is public, it can be found in ATLAS CVS repository under offline/MuonSpectrometer/MuonConditions/MuonCondCoralCool, however the authorization.html file containing the writing password for the conditions DB is private and will be transferred from exiting expert to next expert.

Insertion validation

A soon as data are inserted in the DB the constants should be checked processing few events from the muoncali account using the AtlasTier0 release by the same person who took care of the copy. Note: this step is mainly devoted to check technical problems and not for detailed DQA of the constants and will probably be dropped once the system will show to be stable over a long period and resumed only in presence of major release or geometry changes. The outcome of the tests should be discussed among the shifters before the signoff meeting. What do we do if there is a problem at this stage? can data be retagged as bad to keep the previous calibration? Should we overwite the IOV with the previous good data?

Special operations

describe here how to re-enable the different processing stages in LCDS

Update of the proxy

Under your own account you should generate a proxy from your certificate with a reasonably long validity (e.g. 96h) by typing voms-proxy-init -voms atlas --valid 96:00 -out $HOME/proxy

copy the proxy file via scp to the calibration account and move it under LCDS/secret.

If all the protections in the storage elements are correctly set LCDs should work fine with all the ATLAS certificates. If you observe problems after the proxy update please inform your system administrators.

Deleting ntuples and fragments - Important note for Munich

During calibration, large ntuple files are created, which have to fit into a finite space on disk. The current disk status in Munich can be monitored here: http://lcg-lrz-site-mon.grid.lrz-muenchen.de/dcache-lrz/?time=4.

The ATLASCALIBSCRATCH space is managed entirely by the splitter. If it gets to more than ~80% full (4.5T), please do the following:

  • Identify older runs which do not need to be further analysed.
  • Go to the "datasets" page of the Munich LCDS, and select the run(s) to be removed (note: selecting just one run at a time seems to give better performance).
  • Click on "Delete Ntuples".
  • Repeat with "Delete Fragments" to be really thorough. Fragments are typically ~10 times smaller than ntuples, so these are not so critical.

Note that all deleted files can be recreated on demand using the "Enable" button on the same page, assuming the splitter is not currently processing another run.

resubmit the whole processing for a dataset

from the datasets page select the dataset to be reprocessed and click enable at the bottom of the page. The dataset will be re-split and the ntuples will be reproduced. Warning: if the splitter granularity has been changed, either for a change in the regions size or by changing the streamGroupSize parameter, the new files may not overwrite all the already existing ones, which should then be manually deleted after the end of the processing.

resubmit the calibration

ntuples already used by a calibration are disabled and not available for subsequent ones. If you want to repeat a calibration you should select them from ntuples page and click on enable. Warning: this is a non standard operation and it can be very boring...

changes in splitter configuration

changes to the splitter onfiguration from LCDS/etc/splitter.conf file should be perfomed only by the Shift Manager. The main parameter which may have to be modified during normal operation is the geometryTag which should be switched between ATLAS-CommNF-09-00-00 for toroid off to ATLAS-Comm-09-00-00 for toroid on. After a change in the configuration LCDS should be restarted.

restarting the LCDS

  • Log on the machine hosting the server (the one appearing in the http address!) using the calibration account
  • cd LCDS
    • source setup.sh
    • splitter stop
    • splitter start

Accessing DQMF web display in Munich

When DQMF results are produced in Munich, they are not all immediately available for viewing. To make the detailed histograms visible in the web display, the expert needs to log in to the Munich LCDS node and execute the following commands:

cd LCDS/
cp -r www/Run_XXXX_XXXX_Ref_YYYY_YYYY /data/uh351bz/www/

The dummy XXXX and YYYY should of course be replaced by the actual run numbers. The copy can take a seriously long time, of order an hour, due to the large size of these www directories. This should be improved in future, possibly using a cron job. Suggestions are welcome!

How to find out who the shifter is

  • Log onto OTP-TOOL
  • Method1
    • Click "Browser".
    • Click "Detector Operation" tab.
    • Click on "Shift Tasks"-"Muon" which is says "16 tasks".
    • Click "16 tasks" and get a pops-up menu listing the 16 tasks from which select "529188 - MUON CALIBRATION CENTER SHIFTER"
    • Do mouse-hover over shift calender to get a pop-up showing shifter
  • Method2
    • Click "Tasks"
    • Enter "529188" in Task ID box and click "Search"
    • A table with 1 line should appear. In that table click ID item "529188"
    • Do mouse-hover over shift calender to get a pop-up showing shifter

Troubleshooting

The web interface of LCDS is unaccessible

Check that you are using the correct link (see AtlasMdtCalibLCDS). If the interface crashed:
  • log on the machine hosting the server (the one appearing in the http address!) using the calibration account
  • cd LCDS
  • tail -100 var/log/splitter to check if there is an error which caused the crash, try to understand it and solve it, or ask the Shifter Manager to help you
  • once the error source has been fixed, or in case of no error, restart the LCDS

FAQS

-- MichaelFlowerdew - 02-May-2011

Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r9 - 2011-07-27 - ElizabethGallas
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback