SCT Calibration Loop expert's guide

Getting started

In the following we explain how to check out and work with the last version of the SCT_CalibAlgs package. The calibration loop should be kept updated to the last version of this package. As such, when you check it out on your personal directory you should have a copy identical to that runnning under /afs/cern.ch/user/s/sctcalib/testarea/latest. Modifications of the code should be done on your local copy and commited to svn. Then, the package under /afs/cern.ch/user/s/sctcalib/testarea/latest can be updated. The sctcalib account has no permissions to upload to svn: if you modify this directory you won't be able to commit the modifications without copying them first to your local copy (cumbersome and unpractical).

The first step is to load the configuration in your working area. Usually, you will create a testarea for the version you want to work with (the calibration loop is running on version 20.1.4.5 as of 2022-05-16 )

mkdir -p sct/testarea/20.1.4.5
cd sct/testarea/20.1.4.5

and create a setup file .asetup there, with the following lines

[defaults]
release  = 20.1.4.5
testarea = /sct/testarea/AtlasProduction-20.1.4.5/
builds   = True

Before checking out any version of the code, this configuration file has to be loaded, using the asetup command

source /afs/cern.ch/atlas/software/dist/AtlasSetup/scripts/asetup.sh --inputfile /sct/testarea/AtlasProduction-20.1.4.5/.asetup

Then, you can checkout the SCT calibration algorithms package and build it

cmt co InnerDetector/InDetCalibAlgs/SCT_CalibAlgs
cd InnerDetector/InDetCalibAlgs/SCT_CalibAlgs
mkdir run
cd cmt
cmt config
source setup.sh 
make

The directory structure of the package is:

:[/sctcalib../SCT_CalibAlgs] ll
total 50
drwxr-xr-x. 3 sctcalib zp  2048 Mar  9 10:26 genConf
drwxr-xr-x. 3 sctcalib zp  2048 Mar 17 14:05 scripts
drwxr-xr-x. 4 sctcalib zp  2048 Mar 17 16:35 python
drwxr-xr-x. 3 sctcalib zp  2048 Mar 19 09:21 SCT_CalibAlgs
-rw-r--r--. 1 sctcalib zp 19458 Mar 19 09:21 ChangeLog
drwxr-xr-x. 3 sctcalib zp  2048 Mar 20 17:12 cmt
drwxr-xr-x. 2 sctcalib zp  6144 Mar 20 17:13 x86_64-slc6-gcc48-opt
drwxr-xr-x. 4 sctcalib zp  4096 Mar 25 10:12 src
drwxr-xr-x. 5 sctcalib zp  8192 Mar 25 10:55 run
drwxr-xr-x. 3 sctcalib zp  2048 Mar 26 20:14 share

where:

  • src contains the c++ implementation of the calibration loop algorithms and the write-out to xml, root and sqlite output files.
  • scripts contains the python transformation (sct_calib_tf.py)
  • share contains configuration files
  • python holds two macros checking properties of the selected run to process
  • SCT_CalibAlgs stores heading files
  • run is the directory from which we'll execute the transformation
  • ChangeLog is... well, a changes log. Whenever a modification on the package is to be committed, the changes should be registered there.
To maintain the local copy of the package up to date, it's convinient to update it regularly. To do so, type
svn update

from the directory where the ChangeLog is.

Commiting changes

To get modifications into a new tag of the package, the ChangeLog has to be updated first. Just add a few lines at the beginning of the file writing the date, your name and email address, and a list summarizing the changes made in the package. The last line should be the new tag (for small revisions just add 1 to the last number of the tag). Then, from the folder where the ChangeLog is:

  • svn ci -m "<short update description>"
    . This will prompt a series of messages like:
Sending ChangeLog
Sending python/runInfo.py
Sending python/runSelector.py
Sending scripts/trfOnCAF.sh
Sending share/SCTCalibConfig.py
Sending share/SCTCalib_topOptions.py
Sending share/referenceOptions.py
Sending share/skeleton.sct_calib.py
Transmitting file data ........
Committed revision 

  • svn cp . $SVNROOT//<package>/tags/<tagname>
    . This will only print
    Committed revision <revision number>

Use from the command line

Sometimes it may be necessary to run the calibration loop locally (to track the source of errors, for example). The most direct way to exectue the calibration loop code is to use the transformation in

scripts/sct_calib_tf.py

It substitutes the former

scripts/sct_calib_trf.py

transformation. The two major changes are:

  • The latter used the PyJobTransformsCore libraries, which have been rewritten into PyJobTransforms
  • The input dictionaries and the output file were in pickle format, but now are in JSON format To execute the transformation from the run directory:
:[/sctcalib../SCT_CalibAlgs/run] source /afs/cern.ch/atlas/software/dist/AtlasSetup/scripts/asetup.sh --inputfile
:[/sctcalib../SCT_CalibAlgs/run] python -u ../scripts/sct_calib_tf.py --argJSON= >& logfile.dat &

The logfile, though optional, is very recomendable. Transformations tend to have very lengthy and rapidly varying outputs, making it very hard to track changes during execution. Besides, most of the output is copied to a predefined file called log.sctcalib, but not all. JSON (/ˈdʒeɪsən/ JAY-sən), or JavaScript Object Notation, is an open standard format that uses human-readable text to transmit data objects consisting of attribute–value pairs (wikipedia dixit). A JSON file consists of an array of couples of values, much resembling a python dictionary. The following is an example of an input dictionary for the noisy strip task:

{"'SCTCalibConfig": ["/afs/cern.ch/user/s/sctcalib/testarea/latest/InnerDetector/InDetCalibAlgs/SCT_CalibAlgs/share/SCTCalibConfig.py"],
 "doRunInfo": "True",
 "doRunSelector": "True",
 "input": ["data15_cos.00259237.calibration_SCTNoise.sctcal.HITMAP.c0#data15_cos.00259237.calibration_SCTNoise.sctcal.HITMAP.c0._0001.SCTHitMaps.root.1",
            "data15_cos.00259237.calibration_SCTNoise.sctcal.HITMAP.c0#data15_cos.00259237.calibration_SCTNoise.sctcal.HITMAP.c0._0001.SCTLB.root.1",
            "data15_cos.00259237.calibration_SCTNoise.sctcal.HITMAP.c0#data15_cos.00259237.calibration_SCTNoise.sctcal.HITMAP.c0._0002.SCTHitMaps.root.1",
            "data15_cos.00259237.calibration_SCTNoise.sctcal.HITMAP.c0#data15_cos.00259237.calibration_SCTNoise.sctcal.HITMAP.c0._0002.SCTLB.root.1",
            "data15_cos.00259237.calibration_SCTNoise.sctcal.HITMAP.c0#data15_cos.00259237.calibration_SCTNoise.sctcal.HITMAP.c0._0003.SCTHitMaps.root.1",
            "data15_cos.00259237.calibration_SCTNoise.sctcal.HITMAP.c0#data15_cos.00259237.calibration_SCTNoise.sctcal.HITMAP.c0._0003.SCTLB.root.1",
            "data15_cos.00259237.calibration_SCTNoise.sctcal.HITMAP.c0#data15_cos.00259237.calibration_SCTNoise.sctcal.HITMAP.c0._0004.SCTHitMaps.root.1",
            "data15_cos.00259237.calibration_SCTNoise.sctcal.HITMAP.c0#data15_cos.00259237.calibration_SCTNoise.sctcal.HITMAP.c0._0004.SCTLB.root.1",
            "data15_cos.00259237.calibration_SCTNoise.sctcal.HITMAP.c0#data15_cos.00259237.calibration_SCTNoise.sctcal.HITMAP.c0._0005.SCTHitMaps.root.1"],
 "maxEvents": "1",
 "part": ["doNoisyStrip"],
 "prefix": "data15_cos.00259237.calibration_SCTNoise.sctcal.NOISYSTRIP.c0_c0#data15_cos.00259237.calibration_SCTNoise.sctcal.NOISYSTRIP.c0_c0._0001",
 "splitNoisyStrip": "2"}

The meaning and use of the different options are explained elsewhere. This dictionary in particular corresponds to an example of local file reading, which isn't always the case for processes running inside Iter-0. Files have been up to now stored in CASTOR, but the main DAQ/Tier-0 data/work flows have been moved to EOS, and CASTOR will be only used for tape back-up. To read file stored in CASTOR (/castor/cern.ch/.../root) the file names have to prepended with root://castoratlas/. Files stored in EOS (/eos/atlas/.../root), need the prefix root://eosatlas/.

File description

/scripts/: transformation steering

In this folder there are many files from previous versions of the SCT_CalibAlgs package, but the one steering the transformation is sct_calib_tf.py. It:

  • Checks the input dictionary, reads the input parameters and modifies the input file names if necessary.
  • Calls runSelector.py to check whether the run can be processed.
  • If the task is doNoisyStrip, merges the SCTHitMaps.root and SCTLB.root files from the previous hitmap generation step.
  • In the postExecute there are a couple of hacks
    • In case the task is doDeadStrip or doDeadChips it might happen there are no dead strips or chips. In that case, the mycool.db sqlite file won't be generated and the transformation will fail when checking there's no such an output file. As an exception, when that happens the COOL file will be removed from the list of needed output files (lines 455-507 of sct_calib_tf.py). The good way to do it would be having optional arguments in the transformation. Graeme Stewart has open a ticket in JIRA asking for that development
    • If during the execution of the algorithms there's at least one ERROR Unknown offlineId for OnlineId error message in any event, the transformation will not not finish successfully. However, a small amount of these errors is admisible and a larger amount in one event of the whole sample of events read shouldn't make the transformation fail. To avoid the transformation failing, instead of writing these errors to the log file as ERROR we consider them WARNING. Then, the package InnerDetector /InDetEventCnv/SCT_RawDataByteStreamCnv was modified to fire an incident whenever this error message appears. Module SCT_CalibEventInfo "listens" and counts the number of incidents per event: it there are more than 10, the event is skipped, and the transformation keeps executing.

/share/: configuration files

The configuration files inside /share/ set the value of variables used later by the code during its execution.

  • SCTCalibConfig.py contains the minimum input necessary to run each algorithm. The calibration loop always uses an input dictionary, which overwrites some of these options. Among others, it sets the tags written to local DB and XML files. These options are particularly important: the tags written into the files determine the COOL tag to which they will be uploaded. A tag in the file not matching the corresponding tag in the COOL folder which cause the creation of a new tag, which won't always be desirable (specially if by mistake)
  • skeleton-sct_calib.py is included from the sct_calib_tf.py file. I
  • ReadCoolUPD4.py

/python/: runInfo and runSelector

runInfo.py directly queries a RunControlDB, checking for data concerning the processed run. runSelector.py is a wrapper around AtlRunQuery.py. They are called from the transformation whenever doRunInfo and doRunSelector, respectively, are set to True in the job dictionary.

runSelector

For non-cosmic runs runSelector.py checks whether the run has a stable beam. If it doesn't, the transformation will raise an exception ("=run selection didn't pass Stable Beam check--- job will be finished=") and stop executing. If you find this message in the log file of an unsucessful job, probably the run didn't have a stable beam. runSelector.py is particularly important when running the noisy strips algorithm. The noisy strips of a given run should not be processed until the last good run has been uploaded: part of the algorithm consists on finding the change in noisy strips with respect to the last run uploaded, and writing out this difference to a sqlite file. In case there are processed noisy strips runs waiting to be uploaded, the runSelector holds the execution of the calibration algorithms, periodically checking if every previous good run has been uploaded.

AtlRunQuery

The AtlasRunQuery application allows to check the runs that fulfill a series of conditions. For example, to check the stable beam runs from the last 3 days, with calibration_SCTNoise stream, and more that 10000 events (minimum required for noisy strip processing):

AtlRunQuery.py --time "last3d" --lhc \"stablebeams TRUE\" --partition "ATLAS" --detmaskin "240A" --projecttag "data*_*eV" --streams "*calibration_SCTNoise 10000+" --show "streams *calibration_SCTNoise" --show run --show events --show time --show "lhc" --noroot --nohtml --verbose

Options that could be changed:

  • time: to look for run in a period of time ending now, one can use the
    "last<number><unit>"
    string, where
    <unit>
    can be m,h,d. To query runs in a period of time, one can use "d1.m1.y1-d2.m2.y2".
  • stablebeam: by default it's set to TRUE in all runs except cosmic runs.
  • projecctag: to select cosmic runs change to
    data*_*cos
  • streams: the stream could be changed to
    *express_express
    , and the minimum statistics changed to a different value if needed. k is allowed as a shortcut for 1000, and instead of minimum statistics, maximum statistics can be requiered as well, changing + for -.

This command, in addition to generating an output with the information, creates a folder, data, with the result of the query as a txt file (QueryResult.txt), a xml file (MyLBCollection.xml), and a pickle dictionary (atlrunquery.pickle).

Data dictionary

There's a minimum set of input parameters needed in the dictionary to execute the transformation. In case some of these are missing, default values are assigned.

  • part determines which algorithm is going to run. It needs to be any of [doNoisyStrip,doNoiseOccupancy,doDeadChip,doDeadStrip,doHV,doBSErrorDB,doRawOccupancy,doEfficiency,doLorentzAngle,doNoisyLB]. In case a different values is given the transformation will throw an exception and fail. If no value is give, the defaul is doNoisyStrip
  • doRunInfo can be either True or False and determines where runInfo.py is executed or not. The default value is False
  • doRunSelector can be either True or False and determines where runSelector.py is executed or not. The default value is False
  • splitNoisyStrip can take values 0, 1 or 2. Its value is only relevant when the value of part is doNoisyStrip. The default value is 0.
    • 0 processes the noisy strips for the list of RAW input files. This options is seldom used: processing enough files as to fullfill the minimal statistics necessary takes too long. The task is instead in two steps, first processing RAW files to generate HITMAP files, in a few different jobs that can run simultaneously, and the merging these hitmaps and analyzing them.
    • 1 corresponds to the first of the steps mentioned in the item inmediately above. RAW files are read to generate HITMAP files, but the hitmaps are not analyzed. No sqlite file is generated in this case.
    • 2 corresponds to the second step, when the HITMAPS files are merged into two files named SCTHitMaps.root and SCTLB.root, which are then analyzed. This is the proper noisy strip task.
  • MaxEvents set the maximum number of events to be processed. To process all events, independently of their number, it's set to -1. When part is doNoisyStrip with splitNoisyStrip equal to 2 this argument should be set to 1.
  • prefix is needed for naming conventions inside tier0, and by copying macros moving files between locations. The default is an empty string, but when the tasks are automatically defined inside tier0 the value of the prefix is assigned.
  • input is the list of input files. There is no default value and without the input list the transformation will be aborted.
  • SCTCalibConfig is the path to the transformation configuration file. Inside tier0 it is automatically defined to always refer to /afs/cern.ch/user/s/sctcalib/testarea/latest/InnerDetector/InDetCalibAlgs/SCT_CalibAlgs/share/SCTCalibConfig.py. For local running it should be changed to the same file, but in the local path.

Task definition and tasklister

Since September 2014, the calibration loop code is running directly inside tier0. As such, it's the Data Preparation group who is in charge of the activating/deactivating the tier0 processes and setting their configuration. However, we are still in charge of preparing the configuration that will passed to them, and being aware of any changes that should be made, so we can inform them. The TOM configuration as of 10.6.2015 can be found in = /afs/cern.ch/user/s/sctcalib/public/sctTOM_10062015.cfg=. The following table summarizes the configuration in that file.

Task tasktype file read # events project stream dataset name other comments
Hitmap scthm RAW >10000 data15_cos OR data15_*eV *calibration_SCTNoise.daq.RAW    
Noisy Strips sctns HITMAP >10000 data15_cos OR data15_*eV *calibration_SCTNoise.sctcal.HITMAP.c0   run number >266903
Efficiency scteff HIST >5000 data15_cos OR data15_*eV *express_express.merge.HIST.f*   run number >266502
Noise Occupancy sctno HIST >5000 data15_cos OR data15_*eV *express_express.merge.HIST.f*   run number >266502
ByteStream Errors sctbse HIST >5000 data15_cos OR data15_*eV *express_express.merge.HIST.f*   run number >266502
Raw Occupancy sctro HIST >5000 data15_cos OR data15_*eV *express_express.merge.HIST.f*   run number >266502
Lorentz Angle sctla HIST >5000 data15_cos OR data15_*eV *express_express.merge.HIST.f*   run number >266502
Dead Chips sctdc RAW >200000 data15_cos OR data15_*eV *.merge.RAW express run number > 999999 [currently under validation]
Dead Strips sctds RAW >200000 data15_cos OR data15_*eV *.merge.RAW express run number > 999999 [currently under validation]
The Task Lister webpage shows a list of the tasks defined (running or finished) for each run. On the left there's a checkbox menu which can be used to filter the tasks defined for the SCT calibration loop, checking sctcalib. Only jobs from the last 3 days are shown. Past jobs can be retrieved on the Get older data button. One click shows 3 additional days (6 in total). A further click shows another 9 days (15 days in total).

TaskLister.jpeg

In the Status column, below Task Information the status of the jobs can be:

  • RUNNING, it means the job is defined, but still not really running, but waiting for the input dataset;
  • a yellow band , the job is running;
  • FINISHED, the job has finished successfully.
  • Other colors will appear in the transition between states.
  • If the jobs fail, it will be necessary checking the log file.

To display the link to the log file one has to click on the link corresponding to the field #Done (for successful runs) or the field #Abrt. (for failed/truncated runs). It is also possible the peek the log of a still running job. Failed runs can be reactivated (if they failed due to bugs in the code or a wrong dictionary, and should be relaunched) or truncated (if they should not be repeated, for a lack of stable beam, for instance):

  • Click on #Abrt., as if to show the link to the job log.
  • Click on Truncate Task or Reactivate failed jobs
  • An username and password will be asked. In case you don't know them, send an email to Alberto Gascón.
  • The name username/password combination is used to peek the log of jobs still running.

In the following subsections we describe each task. When we way say subcomponent we mean any of the endcaps or the barrel. is a string composed of the project, the run number, the stream, the task, and a job number identifier from the tier0 processing (i.e. data15_13TeV.00267367.calibration_SCTNoise.sctcal.NOISYSTRIP.c0_c0._0001.).

Hitmaps and noisy strips.

The execution of the noisy strip task used to take longer than the maximum computing time allowed in tier0. As such, it was divided in two steps.

Generation of hitmaps

The argument part of the input dictionary has to be set to doNoisyStrip, splitNoisyStrips equal to 1 and MaxEvents equal to -1. A list of RAW files is processed and two output files are generated:

  • .SCTHitMaps.root
  • .SCTLB.root

Noisy strip processing

The argument part of the input dictionary has to be set to doNoisyStrip, splitNoisyStrips equal to 2 and MaxEvents equal to 1. After processing the noisy strips, 6 files are generated:

  • .BadStripsSummaryFile.xml for every module it shows 3 different values, StripOfflineAll, StripOfflineNew and StripOfflineRef. Each corresponds to the list of noisy strips, the list of new noisy strips (compared to the last run uploaded), and the list of noisy strips of that module in the last run uploaded, respectively.
  • .BadStripsNewFile.xml contains a list of the new noisy strips of every module, compared to the last uploaded run. As such, modules with no noisy strips, or modules that have the same noisy strips as the last run uploaded, don't appear in this list.
  • .BadStripsAllFile.xml contains a list of the noisy strips of every module (with at at least one noisy strip), whether they were present in the last run uploaded or not.
  • .mycool.db
  • .SCTLB.root and .SCTHitMaps.root

Efficiency

To run the efficiency task the part option of the input dictionary should be set to doEfficiency. The task creates 3 output files:

  • .EfficiencySummaryFile.xml shows the &Phi averaged efficiency for every subcomponent, layer and &eta. There's an example here
  • .EfficiencyModuleSummary.xml this file shows instead the efficiency for every individual module. Example here
  • .mycool.db sqlite file with the information of the averaged efficiency.

Noise occupancy

To run the noise occupancy task the part option of the input dictionary should be set to doNoiseOccupancy. The noise occupancy task creates 3 output files:

  • .NoiseOccupancySummaryFile.xml shows the &Phi averaged noise occupancy for every subcomponent, layer and &eta. There's an example here
  • .NoiseOccupancyFile.xml this file shows instead the noise occupancy value for every individual module. Example here
  • .mycool.db sqlite file with the information of the averaged noise occupancy.

Lorentz Angle

To run the lorentz angle task the part option of the input dictionary should be set to doLorentzAngle. The task is computed only for the barrel modules. The task creates 3 output files:

  • .LorentzAngleSummaryFile.xml shows the values of the lorentz angle and the minimum cluster width for every barrel layer and module type [100,111] and side. There's an example here
  • .LorentzAngleFile.xml this file shows additional information about the lorentz angle fit (parameters and errors). Example here
  • .mycool.db sqlite file with the complete information (lorentz angle, cluster width and fit parameters) of the lorentz angle fit.

Raw occupancy

To run the raw occupancy task the part option of the input dictionary should be set to doRawOccupancy. The task creates 2 output files:

  • .RawOccupancySummaryFile.xml shows the &Phi averaged raw occupancy for every subcomponent, layer and &eta. There's an example here
  • .mycool.db sqlite file with the information of the averaged raw occupancy.

Bytestream errors

To run the bytestream errors task the part option of the input dictionary should be set to doBSErrorDB . The bytestream errors tasks looks for the different kind of errors that a module can have (BSParse, TimeOut, BCID, LVL1ID, Preamble, Formatter, ABCD, Raw, MaskedLink, RODClock, TruncROD, ROBFrag). It generates 3 output files:

  • .BSErrorSummaryFile.xml contains the total number of bytestream errors per subcomponent, layer and value of &eta. Here is an example.
  • .BSErrorModuleSummary.xml details the number of errors of each kind, for every individual module. Here is an example.
  • .mycoold.db sqlite file with the information corresponding to .BSErrorSummaryFile.xml.

The Calibration Loop webpage

The interface allows any user to inspect the latest processed runs, and the results for all the different tests. It also allows the current shifter and any other user with the rigth permissions to request runs to be uploaded, or mark them as bad. The source code of the webpage, the list of runs to be uploaded and the results of the calibration loop are only accessible logging to sctcalib@pc-sct-www01 from the sctcalib account.

Code

The files generating the webpage are stored in /var/www/html/24hLoop/. This folder is in the pc-sct-www01 machine, which can be accessed from the sctcalib account. The basic files needed for the generation of the webpage are (with a daily updated link to the files)

  • variables.php stores the definition of many arrays used all through the other two files. In particular, they contain the names of the files resulting from the transformation, the name of the streams for each task, and the name of the tasks. These names determine, among other things, which runs are shown in the webpage. The current values (2022-05-16) of the stream an file names are:
            $stream_toread = array("NoisyStrip" => "calibration_SCTNoise","NoiseOccupancy" => "express_express","RawOccupancy" => "express_express",
                                                  "DeadChip" => "express_express","Efficiency" => "express_express","ByteStreamErrors" => "express_express",
                                                  "DeadStrip" => "express_express","LorentzAngle" => "express_express","Help" => "None");
            $file_toread = array("NoisyStrip" => "BadStripsSummaryFile","NoiseOccupancy" => "NoiseOccupancySummaryFile","RawOccupancy" => "RawOccupancySummaryFile",
                                                  "DeadChip" => "DeadSummaryFile","Efficiency" => "EfficiencySummaryFile","ByteStreamErrors" => "BSErrorSummaryFile",
                                                  "DeadStrip" => "DeadSummaryFile","LorentzAngle" => "LorentzAngleSummaryFile","Help" => "None");

  • ShowInformation.php reads the xml files in /var/www/html/24hLoop/Results with names defined in variables.php and displays a user-defined number of them (now 20). There's a php file with the name of each tasks (Efficiency.php,NoiseOccupancy.php... etc ) , but it's only for organizational pourposes, since all of them are links to ShowInformation.php. There's a tab for each task, with some fields common for all tasks (Run Number, Start time, End time, Duration, Events) and other fields particular for each task. For every run in any of the processes the last 3 fields allow to request the upload of a run, to mark it as bad and show if the run was already uploaded, respectively.
  • uploadrequest.php after a series of runs have been marked for upload (checking the corresponding checkboxes) one has to click the Send button. Then, uploadrequest checks which runs have an upload request and a file containing the path to the corresponding sqlite file is created in
    /var/www/html/24hLoop/toupload
    . An automated shell command checks this folder every hour, looking for upload requests. If there are any, AtlCoolMerge.py is used to upload them. For more details on how to upload conditions data, see sqlite file upload
  • index.php Displays the general layout of the webpage and calls other modules. It checks which is the last run, and whether it has been uploaded. If it has been uploaded, it will show:

Uploaded.png

If it has been processed but not uploaded, the message will be:

NotUploaded.png

It also checks if the uploadCron is working. To do so, it compares the current time with the time of the modification of file uploads.log, where the uploadCron writes its output. The uploadCron executes every hour. If the last upload from the cron was less than 1 hour ago (i.e., it is working as expected) it will show the following message:

CronUploadRight.png

If the time difference is bigger than one hour it will show a warning message:

CronUploadWrong.png

  • getRunQuery.php it should make a query using the ATLAS run query application. It should retrieve the information given by the webpage in table format and display it at the bottom of the webpage. However, it doesn't work right now.

Crontab and automated shell commands

The software utility Cron is a time-based job scheduler in Unix-like computer operating systems. A crontab specifies shell commands to run periodically on a given schedule, using the following syntax

 # * * * * *  command to execute
 # │ │ │ │ │
 # │ │ │ │ │
 # │ │ │ │ └───── day of week (0 - 6) (0 to 6 are Sunday to Saturday, or use names; 7 is Sunday, the same as 0)
 # │ │ │ └────────── month (1 - 12)
 # │ │ └─────────────── day of month (1 - 31)
 # │ └──────────────────── hour (0 - 23)
 # └───────────────────────── min (0 - 59)

where a * in the slot for _minute_/_hour_/_day_/... functions as _every minute_/_every hour_/_every day_/... The crontab stored in pc-sct-www01 looks like TaskLister1.jpeg

  • SendMail looks for runs that are neither uploaded nor requested to be. In case it finds any it sends en email to the current shifters and the appointed administrators of the webpage, indicating which run is not uploaded and how much time is left before it should not be uploaded.
  • lastRun checks which was the last run uploaded, and copies it to a plain file in /afs/cern.ch/user/s/sctcalib/scratch0/lastRun
  • restartTOM is no longer used. Those lines are kept there just in case.
  • uploadCron is a wrapper around AtlCoolMerge.py. It looks for files in

    /var/www/html/24hLoop/toupload
    that indicate that a run should be uploaded (those files always have the structure $runnumber.$test.$uploadtag.log) and copies them to the corresponding COOL folder. The command used is
    /afs/cern.ch/user/a/atlcond/utils/AtlCoolMerge.py --ignoredesc --nomail --comment='SCT NoisyStrips' $dbfile $db ATLAS_COOLWRITE ATLAS_COOLOFL_SCT_W PWD 
    where $dbfile is the path to the sqlite to be uploaded, $db is the database to which the sqlite files are uploaded (CONDBR2 for the noisy strips, MONP200 for the rest of tests), PWD is the password corresponding to the ATLAS_COOLOFL_SCT_W account (not written here for security reasons)
  • mrsync synchronizes
    /afs/cern.ch/user/s/sctcalib/scratch0/results/
    and
    /var/www/html/24hLoop/Results/

To modify the cron table, one has to use

crontab -e

. The text editor will open the cron table file stored in /tmp/, and once the modifications are finished it should be closed with Crtl-X, and the changes saves with the default name. The frequency of execution in the cron table is there for historical reasons ("it was like that when I arrived") so feel free to improve and modify it.

Certificates

To display the shifter help in the Help tab and to show the information of the ATLAS run query at the bottom of the page, the appropiate cookies are needed. Without them, the authentication fails, and this information is simply not displayed. We need two different certificates, one for each page. To get the certificate for the help page one has to execute:

cern-get-sso-cookie --krb -r -u https://twiki.cern.ch/twiki/bin/viewauth/Atlas/SCTOfflineMonitoringShifts/ -o ssocookie_twiki.txt

and to get the certificate for the run query page:

cern-get-sso-cookie --krb --nocertverify -r -u https://atlas-runquery.cern.ch -o ssocookie_query.txt

where the options mean:

  • --krb Use current user kerberos credentials or user certificate for acquiring SSO cookie
  • -r Reprocesses the cookie file to a format which more cookie-handling libraries can understand. Sets expiry date in future for session cookies (24h)
  • -u url gets a certificate for CERN SSO protected site url
  • --nocertverify Disables peer certificate verification. Useful for debugging/tests when peer host does have a self-signed certificate for example.

For the run query certificate, the option --nocertverify is needed due to problems with the certificates of the query web application. Since the certicates expire after 24 hours, they have to be renovated every day, task done by the implementation of a new cronjob, credentialsCron. This cron inits session with a keytab (i.e., in a secure way) as agasconb, generates the credentials for the twiki page and the run query application, places them in /var/www/html/24hLoop/txt and grants them reading rights. The credentials generated for the sctcalib account are somehow invalid to access these pages, and hence we need using another user. And without reading permissions the credentials cannot be read by the php application.

sqlite file upload

The calibration algorithms write a sqlite file (with the structure data%year%_%stream1%.%runnumber%.%stream2%.sctcal.%TASK%.c0_c0._0001.mycool.db) for each of the tasks performed except for the hitmap generations, which is a substep of the noisy strips tasks. The sqlite files can be uploaded to the COOL database from the 24h calibration loop webpage, where the last 20 runs processed are shown, with a tab for each different task. For each task and run there are two fields with checkboxes, upload and bad. In addition, the NosiyStrip tab has a Test field, where whether the noisy strips analysis fulfills some quality criteria is shown with green.png, red.png, yellow.png or grey.png.

  • Noisy strips runs flagged as green.png are safe to be uploaded and the upload checkbox should be marked. Runs with any of the other three states should be checked, and might be unsuitable for upload. If they are deemed unrecoverable, the bad checkbox should be marked.
  • The rest of conditions data can in principle be uploaded as long as they belong to a stable beam run or a good cosmic run. If the calibration loop is working during STANDY, empty output files will be generated, and runs with values not worth uploading will be shown in the webpage

Once all runs are marked one way or another, one should click on the Send button. That will trigger /var/www/html/24hLoop/uploadrequest.php to create a file in /var/www/html/24hLoop/toupload for every run marked before.

Once every hour, uploadCron will look for new files in this folder, and try to upload the sqlite files they refer to.

Which noisy strips should be uploaded? In general, only noisy strips data for stable beam runs should be uploaded, considering they pass all the quality checks. As a general rule, one should consider what will be best for the next stable beam run. Cosmic runs should be for the most part ignored, but they might be useful in two cases:

  • If for some reason (stable beam runs too short) it hasn't been possible to upload noisy strips conditions for some runs, uploading a cosmic run could provide a more recent record of conditions data.
  • During a technical stop the noisy strips could change and it'd better to upload the cosmic run rather than not using them.

In any of those cases, if the next stable-beams runs are uploaded normally, the cosmic run upload will be irrelevant anyway.

uploadCron

uploadCron (/home/sctcalib/bin/uploadCron) is a sh macro that scans the files in /var/www/html/24hLoop/toupload and uploads them using AtlCoolMerge.py. To use AtlCoolMerge.py the environment is set to athena version 17.2.14.4. The pc-sct-www01 machine runs on Scientific Linux 5 (slc5), and more recent releases are not built for slc5. First, the cron uploads the files corresponding to noisy strips:

  1. It loops over the files that match *.noisy.*.up in /var/www/html/24hLoop/toupload/
  2. From the file name it extracts the run number (first 6 characters) and the tag (characters 14-17)
  3. The file content is the path to the sqlite file, and variable $dbfile is set to that path
  4. AtlCoolMerge.py is executed with the following command: /afs/cern.ch/user/a/atlcond/utils/AtlCoolMerge.py --ignoredesc --nomail --comment='SCT NoisyStrips ' $dbfile CONDBR2 ATLAS_COOLWRITE $USER $PASSWD > $directory$run.noisy.$tag.log, where
    • --ignoredesc forces AtlCoolMerge to ignore discrepancies between the header of the sqlite file and what the COOL database is expecting. This should be fixed at some point
    • $dbfile is the path to the sqlite file to be uploaded
    • CONDBR2 is the database where noisy strips are stored (it used to be COMP200)
    • $directory$run.noisy.$tag.log is the name of the log file created, placed in the same folder where the upload requests are.
  5. The file containing the path to the sqlite file is deleted. If it weren't, the cron would try to upload it again when it executes the next time.

Then, the cron uploads the files corresponding the rest of tasks, following a similar algorithm:

  1. It loops over the files that match *.UP?.up in /var/www/html/24hLoop/toupload/
  2. From the file name it extracts the run number (first 6 characters), the task (characters 8-12 ) and the tag (characters 14-16)
  3. The file content is the path to the sqlite file, and variable $dbfile is set to that path.
  4. AtlCoolMerge.py is executed with the following command: /afs/cern.ch/user/a/atlcond/utils/AtlCoolMerge.py --nomail --nobackup --ignoredesc --comment='SCT MONP200 Folders' --destdb=MONP200 $dbfile CONDBR2 ATLAS_COOLWRITE $USER $PASSWD > $directory$run.$test.$tag.$log, where
    • The database is now MONP200. That's where the conditions data for the rest of test is stored.

AtlCoolMerge.py does not need the user and password provided by the shifter. They are already in place in the macro, but just in case they haven't been written here.

Troubleshooting

  • The log file of a failed job says:

Run : run selection didn't pass Stable Beam check--- job will be finished

The runSelector.py macro forces non-cosmic runs to have a stable beam. If they don't, the transformation will throw an exception, and the execution will end.

  • Successful runs in the task lister don't appear in the webpage.
The mrsync cronjob runs every four hours, so it could simply be that after the processing has finished, the run hasn't been copied to the /var/www/html/24hLoop/Results folder. If more that four hours have passed the task was finished, check /var/www/html/24hLoop/variables.php. The name of the streams and files that are considered and shown in the webpage are defined there, in the $stream_toread and $file_toread variables, respectively.

  • HIST tasks (efficiency, noise occupancy...) finish sucessfully, but their contents seem off:
    • Very low or 0 efficiencies.
    • Lorentz Angle fits not converging.
    • Very large noise and raw occupancies
AND the hitmaps are generated but the noisy strips processing fails.

This is most likely due to problems in the contents of the HIST files, not to the algorithms themselves. For example, when the SCT data taking is in standby, the HIST files will be generated but they will be essentially empty.

  • A noisy strip task is waiting for a previously failed noisy strip task to finish or be uploaded.
This should rarely happen, but in that case, it's possible to update the lastRun file manually to release the task. In any case, email the experts before modifying the files.

  • The upload of conditions data fails, and the in the log there's the message: =kinit(v5) : Preauthentification failed during while getting initial credential=
uploadCron reads the keytab file /home/sctcalib/.do_not_delete to get the username and password needed to upload. If password of the sctcalib account has been changed but this file has not been updated, the uploads will fail. The instructions of how to update the keytab file can be found in /home/sctcalib/README

Contact persons and updates

Updates, developments and problems should be updates in the SCT weekly meetings (held on Wednesdays at 10:30, room extension 10351177, pin 4088, as of 2022-05-16). You can check the list of meetings here.

-- AlbertoGasconBravo - 2015-03-27

Topic attachments
I Attachment History Action Size Date Who Comment
PNGpng CronUploadRight.png r1 manage 2.6 K 2015-06-10 - 14:34 UnknownUser Messages shown by Calibration loop web page when upload cron is(n't) working
PNGpng CronUploadWrong.png r1 manage 3.1 K 2015-06-10 - 14:34 UnknownUser Messages shown by Calibration loop web page when upload cron is(n't) working
PNGpng NotUploaded.png r1 manage 2.7 K 2015-06-10 - 15:06 UnknownUser messge shown if last run has not been uploaded
JPEGjpeg TaskLister.jpeg r1 manage 1075.8 K 2015-06-10 - 16:52 UnknownUser tasklister screenshot
JPEGjpeg TaskLister1.jpeg r1 manage 284.6 K 2015-06-19 - 15:39 UnknownUser  
PNGpng Uploaded.png r1 manage 3.6 K 2015-06-10 - 15:05 UnknownUser messge shown if last run has been uploaded
JPEGjpeg crontab.jpeg r1 manage 107.4 K 2015-03-27 - 10:47 UnknownUser  
PNGpng green.png r1 manage 1.8 K 2015-06-10 - 11:00 UnknownUser calibration loop webpage signs
PNGpng grey.png r1 manage 1.9 K 2015-06-10 - 11:00 UnknownUser calibration loop webpage signs
PNGpng red.png r1 manage 2.0 K 2015-06-10 - 11:00 UnknownUser calibration loop webpage signs
PNGpng yellow.png r1 manage 1.4 K 2015-06-10 - 11:00 UnknownUser calibration loop webpage signs
Edit | Attach | Watch | Print version | History: r20 < r19 < r18 < r17 < r16 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r20 - 2017-03-06 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback