Page structure:
Introduction to AlCaReco
For a description as to what
AlCaReco is, look
here (this was a bit technical for me at first) and I have a file I used myself to try and understand it here:
Description.txt
The basic idea of
AlCaReco is that when a new release of CMSSW comes out, we should not see a discrepancy in the plots outputted by one release compared to the release directly before it. This means that the code didn't cause some new bug or other generally bad thing.
The full process I use to validate datasets can be found in this text file:
Validation_Instructions.txtI've tried to be as fully comprehensive as possible there. This is the file that you can find all locations of scripts and directories, as well as links to various pages needed in the process. If you run into any issues while running the macros, look in the
error fixing section to see if I've already dealt with it before.
To compare the plots that come out of the validation to see if they are 1:1, I use a python script:
Validation_Ratios. This is meant to compare each bin entry of the plots of the most recent release with the bins in the plots from the previous release. It then outputs a plot for each of the 9 plots it compares, showing the ratio of the most recent to the previous. To run it, put the .root files from running the alcarecovalidation_cfg.py somewhere together and edit the file names in the format shown and change the path to where your files are located. Can be run with python with pyROOT.
If you are looking to put some plots into a presentation, here is a python script that will remove the stats box and make them look a bit nicer:
Cleanup_Plots. Alternatively if there are not too many you can just open them in a TBrowser and do it the GUI way.
Summary/History of the AlCaReco Task
First, a list of all datasets and any issues with them for a quick history and reference.
Release | Issue | Global Tag | etc...................................................... |
7_0_0_pre4 | no issue | PRE_62_V8 | |
7_0_0_pre5 | no issue | PRE_62_V8 | |
7_0_0_pre6 | calibTree error | PRE_62_V8 | error fixed, see error fixing |
7_0_0_pre7 | no issue | PRE_62_V8 | |
7_0_0_pre8 | calibTree error | PRE_62_V8 | error fixed, see error fixing |
7_0_0_pre9 | no issue | PRE_62_V8 | |
7_0_0_pre10 | discrepancy | GR_R_62_V2 | see error fixing |
7_0_0_pre11 | no issue | GR_R_62_V3 | |
7_0_0_pre12 |
no issue
| GR_R_62_V3 |
|
7_0_0_pre13 | no issue | GR_R_70_V1 |
|
7_0_0 | no issue | GR_R_70_V1 | |
7_1_0_pre1 | discrepancy | GR_R_62_V3 | see error fixing |
7_1_0_pre2 | no issue | GR_R_70_V1 | |
7_1_0_pre3 | mb2010b missing | GR_R_71_V1 | validated by running on all data for 2011A vs pre2 |
7_1_0_pre4 | no issue | GR_R_71_V1 | |
7_1_0_pre5 | no issue | GR_R_71_V1 |
|
7_1_0_pre6 | AccessAlCaReco not compiling | PRE_R_71_V2 |
error fixed, validated
|
7_1_0_pre7 | AccessAlCaReco not compiling | PRE_R_71_V2 | error fixed, validated |
7_1_0_pre8 | no issue | PRE_R_71_V3 | |
7_1_0_pre9 | no issue | GR_R_71_V4 | |
7_2_0_pre1 | no issue | GR_R_72_V1 | |
7_2_0_pre3 | afs quota, not filled | GR_R_72_V2 | error fixed, validated |
7_2_0_pre4 | no issue | GR_R_72_V2 | validated with ValDB |
7_2_0_pre5 | no issue | GR_R_72_V2 | validated with ValDB |
All of these datasets were validated with the mb2010b with the exception of 710pre3, so that's the set in which the discrepancies were found. The discrepancy for 710pre1 was most likely because there was a lack of data or there were two files for the data and I chose only to compare 1. The discrepancy for 700pre10 was said to be a global tag change or issue between that and the previous release. See error fixing for more info.
Suddenly the 2011 and 2012 datasets were back up and running again, and I needed to check if they could be used for validation, and was also necessary because 710pre3 needed to be validated. To do this, it was recommended that I compare 7_1_0_pre2 with 7_1_0_pre1 to see if the 2011 and 2012 sets had the same discrepancy that the 2010 did. So I did the comparison of the two sets for 2010B, 2011A, 2012A,C,D because that is all of the sets that are given in the hypernews announcement (2011A,2012A,C,D are all of the possible sets). After the validation plots were created for pre2 and pre1, I ran my comparison code on them as usual, except it showed higher and higher discrepancy as the datasets reached 2012D, and we don't know why. The same pattern was shown for a comparison done with 710pre5 and 710pre4, which was a 1:1 ratio with 2010B, but not so with the later sets.
This is now known to be because the 2011 and 2012 had multiple files of data, so the comparison was not comparing the same events. Look in "Using 2011 and 2012 datasets for validation" to know more.
Since the new 2011 and 2012 datasets had so much more data than the 2010, notably more root files to run on, there was a question of whether or not it was better to run on more data and also whether it changed anything if we ran on data towards the end of a set rather than the beginning. So, using 7_1_0_pre2, I ran on 100K events and I also skipped 200k and then ran on 10K. The results were that neither of them seemed to change the plots too much, and I would think that running on the first 10K events of the 2010, 2011, 2012 datasets would be fine. We're looking into using a single later dataset like 2012 for validation in the future.
There was also a concern about the calibration trees produced, and whether the plots were outputting as normal. The few plots I was familiar with in them seemed to be fine. This task is apparently meant to be done by someone else and also a lot more in-depth, so from here on I will not be looking at the calibration trees.
You can find minutes of meetings I've had along the may under indico events with the name listed at
this link
, just search for it. They are usually on Mondays.
Here is a link to presentations I've done, with slides and plots and some details on problems:
Using 2011 and 2012 datasets for validation
The releases besides 2010B that are available for
AlCaReco are:
2011A, 2012A, 2012C, 2012D
These were originally not working, but as of around December 2013, have been. The first thing I did to check to see if these can be used for validation was to check and see if I saw the same discrepancy in 710pre1 when compared to 710pre2 but while using the 2011 and 2012 sets. The plots came out as shown in this zip file:
pre2 vs. pre1
Discrepancy was seen, but it was different with each plot. I ran it again on two releases, 710pre5 and 710pre4 that were previously seen to be 1:1, and got the same result, with different discrepancies than were seen from pre2 vs pre1. Plots:
pre5 vs. pre4
The process I used for comparing these are as follows:
- In the area /afs/cern.ch/cms/tracker/sistrvalidation/AlCaRecoValidation/config/ I edit the alcarecovalidation_cfg.py by adding in the file name of the root file for the release in which I need the plots from, here for example 710pre2.
- I get this root file name from DAS by inputting the name for the mb201xX I need, here for example 2011A this is obtained from the relval hypernews announcement of the standard relval samples.
- I choose a single root file with 10k events or more from DAS.
- With the file edited with only the file name changed, I save and exit.
- I do: cmsRun alcarecovalidation_cfg.py
- it runs and outputs the root file of the 9 plots.
- I download this file to my computer
The same process is then done for the other release to be compared with, here being 710pre1.
- now that I have two .root files with 9 plots each, I put them in the same directory and run this comparison code.
- I change only the two top lines, 'current' and 'previous' to the file names of the root files on my computer
- I run the script, and it outputs the comparison plots seen in the zip files above.
No changing of analyzer code or any C++ code was done. I did eventually change it to auto-name the files outputted by alcarecovalidation_cfg.py, but this discrepancy was seen before any editing of C++ code was done.
*UPDATE(6/25/2014)
It looks like the problem is that I wasn't actually comparing the same data for the two releases. In DAS, I was simply picking the first file in the list, which most likely didn't have the same event info as the file I picked for the other release to compare it to. I ran on all of the data for each release, and then compared them, which resulted in a very close to 1:1 ratio. So this almost fixed it, and the next step is to find some way to select a certain range of events in all of the data and run on that for both releases to know that I am comparing the same exact data.
*UPDATE(7/08/2014)
The tool to do this is to append:
eventsToProcess = cms.untracked.VEventRange('runnum:eventnum-runnum:eventnum','runnum:eventnum-runnum:eventnum','...')
After the cms.Source call.
I'll have to figure out which runs and event numbers to use for this.
*UPDATE(7/08/2014)
This fixed it! I ran this on both 710pre5 and pre4 and the comparison plots came out 1:1. The problem all along was that I was not looking at the same data for each release.
*UPDATE(7/22/2014)
For event ranges, I ran on all 2011, 2012 datasets and it came out 1:1 for each comparison. Plots:
Event_selection_pre4_pre5
*UPDATE(8/11/2014)
I compared the number of events,lumis,and run numbers in each dataset to eachother and it came out like:
710pre4
dataset |
number of events |
event range(min-max) |
lumi range(min-max) |
run number |
mb2010b |
35647 |
190-971910881 |
1-706 |
149011 |
mb2011a |
390650 |
200-178977952 |
1-467 |
165121 |
mb2012a |
822499 |
28104613-1963292844 |
77-1688 |
191226 |
mb2012c |
295985 |
25085848-918395522 |
69-804 |
199812 |
mb2012d |
536068 |
8000-1587871744 |
1-1330 |
208307 |
710pre5
dataset |
number of events |
event range(min-max) |
lumi range(min-max) |
run number |
mb2010b |
35647 |
190-971910881 |
1-706 |
149011 |
mb2011a |
389105 |
200-178977952 |
1-467 |
165121 |
mb2012a |
822499 |
28104613-1963292844 |
77-1688 |
191226 |
mb2012c |
292508 |
25085848-918395522 |
69-804 |
199812 |
mb2012d |
527293 |
8000-1587871744 |
1-1330 |
208307 |
Looking at the event numbers, it is apparent why the mb2010b dataset was a good one for the validation: it had the same amount of data in both sets, and all of this was in one root file. That means even if we chose 10000 events to run on, it was always going to be the same data. In every release the number of events has been the same in 2010b compared to the release before it, except the comparisons in which we had discrepancies. 710pre2 vs. 710pre1 for example, didn't have the same number of events in them. This could possibly explain the discrepancies we saw earlier in 710pre1 and 700pre10. In any case, running on a selection of events that is in both will solve the problem.
I then saw which lumis and events were in one release but not the other, and in both. These pickle files contain lists of events and lumis that hold:
- The events in the first file of pre4 that are in all of the dataset of pre5
- The events in the first file of pre5 that are in all of the dataset of pre4
- same as first but for lumis
- same as second but for lumis
By collecting this data I can now run the alcarecovalidation_cfg.py script on either:
- only the events from the first file of one release that were also in the other, to create a 1:1 ratio out of data that previously did not come out 1:1
- switching it up and doing it for the other release
- recreating the original discrepancy plots by running on the original 10000 discrepancy events of one release and then the original 10000 of the other
To run on this list of events I had to modify the event range with the line: (still looking for this)
A good candidate for the validation now seems to be 2012A, as it historically seems to have more stable numbers of events across releases, as well as more data (sistripcalzerobias sets):
release |
mb2010b |
mb2011a |
mb2012a |
mb2012c |
mb2012d |
700pre4 |
35647 |
592 |
99475 |
45377 |
78690 |
700pre5 |
35647 |
592 |
99475 |
45377 |
82133 |
700pre6 |
35647 |
592 |
99475 |
45377 |
82133 |
700pre7 |
35647 |
592 |
99475 |
45377 |
82133 |
700pre8 |
35647 |
592 |
99475 |
45377 |
82133 |
700pre9 |
35647 |
592 |
99475 |
45377 |
82133 |
700pre10 |
35647 |
373072 |
797285 |
284276 |
518064 |
700pre11 |
35647 |
390650 |
820464 |
286176 |
535041 |
700pre12 |
35647 |
390650 |
822499 |
295985 |
536068 |
700pre13 |
35647 |
383915 |
822499 |
295985 |
536068 |
700 |
35647 |
390650 |
822499 |
295985 |
536068 |
710pre1 |
22842 |
179134 |
233087 |
93583 |
162020 |
710pre2 |
35647 |
389069 |
803211 |
295985 |
529573 |
710pre3 |
missing |
390650 |
822499 |
295985 |
536068 |
710pre4 |
35647 |
390650 |
822499 |
295985 |
536068 |
710pre5 |
35647 |
389105 |
822499 |
292508 |
527293 |
710pre6 |
35647 |
390650 |
822499 |
293733 |
529034 |
710pre7 |
35647 |
383222 |
798306 |
293642 |
526175 |
710pre8 |
|
|
|
|
|
710pre9 |
35647 |
390650 |
822499 |
|
536068 |
720pre1 |
35647 |
0 |
0 |
|
0 |
720pre2 |
|
|
|
|
|
720pre3 |
35647 |
0 |
0 |
|
0 |
But in the 720 datasets, these have stopped being filled completely. The mb2010b set continues to be stable and in one root file for every release. I think that the method that should be used from here on (or as long as that dataset lasts) is to use about 10000 events in 2010B for validation, and if that doesn't exist then use an event selection of range 10000 in 2012A, since it has multiple root files. The 2012A will be the most stable choice in terms of amount of data and stability of numbers of events.
Changes to CMS code
For auto-naming the files outputted by alcarecovalidation.py:
In every new release directory there is an analyzer in this location (once you copy it from the previous release):
- /CMSSW_7_1_0_pre5/src/TestArea/AccessAlCaReco/src/AccessAlCaReco.cc
This is the analyzer used by alcarecovalidation_cfg.py. In that, I added a new parameter for naming the output file by adding this line directly after the constructor:
- :outfilename(iConfig.getUntrackedParameter<std::string>("outfilename"," "))
And then adding the name parameter in the endjob() function:
- dqmStore_->save(outfilename);
For auto-naming the files outputted by produceCalibrationTree_RelVal_withoutBigNTuple_ForCollisionRuns_cfg.py:
I moved this line:
- process.add_( cms.Service( "TFileService",fileName = cms.string( outfilename ),closeFileFast = cms.untracked.bool(True) ) )
to a place after the process.fileNames was declared, and added a variable to hold the filename mb20xx:
- outfilename = 'calibTree_'+process.source.fileNames[0].split('/')[-3].split('_')[-1].split('-')[0]+'.root'
To fix the AccessAlCaReco not compiling error:
-added an include in
AccessAlCaReco.cc:
- #include "DataFormats/SiStripCluster/interface/SiStripCluster.h"
Error Fixing
Fixing the calibTree error
The producecalibtree macro wasn't running, can't find original error message. Fix was:
-in
CalibTracker/Configuration/python/setupCalibrationTree_cff.py,
modified
TkCalFullSequence = cms.Sequence(trackFilterRefit +
OfflineGainNtuple + hiteff)
into :
TkCalFullSequence = cms.Sequence(
MeasurementTrackerEvent + trackFilterRefit +
OfflineGainNtuple + hiteff)
-I wasn't responsible for the fix, it was discussed by relevant experts.
The discrepancy seen
So far the discrepancy seen in 700pre10 and 710pre1 haven't had a conclusion as to what caused it. The main theory is that the global tags that are clearly different for those releases are the cause.
Can't find a server to open the .root file
This error occurs sometimes when running the alcarecovalidation_cfg.py. I found this is usually a temporary issue, I solved it simply by waiting a few hours and trying again.
SCRAM_ARCH not available issue
When I went to validate 710pre6, it gave me this error upon cmsrel'ing for the new release:
- ERROR: Release "CMSSW" version "CMSSW_7_1_0_pre6" is not available for arch slc6_amd64_gcc472. "CMSSW_7_1_0_pre6" is currently available for following archs. Please set SCRAM_ARCH properly and re-run the command. slc6_amd64_gcc481 slc6_amd64_gcc490
This was solved by simply setting the environment variable SCRAM_ARCH to the gcc490 with:
- echo $SCRAM_ARCH (to see that it was indeed not one of the ones suggested. It wasn't, it was gcc472)
- export SCRAM_ARCH=slc6_amd64_gcc490
Disk Quota issue
I found that this was mostly out of my hands, but if you are using large files it may be your fault. So far the workaround I have done is move to working in an AFS work space instead of the other space. Here's how I got the AFS space:
- https://account.cern.ch/account/Management/MyAccounts.aspx
- Under "Account tasks" go to "Services..."
- This should be on the first page, go to "AFS Workspaces"
- Click on the left tab "Settings", and you should see buttons to increase your home folder space and to create your workspace. Click on the button to create your workspace.
- This was done instantaneously for me, and it should give you a link to the directory of your space.
One thing that's convenient to have after this is set up is a symbolic link to your work directory in your home directory, so you don't have to cd to the directory every time. To do this, I did:
- ln -s /afs/cern.ch/work/d/dorzel/ work
AccessAlCaReco not compiling
-it was having a problem with anything to do with
SiStripCluster in the code, so the fix was to explicitly include:
#include "DataFormats/SiStripCluster/interface/SiStripCluster.h"
because the code was previously included in
SiStripClusterCollection.h, which is included in AccessAlCaReco, but it no longer includes SiStripCluster.
Credentials not found for certificate login
-I solved this by getting a completely new certificate and following the instructions here:
https://ca.cern.ch/ca/Help/?kbid=024010
-Then do chmod 400 on userkey.pem, should do the trick for logging in
"Certificate issuer not trusted"
-This error occured for me when I was trying to use xrootd to open an online file in root. I found that logging back in to my certificate fixes this.
Segmentation violation when trying a TBrowser
-This is usually solved on my windows machine by closing and restarting xming. Haven't yet had this problem on my linux machine. Must be due to a localhost display thing.
"Warning in <TClass::TClass>: no dictionary for class edm::
-seems to be with issues with access to files, not sure yet
Discrepancy in 2011 and 2012
With the
AlCaReco validation, with some releases, I saw discrepancy in comparing the plots of one release to another. This was because I wasn't actually looking at the same data in each release. Fix this by selecting an event range in the cfg.
3/24/2014 -When going to validate 710pre4, noticed that mb2010b, which was said to be cut from the datasets, was back in the hypernews announcement and existed on DAS.
-Tried to create new area with cmsrel and got: "cannot make directory CMSSW_7_1_0_pre4 Disk quota exceeded"
-When researching what hist1.divide(hist2) does, it appears that it just does a divide operator, "h1/h2" and is different notation. This divides every bin in hist1 by every bin in hist2.
-Sumw2 (sum of the weights) keeps the errors correct when scaling histograms (modifies the errors correctly when scaling).
-Looked at possible use of the "stack" function to draw histos on top of one another (see second link under ROOT-useful examples).
-changing filename on producecalibtree may be as simple as moving the "process.add" lines, where the file name "calibtree.root" is stated, to below the process.source lines and doing some string manip on the filename by using:
- newname = process.source.fileNames[0] #strip,split,etc.
to get the first file name to do string manip on.
-Results of comparing 700pre12 vs 700pre11 for 2011,2012:
- the disk quota popped up again for this, runs the cfg, then says it can't open the file due to disk quota.
4/6/2014 -disk quota problem solved by deleting a large file from a directory in my area. Still not sure if it is based on the machine I connect to or if the problem is solved.
-Noticed that the mb2010b dataset is back in 7_1_0_pre4 and 5, so it was just gone in pre3.
-Validated 710pre4 mb2010b, compared to 710pre2 had 1:1 ratio
-Validated 710pre5 mb2010b, compared to 710pre4 had 1:1 ratio
4/14/2014
-ran comparisons on 2011 and 2012 for 710pre5 vs. 710pre4, saw discrepancy still. Different discrepancy from what was seen in 710pre2 vs. pre1.
-calibTrees now automatically name themselves! Used the method mentioned in 3/24/2014 log.
-disk quota turned out to be an issue out of my control
4/22/2014
-tried to create release directory for 710pre6 but got the following error:
- ERROR: Release "CMSSW" version "CMSSW_7_1_0_pre6" is not available for arch slc6_amd64_gcc472. "CMSSW_7_1_0_pre6" is currently available for following archs. Please set SCRAM_ARCH properly and re-run the command. slc6_amd64_gcc481 slc6_amd64_gcc490
4/27/2014
-looked at code and details of signal bias scan on avery's twiki:
https://twiki.cern.ch/twiki/bin/view/Main/AveryArjo?forceShow=1 -tried to: cmsrel CMSSW_5_3_7_pre5, as that was said to be the release tested with the code and necessary to set an environment variable for the script, got error:
ERROR: Unable to find release area for "CMSSW" version "CMSSW_5_3_7_pre5" for arch slc6_amd64_gcc472.
Please make sure you have used the correct name/version.
Found out this isn't in scram list, unavailable?
5/5/2014
-cmsrel'd CMSSW_5_3_7_pre14 instead, and that made the script work. -commented out the line:
because the next line:
- tree = gDirectory.get(outTree)
takes care of it.
-script worked, produced output that was in line with what Avery detailed, pre14 seems fine to use
5/6/2014 -solved SCRAM_ARCH by setting the variable to the ones suggested. More info in "error fixing".
-worked on looking at the events in the root files used for the 710pre5 vs pre 4 comparisons, looked at ways to open the files in root
5/7/2014
-tried to access file in root, there was an error that caused the file object to be at 0x0
-had to log on to my proxy/grid certificate in order to use xrootd it turns out, did so by using the command: voms-proxy-init -voms cms -hours 96 -valid=96:0
-error in my certificate issuer not being trusted, issued a new certificate and tried again
-error when trying to log in again to my certificate after extracting my cert and key from mycert.p12 file and putting them in my .globus:
-
[dorzel@lxplus0059 private]$ ./certlogin.sh
Logging in...
Credentials couldn't be loaded [/afs/cern.ch/user/d/dorzel/.globus/userkey.pem, /afs/cern.ch/user/d/dorzel/.globus/usercert.pem]: PEM data not found in the stream and its end was reached
No credentials found!
Done.
5/16/2014
-disk quota issue solved, attempted to cmsrel a new 710pre6 directory, got an error telling me to switch to arch 481 and not 472, switched.
-cmsrel worked fine, then copied
TestArea over from previous directory. did scram b and got many errors in compiling
AccessAlCaReco. Numerous undefined variables, unexpected tokens, expected, etc.
5/18/2014
-solved the grid certificate issue from 5/7, just needed to follow steps correctly. Entry created in error solving
-solved the error with xrootd that I got in 5/7, now it's complaining about many cases of:
- "no dictionary for class edm::" however this doesn't stop me from being able to open a TBrowser and look at the file.
5/20/2014
-worked on looking at event data in files, was able to open TBrowser and get the needed plots from 710pre5, but not pre4, as there was a segmentation violation every time I tried to open a TBrowser.
-trying to write a script that compares the three leaves that I need since I can't directly save them
-I think the problem I'm having with opening a TBrowser is in the numerous
- Warning in <TClass::TClass>: no dictionary for class edm::(stuff) is available that I'm getting. I get this when trying to run my script and when trying to open a TBrowser. Can't find any solutions so far
-got plots for pre5 and pre4 2010B because TBrowser suddenly worked (same procedure)
5/21/2014
-to make sure that the
AccessAlCaReco compilation error wasn't due to anything edited in the code, I copied the
TestArea directory from 700pre13 and ran scram b on it. No difference. Also does not matter what SCRAM_ARCH is set to.
-the problem with the script at least is in the "from ROOT import *" line. "import ROOT" doesn't even import anything. So it must be access to the include files that are gone. This problem is similarly seen in opening the online file in root.
-tried with downloads the file with xrdcp, TBrowser worked the same percentage of times, ended up getting all of pre4 event data needed, moving on to pre5
5/22/2014
-problem with TBrowser appears to be fixed by closing and restarting xming
-got all the data needed for comparison of pre4 and pre5, when switching from 2010 to 2011 to 2012, they switch run numbers, making me think that the root files I chose for 2011 and 2012 were different segments of data, and maybe running on all of the data will fix this.
-ran on all data for 2011A runs for 710pre5 vs. 710pre4. The data got a lot closer to 1:1, but since there was much more data this could have been due to more statistics.
5/26/2014
-Tried figuring out the
AccessAlCaReco error again, it seems no matter what release I choose to copy
TestArea from, it doesn't work. This is only for 710pre6 and 710pre7 as well.
-added a .rootrc file to my home directory to see if it would automatically add cms-specific libraries to my root and perhaps solve some issues. It worked correctly, but didn't solve anything with the "no dictionary for class..." error. Tried taking the lines out of the file and inputting them into root in the same directory as the script I was running, no change.
- Rint.Logon: $(HOME)/private/rootlogon.C (in .rootrc)
- {gSystem->Load("libFWCoreFWLite.so");AutoLibraryLoader::enable();} (in rootlogon.C)
-looked up info on what I may not be including in my script or root sessions
6/2/2014
-got script to run correctly for HW3
NoiseBiasScan stuff.
6/6/2014
-fixed the
AccessAlCaReco compilation error, see error fixing
-validated 710pre6
-validated 710pre7
6/17/2014
-validated 710pre9
6/24/2014
-validated 710pre8
-validated 710pre3 with all data compared with 710pre2 both using 2011A
6/30/2014
-validated 720pre1
7/8/2014
-found the event selection parameter for alcarecovalidation_cfg.py, look in "using 2011 and 2012 datasets for validation"
-this seemed to fix the discrepancy error
7/16/2014
-so far in seeing if the events are the same in two root files, I've gotten the objects associated with the event data (Events/EventAuxiliary/id_/event_) but it is a
TNonSplitBrowsable, which I'm not sure how to loop through... maybe I'll just grab all of them and add them together.
8/8/2014
-Much progress has been made on the source of the error in 2011 and 2012 plots. It most certainly seems like we werent comparing the right data. Doing a few checks to make sure this is the case, and to wrap up
AlCaReco.
8/18/2014
-going to figure out how to run the .cfg with a list of events rather than a range.
8/20/2014
-space is freed up now so I'll validate 720pre3 and pre4
Signal Bias Scan
Introduction to Signal Bias Scan
I took over for Avery Arjo in this task, and he documented this as well in his twiki here:
https://twiki.cern.ch/twiki/bin/view/Main/AveryArjo?forceShow=1
The Signal Bias Scan is a way to monitor radiation damage in the
SiStrip detector. This is done by looking at leakage currents at certain known voltage points in the detector. There are special runs that are conducted in order to get this information called Signal Bias Scans. Several of these have been done so far:
(history of runs here) There is a chip called the DCU that reports the leakage current information during these runs, along with a few other things. This information is combined with a log file from the control room that stores the current voltage point as well as the local time the voltage was reached. There is one complication that arises from this: that the DCU stores all of its time information in unix time stamp format, or epoch time, and the log file stores its information in CERN local time. Information from the .root file from the DCU and the log file from the control room have to be synced, so that means the time has to be synced. My attempt at doing this is here:
DCUDataAnalyzer_Revised.py where most of the base code was written by Avery and I have tweaked in to try and fix the issue. It seems so far to fix it for some plots, and break it for others. I also added ntuple code.
Verifying the Daylight Savings Time correction
-in order to look at this, I first looked at the history of the signal runs here:
https://twiki.cern.ch/twiki/bin/viewauth/CMS/SiStripSignalHVScans
-the run we are looking at is of Sept. 28th, 2012 and daylight savings time for switzerland in 2012 is this:
- March 25: +1 hour
- October 28: -1 hour
-so in the run on Sept. 28th, Switzerland was in +1 hour daylight savings time. Since switzerland is UTC+1, this means that during this run, Switzerland was in UTC+2.
-The time data that the DCU stores is in the format of the Unix Time Stamp:
http://www.unixtimestamp.com/index.php
which has no daylight savings time by definition
-The file Avery looked at was from 1348837200 to 1348862400, which translates to 9/28/12 1:00:00pm UTC to
9/28/12 8:00:00pm UTC, a 7 hour run
-These unix times from the DCU need to be synced with the times reported during the ramping in Sep28Log.txt, which are in the format:
- 000nn yyyymmddhhMMss
- where nn = number of voltages taken at that point. (These are taken at the beginning of holding the high voltage steady, and then when it stops holding)
- yyyymmdd = year, month, day
- hhMMss = hours(military),minutes,seconds in CEST (UTC+2) because of daylight savings
-Looking at the times in the log, the first step's timestamp is 20120928172736, and the last step's timestamp is 20120928211848. These fall within the range of the run, which is 15:00(UTC+2) - 22:00(UTC+2). This also must mean the the times recorded in the logs are indeed in local time, as they fall within the bounds 15:00-22:00, which is in local time.
-Since Unix time is exactly on UTC at all times, that means that there should be a difference between the DCU data and the log of +2 hours, or in other words the log times are +2 hours ahead of the DCU.
-This means that any DCU time referenced in the script should have added to it 7200, or any log time should be subtracted by 7200 which corresponds to 2 hours. Must do one, not both.
-Also, any time a python datetime, mktime, ctime, or any time function is used, make sure that if it converts to local time on the lxplus machine, to take that into account.
Times
|
UTC |
UTC(+2) |
Unix Time(UTC) |
Range in log |
9/28/2012 15:27:36 9/28/2012 19:18:48 |
9/28/2012 17:27:36 9/28/2012 21:18:48 |
1348846056 1348859928 |
Range in root file name |
9/28/2012 13:00:00 9/28/2012 20:00:00 |
9/28/2012 15:00:00 9/28/2012 22:00:00 |
1348837200 1348862400 |
|
|
|
|
Python module behavior
Module |
Behavior |
ctime |
takes a unix time stamp and converts it to local time (UTC+2) in string format |
fromtimestamp |
takes a unix time stamp and converts it to local time (UTC+2) in |
mktime |
takes a timetuple object and converts it to local time in unix time. (UTC+2) |
timetuple |
converts to timetuple mode and local time (UTC+2) |
gmtime |
takes unix time and converts it to UTC time |
Fixes
-In the beginning of the script, when it says "covering time1 to time2", time1 and time2 are taken from the filename and converted to a string using ctime. This is UTC+2, so I added "(UTC+2)".
-I noticed that the times on the x-axis of the output root file go from 28-08hr - 28-17hr. Looking into this.
UPDATE(6/25/2014)
-no longer need to do this, needed in some different format now.
5/20/2014
-going to take a look at avery's code and see what can be done about daylight savings
5/23/2014
-for future reference: if you look at a file like
TabulatedOutput.txt in windows, even with word wrap off, it will not display correctly, as the '\n' in windows should actually be '\r\n' and is not converted as such when run in a linux environment in lxplus. View it with wordwrap off in vi in lxplus and it's fine.
-trying to fix the apparent offset of the voltage and the leakage current graphs. Looks like they are offset by a few hours.
5/27/2014
-looking at the rest of the entries in the output root file, it looks like the plots vary in terms of how much data for leakage current there is, and how much the time sync differs.
-need to look to see if the x axis is properly in range, double check to see about the time differences, and see if there may be a better way to do it. Make sure that the datetime module in python doesn't automatically use the time of the lxplus machine, where it could mess up any differences
5/28/2014
-datetime and time modules such as mktime,ctime,fromtimestamp use the local time of the machine
-tried taking all time subtractions or additions off, it still had differences
-tried only subtracting times made by ctime or mktime and in the log, still had differences
-the x-axis shows a range from 08:00 - 17:00, but the run is from 13:00 - 20:00 UTC, not sure what the problem is
6/3/2014
-attempting to add an ntuple of the information that would be found in Output.txt in the .root file generated
-successfully added ntuple to each detID directory, not sure if
AvgILeak or
TimeStamp is correct
6/9/2014
-tried to sort through all the python modules that converted to local time and fix them, however this has made the problem worse so far.
-seeing if I can target just a few points to change the times that would fix the overall error
-in the ntuples, need to have multiple avgIleak leaves for each time reported in output.txt
6/10/2014
-ntuple filled correct info, just need to make it fill branches for each voltage step. *It now fills for each voltage step, but wrong data
-fixed daylight savings time for some plots, but it breaks others.
6/17/2014
-tried to edit the gROOT.processline to add branches based off of the voltage step. This worked, have to fill correct info now.
-variables are stored in a struct object, not sure how to access in python
6/18/2014
-tried to find a way to loop through the variables in the struct, nothing so far
6/19/2014
-ntuple now fills correct data! The fix was instead of using the non-iterable form: object.variable = value, I used the setattr(object,'varname',value) function. This is equivalent to the other form, but allows me to set the variable based on a string, very iterable.
-there seems to be a lot of data points missing in the ntuple when compared to the root file. Might have a short voltage list.
6/20/2014
-ntuple works perfectly for TIBminus_1_4_2_5, but starts missing data points on TIBplus_1_6_2_5, and on all others. Also, the output.txt seems to have the same aliases for detIDs, but different detID's associated with those aliases.
-so the data I'm putting in the leaves isn't the same as the data in output.txt, strange
6/24/2014
-don't need to do daylight savings time fix now, the current format is no longer needed so I'm done I suppose.
Noise Bias Scan
Introduction to Noise Bias Scan
This is now documented in a new page: https://twiki.cern.ch/twiki/bin/view/Sandbox/DylanNoiseBiasScan
Misc. Learning Stuff
Learning ROOT
USEFUL EXAMPLES:
http://root.cern.ch/root/html/tutorials/
http://users.physics.harvard.edu/~kblack/docs/root_tutorial2006.pdf
HISTOGRAM EXAMPLES:http://www.desy.de/~gbrandt/root/mkraemer.pdf
GENERAL ROOT INFO:http://www.pp.rhul.ac.uk/~cowan/RootTutorial/ROOTtutorial.pdf
ftp://root.cern.ch/root/doc/11InputOutput.pdf
TREES AND NTUPLES:http://wlav.web.cern.ch/wlav/pyroot/tpytree.html
http://root.cern.ch/root/html/tutorials/
http://www.linksceem.eu/ls2/images/stories/ROOT_Day1.pdf
(basics)
http://www.linksceem.eu/ls2/images/stories/ROOT_Day2.pdf
(programming)
http://www.linksceem.eu/ls2/images/stories/ROOT_Day3.pdf
(fitting)
http://www.linksceem.eu/ls2/images/stories/ROOT_Day4.pdf
(trees)
http://www.linksceem.eu/ls2/images/stories/ROOT_Day5.pdf
(advanced)
Learning Roofit
-
http://root.cern.ch/root/html532/tutorials/roofit/index.html
-
http://roofit.sourceforge.net/docs/tutorial/
-
http://roofit.sourceforge.net/docs/classref/ClassIndex.html
(list of classes and methods)
Bash Script
-script files need to be saved as .sh
-./ to execute
Tutorial:http://tldp.org/HOWTO/Bash-Prog-Intro-HOWTO.html#toc6
Commands:http://ss64.com/bash/
CMSSW stuff
-using "expand.py" on a script will give information on variables pulled in from other scripts, files, etc. and the locations of such, letting one find the location where a certain variable can be edited, etc.
Config filesEDanalyzer workings
-The main thing to know about working with
AlCaReco is the way the config files work. The purpose of the config file is to name a few parameters for the cmsRun command, which is the general run command for all of CMSSW. Namely, it sets:
- a process object that you can name yourself
- the source files for that object
- the analyzer/producer/filter/output modules
- the order of which the different modules are run
Anything else I've done with these files has been modifying them to change the output file name and sections of events to run on.
Analyzing an online file without downloading it
-pretty much all the info you need for this is here:
https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookXrootdService -follow the steps and you should be fine, the only thing I did differently was in my proxy I used this command:
- voms-proxy-init -voms cms -hours 24 -valid=24:0
this validates it for 24 hours.
REALLY helpful grid cert stuff
go here:
https://ca.cern.ch/ca/Help/?kbid=024010
be sure to use FIREFOX. It is much easier in firefox, as a page appears that wouldn't otherwise.
More info here:
https://twiki.cern.ch/twiki/bin/view/CMSPublic/WorkBookStartingGrid#BasicGrid
Useful commands I've come across
which command - replace "command" with any bash command to see if it is an alias
echo $variable - replace "variable" with an environment variable to output what the variable is set to
du -h - "disk usage". use to see what is taking up disk space. Info
here.
df -h - "disk free". reports the space used and free on all filesystems
ln -s [target dir] [name of link] - "link, symbolic" this makes a symbolic link to any directory
tar zxvf fileNameHere.tgz - unpack a .tgz file
zip zipfoldername * -puts all the files in a directory into a zip folder
export ENVVARIABLE= new_value - sets the chosen environment variable to the new value
for vi: ":set nowrap" and ":set wrap" - turns wordwrap on and off
fs lq - tells you how much your quota is in the current directory and how much you've used
find
NoiseBiasScan -type d -exec fs sa -acl gbenelli rl -dir {} \; - set permissions for one person to read files in a dir