2 July 2012

starting this page to keep track of my progresses, failures and attempts. I will also try to list the useful commands I will run into. Let's see how long I can keep this going..

Anyway:

Analysis status

Data and official MC running. A lot of problems with DoubleElectrons, investigating.. From status they are "Cleaned", maybe the proxy expired (darn, did not receive email). Tried several things:

  • resubmit: Job #1 last action was Cleaned actual status is Created must be killed (-kill) before resubmission

  • killing: crab:  Not possible to kill Job #1 : Last action was: Cleaned Status is Created

  • submitting: crab:  asking to submit 1 jobs, but only 0 left: submitting those

  • forceResubmitting: it resubmitted, but no effect in the end (jobs go in submitted state but then they fall back in the same status)

Sent request to HN, but no reply in ~1h. From google it seems that you have to redo everything AGAIN...

Just got a reply, let's see if/how we can patch it..

Asked Giacinto how to run with CRAB on local T2, waiting.

DQM

Things are stuck, waiting for Lars to complete twiki and Guler to try the validation process to find the bugs. Pinged today.

Things left to do for me:

  • Fixing bugs that Guler finds
  • performing some quality checks
  • when we have Ztt try (w/ Riccardo?) to have easier implementation for Zee
  • Render Plugins (need Marco's input, therefore the twiki)

3 July 2012

CRAB finally granted me some CPU time and the jobs run really fast. TTbar was completed in less than a day.

CRAB feedback provided feedback (wow!). Sadly, all the jobs not published are lost, but I can save those ones. I needed to rerun everything from creation (in a different directory). After creating I needed to check that the job splitting was the same as previous production (to avoid overlaps). (trick) I looked for a diff in ui_working_dir/share/arguments.xml . No difference was found. Now jobs are running well. Let's see if I run into troubles at publishing step.

Not so luchy DuobleE_B, that had only 40jobs done, non published. I removed the outputs and reran from scratch.

Giacinto replied telling me that everyone is fine with CRAB 2.7.7 on pbs. I'll give it a try.

Merged PATtuplizer for uu and ee, is raising some exception:

----- Begin Fatal Exception 03-Jul-2012 15:49:04 CEST-----------------------
An exception of category 'Configuration' occurred while
   [0] Constructing the EventProcessor
   [1] Constructing module: class=PoolOutputModule label='out'
Exception Message:
EventSelector::init, An OutputModule is using SelectEvents
to request a trigger name that does not exist
The unknown trigger name is: muSelPath
----- End Fatal Exception ------------------------------------------------- 

investigating... Nothing special, it tured out that the path I was looking for was not in the schedule (I've put it , but then I overwritten the schedule)

4 July 2012

125GeV resonance presentation!! I still not call it Higgs, first I want to see JPC.

  • the jobs are stuck in the usual 85% done the rest lost. Some jobs appear "Submitting" even if they are done and the output is in the SE. -.- I sent an email for help to crab support, let's hope

  • On the side of running crab locally 2.7 does not cope with optionParser, I contacted Maggi, who is helping me. CRAB 2.8 works fine in creation but in submission is crashing, M.Maggi is debugging.

  • Given Giacinto the list of 2011 zhmuu/ee production to be deleted. 2l2tauPatttuples will be moved to castor or ZH.

  • DQMGui: asked Marco where to find stdout to help debugging of new RenderPlugin

5-6 July 2012

Tried in sending crab jobs and publish on local queues in several ways, lots of problems. Finally seems to have found a way.

General Settings: .bashrc

user's .bashrc needs to be modified in order to run logincms_slc5.sh when logging in even in PBS. Add these lines:

if [ -z $PBS_ENVIRONMENT ]; then
  source ~/logincms_slc5.sh

  ###############################
  #
  export LHOME=/lustre/home/mverzett
  export STORE=/lustre/cms/store/user/mverzett
fi
###############################################################################
# NOTE: the env. variable $PBS_ENVIRONMENT gets initialized
#            at job startup; the following code makes the job write
#            on a temporary area on the wn (which is good when running 
#            many jobs ); but then you have to set on the crab.cfg 
#            copy_data=1; the output file will be copied on users dir of SE
#            (otherwise would be lost!)
#            [e.g /lustre/cms/store/user/$USER]
#
# the variable is different if running in batch or interactive
if [ "$PBS_ENVIRONMENT" == "PBS_BATCH" ]
    then
    echo "running in BATCH"
    mkdir -p /home/tmp/$USER/$PBS_JOBID
    export HOME=/home/tmp/$USER/$PBS_JOBID
    cd
    export PBS_O_WORKDIR=$HOME
else
#    PBS_INTERACTIVE case: if logincms_slc5.sh is sourced scp from cmssusy gets IMPOSSIBLE!
#    source ~/logincms_slc5.sh
fi

General Settings: logincms_slc5.sh
then correctly set logincms_slc5.sh:

export SCRAM_ARCH=slc5_amd64_gcc434
source /opt/exp_soft/cms/cmsset_default.sh
source /afs/cern.ch/cms/LCG/LCG-2/UI/cms_ui_env_3_2.sh 
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/exp_soft/crab/pbs_python-3.5.0/lib
# source /opt/exp_soft/crab/CRAB_2_7_2_p1/crab.sh
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib
### Cernlib 
# export CERN=/afs/in2p3.fr/cernlib/i386_linux24/pro/lib
# export CERN=/opt/exp_soft/CERN/2006/slc4_ia32_gcc4/lib/
export CERN=/opt/exp_soft/CERN/x86_64/2006b/x86_64-slc5-gcc41-opt/lib
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/opt/exp_soft/CERN/x86_64/2006b/x86_64-slc5-gcc41-opt/lib
source /cmshome/mmaggi/CRAB/CRAB_2_8_1/crab.sh 
export CVSROOT=':gserver:cmscvs.cern.ch:/cvs_server/repositories/CMSSW'

Watch out! you really need CRAB custom version saved in /cmshome/mmaggi/CRAB/, two versions are available 2.7.7 and 2.8.1. The first one has a bug that prevents you publishing a dataset produced with a cfg that exploits option parsing.

Crab Settings

Now, last step is to correctly set your crab cfg. The following settings are tested on CRAB 2.8.1 tuned for publishing a dataset.

Modify your [CRAB] section:

[CRAB]

jobtype = cmssw
scheduler = pbs
use_server=0

Add this two sections to your cfg

[PBS]
queue = local

[GRID]
#
## RB/WMS management:
rb = CERN

Set the correct values for the [USER] section

[USER]

copy_data =   1
publish_data =  1
check_user_remote_dir =   0
dbs_url_for_publication =   https://cmsdbsprod.cern.ch:8443/cms_dbs_ph_analysis_02_writer/servlet/DBSServlet
storage_element = se-01.ba.infn.it
return_data =   0
storage_port = 8444
email =   mauro.verzetti@cern.ch
storage_path = /lustre/cms/store/
user_remote_dir = user/

May errors can happen if the settings are not correctly done:

  • If storage_element is set to T2_IT_Bari there will be no job output, the log file will tell that it was impossible to create the directory in the desired path due to permission denial.
  • if storage_path and user_remote_dir are not set there will be no job output, the log file will tell that it was impossible to create the directory in the desired path due to permission denial and the path will be wrong (instead of starting with /lustre will start with /cms, so obviously the directory could not be created)
  • if storage_path is set to /lustre/cms/store/user/USERNAME and user_remote_dir set to SOME/DIR crab will use it as root for pubblication, leading to extremely unpleasant and long file paths (e.g. /lustre/cms/store/user/mverzett/DoubleMu/DoubleMu_Run2012A-PromptReco-v1_PAT_v1/mverzett/DoubleMu/DoubleMu_Run2012A-PromptReco-v1_PAT_v1/)

other error was found later:

ERROR: Couldn't find valid credentials to generate a proxy.

It turned out that the .pem files had not the correct permissions (maybe Giacinto's mistake). Corrected back:

chmod 644 usercert.pem
chmod 400 userkey.pem 

DQM:

It turns out that drawopt is protected by a regex match of the kind [a-zA-Z ] therefore no meaningful regex can be passed to the plugin, and not even E0 is accepted as option! According to them the only interaction between layout and plugin happens through the histo name. I sent an email explaining the need of an additional free string for layout-plugin interaction. No answer in 2 days, I tried to look into the source code. All the classes and code is scattered and I could not guess the dir structure. Called Edgar Rosales, he does not know. Marco still not responding. A possible solution might be to write the exact discriminator's names in drawopt separated by space. This option is not desirable both for us (boring, long and more error prone) and them (improper risky use of the parameter).

9 July 2012

Most of the jobs submitted on friday went bad, they crashed and/or I don't get both the .stdout and the .xml file. After looking for the reason I found out that crab decided to create all the job dirs into my lustre home and did not delete them! Of course the media got filled quite soon leading to a lot of errors. Even more painful: some taskdir still had the .stdout and .xml file inside them. I tried to run a script to recover them, but I found that anyway most of the jobs were gone due to lack of disk space. Giacinto said the the disk space could be solved with some command in the .bash_profile, which I already had on my .bashrc. To avoid any other problem I decided to move to old plain batch queues adding to my cfg some printout of the important configuration in order to track completely the source and the options used to create the output file.

10 July 2012

Sickness. Most of the jobs went fine. At the runtime some input files seem corrupted, but when looking at them everything is fine, moreover some output files are corrupted too (random root error, not closed, no keys, screwed streamer), most probably the FS/stager had some problem or is a bit overloaded in this period. Made a script to get provenance info seems to work. Resubmitted everything except TT.

DQM The talk went smooth without problems. Kaori want to follow some stuff of the GUI offline, I'll send her an email asking what she wants to know.

11-12 July 2012

Analysis:

First round of pattuples ready. Unfortunately I found out two flaws:

  • All my objects were from cleanPatObj, with default cleaning values: it means that all objects below 0.3 were gone, not exactly what I want. Now all my collections are from the selected step, so no worry about it. I will perform the cleaning later according to the channel. A good idea may be to put in LigtSkim to make them lighter.
  • I was keeping all the taus passing VLoose, which is still the loosest WP, the only problem is that not all the taus passing LooseMVA (in principle tighter) are passing VLoose too. Or at least no one knows. Now the cut is: tauID("decayModeFinding") > 0.5 && ( tauID("byVLooseCombinedIsolationDeltaBetaCorr") > 0.5 || tauID("byLooseIsolationMVA") > 0.5 ). So it keeps both DBStream and MVA. Of course the stupid monkey's descendant who is writing shoud REMEMBER TO CHOOSE one WP somehow downstream

All the jobs were resubmitted.

(command) A way to delete all your jobs in batch: qselect -u $USER | xargs qdel

DQM:

I finally found a way to work around the GUI code and display my histos overlaid. It's a sum of several tricks:

  • All my 1D histos are collapsed into one 2D histo for observable (1 for pt, 1 for eta etc..). In each Y bin slice there is one TH1, the label of the bin is the discriminator name
  • The histo is passed to the RenderPlugin
  • The layout passes as "drawOption" the names of the discriminators to display (in the end is an option no?)
  • The RP picks the histo and the drawOpt, it deletes the string to avoid root to crash due to a command ->Draw(drawOpt) and does nothing else in the preDraw
  • In the postDraw the drawOpt is picked from another source that does not interfere with root and is split according to spaces.
  • Each name is looked up in the TH2 y axis and picked with a projection. Each projection name is different to avoid stupid root feature.
  • In case of overlay? That's tricky. To RP is also passed the TCanvas on which everything is already drawn as 2D (is post_Draw, remember?). Now from the TCanvas you can get all the pointers of the objects that are drawn there. Filter with dynamic_cast and compare to address of your test plot (that you have _"officially") to reject it and avoid double plotting.

That's it! Simple drawing is tested to work, the overlay not yet. I need a new set of RAW and castor takes ages to stage out. I've learnt a nice trick: Since I only care of one Tau histo for testing and DQM sequence is slow due to running on RAW I found a way to replace a histo with another one. So I could run on 200 Evts and even put a MC plot in the data ones! (is just for testing the display technique). So here is the (trick):

test    = TFile("guineaPig.root","update") #testFile --> To be modified
ref     = TFile("Validation/RecoTau/test/TauID/ZMM_recoFiles+PFTau_START52_V9_All/TauVal_CMSSW_5_2_5_ZMM.root") #where you take the other histo
testDir = test.Get("DQMData/Run 196349/RecoTauV/Run summary") #Dir in which lies the histo to be substituted
refH    = ref.Get("DQMData/RecoTauV/hpsPFTauProducerZMM_EffMappt") #Histo that you want to put insted
test.cd("DQMData/Run 196349/RecoTauV/Run summary") #Go to that dir
gDirectory.Delete("hpsPFTauProducerRealMuonsData_EffMappt;1") #delete the histo: Waring ';1' is important!
testH = refH.Clone("hpsPFTauProducerRealMuonsData_EffMappt") #make a new one (necessary? boh..), with the name of the previous one
testH.Write() #write it!

Watch out! you have first to delete then to write, otherwise you will have the new histo saved as NAME;2, ant that may be a problem.

13 July 2012

Sickness. Running went fine except the electron trigger path was screwed up. Resubmitted everything.

16-17 July 2012

Analysis:

Babysitting of jobs:

DQM:

My fancy trick works for displaying the histograms, but fails in overlay: in overlay mode the reference plot is suppressed for TH2, so no way to get it from the canvas. In "On side" mode it seems that I have no access to the other pad. I tried asking gROOT/gDirectory->GetListOfKeys() to get all the instantiated vars, but I get only crashes. The only person that seems to know the answer is Marco, who is on holiday. Waiting to see if someone else replies.

GIT Tricks:

install from someone else' repo

git clone http:://address/of/the/repo

To be able to commit your changes.

First fork the project on your personal account then =git remote add ALIAS http:://address/of/your/repo=

To stage for committment your changes

git add filename

Commit

git commit a text editor will open (vi default) to write the log file of the commitment. It MUST NOT be empty. This command commits the changes to the LOCAL OFFLINE repository, not the online (like github)

To change default text editor

(I prefer emacs) git config --global core.editor "emacs -nw"

To commit changes to the online repo

git push ALIAS

Show available repository aliases

git remote show

Diff with file in repo

git diff filename

Git refuses to push the changes

It may be due to the fact that the changes were committed in no branch. This can be easily seen with:

[mverzett@login06 plotting]$ git branch     
* (no branch)
  master
  show

In this case you have to act like this:

#creates a new branch containing your changes
git checkout -b my_changes
#check that your changes were committed into this branch
git log
#move to the branch you want to put your changes
git checkout master
#merge the changes
git merge my_changes

ROOT Tricks:

replace an element into a TFile
test    = TFile("guineaPig.root","update") #testFile --> To be modified
ref     = TFile("Validation/RecoTau/test/TauID/ZMM_recoFiles+PFTau_START52_V9_All/TauVal_CMSSW_5_2_5_ZMM.root") #where you take the other histo
testDir = test.Get("DQMData/Run 196349/RecoTauV/Run summary") #Dir in which lies the histo to be substituted
refH    = ref.Get("DQMData/RecoTauV/hpsPFTauProducerZMM_EffMappt") #Histo that you want to put insted
test.cd("DQMData/Run 196349/RecoTauV/Run summary") #Go to that dir
gDirectory.Delete("hpsPFTauProducerRealMuonsData_EffMappt;1") #delete the histo: Waring ';1' is important!
testH = refH.Clone("hpsPFTauProducerRealMuonsData_EffMappt") #make a new one (necessary? boh..), with the name of the previous one
testH.Write() #write it!

CONDOR Tricks:

make afs based releases accessible to nodes:

You get an error saying that the node cannot list some afs-based dir. Even before you get a warning from farmountAnalysisJobs saying something similar.

Solution:

fs setacl -dir /afs/hep.wisc.edu/cms/mverzett -acl condor-hosts l
fs setacl -dir /afs -acl condor-hosts l
find /afs/hep.wisc.edu/cms/mverzett/vhanalysis -type d -exec fs setacl -dir '{}' -acl condor-hosts rl \;

resubmit dead jobs:

grep  -lir ERR /scratch/mverzett/2012-10-18-8TeV-v1-Higgs/*/dags/dag.status | sed -e "s|status|rescue001|" | xargs -I{} -n 1 farmoutAnalysisJobs --rescue-dag-file={}

FSA Tricks:

python cannot acces some modules that are there:

this happes because FSA is symlinked into src/ and not really there (is a dependency of UWHiggs). Try running $fsa/recipe/symlink_python.sh then from src/ scram b python

-- MauroVerzetti - 02-Jul-2012

Edit | Attach | Watch | Print version | History: r12 < r11 < r10 < r9 < r8 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r12 - 2012-11-22 - MauroVerzetti
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback