Caroline's online analysis logbook

To do:

  • DC5 efficiency for many 2015 runs (at least one for each subperiod) from BW
  • Complete gitlab setup (escalade). Wait until coral is integrated.
  • How does actually and phastjob.csh work? (lxplus)
  • Install root on work laptop.

  • Understand why have both jobs 0.81 s/event in coral, but one was so much faster? (September 2016)
    • 10 30h
    • 5h 30min

  • Check for PICK things to be fixed on this page.

Blue Waters logs

10x32 ( in more detail. Executed 33 commands on 31 minions for each of the following:

Duration [s] Application utime [s] stime [s] Rss inblocks outblocks
27622.453973 48521523 872457 214 403620 4083882 270973
37468.043909 48521520 1184396 213 408992 4002444 315629
37856.662393 48521462 1193732 229 410780 4005015 322550
39320.313733 48521458 1237791 230 407776 4057131 325569
42977.732554 48521526 1366276 314 411276 13579758 1346944
47150.122060 48521450 1495926 1274 412136 76887662 7878304
47649.446918 48521384 1514914 1063 413432 76888090 7942070
57461.700123 48521528 1823220 1530 411532 77186988 8055250
61907.220122 48521453 1963781 1565 413564 76889373 8157261
64911.669264 48521457 2054588 1526 412500 76889043 8040033

  • aprun -n 32  ./pcp ${Period}_$i_${Nodes}times${Length}.cmdlist & ++ wait ++ sleep 100 with MyVariableDC5Eff_trafdic.2015.opt, ppn=32:xe
  • qsub example: qsub MyVariable_10times32.pbs -N 10times32 -q high -l walltime=20:00:00 -l nodes=10:ppn=32:xe

name nodes walltime Priority JobID Start Stop Duration OK-chunks time/OK-chunk Remark charge factor Resources (1 line for each node)
100x28 100 24h high Sep 15, 00:07 Sep 15, 14:48 14h 41min = 881min 584 1.5 min sleep 50 2,200.458 utime ~1355140s, stime ~372s, Rss ~410200, inblocks ~23125358, outblocks ~2363160
10x32 10 20h high Sep 14, 15:11 Sep 15, 9:19 18h 8min = 1088min 169 6.4 min   203.866 utime ~2054588s, stime ~1526s, Rss ~412500, inblocks ~76889043, outblocks ~8040033
10x28 10 20h high Sep 14, 13:06 Sep 15, 2:10 13h 4min = 784min 140 5.6 min   196.104 utime ~1440516s, stime ~849s, Rss ~412236, inblocks ~67284929, outblocks ~7016528
10x16 10 20h high Sep 14, 10:54 Sep 15, 1:20 14h 26min = 866min 160 5.4 min   216.471 utime ~1403370s, stime ~447s, Rss ~413336, inblocks ~38447619, outblocks ~3976294
1x32 1 20h high Sep 14, 16:47 Sep 15, 06:02 13h 15min = 795min 32 24.8 min   19.877 utime ~1516093s, stime ~1064s, Rss ~413444, inblocks ~76887548, outblocks ~7940338
1x28 1 20h high Sep 14, 17:32 Sep 15, 5:56 12h 24min = 744min 28 26.6 min   18.601 utime ~1419903s, stime ~827s, Rss ~413440, inblocks ~67278250, outblocks ~7090695
1x16 1 20h high Sep 14, 18:47 Sep 15, 07:01 12h 14min = 734min 16 45.9 min   18.351 utime ~1406104s, stime ~467s, Rss ~413312, inblocks ~38447131, outblocks ~3975636
1x1 1 15h high Sep 14, 19:20 Sep 15, 06:09 10h 49min = 604min 1 604.0 min   16.23 utime ~1246126s, stime ~62s, Rss ~408636, inblocks ~2409599, outblocks ~279080

(some of the jobs below appear also in the new table above)

aprun number of chunks qsub JobID start stop duration total charge criedl Remark
aprun -n 32 ./pcp ${Period}_$i_${Nodes}times${Length}.cmdlist & ++ wait ++ sleep 100 with MyVariableDC5Eff_trafdic.2015.opt 10x28-15nodes qsub MyVariable_10times28-15nodes.pbs -N 10times28-15nodes -q high -l walltime=20:00:00 -l nodes= 15 :ppn=32:xe Sep 14 - - - job deleted because mysql overloaded. The 15 nodes anyway makes no sense.
aprun -n 32 ./pcp ${Period}_$i_${Nodes}times${Length}.cmdlist & ++ wait ++ sleep 100 with MyVariableDC5Eff_trafdic.2015.opt 10x28 qsub MyVariable_10times28.pbs -N 10times28 -q high -l walltime=20:00:00 -l nodes=10:ppn=32:xe Sep 14, X Sep XXX     1st job killed because walltime more than 14h needed. 2nd job deleted because mysql overloaded. 3rd job: 140 files (1/2) are affected by A FATAL ERROR APPEARED, probably MySQL?
aprun -n 32 ./pcp ${Period}_$i_${Nodes}times${Length}.cmdlist & ++ wait ++ sleep100 with MyVariableDC5Eff_trafdic.2015.opt. 10x16 qsub MyVariable_10times16.pbs -N 10times16 -q high -l walltime=20:00:00 -l nodes=10:ppn=32:xe Sep 14, XX Sep XXX     1st job finished with 9 chunks: calibration data base overloaded without sleep 100 (id not shown) . 2nd job killed because walltime more than 14h needed. 3rd job deleted by me because mysql overload, re-started.
aprun -n 32 ./pcp Test16.cmdlist with MyVarDC5Eff_1times16-32_trafdic.2015.opt 1x16-32 qsub MyVar_Test_1times16-32.pbs -N 1times16 -q high -l walltime=14:00:00 -l nodes=1:ppn=32:xe Sep 13, 14:19 Sep 14, 2:50 12h 31min X  
aprun -n 16 ./pcp Test16.cmdlist with MyVarDC5Eff_1times16_trafdic.2015.opt 1x16 qsub MyVar_Test_1times16.pbs -N 1times16 -q high -l walltime=14:00:00 -l nodes=1:ppn=32:xe Sep 13, 14:43 Sep 13, XX:XX Xh XXmin X walltime more than 14h needed - 1st job deleted
aprun -n 28 ./pcp Test28.cmdlist with MyVarDC5Eff_trafdic.2015.opt 1x28 qsub MyVar_Test28.pbs -q high -l walltime=12:00:00 -l nodes=1:ppn=32:xe Sep 12, 21:45 (Sep 13, 9:52) killed after 12h after 12 files had finished 13.738  
aprun -n 1 .coral.exe MyDC5Eff_trafdic.2015_260061_11002.opt with nodes=1: ppn=4:xk 1x1 qsub My_xk_packed_260061_11002.pbs -l walltime=24:00:00,mem=4gb     5h 27min 0  
aprun -n 1 .coral.exe MyDC5Eff_trafdic.2015_260061_11003.opt with nodes=1: ppn=4:xk 1x1 qsub My_xk_packed_260061_11003.pbs -l walltime=24:00:00,mem=4gb     4h 5min 0  
aprun -n 1 .coral.exe MyDC5Eff_trafdic.2015_260061_11004.opt with nodes=1: ppn=4:xk 1x1 qsub My_xk_packed_260061_11004.pbs -l walltime=24:00:00,mem=4gb     4h 7min 0  
aprun -n 1 .coral.exe MyDC5Eff_trafdic.2015_260061_11002.opt with nodes=1:ppn=1:xk 1x1 qsub My_xk_packed.pbs -l walltime=24:00:00, mem=4gb     10h 30min 0  
aprun -n 1 .coral.exe MyDC5Eff_trafdic.2015_260061_11002.opt with nodes=1:ppn=1:xk 1x1 qsub My_xk_packed.pbs -l walltime=24:00:00     10h 30min 0  

October 2016

  • Run with MySQL database on grid node. First, try to run 2 databases on 2 nodes.
  • Change to "official production setup" for dy15W12t3-BW: Elena's option file and all other settings from her production directory; coral version, etc...

October 17/18, 2016: run MySQL database on grid node

  • See exchange between Marco and Robert Brunner.
  • Set up MySQL database on BW in improved way. There were issues with permissions when running it on the nid node. Set it up from scratch, using more compiler options that specify the location of the mysql.sock file.
  • Tests with "new" MySQL database running on login node: 1times1 OK, both started from login node and from a nid node (interactive session)
  • Run database on interactive session on nid node.

  • Now run database on 1 CPU on nid node and use 1 other CPU of same node to run coral:
    • qsub MyVariable_MySQL-on-nid_1times1.pbs -N 1times1-mysql-nid -l walltime=15:00:00 -l nodes=2:ppn=32:xe [] Still, ~MySQL database on login node is used. Because server is hardcoded in trafdic.
    • Create trafdic.2015.BW.opt with variable CDB server: trafdic.2015.BW-CDB-var.opt
//CDB server h2ologin1    // for BW
CDB server $MYSQLHOST // for variable hosts on BW

`echo hostname` > mysqld.hostname
export MYSQLHOST=`cat mysqld.hostname`
aprun -n 1 $MYSQL/bin/mysql.server start --defaults-file=$MYSQL/etc/my.cnf --socket $MYSQL/mysql.sock &
aprun -n 31 ./pcp ${Period}_${i}_${Nodes}times${Length}.cmdlist &

October 14-16, 2016: copy further data between CERN and BW

  • For the first time also from BW to CERN with FTS3. See Twiki for details.

October 12, 2016: run coral on BW w/o MySQL database

  • Need to do for all detector views and runs covered (!):
cd $CORAL/src/condb/mysqldb/Utils/ 
./getDBFilePath -r 260693 DC01X1
  • Need: run list; list of detector views
  • Then the idea would be to execute ./getDBFilePath in a loop for all runs and detector views and store the calibration files in a dedicated directory (can be group or user space for the time being)
  • Then at the beginning of coral execution, link all those files to the execution directory and ...
  • Project on hold for the time being.

September 2016

September 12, 2016: set up parallel command processor on BW ("pcp code")

  • How to avoid ANSI escape codes when creating a chunk list: ls -f /scratch/sciteam/criedl/DATA/dy15W12-raw/ |grep cdr*| head -16 > Test_16.chunklist (they appear for me on BW because I have in my .bashrc : alias ls="ls --color=always")

September 9/10, 2016: run phast on BW

  • cd /u/sciteam/criedl/COMPASS/coral_svn/trunk/src/condb/mysqldb/Utils/
  • make
  • MyRun/Phast/2015-W08
  • ./phast -h MyOutput-2015-P02_DC05Y.11004-260061_U2117.root -u11 -U2117 -T DC05Y DC5/2016-09-06/phast_mem4gb.root :
    • "** mysqlDBEnv: Host computer name "h2ologin2" unknown, not matching either of: "lxplus", "pcco", "compass", "ccage""
bool mysqlDBEnv(const char *&server, const char *&env, int &port)
  const char *hosts[] =   {"lxplus",    "pcco",    "compass",          "ccage",       "h2ologin1"};
  const char *servers[] = {"wwwcompass","pccodb00","","cccompassdb", "h2ologin1"};
  const char *envs[] =    {0,           "DAQ",     "GRIDKA",           "LYON",        "BLUEWATERS"};
  const int ports[] =     {0,           0,         0,                  23306,         0};
      • Recompile phast libraries and phast&coral with usual procedure (see BW Twiki). OK then.
    • Interactive mode: Real time = 6.14613 [ms] per event, CPU time (utime+stime) = 5.83226 [ms] per event (33517 events)
  • Batch job: qsub MyPhast_xk_packed_260061_11002.pbs  -N Phast_DC5Y_26006_11002 -l walltime=00:00:10
    • PICK Host computer name "nid12351" unknown. I can't add all the nid computers to UserEvent11. Therefore I have two options:
      • run interactively only
      • or copy / link calibration files in directory with proper naming convention. DONE success!
  • Process all three chunks (from Sept. 9) & extract plots:
    ./phast -h 2016-09-10/MyOutput-2015-P02_DC05Y.11002_11004-260061_U2117.root -u11 -U2117 -T DC05Y DC5/2016-09-09/phast_cdr1100*-260061_mem4GB_4cpus.root

September 6-9, 2016: testing coral on BW

  • coral is slower on BW (0.81 sec/event) compared to lxplus (0.57 sec/event)!
  • Try 4GB (same result): (what is the default memory assignment?)
    • qsub My_xk_packed.pbs -l walltime=24:00:00,mem=4gb [] 10.5h
  • 1 node, 4 CPUs, mem=4GB, xk, walltime=24h . The 4 CPUs seem to make a difference!
    • qsub My_xk_packed_260061_11002.pbs -l walltime=24:00:00,mem=4gb [] 5h 27min
    • qsub My_xk_packed_260061_11003.pbs -l walltime=24:00:00,mem=4gb [] 4h 5min
    • qsub My_xk_packed_260061_11004.pbs -l walltime=24:00:00,mem=4gb [] 4h 7min

September 1/2, 2016: testing coral on BW

  • MySQL data base is now accessible in batch mode, too (Marco fixed it)
  • Process 1 chunk W08/cdr11002-260061.raw for DC05Y efficiency
    • qsub My_xk_packed.pbs -l walltime=10:00:00 : job exceeded wallclock limit.
    • qsub My_xk_packed.pbs -l walltime=24:00:00 [] 10.5h

August 2016

August 24-31, 2016: DC5 efficiencies on BW

  • criedl@h2ologin1:~/COMPASS/MyRun/mDSTs> ./coral.exe TrafDic/MyDC5Eff_trafdic.2015.opt
  • W08: run 260061, chunks 11002, 11003, 11004
  • Interactively 50 events (chunk 11002): OK, coral output in Output/DC5/current/2016-08-25
  • Simple batch:
    • Study the different batch example scripts. Differences are not yet clear to me. I start with the method:
    • Start job from my user directory MyRun, have it cd into $JOBID directory on scratch, where the $JOBID.out is stored.
             mkdir -p /scratch/sciteam/$USER/$PBS_JOBID
             cd /scratch/sciteam/$USER/$PBS_JOBID
    • In option file, have as many general pathes as possible. No local link, otherwise you have to create it in the scratch directory.
    • Have job put output into user directory (for the time being). Will probably have to change to scratch for output at some point.

    • In directory aprun: qsub My_xk_packed.pbs
    •                 testjob          criedl                 0 Q normal

    • ppn=1 for only 1 chunk is enough. If I use the default of ppn=1024, the job is queued for a very long time.
    • What is PE (option -N)?
    • qsub -V -q M script (DESY grammar): -V is deprecated and queue M is not known... default is queue normal
    • qdel
    • Next submission, try to add in the batch script PBS_JOBNAME=DC5Y_26006_11002 (but isn't the jobname defined automatically...?). Or why is my job called "testjob"?
    • For loops over more than 1 chunk and other nice features, see eg. my RDdaql script...
    • Of course we will need a ManyRDs.csh script adapted to BW at the end of the day.

August 23/24/25, 2016: Continue to copy raw dy15W12 to BW

  • Contacted CERN IT about many failures and they had some tips.

August 16/17, 2016: DC5 efficiency on lxplus

  • phast execution works today, randomly...
  • Run coral over cdr11007-261520.raw - the run I had all the time (261645) was somehow bad and had no beam tracks. frown frown Interactive mode is promising (1000 events). Then the histograms he1DC05 (1D eff), he2DC05 (2D eff) and hRTDC05 (RT relation) are filled.
    • DONE Interactively: in MyRun/mDSTs ./coral.exe TrafDic/MyDC5Eff_trafdic.2015.opt
    • DONE Send batch with entire chunk: ./ManyRDs.csh -d Output/DC5/ TrafDic/MyDC5Eff_trafdic.2015.opt 1
  • Cross-check with Robert on P02 (W08): run 260061, chunks 11002, 11003, 11004
    • In MyRun/mDST: ./coral.exe TrafDic/MyDC5Eff_trafdic.2015-P02_DC05Y.opt
    • In MyRun/Phast-DC5/2015-P02-12: ./phast -h MyOutput_2117.root -u11 -U2117 -T DC05Y phast.root
      • Compare all four options from the phast option table below: Only difference is pVertex required or not. If yes, number of entries drops by a factor of 3 and efficiency is slightly higher (Y1: 77.5% instead of 75.3%)
    • DC05Y batch: ./ManyRDs.csh -d Output/DC5/P02/DC05Y -s 11002 TrafDic/MyDC5Eff_trafdic.2015-P02_DC05Y.opt 3
    • DC05V batch: ./ManyRDs.csh -d Output/DC5/P02/DC05V -s 11002 TrafDic/MyDC5Eff_trafdic.2015-P02_DC05V.opt 3
    • DC05U batch: ./ManyRDs.csh -d Output/DC5/P02/DC05U -s 11002 TrafDic/MyDC5Eff_trafdic.2015-P02_DC05U.opt 3
    • All coral jobs finished by the next morning. Details of running time see BW written logbook.
    • 1 coral output file (root) for 1 chunk, size ~100MB. Have 3 chunks for each operational detector view. Merge with hadd merge.root ntuple.0.root ntuple.1.root ntuple.2.root
      • Many Error in <TBufferFile::CheckByteCount>: object of class vector [...] read too few bytes: 6 instead of 8 Is this worrysome?
      • Merged file is huge, ~2GB GB, why? Compression not working?
    • ./phast -h MyOutput-2015-P02_DC05Y.11004-260061_U2117.root -u11 -U2117 -T DC05Y DC05Y/MyDC5Eff_trafdic.2015-P02_DC05Y.11004-260061.phast.root etc.
      • Should I better merge the phast output? file size = 1.6MB
      • hadd MyOutput-2015-P02_DC05Y.11002-11004-260061_U2117.root MyOutput-2015-P02_DC05Y.1100{2..4}-260061_U2117.root gives merged file with size of the sum of the three. But this merged file has some weird features (Y-axis screwed up etc.)
    • Best approach: run over all coral output files at the same time: ./phast -h MyOutput-2015-P02_DC05Y.11002_11004-260061_U2117.root -u11 -U2117 -T DC05Y DC05Y/MyDC5Eff_trafdic.2015-P02_DC05Y.*-260061.phast.root

    • It must be possible (and it will be necessary) to generalize this with options and scripts...:
sdiff -s MyDC5Eff_trafdic.2015-P02_DC05V.opt MyDC5Eff_trafdic.2015-P02_DC05U.opt
// DY-2015 data: DC05-V efficiency | // DY-2015 data: DC05-U efficiency
mDST hits DC05V // Output DC05V1/2 hits | mDST hits DC05U // Output DC05U1/2 hits
//TraF SmoothPos [0] 502.935 // Smooth @ (DC05U1+DC05U2) | TraF SmoothPos [0] 502.935 // Smooth @ (DC05U1+DC05U2)/2
TraF SmoothPos [0] 505.535 // Smooth @ (DC05V1+DC05V2)/2 | //TraF SmoothPos [0] 505.535 // Smooth @ (DC05V1+DC05V2)

bit hex meaning U34881 = 0x8841 U34885 = 0x8845 U2113 = 0x0841 U2117 = 0x0845
1 0x0001 disable all other options x x x x
2 0x0040 neighboring hits x x x x
3 0x0800 correct hit time x x x x
4 0x8000 use detector.dat to correct for misalignment x x    
5 0x0004 require pVertex in target   x   x

  • The calibration files do NOT have to be linked in the execution directory if the default ones are used! If none are given, phast will go into the MySQL data base.
  • A detector.dat in the execution directory is only needed if one wants to correct for misalignment.

August 10/11, 2016: continue with coral on BW and lxplus

  • DONE BW: Two improvements with environment: 1. can use again official as it was corrected. 2. Put all COMPASS analysis paths into .bashrc (as it should be) so that I do not need to run a separate script.
  • DONE BW: Check out same coral version as on lxplus (revision 14211). Do exactly the same with the ChipSinica files as on lxplus: activating the 2015 version.
  • DONE BW: Now follow steps as described here: to compile coral.
  • DONE BW in MyRun, create Input directory. Move input for DC5 into Input directory, e.g. Input/DC5-2015-P04-1
  • BW: Copy the Trafdic file I worked on in August 2 to the TrafDic directory: MyDC5Eff_trafdic.2015.opt. This will be the new master file. PICK The special tracking options from trafdic.DC05.DCmode.opt still need to be implemented. (? or continue to have 2 files?)
  • lxplus: pick up thread from June 28/29. Goal is to run coral & phast over one P4-1 (W10) chunk, W10/cdr14112-261645.raw , exclusing but writing the hits of DC05-Y.
    • DONE Write improved option file MyDC5Eff_trafdic.2015.opt which calls trafdic.2015.opt . DONE Use correct and latest detectors.dat and TraF SmoothPos.
    • DONE .
    • DONE Interactively: in MyRun/mDSTs ./coral.exe TrafDic/MyDC5Eff_trafdic.2015.opt "50 events" (those were requested in the option file)
    • DONE Send batch: ./ManyRDs.csh -d Output/DC5/ TrafDic/MyDC5Eff_trafdic.2015.opt 1 "number of events: 68575, coraljob: coral successful"
    • DONE Output was requested in MyRun/mDSTs/Output/DC5. mkdir 2016-08-10 and link current to it. In MyRun/Phast-DC5/2015-P04-1, current then points to the latest input for phast.
    • DONE In MyRun/Phast-DC5/2015-P04-1, link DetectorsDat/2015/current/detectors.261513.transv.dat to detectors.dat .
    • In MyRun/Phast-DC5/2015-P04-1, ./phast -h MyOutput.root -u11 -U34881 -T DC05Y current/MyDC5Eff_trafdic.2015.14112-261645.phast.root
      • PICK It seems [hast does not find the proper RT calibration files in this directory. I retrieve the proper files again for run 261645 (w/ getDBFilePath) and they are correct. ? ? Marco says this problem is know. Does not work always. Check source code of Chia-Yu says it is the naming of the calibration files. See Twiki or Mathieu's docu. (The name I give is indeed wrong)
    • IDEA! Get calibrations: ./getDBFilePath -r 261645 DC05Y1.
      • To generate getDBFilePath (set up compilation environment): cd coral_svn/trunk/src/condb/mysqldb/Utils ; make

  • IDEA! DC05 in 2015 detectors.dat. For DC05 efficiency, need TraF SmoothPos, which is the average z-position of a given plane (i.e. average of un-primed and primed view). Files listed here were produced on June 24, 2016.
File Z-Y1 Z-Y2 <Z-Y> Z-V1 Z-V2 <Z-V> Z-U1 Z-U2 <Z-U>
detectors.261513.transv.dat 511.1350 510.3350 510.735 505.9350 505.1350 505.535 503.3350 502.5350 502.935

August 8/10, 2016: update coral on lxplus and BW.

  • Follow instructions as described here:
    • Check out new version of coral: revision 14211 (= August 10, was 14208 on August 8)
    • Link 2015 and 2016 DC05 decoding files ( and ChipSinica.h) into my coral_svn/trunk/src/DaqDataDecoding/src . For the time being, have the 2015 version activated:
mv ChipSinica.cc_v14211
mv ChipSinica.h ChipSinica.h_v14211
ChipSinica.cc_2015 -> ../../../../../DC5/Decoding/ChipSinica.cc_2015
ChipSinica.cc_2016 -> ../../../../../DC5/Decoding/ChipSinica.cc_2016
ChipSinica.h_2016 -> ../../../../../DC5/Decoding/ChipSinica.h_2016
ChipSinica.h_2015 -> ../../../../../DC5/Decoding/ChipSinica.h_2015 -> ChipSinica.cc_2015
ChipSinica.h -> ChipSinica.h_2015
    • DONE Compile coral (DC05-2015) - August 10.

August 4-8, 2016: Use FTS3 to copy data from CERN to BW.

  • After meeting with Xavi and Alejandro. See BW Twiki for details.

August 2, 2016: Copy more 2014 mDSTs (dy14T07t4) to BW

  • Simple script that fetches, one volume at a time, data from CASTOR to my home afs, copies them (via GO CLI) to BW, using my personal GO endpoint, and deletes them in my home afs.
  • Run 2 scripts in parallel, each is supposed to copy 162 volumes. Yesterday I copied 45 volumes. The total for dy14T07t4 is 1,182 volumes.

August 1, 2016: copy some 2014 mDSTs (dy14T07t4) to BW

  • Unfortunately still have to use my afs home directory with <10 GB total volume.
  • Activate BW GO endpoint (it is open only for ~11 days) on BW: ssh , endpoint-activate ncsa#BlueWaters . Follow link.
  • on lxplus: cd $HOME/COMPASS/GlobusOnline/globusconnectpersonal-2.3.1/ ; ./globusconnectpersonal -start &
  • 2 at a time: xrdcp from Castor to my afs home directory; then use GO webinterface to copy files to BW. This is incredibly slow and ineffective. (details see table on BW data transfers, entry 3)

August 1/2, 2016: run coral on BW

  • Set up environment (in $HOME): . and . ( loads improper version of root, therefore I have my own setup script)
  • Create directory of today's date in $HOME/MyRun/mDSTs/Output/DC5 and link current to it
  • cd $HOME/MyRun/mDSTs/2015-P04-1
  • ./coral.exe trafdic.DC05.DCmode.opt
  • Reproduce Error 2 of July 5:
Error in <TMacro::ReadFile>: Cannot open file:
Required file is in different path:
Problem is environmental variable $COMPASS_FILES needed by trafdic.2015.opt (this default trafdic is also used as input even though I call coral with a specfic DC5 option file). On lxplus, export COMPASS_FILES=/afs/ in the of coral_svn/trunk. On BW, it is defined in (daughter of and is wrongly set to export COMPASS_FILES=$BADP/compass. Change it to $BADP/detector there (hoping it does not screw up at other places now).
  • Error 3: Error in MySQLDB::ConnectDB(): can't connect to DB
Solution: edit $CORAL/src/user/trafdic.2015.opt. Outcomment one line and add the two following:
//CDB server    wwwcompass      
CDB server h2ologin1
CDB specialplace BLUEWATERS
  • This needs to be done after each fresh checkout of coral (because the default trafdic.2015.opt will be checked out). Best is to write your own dedicated trafdic file.
  • Marco has to stop the MySQL data base, add (allow) user anonymous and re-start.
  • BLUEWATER is the key with that BW is registered on lxplus. Marco needs to add more lines that point to the directories with the actual calibration files on BW (M. Bodlak added only one line, which makes calibrations for many detectors unavailable)

July 2016

July 5, 2016: set up BW to run code and try to run coral

  • On BW, in MyRun/Phast-DC5/2015-P04-1 directory (as "good example"), create links:
    • detectors.dat : for a P4 run, it is ../../../project/detector/geometry/2015/detectors.261513.transv.dat
    • RT calibration files: for a P4 run, the files are in ../../../project/detector/calibrations/MySQLDB_files13/ with date Sep23.
    • to mDSTs/Output/DC5/current
  • On BW, in mDST directory: link to input coral file and link official trafdic file in subdirectory
  • Set my personal environment directories: create file, analog to on lxplus.
  • I.e. before a new analysis session, set up your environment:
cd $HOME
. defines the paths to the personal directories MYDIR, CORAL, PHAST, mDST

  • cd $HOME/MyRun/mDSTs/2015-P04-1
  • Error 1 when running coral: ./coral.exe trafdic.DC05.DCmode.opt
./coral.exe: error while loading shared libraries:
cannot open shared object file: No such file or directory
Coral expects root 5.34 (as defined in its configure run before compilation), while the active root on BW is 6.06.
which root
DONE Solution: in script, change to root 5.34. Then this error does not occur anymore.

  • Error 2 when running coral:
Error in <TMacro::ReadFile>: Cannot open file:

July 4, 2016: try to copy some data to BW via personal Globus Connect

  • Create my work directory $WORK on CERN afs: see link work-directory (100GB quota. Home directory has only 10GB.)
  • Copy a test raw 2016 run into $WORK/COMPASS/DATA/2016 :
nsls -l /castor/ | grep 271014
stager_get -M /castor/ 
===> "/castor/ SUBREQUEST_READY"
xrdcopy root:// .
  • This is one chunk of run 271014 (born on the 4th of July).
    • 245 = nsls -l /castor/ | grep 271014 | wc -l, with each about 1.1GB. I.e. I cannot even copy an entire run using this space.
    • Copy 90 chunks: nsls /castor/ | grep 271014 | head -90 > 2016_271014_shortlist.txt . Edit with emacs and replace string cdr by root:// (shift 1 is bulk replace).
    • for i in `cat 2016_271014_shortlist.txt`; do xrdcopy $i . ; done (start at 10:55; end at 11:54, i.e. 1 hour for 100GB)
  • Unfortunately, the $WORK directory is not accessible in the Globus Online web interface. Set up Globus Connect in $WORK directory, following the steps of July 1. But this endpoint also sees files in my home directory only. I delete it again.
  • Transfer files via GO CLI: explored. See BW Illinois Twiki

July 1, 2016: install personal GO endpoint in my CERN user account.

  • Details see BW Illinois Twiki.
  • Now I can add the new CERN endpoint to my web interface.
  • DONE Copy of a test file from my CERN home directory to BW successful.
  • /castor/ directory however not accessible (not surprisingly) and also not /eos/compass and /eos/user/c/criedl. Need gateway between GridFTP and EOS?

June 2016

June 29, 2016: Set up MyRun directory structure on BW

Changes on lxplus:
  • MyRun/DC5 ==> MyRun/Phast-DC5 DONE
  • MyRun/Physics ==> MyRun/Example DONE
  • MyRun/mDSTs/MymDSTs ==> MyRun/mDSTs/Output DONE
  • In MyRun, move all trafdic files into new directory TrafDic DONE
  • Corresponding adjustment of environment scripts on lxplus: DONE

Actions on BW:

  • Copy MyRun directory to BW DONE and adjust links (some links copied as directories) DONE
  • Link phast executable in MyRun/Phast-DC5/2015-P04-1 directory DONE

June 28/29, 2016: coral and phast on lxplus

  • Setup env. as described here (same as June 2)
  • Vincent helps me to identify reason for phast crash on June 4: in my coral option file, the general coral_svn/trunk/src/user/trafdic.2015.opt is included. And there DC05 was still specified as IS_MWPC!Comment out these lines.
  • Then I have to run coral again. Output will end up in MyRun/mDSTs/MymDSTs/DC5/2016-06-28 .
  • Run coral: cd MyRun/mDSTs ; ./coral.exe trafdic.DC05.DCmode.opt
  • Error:
    • DC05U2__: Not catered for => No propagation time correction.
    • Unlike I recommended to myself in Oct. 2015, "In coral_svn/trunk/src/geom/, the string DC5 has to appear in two places (not counting appearances in comments)", this is NOT the case.
    • In line 832, add: || strncmp(tbn,"DC05",4)==0 (I do not understand why this happened. I should have checked out a coral version where DC05 is taken care of already? Anyway will check out later a fresh version of coral.)
  • Re-compile coral DONE
  • Run coral (same run as on June 2)
    • Interactively: in MyRun/mDSTs ./coral.exe trafdic.DC05.DCmode.opt: results in send2nsd: NS002 - send error : No valid credential found (2x)
    • Send batch: ./ManyRDs.csh -d MymDSTs/DC5/ trafdic.DC05.DCmode.opt 1, finishes without output
    • Use instead of cdr11002-261645.raw: cdr14112-261645.raw, then interactive job produces something
    • In MyRun/DC5/2015-P04-1, ./phast -h MyOutput.root -u11 -U34881 -T DC05Y current/phast.root . There is something in the produced root file.
    • In MyRun/mDSTs, send batch: ./ManyRDs.csh -d MymDSTs/DC5/ trafdic.DC05.DCmode.opt 1 (runs 2-3 hours on Tuesday evening)
    • Wednesday: In MyRun/DC5/2015-P04-1, ./phast -h PhastOutput.DC05.DCmode.14112-261645.root -u11 -U34881 -T DC05Y current/mDST.DC05.DCmode.14112-261645.root
      • Some histograms are filled; but I don't see all the files I expect.
      • Next: run -U34885 (mv PhastOutput.DC05.DCmode.14112-261645.root PhastOutput.DC05.DCmode.14112-261645_U34881.root)

June 17/27, 2016: Set up Globus Online Command Line Interface

June 16, 2016: How to copy from CASTOR to EOS

  • To have a gridftp endpoint at CERN.
  • Note: this might not be needed - Damien's script copies from CASTOR with automatic stageing from what I understood
Access to EOS:
ssh -X
source /afs/ 
eos ls -l /eos/user/c/criedl   
source /afs/
eos ls -l /eos/compass

Access to CASTOR:

nsls -l /castor/ |awk '$7==16'

Want to copy from CASTOR to EOS:


  • I have a user directory on EOS, /eos/user/c/criedl. eos quota is 2TB from what I see. To access this dir, need to source the first
  • EOS tutorial
  • EOS documentation

June 14/15, 2016: Setting up BW - part 1

mkdir git
cd git
git clone ssh:// 
git clone ssh://
git config --global "Caroline Kathrin Riedl"
git config --global ""
export PHAST=/u/sciteam/criedl/COMPASS/phast
cd escalade-framework
PICK For the time being: wait until escalade and dy are ready. Then also remember git clone --recursive http://... Then set up gitlab framework on BW and re-compile software.

June 4, 2016: Getting started with phast on lxplus

  • set up env. as on June 2
  1. DONE Compile UserEvent11 for DC efficiency
    • copy /afs/*User*11* to my phast/user/
    • goto git/dy-analysis, ./
    • make phast (i.e. re-compile phast in git framework. Problems with SetTresol: not declared in this scope. Check if I need newer version of phast.
    • Download Phast.tar.gz.7.149.
      • mv Phast.tar.gz.7.149 Phast.7.149.tar (it seems when I download the file, my computer automatically unzips it...?)
      • tar xvf Phast.7.149.tar
      • voila: phast.7.149 directory.
    • SetTresol is also not in phast.7.149. Continue using phast.7.148 for the time being. Instead, use older version (1.27) of Then phast compiles.
  2. Prepare running of phast (following Mathieu's recipe)
    • Create new directory myRun/DC5, and in there, 2015-P01-S through 2015-P09-S, where S=1 or 2 if the period has 2 separate alignment files, or S=12 if it has 1 common alignment file (detectors.dat).
    • Link phast executable in each of these directories.
    • Link proper detectors.dat in each P-directory. Name string of link has to be called exactly detectors.dat. For example, the one chunk I produced with coral belongs to run 261645 = W10 = P4, sub-period 1, i.e. detectors.261513.transv.dat (for the time being, I use Chia-Yu's table). DONE for 2015-P04-1.
    • In coral_svn/trunk/src/condb/mysqldb/Utils, compile with make to create getDBFilePath.
    • Run getDBFilePath -r 261645 DC05Y1 and similarly for Y2, U1, U2, V1, V2 to create proper calibration files.
    • Copy them to a subdirectory in DC5/RT-calibration, where the directory name indicates the creation date of the calibration file in the MySQL data base. In this example for run 261645: 2015-09-23 DONE
  3. Run phast in 2015-P04-1 with ./phast -h MyOutput.root -u11 -U34881 -T DC05Y current/mDST.DC05.DCmode.11002-261645.root
    • error message ** UB: Mode 0x800 (i.e. correct for event time and signal propagation) requested while argument dets are not drift-like. I do not understand the line _ => Examining (option "IS_MWPC") detector plane(s):_ . The mDST was not produced in MWPC mode. Where does this come from?

June 2, 2016: Run coral on lxplus

Set up environment: (as described on main page)
cd coral_svn/trunk
cd ../../MyRun/mDSTs

./coral.exe trafdic.DC05.DCmode.opt Intercatively OK - 1 chunk produced as MymDSTs/phast.root (2.6MB) after I killed job by hand.

root -t phast.root 
TBrowser b
  • Submit in batch mode: ./ManyRDs.csh -d MymDSTs/DC5/ trafdic.DC05.DCmode.opt 1
  • Result (after ~ 6 hours for only 1 chunk?!): MymDSTs/DC5/mDST.DC05.DCmode.11002-261645.root (108MB).
  • Move in directory MymDSTs/DC5/2016-06-02

June 1, 2016: set up git on lxplus

mkdir /afs/
cd git
git clone ssh://
export PHAST=/afs/
cd dy-analysis
--> Add /afs/ to LD_LIBRARY_PATH and DYLD_LIBRARY_PATH (specific for MacOSX)
--> Define STAGE_HOST as castorpublic
--> Define STAGE_SVCCLASS as compassuser
--> Set the variable ROOTDIR as /afs/

Now typing make, he complains that ROOTSYS is not set. I have to type:
export ROOTSYS="/afs/"
Then I can compile.

Then : make depends

/afs/ error while loading shared libraries: cannot open shared object file: No such file or directory
--> Configuration selected :
   PHAST : /afs/
   PHAST Infos : 
   PHAST Arguments : -u93
   PHAST Output : /castor/
   PHAST Saved files : DST stored
--> Data configuration:
   Name : dy15W10t2_93
   Directory selected: /castor/
   Pattern of files : ".*DST.([0-9]*)-[0-9]-[0-9].root.*"
   Data output will be here: /afs/
--> Looking for data for dy15W10t2_93 matching with the following pattern ".*DST.([0-9]*)-[0-9]-[0-9].root.*"
--> Cleaning empty files in dy15W10t2_93..DONEcompass/generalprod/testcoral/dy15W10t2/mDST
--> Total 395 files found
--> Compiling library..

Generates runlists of the form src/dy15W10t2_93/261647.txt .

Then: compile PHAST. First need to set up more environment:

. /afs/
export CERN=/afs/
export CERN_LEVEL=x86_64-slc6-gcc47-opt
export ROOTSYS="/afs/"
. /afs/

make phast

It takes all source files from my phast directory, and the local users directory.

Code crashes with In static member function ‘static TCanvas* NicePlot::Draw(TString, TH1*, TH1*, Bool_t, Double_t, Double_t)’: error: ‘SetExponentOffset’ is not a member of ‘TGaxis’

Codes compiles if in lib/ the following line is outcommented:
//      TGaxis::SetExponentOffset(0.01, -0.05, "x");
PICK Investigate. Not super urgent. Might be root version issue.

Submit a test phast job.

make try

with /castor/ output.root (tree) and hist.root (histograms only) are written. Checked - OK.

October 2015

October 19, 2015

cd coral_svn/trunk . cd ../../MyRun ./

  • Adapt trafdic.DC05.DCmode.opt to run over run 261645 (1 chunk) with detectors.2015.W10.alon.ai3_forAlain.dat
  • run interactively: ./coral.exe trafdic.DC05.DCmode.opt and work station 0. Looks OK - processes events.
  • run on batch: ./ManyRDs.csh -d MymDSTs/DC5/ trafdic.DC05.DCmode.opt 1
* ManyRDs: Parsing and checking command line arguments and options file...
 * ManyRDs: All checks done. Proceeding to submit jobs..
 * Stageing run 261645, 11001 <= chunk < 11002
 * ManyRDs: Submitting chunk 11001-261645
Job <710260527> is submitted to queue <1nd>.
root -t phast.root 
TBrowser b

September 2015

September 2, 2015

Instead of, introduce 2 separate scripts:

So that:

  • Before compiling CORAL (which does not happen so often)
cd coral_svn/trunk
  • Before a normal analysis session,
cd coral_svn/trunk
cd ../../MyRun

Note: the 'export' lines in *.sh have to be pasted into the command line. Executing the script does not do the job. It's OK to do that for the time being, but why...

Next try:

  • Use Yann's .tcshrc and try to run in bash. Is OK interactively after following above recipe.
  • Runs in batch. But crashes after 8.5 hours (1 chunk only!) because file size exceeded...

September 1, 2015

Log of jobs: symbol lookup error: /afs/ undefined symbol: gROOT ** coraljob: WARNING: Error in the execution of coral

But it works interactively! Do I have to set env. on castor?

Try with Yann's original ManyRDs.csh script. I had changed the following line: #set outDir = ~/w0/csub | set outDir = ~/w0/csub set outDir = /castor/ <

Same error as before. The job does definitively run interactively - ./coral.exe trafdic.DC05.DCmode.opt and then "0" (choose no work station)

Try everything in csh ...

- log in - csh - in coral_svn/trunk: source setup.csh - setenv MYDIR /afs/ - setenv MYOUTPUTDIR ${MYDIR}/MyRun/mDSTs/MymDSTs - setenv PHAST ${MYDIR}/phast - in MyRun/mDSTs, ./coral.exe trafdic.DC05.DCmode.opt produces many events! - Now send batch job! ./ManyRDs.csh -d MymDSTs/DC5/ trafdic.DC05.DCmode.opt 1

Same problem.... OK. Get some help now. Now that I stripped down the environment it is easier to explain.

August 2015

August 31, 2015

Starting a fresh analysis session, code seems to run interactively. (Are my problems arising from losing X credentials?)

It is important to not only run the scripts as described on my Twiki, but also to run the following commands (not sure if all are needed):

. /afs/ export MYDIR=/afs/ export MYOUTPUTDIR=${MYDIR}/MyRun/mDSTs/MymDSTs export PHAST=${MYDIR}/phast

(why does this not work from script? Investigate.)

Then submit job: ./ManyRDs.csh -d MymDSTs/DC5/ trafdic.DC05.DCmode.opt 10

August 26, 2015

Checked out new version of coral (14063) and compiled everything successfully. Now is ready for DC5. Trying to run over /castor/ : kSigSegmentationViolation, seems to be a root problem. Try to fix it.

August 24, 2015

Root files couldn't be produced: /afs/ symbol lookup error: /afs/ undefined symbol: gROOT Debugging interactively.

ERROR, on Mon, 24/Aug/2015 19:24:11.168530 (GMT) from: 1576 `CsEvent::_decode(): exception: CsDriftChamberDetector::DecodeRawData(): Wrong digit type Decoding error for DC05V1__, detector may be empty during reconstruction.'

Same for V2, Y1, Y2 (but not for U1, U2!)

August 21, 2015

Sent first job ./ManyRDs.csh -d MymDSTs/DC5/ trafdic.DC05.DCmode.opt 1 But it had not good output.

-- CarolineRiedl - 2015-10-19

