cmssrv217

Agent tweaks

UPDATE wmbs_location SET state=(SELECT id from wmbs_location_state where name='Normal') WHERE state!=(SELECT id from wmbs_location_state where name='Normal');
UPDATE wmbs_location SET running_slots=2000, pending_slots=1000;
UPDATE rc_threshold SET max_slots=2000, pending_slots=1000;
  • Set maxRetries to 0 ==> OK
  • Run PhEDExFix ==> NOPE

Jobs in Condor

[cmsdataops@cmssrv217 current]$ condorq
[cmsdataops@cmssrv217 current]$ 
control r and source and prompt $manage mysql-prompt wmagent

Jobs ordered by status

SQL> select wmbs_job_state.name, count(*)
from wmbs_job
join wmbs_job_state on (wmbs_job.state = wmbs_job_state.id)
group by wmbs_job.state, wmbs_job_state.name;
+----------+----------+
| name     | count(*) |
+----------+----------+
| cleanout |        5 |
+----------+----------+
1 row in set (0.67 sec)

Workflows in the System

MariaDB [wmagent]> SELECT DISTINCT name from wmbs_workflow;
+---------------------------------------------------------------------------------+
| name                                                                            |
+---------------------------------------------------------------------------------+
jen_a_ACDC_Run2016B-AlCaLumiPixels3-07Jul2016_8014_160722_221559_2110		aborted-archived
prozober_ACDC_task_EXO-RunIISpring16DR80-03464__v1_T_160725_174306_2970		aborted-archived
prozober_ACDC_task_EXO-RunIISpring16DR80-03473__v1_T_160725_174236_2120		aborted-archived
prozober_ACDC_task_EXO-RunIISpring16DR80-03478__v1_T_160725_174150_9280		aborted-archived
prozober_ACDC_task_EXO-RunIISpring16DR80-03560__v1_T_160725_174251_3034		aborted-archived
+-------------------------------------------------------------------------+
5 rows in set (0.01 sec)

and what matters is only, well, they are ALL at least in complete status.

NONE

Workflows not fully injected

MariaDB [wmagent]> select distinct name from wmbs_workflow where injected = 0;  
Empty set (0.00 sec)

Their status in workqueue is as follows:

cmst1@vocms0310:/data/srv/wmagent/current $ python getGQByWorkflow.py BLAH

Subscriptions not finished

SQL> select distinct wmbs_workflow.name AS wfName
   FROM wmbs_subscription
   INNER JOIN wmbs_fileset ON wmbs_subscription.fileset = wmbs_fileset.id
   INNER JOIN wmbs_workflow ON wmbs_workflow.id = wmbs_subscription.workflow
   where wmbs_subscription.finished = 0 ORDER BY wmbs_workflow.name;
Empty set (0.00 sec)

Files available in WMBS (waiting for job creation)

MariaDB [wmagent]> select subscription,count(*) from wmbs_sub_files_available group by subscription;
Empty set (0.04 sec)

Checking workflows with files still available:

SQL> SELECT wmbs_workflow.name, count(wmbs_sub_files_available.subscription), count(wmbs_sub_files_available.fileid)
  FROM wmbs_sub_files_available
  INNER JOIN wmbs_subscription ON wmbs_sub_files_available.subscription = wmbs_subscription.id
  INNER JOIN wmbs_workflow ON wmbs_subscription.workflow = wmbs_workflow.id
  GROUP BY wmbs_workflow.name;
Empty set (0.03 sec)

Files acquired or acquired in WMBS (waiting for job to finish)

MariaDB [wmagent]>  select subscription,count(*) from wmbs_sub_files_acquired group by subscription;
Empty set (0.02 sec)

Files and Blocks in Phedex and DBS

Blocks open in DBS

MariaDB [wmagent]> SELECT * FROM dbsbuffer_block WHERE status!='Closed';
Empty set (0.07 sec)

Files not updated DBS

MariaDB [wmagent]> SELECT * from dbsbuffer_file where status = 'NOTUPLOADED';
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+--------+--------------+----------+-------------+-----------+----------+----------------------+
| id     | lfn                                                                                                                                                                                                                     | filesize   | events | dataset_algo | block_id | status      | in_phedex | workflow | LastModificationDate |
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+--------+--------------+----------+-------------+-----------+----------+----------------------+
| 413446 | /store/mc/RunIISpring16MiniAODv2/ChargedHiggs_HplusTB_HplusToTauNu_M-750_13TeV_amcatnlo_pythia8/MINIAODSIM/PUSpring16RAWAODSIM_reHLT_80X_mcRun2_asymptotic_v14-v1/20000/E69C40FB-374E-E611-9A0E-00221982C606.root       |  268136324 |   6248 |         3838 |     NULL | NOTUPLOADED |         0 |   103521 |                 NULL |
| 413457 | /store/mc/RunIISpring16MiniAODv2/ChargedHiggs_HplusTB_HplusToTauNu_M-750_13TeV_amcatnlo_pythia8/MINIAODSIM/PUSpring16RAWAODSIM_reHLT_80X_mcRun2_asymptotic_v14-v1/20000/168D81F2-2F4E-E611-A471-001EC9B22B53.root       |  336488295 |   7904 |         3838 |     NULL | NOTUPLOADED |         0 |   103521 |                 NULL |
...
| 483734 | /store/mc/RunIISpring16DR80/Graviton2PMqqbarToZGTo2LG_width-0p056_M-2000_13TeV-JHUgen/RAWAODSIM/PUSpring16RAWAODSIM_80X_mcRun2_asymptotic_2016_v3-v1/20000/12142850-D652-E611-8CB0-6C3BE5B5C0C0.root                    | 3512527757 |   4770 |         3915 |     NULL | NOTUPLOADED |         0 |   133494 |                 NULL |
| 484470 | /store/mc/RunIISpring16DR80/Graviton2PMqqbarToZGTo2LG_width-0p056_M-2000_13TeV-JHUgen/RAWAODSIM/PUSpring16RAWAODSIM_80X_mcRun2_asymptotic_2016_v3-v1/20000/F448786E-EC52-E611-9D3F-B499BAAB427C.root                    | 4210398553 |   5724 |         3915 |     NULL | NOTUPLOADED |         0 |   133494 |                 NULL |
| 484471 | /store/mc/RunIISpring16DR80/Graviton2PMqqbarToZGTo2LG_width-0p056_M-2000_13TeV-JHUgen/RAWAODSIM/PUSpring16RAWAODSIM_80X_mcRun2_asymptotic_2016_v3-v1/20000/70E3C668-EC52-E611-87E2-6C3BE5B59210.root                    | 2110141483 |   2862 |         3915 |     NULL | NOTUPLOADED |         0 |   133494 |                 NULL |
+--------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+--------+--------------+----------+-------------+-----------+----------+----------------------+
362 rows in set (0.37 sec)

Trying to figure out what happened to those NOTUPLOADED files in dbsbuffer_file: SELECT dbsbuffer_workflow.name, count(dbsbuffer_file.lfn) from dbsbuffer_file INNER JOIN dbsbuffer_workflow ON dbsbuffer_workflow.id = dbsbuffer_file.workflow WHERE status = 'NOTUPLOADED' group by dbsbuffer_workflow.name;

which gives us a table with workflow name and number of files pending dbs injection.

Then from there, we get theis status from reqmgr2, e.g.: amaltaro@vocms049:/data/amaltaro/ReqMgr/Sept2016_Patches $ python /data/amaltaro/WmAgentScripts/RelVal/testbed/getrequeststatus.py alan pdmvserv_HIG-RunIISummer15wmLHEGS-00342_00124_v0__160715_124648_6535 rejected-archived pdmvserv_HIG-RunIISummer15wmLHEGS-00345_00124_v0__160715_124807_9406 rejected-archived pdmvserv_task_EXO-RunIISpring16DR80-03148__v1_T_160714_125341_5358 normal-archived pdmvserv_task_EXO-RunIISpring16DR80-03433__v1_T_160719_130400_7521 rejected-archived pdmvserv_task_EXO-RunIISpring16DR80-03435__v1_T_160719_130310_8375 rejected-archived pdmvserv_task_EXO-RunIISpring16DR80-03441__v1_T_160719_130445_1031 rejected-archived pdmvserv_task_EXO-RunIISpring16DR80-03456__v1_T_160719_131411_2281 rejected-archived pdmvserv_task_EXO-RunIISpring16DR80-03469__v1_T_160719_132434_4829 normal-archived pdmvserv_task_EXO-RunIISpring16DR80-03474__v1_T_160719_132617_1378 rejected-archived pdmvserv_task_EXO-RunIISpring16DR80-03478__v1_T_160719_132755_1283 rejected-archived pdmvserv_task_EXO-RunIISpring16DR80-03480__v1_T_160719_132840_1493 rejected-archived pdmvserv_task_EXO-RunIISpring16DR80-03484__v1_T_160719_133548_2600 rejected-archived pdmvserv_task_EXO-RunIISpring16DR80-03500__v1_T_160719_134427_3795 rejected-archived pdmvserv_task_EXO-RunIISpring16DR80-03504__v1_T_160719_134746_5617 normal-archived pdmvserv_task_EXO-RunIISpring16DR80-03516__v1_T_160719_140341_8564 rejected-archived pdmvserv_task_EXO-RunIISpring16DR80-03518__v1_T_160719_140418_959 normal-archived pdmvserv_task_EXO-RunIISpring16DR80-03520__v1_T_160719_140514_6728 normal-archived pdmvserv_task_EXO-RunIISpring16reHLT80-01920__v1_T_160716_210729_9891 rejected-archived pdmvserv_task_EXO-RunIISpring16reHLT80-02047__v1_T_160722_125258_8063 rejected-archived pdmvserv_task_HIG-RunIISpring16DR80-01493__v1_T_160715_162001_8367 rejected-archived pdmvserv_task_HIG-RunIISpring16DR80-01500__v1_T_160714_174637_1365 normal-archived prozober_ACDC_task_EXO-RunIISpring16DR80-03474__v1_T_160725_174205_9550 rejected-archived prozober_task_EXO-RunIISpring16reHLT80-00397__v1_T_160720_160628_5528 normal-archived

Files not injected in Phedex, with parent block id (can be recovered)

SQL> SELECT * FROM dbsbuffer_file
WHERE in_phedex=0
AND block_id IS NOT NULL
AND lfn NOT LIKE '%unmerged%'
AND lfn NOT LIKE 'MCFakeFile%'
AND lfn NOT LIKE '%BACKFILL%'
AND lfn NOT LIKE '/store/user%';
Empty set (0.88 sec)

Files not in phedex without parent block id (cannot be recovered) Possible input files.

SQL> SELECT count(*) FROM dbsbuffer_file
WHERE in_phedex=0
AND block_id IS NULL
AND lfn NOT LIKE '%unmerged%' 
AND lfn NOT LIKE 'MCFakeFile%'
AND lfn NOT LIKE '%BACKFILL%'
AND lfn NOT LIKE '/store/backfill/%'
AND lfn NOT LIKE '/store/user%';
+----------+
| count(*) |
+----------+
|   216924 |
+----------+
1 row in set (0.38 sec)

So we run fix Phedex to update the files not in phedex

cmst1@vocms0303:/data/srv/wmagent/current $ curl https://raw.githubusercontent.com/amaltaro/ProductionTools/master/newFixPhEDEx.py > newFixPhedex.py
cmst1@vocms0303:/data/srv/wmagent/current $ python newFixPhedex.py
Shutting down PhEDExInjector...
Checking 4 dataset in both PhEDEx and DBS ...
100/6024 files processed
...
6000/6024 files processed
Found 17623 out of 17623 files that are already registered in PhEDEx            but buffer doesn't know
Fixing them now, it may take several minutes ...
Rows were successfully updated! Good job!
Starting PhEDExInjector now ...

started with pid 2081529

And we check afterwards

SQL>SELECT lfn FROM dbsbuffer_file
WHERE in_phedex=0
AND block_id IS NULL
AND lfn NOT LIKE '%unmerged%' 
AND lfn NOT LIKE 'MCFakeFile%'
AND lfn NOT LIKE '%BACKFILL%'
AND lfn NOT LIKE '/store/backfill/%'
AND lfn NOT LIKE '/store/user%';
no rows selected

Agent is READY for redeployment.

Edit | Attach | Watch | Print version | History: r31 < r30 < r29 < r28 < r27 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r31 - 2016-09-20 - AlanMalta
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback