cmsgwms-submit2
Agent tweaks
UPDATE wmbs_location SET state=(SELECT id from wmbs_location_state where name='Normal') WHERE state!=(SELECT id from wmbs_location_state where name='Normal');
UPDATE wmbs_location SET running_slots=2000, pending_slots=1000;
UPDATE rc_threshold SET max_slots=2000, pending_slots=1000;
- Set maxRetries to 0 ==> OK
- Run PhEDExFix ==> NOPE
Jobs in Condor
[cmsdataops@cmsgwms-submit2 current]$ condorq
[cmsdataops@cmsgwms-submit2 current]$
Jobs ordered by status
SQL> select wmbs_job_state.name, count(*)
from wmbs_job
join wmbs_job_state on (wmbs_job.state = wmbs_job_state.id)
group by wmbs_job.state, wmbs_job_state.name;
+----------+----------+
| name | count(*) |
+----------+----------+
| cleanout | 2542 |
+----------+----------+
1 row in set (0.01 sec)
Workflows in the System
MariaDB [wmagent]> SELECT DISTINCT name from wmbs_workflow;
+-----------------------------------------------------------------+
| name |
+-----------------------------------------------------------------+
| fabozzi_HIRun2015-HIFlowCorr-25Aug2016_758p5_160826_162420_2962 | completed
+-----------------------------------------------------------------+
1 row in set (0.00 sec)
and what matters is only, well, they are ALL at least in complete status.
NONE
Workflows not fully injected
MariaDB [wmagent]> select distinct name from wmbs_workflow where injected = 0;
Empty set (0.00 sec)
Their status in workqueue is as follows:
cmst1@vocms0310:/data/srv/wmagent/current $ python getGQByWorkflow.py BLAH
Subscriptions not finished
SQL> select distinct wmbs_workflow.name AS wfName
FROM wmbs_subscription
INNER JOIN wmbs_fileset ON wmbs_subscription.fileset = wmbs_fileset.id
INNER JOIN wmbs_workflow ON wmbs_workflow.id = wmbs_subscription.workflow
where wmbs_subscription.finished = 0 ORDER BY wmbs_workflow.name;
Empty set (0.01 sec)
and they are all either aborted or announced (mostly archived). So let's switch their subscription to finished:
MariaDB [wmagent]> UPDATE wmbs_subscription SET finished=1 WHERE finished=0;
Query OK, 88 rows affected (0.18 sec)
Rows matched: 88 Changed: 88 Warnings: 0
Files available in WMBS (waiting for job creation)
MariaDB [wmagent]> select subscription,count(*) from wmbs_sub_files_available group by subscription;
Empty set (0.00 sec)
Checking workflows with files still available:
SQL> SELECT wmbs_workflow.name, count(wmbs_sub_files_available.subscription), count(wmbs_sub_files_available.fileid)
FROM wmbs_sub_files_available
INNER JOIN wmbs_subscription ON wmbs_sub_files_available.subscription = wmbs_subscription.id
INNER JOIN wmbs_workflow ON wmbs_subscription.workflow = wmbs_workflow.id
GROUP BY wmbs_workflow.name;
Empty set (0.00 sec)
that means we can remove these files from this table:
MariaDB [wmagent]> DELETE FROM wmbs_sub_files_available;
Query OK, 117 rows affected (0.10 sec)
Files acquired or acquired in WMBS (waiting for job to finish)
MariaDB [wmagent]> select subscription,count(*) from wmbs_sub_files_acquired group by subscription;
Empty set (0.00 sec)
Checking workflows with files still acquired:
SQL> SELECT wmbs_workflow.name, count(wmbs_sub_files_acquired.subscription), count(wmbs_sub_files_acquired.fileid)
FROM wmbs_sub_files_acquired
INNER JOIN wmbs_subscription ON wmbs_sub_files_acquired.subscription = wmbs_subscription.id
INNER JOIN wmbs_workflow ON wmbs_subscription.workflow = wmbs_workflow.id
GROUP BY wmbs_workflow.name;
Empty set (0.00 sec)
that means we can remove these files from this table:
MariaDB [wmagent]> DELETE FROM wmbs_sub_files_acquired;
Query OK, 117 rows affected (0.10 sec)
Files and Blocks in Phedex and DBS
Blocks open in DBS
MariaDB [wmagent]> SELECT * FROM dbsbuffer_block WHERE status!='Closed';
Empty set (0.07 sec)
Files not updated DBS
MariaDB [wmagent]> SELECT * from dbsbuffer_file where status = 'NOTUPLOADED';
Empty set (0.87 sec)
Files not injected in Phedex, with parent block id (can be recovered)
SQL> SELECT * FROM dbsbuffer_file
WHERE in_phedex=0
AND block_id IS NOT NULL
AND lfn NOT LIKE '%unmerged%'
AND lfn NOT LIKE 'MCFakeFile%'
AND lfn NOT LIKE '%BACKFILL%'
AND lfn NOT LIKE '/store/user%';
Empty set (0.88 sec)
Files not in phedex without parent block id (cannot be recovered) Possible input files.
SQL> SELECT count(*) FROM dbsbuffer_file
WHERE in_phedex=0
AND block_id IS NULL
AND lfn NOT LIKE '%unmerged%'
AND lfn NOT LIKE 'MCFakeFile%'
AND lfn NOT LIKE '%BACKFILL%'
AND lfn NOT LIKE '/store/backfill/%'
AND lfn NOT LIKE '/store/user%';
+----------+
| count(*) |
+----------+
| 27798 |
+----------+
1 row in set (0.05 sec)
So we run fix Phedex to update the files not in phedex
[cmsdataops@cmsgwms-submit2 current]$ curl https://raw.githubusercontent.com/amaltaro/ProductionTools/master/newFixPhEDEx.py > newFixPhedex.py
[cmsdataops@cmsgwms-submit2 current]$ source /data/srv/wmagent/current/apps/wmagent/etc/profile.d/init.sh
[cmsdataops@cmsgwms-submit2 current]$ python newFixPhedex.py
Shutting down PhEDExInjector...
Checking 326 dataset in both PhEDEx and DBS ...
100/326 files processed
...
6500/6536 files processed
Found 27797 out of 27798 files that are already registered in PhEDEx but buffer doesn't know
Fixing them now, it may take several minutes ...
Rows were successfully updated! Good job!
Starting PhEDExInjector now ...
started with pid 1754861
Done!
And we check afterwards
SQL> SELECT lfn FROM dbsbuffer_file
WHERE in_phedex=0
AND block_id IS NULL
AND lfn NOT LIKE '%unmerged%'
AND lfn NOT LIKE 'MCFakeFile%'
AND lfn NOT LIKE '%BACKFILL%'
AND lfn NOT LIKE '/store/backfill/%'
AND lfn NOT LIKE '/store/user%' ORDER BY lfn;
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| lfn |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
| /store/mc/RunIISummer15GS/ST_t-channel_4f_leptonDecays_13TeV-amcatnlo-herwigpp_TuneEE5C/GEN-SIM/MCRUN2_71_V1-v1/10000/DC3086E5-8754-E511-AEC9-002590494C74.root |
+-----------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.02 sec)
which belongs to:
/ST_t-channel_4f_leptonDecays_13TeV-amcatnlo-herwigpp_TuneEE5C/RunIISummer15GS-MCRUN2_71_V1-v1/GEN-SIM
and NOT to the workflow still in wmbs_workflow.
Agent is READY to go.