cmsgwms-submit2

Agent tweaks

UPDATE wmbs_location SET state=(SELECT id from wmbs_location_state where name='Normal') WHERE state!=(SELECT id from wmbs_location_state where name='Normal');
UPDATE wmbs_location SET running_slots=2000, pending_slots=1000;
UPDATE rc_threshold SET max_slots=2000, pending_slots=1000;
  • Set maxRetries to 0 ==> OK
  • Run PhEDExFix ==> NOPE

Jobs in Condor

[cmsdataops@cmsgwms-submit2 current]$ condorq
[cmsdataops@cmsgwms-submit2 current]$ 

Jobs ordered by status

SQL> select wmbs_job_state.name, count(*)
from wmbs_job
join wmbs_job_state on (wmbs_job.state = wmbs_job_state.id)
group by wmbs_job.state, wmbs_job_state.name;
+----------+----------+
| name     | count(*) |
+----------+----------+
| cleanout |   157897 |
+----------+----------+
1 row in set (0.06 sec)

Workflows in the System

MariaDB [wmagent]> SELECT DISTINCT name from wmbs_workflow;
+---------------------------------------------------------------------------------+
| name                                                                            |
+---------------------------------------------------------------------------------+
areinsvo_SUS-RunIISpring16MiniAODv2-Backfill-00012_00005_v0__160607_211355_6889		completed
fabozzi_HIRun2015-HIMinimumBias5-02May2016_758p4_160502_172625_4322		completed
fabozzi_Run2015B-DoubleEG-08Jun2016_765p1_160608_195715_2425		completed
fabozzi_Run2015D-DoubleEG-08Jun2016_765p1_160608_195144_7236		completed
fabozzi_Run2016B-2-MuonEG-01Jul2016_8013p1_160701_191318_9475		completed
fabozzi_Run2016B-2-SingleElectron-01Jul2016_8013p1_160701_190809_6571		completed
fabozzi_Run2016B-2-SinglePhoton-01Jul2016_8013p1_160701_191447_6065		completed
pdmvserv_task_BTV-RunIISpring16DR80-00038__v1_T_160516_093427_2240		normal-archived
pdmvserv_task_TOP-RunIISpring16DR80-00043__v1_T_160611_010715_7552		rejected-archived
pdmvserv_TOP-RunIISummer15wmLHEGS-00009_00068_v0__160609_042535_4109		aborted-archived
pdmvserv_TSG-RunIISpring16MiniAODv1-00007_00087_v0__160705_090408_8319		normal-archived
prozober_HIG-RunIISummer15wmLHEGS-00230_00059_v0__160615_164923_3825		rejected-archived
+---------------------------------------------------------------------------------+
12 rows in set (0.00 sec)

and what matters is only, well, they are ALL at least in complete status.

NONE

Workflows not fully injected

MariaDB [wmagent]> select distinct name from wmbs_workflow where injected = 0;  
Empty set (0.00 sec)

Their status in workqueue is as follows:

cmst1@vocms0310:/data/srv/wmagent/current $ python getGQByWorkflow.py BLAH

Subscriptions not finished

SQL> select distinct wmbs_workflow.name AS wfName
   FROM wmbs_subscription
   INNER JOIN wmbs_fileset ON wmbs_subscription.fileset = wmbs_fileset.id
   INNER JOIN wmbs_workflow ON wmbs_workflow.id = wmbs_subscription.workflow
   where wmbs_subscription.finished = 0 ORDER BY wmbs_workflow.name;
Empty set (0.01 sec)

and they are all either aborted or announced (mostly archived). So let's switch their subscription to finished:

MariaDB [wmagent]> UPDATE wmbs_subscription SET finished=1 WHERE finished=0;
Query OK, 88 rows affected (0.18 sec)
Rows matched: 88  Changed: 88  Warnings: 0

Files available in WMBS (waiting for job creation)

MariaDB [wmagent]> select subscription,count(*) from wmbs_sub_files_available group by subscription;
+--------------+----------+
| subscription | count(*) |
+--------------+----------+
|          431 |        1 |
|          436 |       53 |
|          437 |      267 |
|          438 |      420 |
|          439 |      437 |
|          440 |      446 |
|          442 |       35 |
|          443 |      446 |
|          444 |     1830 |
|        25511 |     4602 |
|        25512 |      912 |
|        33634 |      125 |
|        33639 |       81 |
|        33640 |      115 |
|        33641 |      265 |
|        33642 |      269 |
|        33643 |      114 |
|        33645 |       15 |
|        33646 |      114 |
|        33647 |     4416 |
|        33648 |     4655 |
+--------------+----------+
21 rows in set (0.01 sec)

Checking workflows with files still available:

SQL> SELECT wmbs_workflow.name, count(wmbs_sub_files_available.subscription), count(wmbs_sub_files_available.fileid)
  FROM wmbs_sub_files_available
  INNER JOIN wmbs_subscription ON wmbs_sub_files_available.subscription = wmbs_subscription.id
  INNER JOIN wmbs_workflow ON wmbs_subscription.workflow = wmbs_workflow.id
  GROUP BY wmbs_workflow.name;
+----------------------------------------------------------------------+----------------------------------------------+----------------------------------------+
| name                                                                 | count(wmbs_sub_files_available.subscription) | count(wmbs_sub_files_available.fileid) |
+----------------------------------------------------------------------+----------------------------------------------+----------------------------------------+
pdmvserv_task_BTV-RunIISpring16DR80-00038__v1_T_160516_093427_2240		normal-archived
pdmvserv_task_TOP-RunIISpring16DR80-00043__v1_T_160611_010715_7552		rejected-archived
pdmvserv_TOP-RunIISummer15wmLHEGS-00009_00068_v0__160609_042535_4109		aborted-archived
+----------------------------------------------------------------------+----------------------------------------------+----------------------------------------+
3 rows in set (0.01 sec)

that means we can remove these files from this table:

MariaDB [wmagent]> DELETE FROM wmbs_sub_files_available;
Query OK, 117 rows affected (0.10 sec)

Files acquired or acquired in WMBS (waiting for job to finish)

MariaDB [wmagent]> select subscription,count(*) from wmbs_sub_files_acquired group by subscription;
+--------------+----------+
| subscription | count(*) |
+--------------+----------+
|          432 |        1 |
|        25558 |        1 |
|        25561 |        1 |
|        25889 |        1 |
|        25892 |        1 |
|        26065 |        1 |
|        26270 |        1 |
|        27197 |        1 |
|        27265 |        1 |
|        27424 |        1 |
|        27469 |        1 |
|        27488 |        1 |
|        27563 |        1 |
|        33633 |        7 |
|        34629 |      109 |
+--------------+----------+
15 rows in set (0.00 sec)

Checking workflows with files still acquired:

SQL> SELECT wmbs_workflow.name, count(wmbs_sub_files_acquired.subscription), count(wmbs_sub_files_acquired.fileid)
  FROM wmbs_sub_files_acquired
  INNER JOIN wmbs_subscription ON wmbs_sub_files_acquired.subscription = wmbs_subscription.id
  INNER JOIN wmbs_workflow ON wmbs_subscription.workflow = wmbs_workflow.id
  GROUP BY wmbs_workflow.name;
+----------------------------------------------------------------------+---------------------------------------------+---------------------------------------+
| name                                                                 | count(wmbs_sub_files_acquired.subscription) | count(wmbs_sub_files_acquired.fileid) |
+----------------------------------------------------------------------+---------------------------------------------+---------------------------------------+
pdmvserv_task_BTV-RunIISpring16DR80-00038__v1_T_160516_093427_2240		normal-archived
pdmvserv_task_TOP-RunIISpring16DR80-00043__v1_T_160611_010715_7552		rejected-archived
pdmvserv_TOP-RunIISummer15wmLHEGS-00009_00068_v0__160609_042535_4109		aborted-archived
prozober_HIG-RunIISummer15wmLHEGS-00230_00059_v0__160615_164923_3825		rejected-archived
+----------------------------------------------------------------------+---------------------------------------------+---------------------------------------+
4 rows in set (0.00 sec)

that means we can remove these files from this table:

MariaDB [wmagent]> DELETE FROM wmbs_sub_files_acquired;
Query OK, 117 rows affected (0.10 sec)

Files and Blocks in Phedex and DBS

Blocks open in DBS

MariaDB [wmagent]> SELECT * FROM dbsbuffer_block WHERE status!='Closed';
Empty set (0.07 sec)

Files not updated DBS

MariaDB [wmagent]> SELECT * from dbsbuffer_file where status = 'NOTUPLOADED';
Empty set (0.87 sec)

Files not injected in Phedex, with parent block id (can be recovered)

SQL> SELECT * FROM dbsbuffer_file
WHERE in_phedex=0
AND block_id IS NOT NULL
AND lfn NOT LIKE '%unmerged%'
AND lfn NOT LIKE 'MCFakeFile%'
AND lfn NOT LIKE '%BACKFILL%'
AND lfn NOT LIKE '/store/user%';
Empty set (0.88 sec)

Files not in phedex without parent block id (cannot be recovered) Possible input files.

SQL> SELECT count(*) FROM dbsbuffer_file
WHERE in_phedex=0
AND block_id IS NULL
AND lfn NOT LIKE '%unmerged%' 
AND lfn NOT LIKE 'MCFakeFile%'
AND lfn NOT LIKE '%BACKFILL%'
AND lfn NOT LIKE '/store/backfill/%'
AND lfn NOT LIKE '/store/user%';
+----------+
| count(*) |
+----------+
|   232104 |
+----------+
1 row in set (0.47 sec)

So we run fix Phedex to update the files not in phedex

cmst1@vocms0303:/data/srv/wmagent/current $ curl https://raw.githubusercontent.com/amaltaro/ProductionTools/master/newFixPhEDEx.py > newFixPhedex.py
cmst1@vocms0303:/data/srv/wmagent/current $ python newFixPhedex.py
Shutting down PhEDExInjector...
Checking 4 dataset in both PhEDEx and DBS ...
100/6024 files processed
...
6000/6024 files processed
Found 17623 out of 17623 files that are already registered in PhEDEx            but buffer doesn't know
Fixing them now, it may take several minutes ...
Rows were successfully updated! Good job!
Starting PhEDExInjector now ...

started with pid 2081529

And we check afterwards

SQL>SELECT lfn FROM dbsbuffer_file
WHERE in_phedex=0
AND block_id IS NULL
AND lfn NOT LIKE '%unmerged%' 
AND lfn NOT LIKE 'MCFakeFile%'
AND lfn NOT LIKE '%BACKFILL%'
AND lfn NOT LIKE '/store/backfill/%'
AND lfn NOT LIKE '/store/user%';
no rows selected

Edit | Attach | Watch | Print version | History: r19 | r17 < r16 < r15 < r14 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r15 - 2016-08-22 - AlanMalta
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    CMSPublic All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback