Miscellaneous Tasks
Closing Blocks
To close blocks we can use the
phedex::Web::API::Inject
. We are not injecting blocks or files, we are only using the API to set the desire state of the block, which is closed in this case
Step 1
Generate the .xml file with the blocks to close. The file should contain the XML structure representing the data to be closed.
- See requirements of the XML file at phedex::Web::API::Inject
- XML should looks like:
- Note: Please set the parameter is-open to "n" in all the blocks
Step 2
Use
PhEDEx datasvc inject API to close the blocks
List of LFNs by Site
Sometimes, for accounting or other purposes, we want a list of all files at a given site. We can do it via
PhEDEx APIs or querying directly TMDB. Both methods are explained here.
- Via PhEDEx APIs
- Get all blocks at the site
- Get all files and its replicas
- Via TMDB
- We are going to use PHEDEX::Core::SQL::getSiteReplicasByName
- https://github.com/dmwm/PHEDEX/blob/master/perl_lib/PHEDEX/Core/SQL.pm#L572-L589
- If you look at the query, you will see that there is a join on t_dps_block_replica or t_xfer_replica tables. Files that are active for transfer are in t_xfer_replica table, while files in blocks that are inactive (no transfers) are not in the t_xfer_replica table. Only the blocks are in t_dps_block_replica table. so you need to join both and combine. For blocks not in transfer, phEDEx doesn't keep track of the location of each individual file, just the location of the blocks.
- Run the script getSiteReplicasByName
List Missing Files Without Replica
The script below helps in the identification of files that are missing and doesn't have replicas in storage elements.
Currently the script receives and input file that must be located in the same folder where the script is located, and must be named 'StuckDatasets.txt' and correspond to the following format:
Input File Format
# -- 2015-10-08 06:30
#
#- DDM Partition:
AnalysisOps -
#
#------------------------------------
# Rank
TrueSize DiskSize nsites
DatasetName # [days] [GB] [GB]
x x x x/<DATASET>
x x x x /<DATASET>
Location
/afs/cern.ch/user/j/jpulidom/public/missingFiles.py
Paramenters
- -s Site/Node
- -d Data Tier
Example
Manually Deactivate a Block
Block activation in
PhEDEx refers to a combination of multiple operations on TMDB. When a block is activated,
- One entry per logical file is made in table t_xfer_file
- One entry per file replica is made in table t_xfer_replica
- Column is_active is set to 'y' for replicas of the block in t_dps_block_replica
Three operations are performed in succession in the
BlockActivate central agent. There is also a
BlockDeactivate agent that finds blocks that can be deactivated (active for more than 3 days and are not in the deletion queue or activation queue) and performs the reverse of the above operations on them.
For some unknown reason, we sometimes find stray entries in t_xfer_file for blocks where is_active is 'n' for all replicas.
BlockDeactivate agent does not detect such stray entries, but
BlockActivate agent fails in its sanity check where the following equation must hold:
(number of entries made in t_xfer_file) * (number of block replicas) = (number of entries made in t_xfer_replica)
When this happens, the block will neither be activated nor deactivated.
It is easy to spot the blocks with this problem by looking at the warning messages in the
BlockActivate agent log (/data/ProdNodes/Prod_Mgmt/logs/mgmt-blockactiv). There are however other symptoms that point to the block activation inconsistency, such as
- blockreplica and filereplica APIs disagree: While blockreplica says the replica is incomplete, filereplica shows all files as present at the site.
- Cannot invalidate files: Either the block activation step never completes, or FileDeleteTMDB says the file does not exist, even though filereplicas call says it does.
- Block deletions stays in pending: Blocks stay in the deletions table (https://cmsweb.cern.ch/phedex/prod/Activity::Deletions
) forever.
To free the blocks, we need to manually deactivate them. This is an operation that writes into TMDB directly and therefore must be executed extremely carefully.
Step 1
Identify the blocks to deactivate.
Log in to the central agent machine and copy the names of blocks with inconsistencies from the latest cycle of the
BlockActivation agent
ssh vocms0214.cern.ch
sed -n '/2018-08-07 14:58:40: BlockActivate\[17445\]: Creating/,/debug/p' /data/ProdNodes/Prod_Mgmt/logs/mgmt-blockactiv > blockactiv.log
# Need to replace the timestamp
# Need to edit blockactiv.log to be a simple list of block names (one per line)
Step 2
Stop the blockactiv and blockdeact agents
ssh phedex@vocms0214.cern.ch
cd /data/ProdNodes
PHEDEX/Utilities/Master -config SITECONF/CH_CERN/PhEDEx/Config.Mgmt.Prod stop mgmt-blockactiv mgmt-blockdeact
Step 3
Deactivate the blocks
sqlplus $(OracleConnectId -db ~/TransferTeam/phedex/DBParam:Prod/OPSIIYAMA) @deactivate_block.sql block_name
where deactivate_block.sql is
set role phedex_ops-your-account_prod identified by -password-written-in-dbparam-
delete from t_xfer_replica where fileid in (select id from t_xfer_file where inblock = (select id from t_dps_block where name = ':1'));
delete from t_xfer_file where inblock = (select id from t_dps_block where name = ':1');
update t_dps_block_replica set is_active = 'n', time_update = ((sysdate - to_date('01-JAN-1970','DD-MON-YYYY')) * (86400)) where block = (select id from t_dps_block where name = ':1');
quit
/
Step 4
Start the blockactiv and blockdeact agents
ssh phedex@vocms0214.cern.ch
cd /data/ProdNodes
PHEDEX/Utilities/Master -config SITECONF/CH_CERN/PhEDEx/Config.Mgmt.Prod start mgmt-blockactiv mgmt-blockdeact
Check the blockactiv logs to make sure everything worked.
--
JuanPulidoMojica - 2016-05-30