DBS

API: https://cmsweb.cern.ch/dbs/prod/global/DBSReader/


check dataset status:
https://cmsweb.cern.ch/dbs/prod/global/DBSReader/datasets?detail=1&dataset_access_type=*&dataset=/Upsilon2S_1S/Summer08_IDEAL_V2_Upsilon2S_1S_v1/RECO
more info about dataset status: TODO: add status twiki

check file status:
https://cmsweb.cern.ch/dbs/prod/global/DBSReader/files?detail=1&logical_file_name=/store/data/Run2012B/DoubleMuParked/AOD/HZZ-22Jan2013-v1/20000/12DA0E7E-2CDA-E211-947D-00259073E374.root

TMDB

API: https://cmsweb.cern.ch/phedex/datasvc/doc

[[https://cmsweb.cern.ch/phedex/datasvc/doc/data][data]]: show data which is registered (injected) to phedex
filereplicas, blockreplicas, missingfiles

script to parse PhEDEx data service:
https://github.com/CMSCompOps/TransferTeam/blob/master/commons/datasvc.py

* get name and id of all RAW datasets at CNAF_Disk 
datasvc.py --service blockreplicas --options "node=T1_IT_CNAF_Disk&dataset=/*/*/RAW" --path /phedex/block:name,id

* get list of missing files for each block in the block_list.txt file
awk '{system("./datasvc.py --service missingfiles --options \"block="$1"\" --path /phedex/block/replica/node")}' block_list.txt

check replica of files, blocks, and datasets:
https://github.com/CMSCompOps/TransferTeam/blob/master/commons/checkReplica.py

* get custodial location for the each dataset in the file
awk '{system("python checkReplica.py --option custodial:y --dataset "$1)}' dataset_list.txt

* get all available replicas for the given LFN
python checkReplica.py --lfn /store/data/Run2012A/LP_MinBias2/RECO/PromptReco-v1/000/193/092/80109BFF-8895-E111-B417-5404A63886D6.root

Q: How the PhEDEx uses TFC (storage.xml)

PhEDEx FileExport agent checks every minute if storage.xml is
updated and in case of change it parses storage.xml and uploads the new
rules into the t_xfer_catalogue table in TMDB.
The storage.xml or the path to storage.xml are not directly stored in
TMDB - just the rules converted into DB rows with columns NODE,
PROTOCOL, PATH_MATCH, RESULT etc...

- Activate block:
insert into t_dps_block_activate (block, time_request, time_until) select id, now, now+86400 from t_dps_block where name=':blockname';
awk -v now=`date +%s` -v q="'" '{print "insert into t_dps_block_activate (block, time_request, time_until) select id, "now", "now+86400" from t_dps_block where name="q $1 q";"}' blocklist > blockactivate.sql

- Check if block is activate for manually activated blocks (result should be 0)
select * from t_dps_block_activate where block not in (select inblock from t_xfer_file);

- Open a block
UPDATE t_dps_block SET is_open='y' WHERE id=:blockid;
awk -v q="'" '{print "UPDATE t_dps_block SET is_open="q"y"q" WHERE id=" $1 ";"}' blockidlist > blockopen.sql

then just set "is-open=n" in injection xml to close it again after injecting files

Q: In order to transfer files from siteX to siteY, does site Y have to have DN of siteX's admin in their gridmap file?
A: because transfers to siteY are running with the DN of the admin of siteY
and in fact siteY is not even contacting the storage of siteX directly
since the transfers are executed by FTS3(or FTS2)
so siteY admin submits transfer to FTS
and FTS uses the delegated proxy of the siteYadmin to talk to the storage of siteY and siteX for the transfer
so what is needed is:
1) the DN of the destination site admin must be recognized by FTS, source storage and destination storage
2) the DN of the destination site admin must have write permission on the destination storage
3) the host certificates of the source and destination storages must be valid for FTS

gridmap files in each site contain all DNs of anyone with CMS VO in the grid
because users need to read files too!
but
this doesn't mean that the DN is always explicitly listed in the gridmap file
usually for read access there is a simple group mapping
something like 
vo:cms --> cmsXXX 
to a pool account
it depends on the storage element 

Q: when I add a dataset which already exist in another request for the same site to my replica/deletion request, what will be the phedex's behavior?
A: In general, if you approve a new transfer request when the data is already at destination, phedex will simply update the subscription(except for custodiality)

Q: How DN is used for transfers from SiteX to SiteY
A:  siteY is not even contacting the storage of siteX directly since the transfers are executed by FTS3(or FTS2)
so siteY admin submits transfer to FTS and FTS  uses the delegated proxy of the siteYadmin to talk to the storage of siteY and siteX for the transfer so what is needed is:
1) the DN of the destination site admin must be recognized by FTS, source storage and destination storage
2) the DN of the destination site admin must have write permission on the destination storage
3) the host certificates of the source and destination storages must be valid for FTS

Q: when wf agent injects blocks, replica subscription attribute is set to 'no' 
"inject" API doesn't subscribe replicas by default
you need to call "subscribe" API (either directly in datasvc, or through website transfer request) to subscribe replicas
just one minor detail
you cannot set the "subscribed" state of a replica
you can create a subscriptions and then, the central agents will change the state of the replica to "subscribed"
so anyway, the full answer is:
When WMAgent injects a block, the block replica is not subscribed. You need to call the "subscribe" API to subscribe the replica.
then


Q: produced and injected sites can be different?
answer: no, they should always be the same
if you produce data on one site and then inject on another, you will have a StorageConsistency problem
so WMagent should always inject to the production location
(but in case of mistakes/bugs it can inject somewhere else, and then we need to fix it in PhEDEx - see for example the problem with Disk sites)
 me:  (ok, injection just update tmdb, so it should be injected to where it produced)


Q: Can we say replica with subscription:n is the site which the block is produced&injected?(or produced and injected sites can be different)
There are three cases of replica with "subscription:n":
1) The source site where the block was produced&injected
2) A replica which is waiting for deletion (because of a deletion request or a move request)
3) Buffer replicas (you cannot have a subscription to Buffer, but you should have a subscription to MSS)
If the replica with "subscription:n" is not in the previous three cases, it is the effect of a bug :)
You can find out about the source site (case number 1) using the "data" API

https://cmsweb.cern.ch/phedex/datasvc/doc/data
file:node attribute in the Data API is the original source node, not the replica node
So for T1 Buffer/MSS the situation is this:
1) If the block replica was produced&injected at T1, it should have an unsubscribed replica on Buffer
2) If the block is subscribed to MSS, it should have an unsubscribed replica on Buffer and a subscribed replica on MSS
3) if the block is in deletion, it should have an unsubscribed replica on Buffer and an unsubscribed replica on MSS

Can one block be created in different sites
in theory yes, PhEDEx allows it
in practice no, it is never done
I don't think WMAgent will do it
(but maybe there are a few blocks which had problems in production with different production sites for different files)
phedex injection is done at file level




Q: Who is creating subscription in PhEDEx after injection completed(I mean there is a replica location with subscription:n, and we should also have replica with subscription:y, who is creating this second one, agent or ..)
It depends :)
In most cases, WMAgent should create the subscription to the custodial location automatically with the subscribe API
(so to MSS)
but it is configurable, so it is not automatic for all datasets
And additional subscriptions, of course, can be created by people with the PhEDEx webpage
or with the datasvc
so in conclusion, there is no 'generic' answer, but the most common case is:
1) custodial subscription is created automatically by WMAgent
2) additional non-custodial subscriptions are created by people (CompOps, Site Admins, users)



Q: why don't we remove all replicas with subscription:n if they have another subscription:y replica?

1) If the subscription to MSS is a Move subscription, the replicas with 'subscription:n' on T2s/T3s will actually be removed automatically by PhEDEx
2) But if the subscription is not a Move, there is no automatic deletion of unsubscribed replicas.
So CompOps needs to submit deletion request manually.

So in general, the policy is:
1) For Move subscriptions, remove the source replica automatically
2) otherwise send a manual deletion
(of course, if it is not _Buffer)

currently, PhEDEx allows Move only in this case:
1) the destination is MSS
(T0 MSS or T1 MSS)
and
2) the source is T2 or T3
and
3) the source replica is unsubscribed

future Move request will be probably
1) Allowed destinations: T0/T1 Disk/MSS
2) Allowed sources: T1 Disk, T2, T3
just need to understand one important point
what do we do when we request two Moves
one to T1 Disk
and one from T1 Disk
to MSS
In this case, we need to be very careful to avoid deleting the data before move to MSS...
(probably we simply disallow this case)
anyway
if your question is
"why do we have so many unsubscribed source replicas?"
the answer is:
"because we forgot to move or delete them from the source"
solution is:
1) if you want to keep the replica on this site, subscribe it
2) if you don't want to keep it on this site, move it to MSS

* If the prod link is not enabled after commissioning it
- Check if it reached the min rate (T1~ 20MB/s, T2~ 5MB/s)
https://cmsweb.cern.ch/phedex/debug/Activity::RatePlots?graph=quantity_rates&entity=dest&src_filter=Brist&dest_filter=T0&no_mss=true&period=l90d&upto=&.submit=Update

- to monitor a rate in the past more detailed, use "upto" field, this will show hourly rate, not daily.
https://cmsweb.cern.ch/phedex/debug/Activity::RatePlots?graph=quantity_rates&entity=dest&src_filter=Brist&dest_filter=T0&no_mss=true&period=l132h&upto=20140706

-Check loadtest file size, they can be smaller than 2.7GB, in that case inject more files
https://cmsweb.cern.ch/phedex/debug/Data::Replicas?filter=Source.*Bristol;rows=all;dbs=1;node=187;node=192;node=4;node=281;node=1461;rcolumn=Name;rcolumn=Files;rcolumn=Bytes;dexp=5881;nvalue=Node%20files#d5881

Q: is it possible to use lcg-cp command for phedex transfers instead of FTS servers for some T3 sites?

1) Does it work? Yes
2) Do we want it? No ;)
actually
let me specify further
2a) if the T3 has a storage element with SRM or gridftp, they should use FTS3, it should work without any special config
2b) if the T3 has a simple local disk storage, they can use lcg-cp
by "local disk storage" I mean "their PhEDEx agents are transferring data directly to a filesystem on their PhEDEx machine which is not published to the grid"
anyway in theory also in case 2a a T3 could use simple lcg-cp
but FTS3 is designed just for this - to allow downloads to any storage element without worrying about config
so it's also convenient for them to use it...
or it should be at least
(and keep in mind that in the future FTS3 might even cover the use case 'download to local disk'...)

Q: PhEDEx&FTS Transfer

nobody is contacting the Download agent
the download agent is contacting PhEDEx central DB and FTS3
not the other way round
there is no incoming connectivity to download agent
so the workflow of the Download agent is:
1) agent contacts PhEDEx DB checking for work, queries current transfer tasks
2) agent submits job to FTS3
3) agent polls FTS3 to query for job status
4) when transfer is finished, agent updates central PhEDEx DB
if you don't see anything in the FileDownload agent logs, it could be two things:
1) agent doesn't have anything to do
2) agent is not working
3) agent log verbosity is turned off
I would begin checking number 3
ask site to put in phedex config
export PHEDEX_VERBOSE=1
export PHEDEX_DEBUG=1

10.03.2014

T1_US_FNAL_Disk links enabled

Fixing 1682 blocks in 79 datasets associated to the wrong DBS in TMDB (sr #142081)

03.03.2014

CompOps Meeting
  • EOSCMS update to cure crashes (currently restarted regularly)
  • CVMFS switch

T1_FR_CCIN2P3_Disk debug transfers

<lfn-to-pfn protocol="srmv2"
path-match=".*/LoadTest07_LoadTest07_T1CCIN2P3_Disk_.*_.*_(.*)"
result="srm://ccsrm.in2p3.fr:8443/srm/managerv2?SFN=/pnfs/in2p3.fr/data/cms/disk/data/store/PhEDEx_Debug/LoadTest07Source/LoadTest07_T1CCIN2P3_Disk_$1"/>

<lfn-to-pfn protocol="srmv2"
path-match="/+.*/LoadTest07_Debug_(.*)/IN2P3/(.*)"
result="srm://ccsrm.in2p3.fr:8443/srm/managerv2?SFN=/pnfs/in2p3.fr/data/cms/disk/data/store/PhEDEx_LoadTest07/LoadTest07_Debug_$1/IN2P3/$2"/>

A high priority data transfer to T2_CERN_CH is stuck

  • https://savannah.cern.ch/support/index.php?142306
  • T1_JINR_Disk better choice, but FNAL is selected as source site. FNAL is in downtime, and has high load staging.
  • T2 CERN download-t1 agent's configuration has been changed
    • -mapfile ${PHEDEX_CONFIG}/tempfts.map
    • SRM.Endpoint="srm://cmssrm.fnal.gov:8443/srm/managerv2" FTS.Endpoint="https://fts22-t1-import.cern.ch:8443/glite-data-transfer-fts/services/FileTransfer"

      SRM.Endpoint="DEFAULT" FTS.Endpoint="https://lcgfts3.gridpp.rl.ac.uk:8443"

    • mapfile info: https://twiki.cern.ch/twiki/bin/viewauth/CMS/DDTFTSSetup
  • T2 CERN download-t2 agent's configuration has been changed
    • -service ${RAL_FTS_SERVER}
    • Currently only FNAL has problem with FTS3
notes for PhEDEx source site selection algorithm

1) https://cmsweb.cern.ch/phedex/datasvc/doc/links

2) current queue

3) recent throughput

Check queue plots (less queue better):

Check rate plots (more rate better)

general idea is weight ~ total latency to completion ~ time to transfer current file + time to empty queue before current file so more or less it is queue size/throughput

keep in mind one thing: with the current parameters(if link weights are equal), the FileRouter will nearly always select a link which transferred ANYTHING in the last two days, instead of a link with the best weight but which happened to have nothing to transfer recently. See here for the full explanation: https://github.com/dmwm/PHEDEX/issues/689

Download Agent Notes for Stuck Transfers

* suspend, wait ~30 min, unsuspend will re-trigger to PhEDEx source site selection.

  • Even if new source site is selected by PhEDEx, if currently ongoing transfers exist, they won't be transferred from this new source site

* disconnected from database, (re)connecting to database, Creating new private DBH

  • it just means that agent has nothing to do so it is disconnecting from db and reconnecting later (number of connections on DB is limited, so agents should disconnect when they don't need)

Checking deletion requests for T0_CH_CERN_MSS

Changing custodiality in TMDB
Procedure

https://savannah.cern.ch/file/custodiality_change_400.txt?file_id=19559

* stop central BlockAllocator agent
: cd /data/ProdNodes 
: source PHEDEX/etc/profile.d/env.sh 
: PHEDEX/Utilities/Master -config SITECONF/CH_CERN/PhEDEx/Config.Mgmt.Prod stop mgmt-blockalloc
All agents sucessfully terminated

sqlplus $(./OracleConnectId -db ~/.globus/DBParam:Dev/Meric)
set role phedex_opsmeric_dev identified by XXXXX;

* select datasets in the request
select ds.name from t_dps_dataset ds join t_dps_subs_dataset sd on sd.dataset=ds.id join t_dps_subs_param pm on pm.id=sd.param where pm.request=:request;

* get number of the datasets
select ds.name,nd.name,pm.is_custodial from t_dps_dataset ds join t_dps_subs_dataset sd on sd.dataset=ds.id join t_adm_node nd on nd.id=sd.destination join t_dps_subs_param pm on pm.id=sd.param where pm.request=:request;

* get number of blocks in the datasets (X)
select bk.name,nd.name,pm.is_custodial from t_dps_block bk join t_dps_subs_block sb on sb.block=bk.id join t_adm_node nd on nd.id=sb.destination join t_dps_subs_param pm on pm.id=sb.param where pm.request=:request;

* get number of files in the blocks (Y)
select fil.logical_name,nd.name,pm.is_custodial from t_dps_file fil join t_dps_subs_block sb on sb.block=fil.inblock join t_adm_node nd on nd.id=sb.destination join t_dps_subs_param pm on pm.id=sb.param where pm.request=:request;

* change custodiality of this request
update t_dps_subs_param set is_custodial='y' where request=:request;

* get number of blocks with inconsistent custodiality(different for subscription and blocks) (X)
select count(*) from t_dps_block_dest bd join t_dps_subs_block sb on sb.block=bd.block and sb.destination=bd.destination join t_dps_subs_param sp on sp.id=sb.param where sp.is_custodial!=bd.is_custodial;

* get custodiality of all blocks
select count(*),is_custodial from t_dps_block_dest group by is_custodial;

  COUNT(*) I
---------- -
    A y
    B n

* UPDATE
merge into t_dps_block_dest bd using (select sub.block,sub.destination,sp.is_custodial from t_dps_subs_block sub join t_dps_subs_param sp on sp.id=sub.param) subs on (subs.block=bd.block and subs.destination=bd.destination) when matched then update set bd.is_custodial=subs.is_custodial where bd.is_custodial!=subs.is_custodial

* get custodiality of all blocks again
select count(*),is_custodial from t_dps_block_dest group by is_custodial;

  COUNT(*) I
---------- -
    A+X y
    B-X n

* get number of blocks with inconsistent custodiality again (should be 0)
select count(*) from t_dps_block_dest bd join t_dps_subs_block sb on sb.block=bd.block and sb.destination=bd.destination join t_dps_subs_param sp on sp.id=sb.param where sp.is_custodial!=bd.is_custodial

* commit, and quit
commit;
quit;

* verify on webpage

* Restart BlockAllocator
: PHEDEX/Utilities/Master -config SITECONF/CH_CERN/PhEDEx/Config.Mgmt.Prod start mgmt-blockalloc

Notes
   * in general, if you approve a new transfer request when the data is already at destination, phedex will simply update the subscription, except for custodiality
it will not transfer data again
(but - if you use a Move request, phedex will delete data from T2s)

   * Why custodiality could not be changed via phedex webpage
"custodiality" flag allows to specify different space tokens during transfer
for example, a T1 could select "space_token=TAPE" for custodial=y and "space_token=DISK" for custodial=n
(in the end, only FNAL used this functionality...)
(though not with space tokens...)
BUT
this works only during transfers
after you write the file in a space token, you cannot change the space token anymore
(at most, you can replicate in another disk space token, but not change disk space token to tape or viceversa)
so it was forbidden for this reason
because if people changed custodiality in phedex subscriptions, there was no way to propagate this change to physical files
so you could end up with files 'custodial=y' which were still on disk space token
dangerous for data loss
better to prevent it, and force manual action to change custodiality
to discourage people
and remind them that sites need to verify manually tape migration before requesting custodiality change

   * Namespace and SpaceToken
spacetoken is independent from namespace 
same namespace can have different spacetokens
space token is used to select retention policy
and to manage space reservations
(retention policy means 'file is on tape or on disk')
space reservation is similar to quotas

CMS_T0 is the name of the space token assigned to the t0cms service class
So it is The same
When using srm spiace token put CMS_T0
When using rfcp or xrdcp put t0cms

Setup Proxy Renewal script on vocms machines
1)
> voms-proxy-init -voms cms
> cp /tmp/x509up_... ./proxy.cert

2) copy hostcert.pem hostkey.pem (/etc/grid-security)
> ls -l
-rw-r--r--. 1 mtaze root 3219 Oct  1 10:32 hostcert.pem
-r--------. 1 mtaze root 3272 Oct  1 10:32 hostkey.pem
-rw-------. 1 mtaze zh   8633 Oct  1 10:20 proxy.cert
-rwxr-xr-x. 1 mtaze zh   2193 Oct  1 10:21 VomsProxyRenew

3) register machine for proxy retrieval from myproxy.cern.ch 
e-mail to px . support [at] cern . ch with the DN of host cert
voms-proxy-info -file /etc/grid-security/hostcert.pem -subject
(or openssl x509 -subject -noout -in /data/certs/hostcert.pem)
Note: if hostcert doesnt exist or has problems, contact Ivan

-- MericTaze - 03 Mar 2014

Edit | Attach | Watch | Print version | History: r22 < r21 < r20 < r19 < r18 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r22 - 2015-05-20 - MericTaze
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback