Difference: ProductionProcedures (1 vs. 90)

Revision 902017-08-01 - FedericoStagni

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 147 to 147
 

Getting production to 100%

Changed:
<
<
Many times, it seems that is very easy for a production to reach 95%, but what is difficult is to reach 100%. A list of cases can be found in this link. (Mostly for Grid Experts and Production Manager, but Grid shifters can still grasp useful information)
>
>
Many times, it seems that is very easy for a production to reach 95%, but what is difficult is to reach 100%. A list of cases can be found in this link. (Mostly for Grid Experts and Production Manager, but Grid shifters can still grasp useful information)
 

Closing a production

Revision 892015-02-11 - AndrewMcNab

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 34 to 36
  Status = InActive } }
Changed:
<
<
} to desactivate the LFC checking.

>
>
}
 
Changed:
<
<

Building,Deploying & Installing LHCbDirac (and Core Software)

>
>
to desactivate the LFC checking.

Building,Deploying & Installing LHCbDirac (and Core Software)

 

Building the DIRAC binary distributions

Line: 62 to 66
 If you see an error message like "Warning : Cannot add voms attribute /lhcb/Role=user to proxy Accessing data in the grid storage from the user interface will not be possible. The grid jobs will not be affected." then try doing chmod 644 $DIRAC_VOMSES/lhcb-voms.cern.ch. You will also need to set $X509_CERT_DIR and X509_VOMS_DIR. Refer to lxplus for default settings, or take a look at the Dirac tool dirac-admin-get-CAs available in Diracs later than v4r19. However you do it, if you make a local copy of these two directories, you will need to keep that copy up-to-date. -- WillReece - 2009-10-06

"user guide" on how to take advantage of the CMT setup of LHCbDirac

Changed:
<
<
Summary of commands to be used for taking advantage of the LHCbDirac installation using CMT.

>
>
Summary of commands to be used for taking advantage of the LHCbDirac installation using CMT.

 

Deployment of LHCb Software on the Grid

Line: 72 to 77
 

Primary job states in DIRAC

Changed:
<
<
dirac-primary-states
>
>
dirac-primary-states
 

Job Management Operations

Line: 80 to 85
  The procedure for search string in output for selected jobs (suitable for Grid Shifters and Grid Experts).
Changed:
<
<
A low level investigation on LSF to check why LHCb jobs do not start at CERN.
>
>
A low level investigation on LSF to check why LHCb jobs do not start at CERN.
 

Running Jobs Locally With DIRAC

Added:
>
>
 There are three submission modes associated with DIRAC: default WMS submission, local execution of a workflow and finally execution of a workflow in the full agent machinery. This procedure explains the steps for running jobs locally with DIRAC.

Production management

Line: 148 to 156
 

Pilots monitor

If jobs are not being submitted for a long time, you can check first of all if pilots are submitted, and then if they are actually matched. First, you can look in the portal in the "Pilot monitor" page, to see if there pilots running or submitted. Then, with the command

Deleted:
<
<
 
dirac-admin-get-job-pilots jobID
Changed:
<
<
you check if pilots are submitted, for you job queue. This will print the logs for the pilots in the queue. If you don't see a line with
'Status': 'Submitted'
then it might be that there is a problem.
>
>
you check if pilots are submitted, for you job queue. This will print the logs for the pilots in the queue. If you don't see a line with

'Status': 'Submitted'

then it might be that there is a problem.

  Also, through the pilot monitor page you can see the pilot output for the "done" pilots, that can contain useful information of why the pilots might not be matched.
Line: 178 to 189
 Query OK, 1 row affected (0.02 sec)

mysql> select * from BkQueries where BkQueryID = 7590 ;

Changed:
<
<
Empty set (0.00 sec)
>
>
Empty set (0.00 sec)
 
      • Delete Additional Parameter from Production
mysql> select * from AdditionalParameters where TransformationID = 16309 and ParameterName = "BkQueryID" ;
Line: 194 to 206
 Query OK, 1 row affected (0.01 sec)

mysql> select * from AdditionalParameters where TransformationID = 16309 and ParameterName = "BkQueryID" ;

Changed:
<
<
Empty set (0.00 sec)
>
>
Empty set (0.00 sec)
 
  • Add the needed files to the production via the script below, provide production ID, run number and list of files (Certificate role needs to be lhcb_prod)
transID = 16310
Line: 214 to 225
  raise Exception, res['Message'] res = tsClient.addTransformationRunFiles( transID, run, sortList( listOfFiles ) ) if not res['OK']:
Changed:
<
<
raise Exception, res['Message']
>
>
raise Exception, res['Message']
 

Dealing with gaudi applications commands

Line: 224 to 234
 For what regards install_project.py:
Changed:
<
<
Operations->LHCb-Production->GaudiExecution->installProjectOptions
>
>
Operations->LHCb-Production->GaudiExecution->installProjectOptions
  can be used for setting the flags of install project, when really installing the project. I remind you that this action is triggered only if the project is not (yet) installed. If such option is not set, the default is to run install_project.py with "-b" flag.

Instead, the option

Changed:
<
<
Operations->LHCb-Production->GaudiExecution->checkProjectOptions
>
>
Operations->LHCb-Production->GaudiExecution->checkProjectOptions
  can be used for setting possible flags for checking if the project is already installed. This is done running with the default flags "-b --check". In case you want to override such behavior, by setting this option in the CS, do not forget to always add at least "--check".

The option

Changed:
<
<
Operations->LHCb-Production->GaudiExecution->removalProjectOptions
>
>
Operations->LHCb-Production->GaudiExecution->removalProjectOptions
  is instead for removing application. It's by default '-r'.

It is also possible to modify the install_project location (the script is downloaded from the web server), setting:

Changed:
<
<
Operations->LHCb-Production->GaudiExecution->install_project_location
>
>
Operations->LHCb-Production->GaudiExecution->install_project_location
  which, by default, points at http://lhcbproject.web.cern.ch/lhcbproject/dist/
Line: 260 to 267
 

Transfer PIT - CASTOR

Changed:
<
<
The Data transfer betwen the PIT and CASTOR for the RAW is handle on the machine lbdirac.cern.ch by the user lhcbprod. The dirac installation is done under /sw/dirac/data-taking. The transfer itself is managed by the Agent /sw/dirac/data-taking/startup/DataManagement_transferAgent. This python process should run MaxProcess processes and each process can start a new process for each transfer (MaxProcess can be found in /sw/dirac/data-taking/etc/DataManagement_TransferAgent.cfg). If you don't see too many processes, you can look at the log /sw/dirac/data-taking/DataManagement_TransferAgent/log/current. A typical behaviour can be seen here.
>
>
The Data transfer betwen the PIT and CASTOR for the RAW is handle on the machine lbdirac.cern.ch by the user lhcbprod. The dirac installation is done under /sw/dirac/data-taking. The transfer itself is managed by the Agent /sw/dirac/data-taking/startup/DataManagement_transferAgent. This python process should run MaxProcess processes and each process can start a new process for each transfer (MaxProcess can be found in /sw/dirac/data-taking/etc/DataManagement_TransferAgent.cfg). If you don't see too many processes, you can look at the log /sw/dirac/data-taking/DataManagement_TransferAgent/log/current. A typical behaviour can be seen here.
  You can also look at this web page to spot a potentiel problem if you see that the rate decrease. In principle in normal condition of data taking period, it means that one or several processes are stuck. you can find them with strace -f -pid _PID_. As soon as you find it you can kill it kill -9 _PID_. If it has no effect, you can stop the agent in a proper way touch /sw/dirac/data-taking/control/DataManagement/TransferAgent/stop_agent. If it does not produce any effect, you can finnalyy try runsvctrl t /sw/dirac/data-taking/startup/DataManagement_TransferAgent. As last resort, you will have to kill it by hand kill -9 _PID_
Line: 282 to 291
 

How to recover replicas that are lost even if SRM reports they are existing

Added:
>
>
 This can happen. The file is physically lost but SRM (lcg-ls) reports the file is there, see this GGUS. This replica is totally lost from tape and disk:
Changed:
<
<
 > lcg-ls -l srm://gridka-dCache.fzk.de/pnfs/gridka.de/lhcb/data/2011/RAW/FULL/LHCb/COLLISION11/98298/098298_0000000077.raw

>
>
 > lcg-ls -l srm://gridka-dCache.fzk.de/pnfs/gridka.de/lhcb/data/2011/RAW/FULL/LHCb/COLLISION11/98298/098298_0000000077.raw

 -rw-r--r-- 1 2 2 3145768992 NEARLINE /pnfs/gridka.de/lhcb/data/2011/RAW/FULL/LHCb/COLLISION11/98298/098298_0000000077.raw * Checksum: 51e2fc3d (adler32) * Space tokens: 39930230
Line: 289 to 298
  * Checksum: 51e2fc3d (adler32) * Space tokens: 39930230
Added:
>
>
 You have then to remove the lost replicas and then copy them over again from other another site:
Changed:
<
<
$ dirac-dms-remove-lfn-replica /lhcb/data/2011/RAW/FULL/LHCb/COLLISION11/98298/098298_0000000077.raw GRIDKA-RAW

>
>
$ dirac-dms-remove-lfn-replica /lhcb/data/2011/RAW/FULL/LHCb/COLLISION11/98298/098298_0000000077.raw GRIDKA-RAW

 $ dirac-dms-replicate-lfn /lhcb/data/2011/RAW/FULL/LHCb/COLLISION11/98298/098298_0000000077.raw GRIDKA-RAW
Added:
>
>
If there is only one replica and the corresponding file at the site has been lost completely, then you need to use dirac-dms-remove-files to remove the entry in the replica catalogue. You need to double check this is really the case, as this command will remove all replicas of the given file!
 

Changing the Default Protocols List for a given Site (Tier-1)

The order of the list of protocols supplied to SRM can be changed e.g. testing root at NIKHEF means root is prepended to the list. This guide explains how to change the protocols list for a given site. This operation is restricted to those with the diracAdmin role. (Grid Experts)

Line: 305 to 316
  Example to ban all the SEs at RAL in writing.
Changed:
<
<
dirac-admin-ban-se -c RAL.uk
>
>
dirac-admin-ban-se -c RAL.uk
  Example to ban one SE at RAL
Changed:
<
<
dirac-admin-ban-se RAL-DST
>
>
dirac-admin-ban-se RAL-DST
  Example to ban one SE in writing at CNAF
Changed:
<
<
dirac-admin-ban-se -w CNAF-USER
>
>
dirac-admin-ban-se -w CNAF-USER
  Also keep in mind that NIKHEF and SARA have different SE. LCG.NIKHEF.nl SE are:
Line: 350 to 358
 The following is a simple description of the replacement for genCatalog in DIRAC3. This uses the standard LHCb input data resolution policy to obtain access URLs.

Determining Versions of LCG Utils, GFAL and the Default SRM2 Protocols List in DIRAC

Changed:
<
<
There is a simple non-intrusive script to obtain the external package versions in DIRAC (Grid Shifters and Grid Experts)
>
>
There is a simple non-intrusive script to obtain the external package versions in DIRAC (Grid Shifters and Grid Experts)
 

Checking the throughput from the pit to Castor (during data taking)

Added:
>
>
 The link band-with is 10GBit. Expected rate (beginning of 2012) is about 280 MB/s, some more details here.

DIRAC Configuration Service

Line: 367 to 376
 

Getting information from BDII

Changed:
<
<

DIRAC (and not) Services monitoring

>
>

DIRAC (and not) Services monitoring

 

Getting the List of Ports For DIRAC Central Services (and how to ping them)

The ports for DIRAC central services for a given setup can easily be checked (useful for hardware requests). The procedure for port discovery as well as how to ping a service is available (Grid Shifters and Grid Experts).

SLS sensors

GridMap Overview

Changed:
<
<

CERN Core Services monitoring


>
>

CERN Core Services monitoring

 
Changed:
<
<

Sites Management

>
>

Sites Management

 

Banning and allowing sites

  • When to ban a site.
Changed:
<
<

Sites rank: Unspecified Grid Resources Error....

>
>

Sites rank: Unspecified Grid Resources Error....

 
  • In this Rank procedure some guidelines to debug understand why a site is not running payload.

Site Availability Monitoring (SAM) tests

Line: 400 to 409
  If many jobs start to fail at a site they should be immediately investigated.

SQLlite hint

Changed:
<
<
The following SQLlite is meant to provide a template description to feed a GGUS requests of investigation about one of the most recurrent problems.

>
>
The following SQLlite is meant to provide a template description to feed a GGUS requests of investigation about one of the most recurrent problems.

 

Site Problems Follow up

Deleted:
<
<
Grid jobs (=pilot jobs in DIRAC terminology) are often failing. In this link an encyclopedic summary of all known issues concerning (mainly) grid jobs issues is reported
 
Changed:
<
<

Getting in touch with sites: Tickets and Mails

How to deal with GGUS follows

GGUS

>
>
Grid jobs (=pilot jobs in DIRAC terminology) are often failing. In this link an encyclopedic summary of all known issues concerning (mainly) grid jobs issues is reported
 
Changed:
<
<

In this practice-guide we aim to provide few clear rules to the operators/GEOCs/experts to submit GGUS tickets and a quick introduction to the ticketing system. In Grid a problem is not a problem if a GGUS has not been open. With that clear in mind we wanto to present and analyze the best way to submit a GGUS ticket to a site (for a site specific problem). In the early days there was the mail as unique tool to contact sites. It was quick and also efficient. GGUS ticketing system came in the game bringing much more functionality but also a less fast way to get in touch with sites. While indeed it was a good tool to track problems and also to accumulate know-how about problems, the path of the ticket was not always straight to the experts that had to fix the problem on the remote sites. We noticed (years of experience) that a ticket GGUS + direct mail to support mailing list at the site wss the most efficient and quick way to get in touch with the site. The responsiveness increases if a LHCb local T1 contact person was also put in the loop. (please note that contact mailing address for T1s are available at https://twiki.cern.ch/twiki/bin/view/LHCb/ProductionProcedures#Mailing_lists )
The recent releases of GGUS introduced new features that matched both the need for a quick contact with sites in a old mail fashion and the robustness typical of a ticketing system. More in details together with the usual USER ticket GGUS is offering the possibility to submit both TEAM tickets and ALARM tickets.
>
>

Getting in touch with sites: Tickets and Mails

How to deal with GGUS follows

GGUS


In this practice-guide we aim to provide few clear rules to the operators/GEOCs/experts to submit GGUS tickets and a quick introduction to the ticketing system. In Grid a problem is not a problem if a GGUS has not been open. With that clear in mind we wanto to present and analyze the best way to submit a GGUS ticket to a site (for a site specific problem). In the early days there was the mail as unique tool to contact sites. It was quick and also efficient. GGUS ticketing system came in the game bringing much more functionality but also a less fast way to get in touch with sites. While indeed it was a good tool to track problems and also to accumulate know-how about problems, the path of the ticket was not always straight to the experts that had to fix the problem on the remote sites. We noticed (years of experience) that a ticket GGUS + direct mail to support mailing list at the site wss the most efficient and quick way to get in touch with the site. The responsiveness increases if a LHCb local T1 contact person was also put in the loop. (please note that contact mailing address for T1s are available at https://twiki.cern.ch/twiki/bin/view/LHCb/ProductionProcedures#Mailing_lists )
The recent releases of GGUS introduced new features that matched both the need for a quick contact with sites in a old mail fashion and the robustness typical of a ticketing system. More in details together with the usual USER ticket GGUS is offering the possibility to submit both TEAM tickets and ALARM tickets.

 
  • A TEAM ticket is a special ticket that matters production activities and is targetted to problems at sites that have to be followed by a crew as a whole rather than a single person (ex. production team).Any problem at every each site that everyone in the operations team is potentially demanded to follow and intervene, must be spawned via a TEAM ticket. A TEAM ticket either can allow for a direct routing of the problem to the site (in which case the submitter must put from a drop-down menu the GOC name of the site affected) or can go through the usual path TPM/ROC/SITE with the unavoidable lost of time. A TEAM ticket is not a top priority ticket. Submitter has the possibility to select the severity from the web form. The only difference is that everybody in the same TEAM could modify and interact with the ticket that is owned by the TEAM and not the user. GGUS knows about the meber of the TEAM via VOMS. All people entitled to dress the Role=production (now) or Role=team (coming soon) are part of the TEAM and recognize to act on the ticket.
Changed:
<
<
  • An ALARM ticket is another special ticket that is meant really to generate ALARMs on the interested sites. The implementation of the ALARM at site level is different and different the support each site decided to put in place. Mails to special site mailing list (alarming mailing list) internally on the site, might in turn trigger procedures to react quickly to a problem even outside working hour. SMS, phone calls, operators, control rooms, remedy tickets...Everything is left behind the scenes.T1's, as per MoU, are demanded to react in less than 30 minutes, 24X7 to ALARM tickets. What matters here is: VOs are guaranteed to have a answer in at least 30 minutes but the solution is not necessarily guaranteed in such short time! Only a very well restricted number of people inside the VO (read: alarmers) are entitled to submit ALARM tickets. This limitation is clearly a need to avoid that non-experts had the possibility to wake up someone else for a fake problems. Soon GGUS will retrieve authorized alarmers from VOMS (Role=alarm). A list of authorized ALARMERS for LHCb is today available here.
>
>
  • An ALARM ticket is another special ticket that is meant really to generate ALARMs on the interested sites. The implementation of the ALARM at site level is different and different the support each site decided to put in place. Mails to special site mailing list (alarming mailing list) internally on the site, might in turn trigger procedures to react quickly to a problem even outside working hour. SMS, phone calls, operators, control rooms, remedy tickets...Everything is left behind the scenes.T1's, as per MoU, are demanded to react in less than 30 minutes, 24X7 to ALARM tickets. What matters here is: VOs are guaranteed to have a answer in at least 30 minutes but the solution is not necessarily guaranteed in such short time! Only a very well restricted number of people inside the VO (read: alarmers) are entitled to submit ALARM tickets. This limitation is clearly a need to avoid that non-experts had the possibility to wake up someone else for a fake problems. Soon GGUS will retrieve authorized alarmers from VOMS (Role=alarm). A list of authorized ALARMERS for LHCb is today available here.
  Would you please mind that - despite those new tools are extremely useful and important we warmly recommend to not abusing about them. The net effect indeed is a lost of credibility that would relax the ALARM threshold. I propose below some suggestions about typical problems and action to be taken.
Changed:
<
<
  1. If the problem is a show stopper the shifter has to call the GEOC. The Experts has then to investigate whether the problem is really a show stopper and in case submit the ALARM. A show stopper here is mainly a problem that prevents to continue with the activity on the site.In the GGUS portal for ALARM ticket, there is available a list of identified MoU activities that may give origin to an alarm. It's worth to remind however that at CERN a show stopper only matters data. When submitting please put in cc also lhcb-grid@cernNOSPAMPLEASE.ch mailing list and open an entry in the e-logbook
>
>
  1. If the problem is a show stopper the shifter has to call the GEOC. The Experts has then to investigate whether the problem is really a show stopper and in case submit the ALARM. A show stopper here is mainly a problem that prevents to continue with the activity on the site.In the GGUS portal for ALARM ticket, there is available a list of identified MoU activities that may give origin to an alarm. It's worth to remind however that at CERN a show stopper only matters data. When submitting please put in cc also lhcb-grid@cernNOSPAMPLEASE.ch mailing list and open an entry in the e-logbook
 
  1. If the problem affects severely one of the services at T1's and compromises one of the activities on the site, a TEAM ticket with "Top Priority" or "Very Urgent" is recommended. We leave up to the GEOC to decide but also an entry in the e-logbook must be filed.
Changed:
<
<
  1. If the problem interests the production activity at other sites the GEOC or the shifter must open a TEAM ticket with a severity that ranges from "Less Urgent" to "Top Priority" depending on how the problem impacts the (T2) site. If just few jobs have problems and the rest is running happily (let's say less than 10%) it may be just a (few) WN problem (Less Urgent). If the site is acting as a black hole and compromise also the activities somewhere else by attracting and failing jobs that otherwise may reach other sites, the site must be banned and the TEAM ticket deserves a Top Priority level.
  2. Normal users can also get in touch with sites via Standard ticket. Severity is again matter of personal feelings. We discourage however to always think "my problem is always more important than any other else". In WLCG soon 5K users will start doing their activities.
>
>
  1. If the problem interests the production activity at other sites the GEOC or the shifter must open a TEAM ticket with a severity that ranges from "Less Urgent" to "Top Priority" depending on how the problem impacts the (T2) site. If just few jobs have problems and the rest is running happily (let's say less than 10%) it may be just a (few) WN problem (Less Urgent). If the site is acting as a black hole and compromise also the activities somewhere else by attracting and failing jobs that otherwise may reach other sites, the site must be banned and the TEAM ticket deserves a Top Priority level.
  2. Normal users can also get in touch with sites via Standard ticket. Severity is again matter of personal feelings. We discourage however to always think "my problem is always more important than any other else". In WLCG soon 5K users will start doing their activities.
  Ticket escalation:
Deleted:
<
<
 
Changed:
<
<
Since Jan 21st GGUS allows for an escalation of ticket slowly taken by the support units or unresponsive sites. Please read this escalation procedure document.
>
>
Since Jan 21st GGUS allows for an escalation of ticket slowly taken by the support units or unresponsive sites. Please read this escalation procedure document.
 
Changed:
<
<

Site Mail contacts

WLCG "conventional" site-support mailing list are available here.

>
>

Site Mail contacts

WLCG "conventional" site-support mailing list are available here.

 

Daily Shifter Checklist

Line: 488 to 497
 OK BAD MAYBE
Changed:
<
<
Usage: dirac-bookkeeping-setdataquality-run.py
>
>
Usage: dirac-bookkeeping-setdataquality-run.py
 
  • The following commands you can use to set the data quality flag:
Line: 503 to 511
 OK BAD MAYBE
Changed:
<
<
Usage: dirac-bookkeeping-setdataquality-run.py The data quality flag is case sensitive. Set data quality a given run:
>
>
Usage: dirac-bookkeeping-setdataquality-run.py

The data quality flag is case sensitive. Set data quality a given run:

 (DIRAC3-user) zmathe@pclhcb43 /scratch/zmathe/dirac[408]>dirac-bookkeeping-setdataquality-run 20716 'BAD' Quality flag has been updated!
Changed:
<
<
(DIRAC3-user) zmathe@pclhcb43 /scratch/zmathe/dirac[409]>
>
>
(DIRAC3-user) zmathe@pclhcb43 /scratch/zmathe/dirac[409]>
  dirac-bookkeeping-setdataquality-files
Line: 517 to 527
 (DIRAC3-user) zmathe@pclhcb43 /scratch/zmathe/dirac[413]> dirac-bookkeeping-setdataquality-files /lhcb/data/2009/RAW/EXPRESS/FEST/FEST/44026/044026_0000000002.raw 'BAD' ['/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/44026/044026_0000000002.raw'] Quality flag updated!
Changed:
<
<
(DIRAC3-user) zmathe@pclhcb43 /scratch/zmathe/dirac[414]>
>
>
(DIRAC3-user) zmathe@pclhcb43 /scratch/zmathe/dirac[414]>
  Set the quality flag a list of file:
Line: 526 to 535
 (DIRAC3-user) zmathe@pclhcb43 /scratch/zmathe/dirac[416]> dirac-bookkeeping-setdataquality-files lfns.txt 'BAD' ['/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/44026/044026_0000000002.raw', '/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43998/043998_0000000002.raw', '/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43995/043995_0000000002.raw', '/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43994/043994_0000000002.raw', '/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43993/043993_0000000002.raw', '/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43992/043992_0000000002.raw', '/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43989/043989_0000000002.raw', '/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43987/043987_0000000002.raw'] Quality flag updated!
Changed:
<
<
(DIRAC3-user) zmathe@pclhcb43 /scratch/zmathe/dirac[417]> The lfns.txt contains the following:
>
>
(DIRAC3-user) zmathe@pclhcb43 /scratch/zmathe/dirac[417]>

The lfns.txt contains the following:

 /lhcb/data/2009/RAW/EXPRESS/FEST/FEST/44026/044026_0000000002.raw /lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43998/043998_0000000002.raw /lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43995/043995_0000000002.raw
Line: 536 to 548
 /lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43992/043992_0000000002.raw /lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43989/043989_0000000002.raw /lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43987/043987_0000000002.raw
Changed:
<
<
lfns.txt (END)
>
>
lfns.txt (END)
 

Web portal

Web portal stuck: how to restart it

First try to restart the Paster with runit:

Changed:
<
<
runsvctrl t runit/Web/Paster

>
>
runsvctrl t runit/Web/Paster

 
Deleted:
<
<
if this is not enough, then it will be necessary to get all Paster processes ('ps faux | grep -i web_paster') and do a 'kill -9' of them.
 
Added:
>
>
if this is not enough, then it will be necessary to get all Paster processes ('ps faux | grep -i web_paster') and do a 'kill -9' of them.
 

Documents

Revision 882012-05-28 - unknown

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 539 to 539
 lfns.txt (END)
Added:
>
>

Web portal

Web portal stuck: how to restart it

First try to restart the Paster with runit:

runsvctrl t runit/Web/Paster
if this is not enough, then it will be necessary to get all Paster processes ('ps faux | grep -i web_paster') and do a 'kill -9' of them.
 

Documents

Revision 872012-05-09 - unknown

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 91 to 91
  An example of how to launch a simulation production is available. Note that the prerequisite for launching productions is having the lhcb_prmgr role.
Added:
>
>

How to derive a production

Typically, when a new application version is released and should substitute the current one, it's very useful to derive a production
 

How to rerun a production job starting from a log SE link

Frequently individual jobs can fail with an error that should be investigated by applications experts. The following guide on how to rerun a job can be circulated in case of questions by the applications expert.

Revision 862012-04-24 - unknown

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 252 to 252
 

Data Management

Added:
>
>

How to fix screwed up replication transformations

Using the dirac-transformation-debug, instructions here
 

Transfer PIT - CASTOR

The Data transfer betwen the PIT and CASTOR for the RAW is handle on the machine lbdirac.cern.ch by the user lhcbprod. The dirac installation is done under /sw/dirac/data-taking. The transfer itself is managed by the Agent /sw/dirac/data-taking/startup/DataManagement_transferAgent. This python process should run MaxProcess processes and each process can start a new process for each transfer (MaxProcess can be found in /sw/dirac/data-taking/etc/DataManagement_TransferAgent.cfg). If you don't see too many processes, you can look at the log /sw/dirac/data-taking/DataManagement_TransferAgent/log/current. A typical behaviour can be seen here.

Revision 852012-04-17 - JoelClosier

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 252 to 252
 

Data Management

Added:
>
>

Transfer PIT - CASTOR

The Data transfer betwen the PIT and CASTOR for the RAW is handle on the machine lbdirac.cern.ch by the user lhcbprod. The dirac installation is done under /sw/dirac/data-taking. The transfer itself is managed by the Agent /sw/dirac/data-taking/startup/DataManagement_transferAgent. This python process should run MaxProcess processes and each process can start a new process for each transfer (MaxProcess can be found in /sw/dirac/data-taking/etc/DataManagement_TransferAgent.cfg). If you don't see too many processes, you can look at the log /sw/dirac/data-taking/DataManagement_TransferAgent/log/current. A typical behaviour can be seen here.

You can also look at this web page to spot a potentiel problem if you see that the rate decrease. In principle in normal condition of data taking period, it means that one or several processes are stuck. you can find them with strace -f -pid _PID_. As soon as you find it you can kill it kill -9 _PID_. If it has no effect, you can stop the agent in a proper way touch /sw/dirac/data-taking/control/DataManagement/TransferAgent/stop_agent. If it does not produce any effect, you can finnalyy try runsvctrl t /sw/dirac/data-taking/startup/DataManagement_TransferAgent. As last resort, you will have to kill it by hand kill -9 _PID_

You can apply the recipe for the RemovalAgent.

 

Job Data Access Issues

The following document is meant to give Grid shifters a few hints on things they can check when a job has problem accessing data on a site. (Grid Shifters and Grid Experts).

Line: 544 to 552
 
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.doc" attr="" comment="Template (.doc)" date="1217511949" name="ProdOpsProcedureTemplate.doc" path="ProdOpsProcedureTemplate.doc" size="98304" stream="ProdOpsProcedureTemplate.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="dirac-primary-states.pdf" attr="" comment="DIRAC primary job states" date="1221475462" name="dirac-primary-states.pdf" path="dirac-primary-states.pdf" size="33073" stream="dirac-primary-states.pdf" user="Main.GreigCowan" version="1"
META FILEATTACHMENT attachment="dirac-primary-states.png" attr="" comment="DIRAC primary job states" date="1221475538" name="dirac-primary-states.png" path="dirac-primary-states.png" size="111571" stream="dirac-primary-states.png" user="Main.GreigCowan" version="1"
Added:
>
>
META FILEATTACHMENT attachment="Transfer_online_ps.tiff" attr="" comment="Transfer_online_nbprocesses" date="1334675224" name="Transfer_online_ps.tiff" path="Transfer_online_ps.tiff" size="522818" stream="Transfer_online_ps.tiff" tmpFilename="/usr/tmp/CGItemp6619" user="joel" version="1"

Revision 842012-04-16 - FedericoStagni

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 214 to 214
  raise Exception, res['Message']
Added:
>
>

Dealing with gaudi applications commands

There are a set of options with which the gaudi applications are run. Some of the flags can be set in the CS. All these option can be made dependent from the setup. The example that follows are for the LHCb_Production setup.

For what regards install_project.py:

Operations->LHCb-Production->GaudiExecution->installProjectOptions

can be used for setting the flags of install project, when really installing the project. I remind you that this action is triggered only if the project is not (yet) installed. If such option is not set, the default is to run install_project.py with "-b" flag.

Instead, the option

Operations->LHCb-Production->GaudiExecution->checkProjectOptions

can be used for setting possible flags for checking if the project is already installed. This is done running with the default flags "-b --check". In case you want to override such behavior, by setting this option in the CS, do not forget to always add at least "--check".

The option

Operations->LHCb-Production->GaudiExecution->removalProjectOptions

is instead for removing application. It's by default '-r'.

It is also possible to modify the install_project location (the script is downloaded from the web server), setting:

Operations->LHCb-Production->GaudiExecution->install_project_location

which, by default, points at http://lhcbproject.web.cern.ch/lhcbproject/dist/

 

Data Management

Revision 832012-04-15 - unknown

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 302 to 302
 

Determining Versions of LCG Utils, GFAL and the Default SRM2 Protocols List in DIRAC

There is a simple non-intrusive script to obtain the external package versions in DIRAC (Grid Shifters and Grid Experts)

Added:
>
>

Checking the throughput from the pit to Castor (during data taking)

The link band-with is 10GBit. Expected rate (beginning of 2012) is about 280 MB/s, some more details here.
 

DIRAC Configuration Service

Adding new users

Revision 822012-02-08 - StefanRoiser

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 152 to 152
  Also, through the pilot monitor page you can see the pilot output for the "done" pilots, that can contain useful information of why the pilots might not be matched.
Added:
>
>

Creating "mini productions"

So called "mini productions" are sometimes necessary to process a small set of files with a new production (i.e. improved application)

  • Create a new production, if necessary modify steps and create a new request from those steps,
  • Launch the production as usual (I propose to set the run range to a set which is not used anywhere else - see later - e.g. the run/s which are concerned by this prod
  • After the production has been launched two steps need to be done quickly !!!
    • Stop the production in the Dirac production monitor
    • Delete the BkQuery and Params for this production in the Database (ProductionDB). The BkQueryID can be found in the "Additional Params" of the production on the Dirac Production Monitor page
mysql> select * from BkQueries where BkQueryID = 7590 ;
+-----------+----------------------+------------------------------+----------------+----------+-----------+------------+---------------+--------------+-----------------+----------+--------+---------+------------+------+
| BkQueryID | SimulationConditions | DataTakingConditions         | ProcessingPass | FileType | EventType | ConfigName | ConfigVersion | ProductionID | DataQualityFlag | StartRun | EndRun | Visible | RunNumbers | TCK  |
+-----------+----------------------+------------------------------+----------------+----------+-----------+------------+---------------+--------------+-----------------+----------+--------+---------+------------+------+
|      7590 | All                  | Beam3500GeV-VeloClosed-MagUp | Real Data      | RAW      | 90000000  | LHCb       | Collision11   | 0            | OK              |   102896 | 102897 | All     | All        | All  |
+-----------+----------------------+------------------------------+----------------+----------+-----------+------------+---------------+--------------+-----------------+----------+--------+---------+------------+------+
1 row in set (0.00 sec)

mysql> delete from BkQueries where BkQueryID = 7590 ;
Query OK, 1 row affected (0.02 sec)

mysql> select * from BkQueries where BkQueryID = 7590 ;
Empty set (0.00 sec)
      • Delete Additional Parameter from Production
mysql> select * from AdditionalParameters where TransformationID = 16309 and ParameterName = "BkQueryID" ;
+------------------+---------------+----------------+---------------+
| TransformationID | ParameterName | ParameterValue | ParameterType |
+------------------+---------------+----------------+---------------+
|            16309 | BkQueryID     | 7590           | StringType    |
+------------------+---------------+----------------+---------------+
1 row in set (0.00 sec)

mysql> delete from AdditionalParameters where TransformationID = 16309 and ParameterName = "BkQueryID" ;
Query OK, 1 row affected (0.01 sec)

mysql> select * from AdditionalParameters where TransformationID = 16309 and ParameterName = "BkQueryID" ;
Empty set (0.00 sec)
  • Add the needed files to the production via the script below, provide production ID, run number and list of files (Certificate role needs to be lhcb_prod)
transID = 16310
run = 102140
listOfFiles = ['/lhcb/LHCb/Collision11/SDST/00012938/0000/00012938_00006376_1.sdst']

from DIRAC.Core.Base.Script import parseCommandLine
parseCommandLine()
from DIRAC.Core.Utilities.List                                            import sortList
from DIRAC.Core.DISET.RPCClient import RPCClient

tsClient = RPCClient( 'Transformation/TransformationManager' )

res = tsClient.addFilesToTransformation( transID, sortList( listOfFiles ) )
if not res['OK']:
 raise Exception, res['Message']
res = tsClient.addTransformationRunFiles( transID, run, sortList( listOfFiles ) )
if not res['OK']:
 raise Exception, res['Message']
 

Data Management

Job Data Access Issues

Revision 812011-11-04 - unknown

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 169 to 169
 
Added:
>
>

How to recover replicas that are lost even if SRM reports they are existing

This can happen. The file is physically lost but SRM (lcg-ls) reports the file is there, see this GGUS. This replica is totally lost from tape and disk:
 > lcg-ls -l srm://gridka-dCache.fzk.de/pnfs/gridka.de/lhcb/data/2011/RAW/FULL/LHCb/COLLISION11/98298/098298_0000000077.raw
-rw-r--r--   1     2     2 3145768992             NEARLINE /pnfs/gridka.de/lhcb/data/2011/RAW/FULL/LHCb/COLLISION11/98298/098298_0000000077.raw
        * Checksum: 51e2fc3d (adler32)
        * Space tokens: 39930230
You have then to remove the lost replicas and then copy them over again from other another site:
$ dirac-dms-remove-lfn-replica /lhcb/data/2011/RAW/FULL/LHCb/COLLISION11/98298/098298_0000000077.raw GRIDKA-RAW
$ dirac-dms-replicate-lfn  /lhcb/data/2011/RAW/FULL/LHCb/COLLISION11/98298/098298_0000000077.raw GRIDKA-RAW
 

Changing the Default Protocols List for a given Site (Tier-1)

The order of the list of protocols supplied to SRM can be changed e.g. testing root at NIKHEF means root is prepended to the list. This guide explains how to change the protocols list for a given site. This operation is restricted to those with the diracAdmin role. (Grid Experts)

Revision 802011-06-22 - AndresAeschlimann

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 272 to 272
 

SQLlite hint

The following SQLlite is meant to provide a template description to feed a GGUS requests of investigation about one of the most recurrent problems.

Site Problems Follow up

Changed:
<
<
Grid jobs (=pilot jobs in DIRAC terminology) are often failing. In this link an encyclopedic summary of all known issues concerning (mainly) grid jobs issues is reported
>
>
Grid jobs (=pilot jobs in DIRAC terminology) are often failing. In this link an encyclopedic summary of all known issues concerning (mainly) grid jobs issues is reported
 

Getting in touch with sites: Tickets and Mails

How to deal with GGUS follows

Revision 792011-06-07 - unknown

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 139 to 139
  Many times, it seems that is very easy for a production to reach 95%, but what is difficult is to reach 100%. A list of cases can be found in this link. (Mostly for Grid Experts and Production Manager, but Grid shifters can still grasp useful information)
Added:
>
>

Closing a production

It is very cumbersome to keep in the production system old production, may be still active generating unduly load on various component of the Production System like for example the BookkeepingWatchAgent that will also loop on these not longer useful productions stretching the time to create tasks for effectively active productions. At this link a procedure to pick up and close not longer alive production is provided.
 

Pilots monitor

If jobs are not being submitted for a long time, you can check first of all if pilots are submitted, and then if they are actually matched. First, you can look in the portal in the "Pilot monitor" page, to see if there pilots running or submitted. Then, with the command

Revision 782011-03-30 - JoelClosier

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 156 to 156
 The following document is meant to give Grid shifters a few hints on things they can check when a job has problem accessing data on a site. (Grid Shifters and Grid Experts). Check list to debug dcache and CASTOR issues are available also.
Added:
>
>

Staging request blocked

If there are some STAGEIN request blocked you can follow the recipe (http://lblogbook.cern.ch/Operations/4647) to recover the situation
 

Changing the Data manager

It happens more frequently than one expects the need of swapping the identity of the LHCb Data Manager. In this procedure the steps to accomplish smoothly this operation are described.

Revision 772011-02-09 - FedericoStagni

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 139 to 139
  Many times, it seems that is very easy for a production to reach 95%, but what is difficult is to reach 100%. A list of cases can be found in this link. (Mostly for Grid Experts and Production Manager, but Grid shifters can still grasp useful information)
Added:
>
>

Pilots monitor

If jobs are not being submitted for a long time, you can check first of all if pilots are submitted, and then if they are actually matched. First, you can look in the portal in the "Pilot monitor" page, to see if there pilots running or submitted. Then, with the command

dirac-admin-get-job-pilots jobID

you check if pilots are submitted, for you job queue. This will print the logs for the pilots in the queue. If you don't see a line with

'Status': 'Submitted'
then it might be that there is a problem.

Also, through the pilot monitor page you can see the pilot output for the "done" pilots, that can contain useful information of why the pilots might not be matched.

 

Data Management

Job Data Access Issues

Revision 762010-12-03 - unknown

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 147 to 147
 Check list to debug dcache and CASTOR issues are available also.

Changing the Data manager

Changed:
<
<
It happens more frequently than one expects the need of swapping the identity of the LHCb Data Manager. In this procedure the steps to accomplish smoothly this are described.
>
>
It happens more frequently than one expects the need of swapping the identity of the LHCb Data Manager. In this procedure the steps to accomplish smoothly this operation are described.
 

File recovery

Revision 752010-12-03 - unknown

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 80 to 80
  The procedure for search string in output for selected jobs (suitable for Grid Shifters and Grid Experts).
Added:
>
>
A low level investigation on LSF to check why LHCb jobs do not start at CERN.
 

Running Jobs Locally With DIRAC

There are three submission modes associated with DIRAC: default WMS submission, local execution of a workflow and finally execution of a workflow in the full agent machinery. This procedure explains the steps for running jobs locally with DIRAC.
Line: 144 to 146
 The following document is meant to give Grid shifters a few hints on things they can check when a job has problem accessing data on a site. (Grid Shifters and Grid Experts). Check list to debug dcache and CASTOR issues are available also.
Added:
>
>

Changing the Data manager

It happens more frequently than one expects the need of swapping the identity of the LHCb Data Manager. In this procedure the steps to accomplish smoothly this are described.
 

File recovery

Revision 742010-11-16 - FedericoStagni

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 152 to 152
  The order of the list of protocols supplied to SRM can be changed e.g. testing root at NIKHEF means root is prepended to the list. This guide explains how to change the protocols list for a given site. This operation is restricted to those with the diracAdmin role. (Grid Experts)
Changed:
<
<

Banning a SE if a Site is in downtime

>
>

Banning a SE if a Site is in downtime or full

 
Changed:
<
<
You (as Grid Expert) can use the command dirac-admin-ban-se / dirac-admin-allow-se to disable or enable a SE in case of problem or downtime affecting a SE.
>
>
You (as Grid Expert or Data Manager) can use the command dirac-admin-ban-se / dirac-admin-allow-se to disable or enable a SE in case of problem or downtime affecting a SE.
  Example to ban all the SEs at RAL in writing.
Line: 166 to 166
 dirac-admin-ban-se RAL-DST
Added:
>
>
Example to ban one SE in writing at CNAF
dirac-admin-ban-se -w CNAF-USER

Also keep in mind that NIKHEF and SARA have different SE. LCG.NIKHEF.nl SE are: NIKHEF-RAW, NIKHEF-RDST

While the others are in fact based at SARA: NIKHEF-DST NIKHEF-FAILOVER NIKHEF-USER NIKHEF_M-DST NIKHEF_MC-DST NIKHEF_MC_M-DST

Banning, unbanning, and re-directing the ConditionDB

The actual implementation is using the Configuration Service to store the connection strings of the oracle DB at the the site. There are plans for switching to another technology, but for the moment this is how it is implemented.

In the CS, in section Resources/CondDB there is a subsection for each of the T1 sites, with connection strings and status. If the status is "Active", such database is used. If "InActive" is used instead (or anything but "Active"), a second connection string between the available ones is used, chosen with a random shuffle. Anyway, sometimes it's better to specify a fixed redirection.

A "trick" is used instead for the specific redirection: just save the original, real connection in a section (call it for example "LCG.CNAF.ir.REAL") and set the status as "InActive". Then, if you, for example, wants to redirect to CERN, just copy the CERN section and call it LCG.CNAF.it.

Remember that SARA and NIKHEF share the same DB.

 

Checking if a file is cached

Simple four steps to know how to check if a file is cached or not (Grid Shifters and Grid Experts)

Revision 732010-11-15 - FedericoStagni

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 133 to 133
  The following link gives an introduction and examples using the Production API. (Grid Shifters and Grid Experts)
Added:
>
>

Getting production to 100%

Many times, it seems that is very easy for a production to reach 95%, but what is difficult is to reach 100%. A list of cases can be found in this link. (Mostly for Grid Experts and Production Manager, but Grid shifters can still grasp useful information)

 

Data Management

Job Data Access Issues

Revision 722010-06-03 - unknown

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 85 to 85
 

Production management

Added:
>
>

Launching productions from the production request page

An example of how to launch a simulation production is available. Note that the prerequisite for launching productions is having the lhcb_prmgr role.

How to rerun a production job starting from a log SE link

Frequently individual jobs can fail with an error that should be investigated by applications experts. The following guide on how to rerun a job can be circulated in case of questions by the applications expert.

 

Dealing with Production IDs, Production Job IDs and WMS Job IDs

This simple guide shows how to obtain production IDs from WMS job IDs and vice versa (Grid Shifters and Grid Experts)

Revision 712010-05-05 - JoelClosier

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 14 to 14
 

non CERN machines

Changed:
<
<
to install outside CERN, you should download and install DIRAC (for example v4r2)
>
>
to install outside CERN, you should download and install LHCbDirac (for example v5r3)
 
  • setenv MYSITEROOT /my/location/to/install/dirac
Changed:
<
<
  • setenv CMTCONFIG slc4_ia32_gcc34
>
>
  • setenv CMTCONFIG <LCG_TAG>
 
Changed:
<
<
  • python install_project.py -p Dirac -v v4r2 -b
>
>
  • python install_project.py -p LHCbDirac -v v5r3 -b
 
  • source $MYSITEROOT/LbLogin.csh or . $MYSITEROOT/LbLogin.sh
Changed:
<
<
>
>
 
Changed:
<
<
To use Dirac on a machine without lcg stuff, you need to add in the $DIRACROOT/etc/dirac.cfg the following lines
>
>
To use LHCbDirac on a machine without lcg stuff, you need to add in the $DIRACROOT/etc/dirac.cfg the following lines
 
Resources
{
Line: 37 to 37
 } to desactivate the LFC checking.

Changed:
<
<

Building,Deploying & Installing DIRAC (and Core Software)

>
>

Building,Deploying & Installing LHCbDirac (and Core Software)

 

Building the DIRAC binary distributions

Line: 45 to 45
 

Installing DIRAC on lxplus

Changed:
<
<
  • If you are developing tools for DIRAC, it probably makes sense for you to install your own version of DIRAC from CVS.
>
>
  • If you are developing tools for DIRAC, it probably makes sense for you to install your own version of LHCbDirac.
 
Changed:
<
<

Installing DIRAC on non CERN machines

For installing DIRAC on your local machine, you should download and install DIRAC, specifying the version <version>
>
>

Installing LHCbDirac on non CERN machines

For installing LHCbDirac on your local machine, you should download and install LHCbDirac, specifying the version <version>
 
  • setenv MYSITEROOT /my/location/to/install/dirac
Changed:
<
<
  • setenv CMTCONFIG slc4_ia32_gcc34
>
>
  • setenv CMTCONFIG <LCG_TAG>
 
Changed:
<
<
  • python install_project.py -p Dirac -v <version>
>
>
  • python install_project.py -p LHCbDirac -v <version>
 Then in your login script you should include:
  • source $MYSITEROOT/LbLogin.csh or . $MYSITEROOT/LbLogin.sh
Changed:
<
<
and to set up the DIRAC environment (beware if you use ganga this is not needed as done internally by ganga)
  • SetupProject Dirac
>
>
and to set up the LHCbDirac environment (beware if you use ganga this is not needed as done internally by ganga)
  If you see an error message like "Warning : Cannot add voms attribute /lhcb/Role=user to proxy Accessing data in the grid storage from the user interface will not be possible. The grid jobs will not be affected." then try doing chmod 644 $DIRAC_VOMSES/lhcb-voms.cern.ch. You will also need to set $X509_CERT_DIR and X509_VOMS_DIR. Refer to lxplus for default settings, or take a look at the Dirac tool dirac-admin-get-CAs available in Diracs later than v4r19. However you do it, if you make a local copy of these two directories, you will need to keep that copy up-to-date. -- WillReece - 2009-10-06
Changed:
<
<

"user guide" on how to take advantage of the CMT setup of DIRAC

Summary of commands to be used for taking advantage of the DIRAC installation using CMT.

>
>

"user guide" on how to take advantage of the CMT setup of LHCbDirac

Summary of commands to be used for taking advantage of the LHCbDirac installation using CMT.

 

Deployment of LHCb Software on the Grid

Revision 702010-04-28 - unknown

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 107 to 107
 

Production workflow

Added:
>
>
 

User workflow

Revision 692010-04-14 - unknown

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 128 to 128
 

Job Data Access Issues

Changed:
<
<
The following document is meant to give Grid shifters a few hints on things they can check when a job has problem accessing data on a site. (Grid Shifters and Grid Experts)
>
>
The following document is meant to give Grid shifters a few hints on things they can check when a job has problem accessing data on a site. (Grid Shifters and Grid Experts). Check list to debug dcache and CASTOR issues are available also.
 

File recovery

Revision 682010-03-09 - NickBrook

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 11 to 11
 

Setting Production environment

CERN - lxplus

After you login on lxplus, you have to run the two following commands to get the production environment
Changed:
<
<
>
>
 
  • lhcb-proxy-init -g lhcb_prod

non CERN machines

to install outside CERN, you should download and install DIRAC (for example v4r2)

Revision 672009-10-06 - JoelClosier

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 142 to 142
  You (as Grid Expert) can use the command dirac-admin-ban-se / dirac-admin-allow-se to disable or enable a SE in case of problem or downtime affecting a SE.
Changed:
<
<
Example to ban all the SE at RAL in writing.
>
>
Example to ban all the SEs at RAL in writing.
 
Changed:
<
<
dirac-admin-ban-se -c RAL.uk -w
>
>
dirac-admin-ban-se -c RAL.uk

Example to ban one SE at RAL

dirac-admin-ban-se RAL-DST
 

Checking if a file is cached

Revision 662009-10-06 - unknown

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 59 to 59
 and to set up the DIRAC environment (beware if you use ganga this is not needed as done internally by ganga)
  • SetupProject Dirac
Added:
>
>
If you see an error message like "Warning : Cannot add voms attribute /lhcb/Role=user to proxy Accessing data in the grid storage from the user interface will not be possible. The grid jobs will not be affected." then try doing chmod 644 $DIRAC_VOMSES/lhcb-voms.cern.ch. You will also need to set $X509_CERT_DIR and X509_VOMS_DIR. Refer to lxplus for default settings, or take a look at the Dirac tool dirac-admin-get-CAs available in Diracs later than v4r19. However you do it, if you make a local copy of these two directories, you will need to keep that copy up-to-date. -- WillReece - 2009-10-06
 

"user guide" on how to take advantage of the CMT setup of DIRAC

Summary of commands to be used for taking advantage of the DIRAC installation using CMT.

Deployment of LHCb Software on the Grid

Revision 652009-10-06 - JoelClosier

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 136 to 136
  The order of the list of protocols supplied to SRM can be changed e.g. testing root at NIKHEF means root is prepended to the list. This guide explains how to change the protocols list for a given site. This operation is restricted to those with the diracAdmin role. (Grid Experts)
Added:
>
>

Banning a SE if a Site is in downtime

You (as Grid Expert) can use the command dirac-admin-ban-se / dirac-admin-allow-se to disable or enable a SE in case of problem or downtime affecting a SE.

Example to ban all the SE at RAL in writing.

dirac-admin-ban-se -c LCG.RAL.uk -w 
 

Checking if a file is cached

Simple four steps to know how to check if a file is cached or not (Grid Shifters and Grid Experts)

Revision 642009-07-08 - NickBrook

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 19 to 19
 
Changed:
<
<
  • python install_project.py -p Dirac -v v4r2
>
>
  • python install_project.py -p Dirac -v v4r2 -b
 
  • source $MYSITEROOT/LbLogin.csh or . $MYSITEROOT/LbLogin.sh
  • SetupProject Dirac

Revision 632009-06-12 - JoelClosier

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 239 to 239
 

Mailing lists

Changed:
<
<
>
>
 

Questions and comments to the experts

Revision 622009-06-12 - JoelClosier

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 255 to 255
 
Changed:
<
<
>
>
 
  • lhcb-grid : lhb-grid-alarms + lhcb-gridshifters
  • lhcb-paste : lhcb-grid + lhcb-ganga
  • lhcb-gridteam : lhcb-lcgcontact + lhcb-datamanagement + lhcb-bookkeeping + lhcb-sam + lhcb-production-managers + lhcb-gridresources + lhcb-cern-contact + lhcb-cnaf-contact + lhcb-gridka-contact + lhcb-in2p3-contact + lhcb-nikhef-contact + lhcb-pic-contact + lhcb-ral-contact

Revision 612009-06-11 - PhilippeCharpentier

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 240 to 240
 

Mailing lists

  • lhcb-dirac : all people involved in development of DIRAC
Changed:
<
<
  • lhcb-gridshifters : all Computing shifters : Used for annoucing events concerning the shifts
>
>
 
Changed:
<
<
>
>
 
Line: 255 to 255
 
Changed:
<
<
>
>
 
  • lhcb-grid : lhb-grid-alarms + lhcb-gridshifters
  • lhcb-paste : lhcb-grid + lhcb-ganga
  • lhcb-gridteam : lhcb-lcgcontact + lhcb-datamanagement + lhcb-bookkeeping + lhcb-sam + lhcb-production-managers + lhcb-gridresources + lhcb-cern-contact + lhcb-cnaf-contact + lhcb-gridka-contact + lhcb-in2p3-contact + lhcb-nikhef-contact + lhcb-pic-contact + lhcb-ral-contact

Revision 602009-05-07 - RobertoSantinel

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 196 to 196
 Grid jobs (=pilot jobs in DIRAC terminology) are often failing. In this link an encyclopedic summary of all known issues concerning (mainly) grid jobs issues is reported

Getting in touch with sites: Tickets and Mails

Changed:
<
<
Ho to deal with GGUS follows
>
>
How to deal with GGUS follows
 

GGUS

Changed:
<
<
Rules of thumb:
  • It is always good practice to open a GGUS ticket for reporting a problem to a site.
  • A problem is not a problem if a GGUS has not been open.
  • We noticed (years of experience) that GGUS + mail to support mailing list at the site is the most efficient and quick way to get in touch with the site.
  • The responsivness increases if T1 local contact person are in the loop. Contact mailing address for T1s are available at https://twiki.cern.ch/twiki/bin/view/LHCb/ProductionProcedures#Mailing_lists
  • In the last release of GGUS portal new flavors of tickets have been introduced that merge the traceability feature of a GGUS ticket and the effectiveness of the direct mail to the site.
  • Some of these 'new' tickets trigger e-mails to special site mailing list (alarming mailing list) that, internally on the site, trigger procedures to react quickly to a problem even outside working hour.
  • When submitting GGUS tickets, make sure that you CC lhcb-grid@cernNOSPAMPLEASE.ch. This will enable a fast response if the original submitter is not available.
  • If the issue breaks or represents a show stopper for some of the MoU activities at T0 or T1, open a GGUS ALARM ticket. Only few people in the VO are entitled to open such kind of ticket and usually these people are grid experts. A list of authorized ALARMERS for LHCb is available here.
  • If the problem concerns production activity open a TEAM ticket that allows all member of a team (all LHCb collaborators with Role=production) to interact and modify the ticket and always select the affected site: this will allow a direct routing of the ticket to the site speeding up notification and avoid to have dependencies on TPM and ROC as intermediate steps to the site.
  • Since Jan 21st GGUS allows for an escalation of ticket slowly taken by the support unit. Please read this escalation procedure document
>
>

In this practice-guide we aim to provide few clear rules to the operators/GEOCs/experts to submit GGUS tickets and a quick introduction to the ticketing system. In Grid a problem is not a problem if a GGUS has not been open. With that clear in mind we wanto to present and analyze the best way to submit a GGUS ticket to a site (for a site specific problem). In the early days there was the mail as unique tool to contact sites. It was quick and also efficient. GGUS ticketing system came in the game bringing much more functionality but also a less fast way to get in touch with sites. While indeed it was a good tool to track problems and also to accumulate know-how about problems, the path of the ticket was not always straight to the experts that had to fix the problem on the remote sites. We noticed (years of experience) that a ticket GGUS + direct mail to support mailing list at the site wss the most efficient and quick way to get in touch with the site. The responsiveness increases if a LHCb local T1 contact person was also put in the loop. (please note that contact mailing address for T1s are available at https://twiki.cern.ch/twiki/bin/view/LHCb/ProductionProcedures#Mailing_lists )
The recent releases of GGUS introduced new features that matched both the need for a quick contact with sites in a old mail fashion and the robustness typical of a ticketing system. More in details together with the usual USER ticket GGUS is offering the possibility to submit both TEAM tickets and ALARM tickets.
  • A TEAM ticket is a special ticket that matters production activities and is targetted to problems at sites that have to be followed by a crew as a whole rather than a single person (ex. production team).Any problem at every each site that everyone in the operations team is potentially demanded to follow and intervene, must be spawned via a TEAM ticket. A TEAM ticket either can allow for a direct routing of the problem to the site (in which case the submitter must put from a drop-down menu the GOC name of the site affected) or can go through the usual path TPM/ROC/SITE with the unavoidable lost of time. A TEAM ticket is not a top priority ticket. Submitter has the possibility to select the severity from the web form. The only difference is that everybody in the same TEAM could modify and interact with the ticket that is owned by the TEAM and not the user. GGUS knows about the meber of the TEAM via VOMS. All people entitled to dress the Role=production (now) or Role=team (coming soon) are part of the TEAM and recognize to act on the ticket.
  • An ALARM ticket is another special ticket that is meant really to generate ALARMs on the interested sites. The implementation of the ALARM at site level is different and different the support each site decided to put in place. Mails to special site mailing list (alarming mailing list) internally on the site, might in turn trigger procedures to react quickly to a problem even outside working hour. SMS, phone calls, operators, control rooms, remedy tickets...Everything is left behind the scenes.T1's, as per MoU, are demanded to react in less than 30 minutes, 24X7 to ALARM tickets. What matters here is: VOs are guaranteed to have a answer in at least 30 minutes but the solution is not necessarily guaranteed in such short time! Only a very well restricted number of people inside the VO (read: alarmers) are entitled to submit ALARM tickets. This limitation is clearly a need to avoid that non-experts had the possibility to wake up someone else for a fake problems. Soon GGUS will retrieve authorized alarmers from VOMS (Role=alarm). A list of authorized ALARMERS for LHCb is today available here.

Would you please mind that - despite those new tools are extremely useful and important we warmly recommend to not abusing about them. The net effect indeed is a lost of credibility that would relax the ALARM threshold. I propose below some suggestions about typical problems and action to be taken.

  1. If the problem is a show stopper the shifter has to call the GEOC. The Experts has then to investigate whether the problem is really a show stopper and in case submit the ALARM. A show stopper here is mainly a problem that prevents to continue with the activity on the site.In the GGUS portal for ALARM ticket, there is available a list of identified MoU activities that may give origin to an alarm. It's worth to remind however that at CERN a show stopper only matters data. When submitting please put in cc also lhcb-grid@cernNOSPAMPLEASE.ch mailing list and open an entry in the e-logbook
  2. If the problem affects severely one of the services at T1's and compromises one of the activities on the site, a TEAM ticket with "Top Priority" or "Very Urgent" is recommended. We leave up to the GEOC to decide but also an entry in the e-logbook must be filed.
  3. If the problem interests the production activity at other sites the GEOC or the shifter must open a TEAM ticket with a severity that ranges from "Less Urgent" to "Top Priority" depending on how the problem impacts the (T2) site. If just few jobs have problems and the rest is running happily (let's say less than 10%) it may be just a (few) WN problem (Less Urgent). If the site is acting as a black hole and compromise also the activities somewhere else by attracting and failing jobs that otherwise may reach other sites, the site must be banned and the TEAM ticket deserves a Top Priority level.
  4. Normal users can also get in touch with sites via Standard ticket. Severity is again matter of personal feelings. We discourage however to always think "my problem is always more important than any other else". In WLCG soon 5K users will start doing their activities.

Ticket escalation:

Since Jan 21st GGUS allows for an escalation of ticket slowly taken by the support units or unresponsive sites. Please read this escalation procedure document.


 

Site Mail contacts

WLCG "conventional" site-support mailing list are available here.

Revision 592009-04-08 - StuartPaterson

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 118 to 118
 
Added:
>
>

Production API Notes

The following link gives an introduction and examples using the Production API. (Grid Shifters and Grid Experts)

 

Data Management

Job Data Access Issues

Revision 582009-03-16 - RobertoSantinel

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 192 to 192
 Grid jobs (=pilot jobs in DIRAC terminology) are often failing. In this link an encyclopedic summary of all known issues concerning (mainly) grid jobs issues is reported

Getting in touch with sites: Tickets and Mails

Added:
>
>
Ho to deal with GGUS follows

GGUS

  Rules of thumb:
  • It is always good practice to open a GGUS ticket for reporting a problem to a site.
Changed:
<
<
  • A problem is not a problem if a GGUS ahs not been open.
>
>
  • A problem is not a problem if a GGUS has not been open.
 
  • We noticed (years of experience) that GGUS + mail to support mailing list at the site is the most efficient and quick way to get in touch with the site.
Changed:
<
<
>
>
 
  • In the last release of GGUS portal new flavors of tickets have been introduced that merge the traceability feature of a GGUS ticket and the effectiveness of the direct mail to the site.
Changed:
<
<
  • Some of these 'new' tickets trigger e-mails to special site mailing list (alarming mailing list) that, internally on the site, trigger procedures to react quickly to a problem even outside working hour.

GGUS

>
>
  • Some of these 'new' tickets trigger e-mails to special site mailing list (alarming mailing list) that, internally on the site, trigger procedures to react quickly to a problem even outside working hour.
 
  • When submitting GGUS tickets, make sure that you CC lhcb-grid@cernNOSPAMPLEASE.ch. This will enable a fast response if the original submitter is not available.
  • If the issue breaks or represents a show stopper for some of the MoU activities at T0 or T1, open a GGUS ALARM ticket. Only few people in the VO are entitled to open such kind of ticket and usually these people are grid experts. A list of authorized ALARMERS for LHCb is available here.
Changed:
<
<
  • If the problem concerns production activity open a TEAM ticket that allows all member of a team (all LHCb collaborators with Role=production) to interact and modify the ticket and always select the affected site: this will allow a direct routing of the ticket to the site speeding up notification and avoid to have dependencies on TPM and ROC as intermediate steps to the site.
>
>
  • If the problem concerns production activity open a TEAM ticket that allows all member of a team (all LHCb collaborators with Role=production) to interact and modify the ticket and always select the affected site: this will allow a direct routing of the ticket to the site speeding up notification and avoid to have dependencies on TPM and ROC as intermediate steps to the site.
  • Since Jan 21st GGUS allows for an escalation of ticket slowly taken by the support unit. Please read this escalation procedure document
 

Site Mail contacts

WLCG "conventional" site-support mailing list are available here.

Line: 272 to 273
 
  • The following commands you can use to set the data quality flag:

Deleted:
<
<
  dirac-bookkeeping-setdataquality-run
Changed:
<
<
The input parameters is the run number and the data quality flag. if you want to know the data quality flag, you have to use this command without input parameters.
for example:
>
>
The input parameters is the run number and the data quality flag. if you want to know the data quality flag, you have to use this command without input parameters.
for example:
 
(DIRAC3-user) zmathe@pclhcb43 /scratch/zmathe/dirac[406]>dirac-bookkeeping-setdataquality-run
Available data quality flags:
Line: 285 to 284
 BAD MAYBE Usage: dirac-bookkeeping-setdataquality-run.py
Changed:
<
<
The data quality flag is case sensitive.
Set data quality a given run:
>
>
The data quality flag is case sensitive. Set data quality a given run:
 (DIRAC3-user) zmathe@pclhcb43 /scratch/zmathe/dirac[408]>dirac-bookkeeping-setdataquality-run 20716 'BAD' Quality flag has been updated! (DIRAC3-user) zmathe@pclhcb43 /scratch/zmathe/dirac[409]>
Line: 296 to 292
  dirac-bookkeeping-setdataquality-files
Changed:
<
<
The input is a logical file name or a file. This file contains a list of lfns.
Set the quality flag one file:
>
>
The input is a logical file name or a file. This file contains a list of lfns. Set the quality flag one file:
 
(DIRAC3-user) zmathe@pclhcb43 /scratch/zmathe/dirac[413]> dirac-bookkeeping-setdataquality-files /lhcb/data/2009/RAW/EXPRESS/FEST/FEST/44026/044026_0000000002.raw 'BAD'
['/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/44026/044026_0000000002.raw']
Line: 313 to 307
 ['/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/44026/044026_0000000002.raw', '/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43998/043998_0000000002.raw', '/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43995/043995_0000000002.raw', '/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43994/043994_0000000002.raw', '/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43993/043993_0000000002.raw', '/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43992/043992_0000000002.raw', '/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43989/043989_0000000002.raw', '/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43987/043987_0000000002.raw'] Quality flag updated! (DIRAC3-user) zmathe@pclhcb43 /scratch/zmathe/dirac[417]>
Changed:
<
<
The lfns.txt contains the following:
>
>
The lfns.txt contains the following:
 /lhcb/data/2009/RAW/EXPRESS/FEST/FEST/44026/044026_0000000002.raw /lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43998/043998_0000000002.raw /lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43995/043995_0000000002.raw

Revision 572009-03-15 - AndreiTsaregorodtsev

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 114 to 114
 
Added:
>
>

Procedures in case of site failures

 

Data Management

Job Data Access Issues

Revision 562009-03-12 - StuartPaterson

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 106 to 106
 

Production workflow

* Production Job Finalization

Added:
>
>

User workflow

Section to describe general policies for user jobs in DIRAC.

 

Data Management

Job Data Access Issues

Revision 552009-03-03 - ZoltanMathe

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 244 to 244
 
Added:
>
>

Bookkeeping System

Set the data quality flag

  • You know the data quality flag you can use: dirac-bookkeeping-setdataquality-run or dirac-bookkeeping-setdataquality-files. The commands without input parameter shows the available quality flags and how to use this command.

(DIRAC3-user) zmathe@pclhcb43 /scratch/zmathe/dirac[406]>dirac-bookkeeping-setdataquality-run
Available data quality flags:
UNCHECKED
OK
BAD
MAYBE
Usage: dirac-bookkeeping-setdataquality-run.py <RunNumber> <DataQualityFlag>

  • The following commands you can use to set the data quality flag:

dirac-bookkeeping-setdataquality-run

The input parameters is the run number and the data quality flag. if you want to know the data quality flag, you have to use this command without input parameters.
for example:

(DIRAC3-user) zmathe@pclhcb43 /scratch/zmathe/dirac[406]>dirac-bookkeeping-setdataquality-run
Available data quality flags:
UNCHECKED
OK
BAD
MAYBE
Usage: dirac-bookkeeping-setdataquality-run.py <RunNumber> <DataQualityFlag>
The data quality flag is case sensitive.
Set data quality a given run:
(DIRAC3-user) zmathe@pclhcb43 /scratch/zmathe/dirac[408]>dirac-bookkeeping-setdataquality-run 20716 'BAD'
Quality flag has been updated!
(DIRAC3-user) zmathe@pclhcb43 /scratch/zmathe/dirac[409]>

dirac-bookkeeping-setdataquality-files

The input is a logical file name or a file. This file contains a list of lfns.
Set the quality flag one file:

(DIRAC3-user) zmathe@pclhcb43 /scratch/zmathe/dirac[413]> dirac-bookkeeping-setdataquality-files /lhcb/data/2009/RAW/EXPRESS/FEST/FEST/44026/044026_0000000002.raw 'BAD'
['/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/44026/044026_0000000002.raw']
Quality flag updated!
(DIRAC3-user) zmathe@pclhcb43 /scratch/zmathe/dirac[414]>

Set the quality flag a list of file:

(DIRAC3-user) zmathe@pclhcb43 /scratch/zmathe/dirac[416]> dirac-bookkeeping-setdataquality-files lfns.txt 'BAD'
['/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/44026/044026_0000000002.raw', '/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43998/043998_0000000002.raw', '/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43995/043995_0000000002.raw', '/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43994/043994_0000000002.raw', '/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43993/043993_0000000002.raw', '/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43992/043992_0000000002.raw', '/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43989/043989_0000000002.raw', '/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43987/043987_0000000002.raw']
Quality flag updated!
(DIRAC3-user) zmathe@pclhcb43 /scratch/zmathe/dirac[417]>
The lfns.txt contains the following:
/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/44026/044026_0000000002.raw
/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43998/043998_0000000002.raw
/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43995/043995_0000000002.raw
/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43994/043994_0000000002.raw
/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43993/043993_0000000002.raw
/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43992/043992_0000000002.raw
/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43989/043989_0000000002.raw
/lhcb/data/2009/RAW/EXPRESS/FEST/FEST/43987/043987_0000000002.raw
lfns.txt (END)
 

Documents

Revision 542009-01-28 - RobertoSantinel

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 59 to 59
 and to set up the DIRAC environment (beware if you use ganga this is not needed as done internally by ganga)
  • SetupProject Dirac
Added:
>
>

"user guide" on how to take advantage of the CMT setup of DIRAC

Summary of commands to be used for taking advantage of the DIRAC installation using CMT.

 

Deployment of LHCb Software on the Grid

Line: 250 to 252
 
Deleted:
<
<
META FILEATTACHMENT attachment="dirac-primary-states.png" attr="" comment="DIRAC primary job states" date="1221475538" name="dirac-primary-states.png" path="dirac-primary-states.png" size="111571" stream="dirac-primary-states.png" user="Main.GreigCowan" version="1"
META FILEATTACHMENT attachment="dirac-primary-states.pdf" attr="" comment="DIRAC primary job states" date="1221475462" name="dirac-primary-states.pdf" path="dirac-primary-states.pdf" size="33073" stream="dirac-primary-states.pdf" user="Main.GreigCowan" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.doc" attr="" comment="Template (.doc)" date="1217511949" name="ProdOpsProcedureTemplate.doc" path="ProdOpsProcedureTemplate.doc" size="98304" stream="ProdOpsProcedureTemplate.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="RecoveryOfFilesLostBySE.doc" attr="" comment="" date="1217533516" name="RecoveryOfFilesLostBySE.doc" path="RecoveryOfFilesLostBySE.doc" size="105984" stream="RecoveryOfFilesLostBySE.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="SAMProcedure010808.pdf" attr="" comment="" date="1217583512" name="SAMProcedure010808.pdf" path="SAMProcedure010808.pdf" size="476639" stream="SAMProcedure010808.pdf" user="Main.StuartPaterson" version="2"
META FILEATTACHMENT attachment="LHCbSoftwareDeployment.pdf" attr="" comment="" date="1217853807" name="LHCbSoftwareDeployment.pdf" path="LHCbSoftwareDeployment.pdf" size="203140" stream="LHCbSoftwareDeployment.pdf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.txt" attr="" comment="" date="1219011372" name="ProdOpsProcedureTemplate.txt" path="ProdOpsProcedureTemplate.txt" size="240" stream="ProdOpsProcedureTemplate.txt" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1219011424" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="745" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="DataAccessProblems.pdf" attr="" comment="data access check list" date="1220972511" name="DataAccessProblems.pdf" path="DataAccessProblems.pdf" size="1600481" stream="DataAccessProblems.pdf" user="Main.NickBrook" version="4"
 
META FILEATTACHMENT attachment="feature_request_and_bug_submission.pdf" attr="" comment="Procedure to submit feature requests and report bugs." date="1220613807" name="feature_request_and_bug_submission.pdf" path="feature_request_and_bug_submission.pdf" size="199672" stream="feature_request_and_bug_submission.pdf" user="Main.PaulSzczypka" version="1"
Added:
>
>
META FILEATTACHMENT attachment="DataAccessProblems.pdf" attr="" comment="data access check list" date="1220972511" name="DataAccessProblems.pdf" path="DataAccessProblems.pdf" size="1600481" stream="DataAccessProblems.pdf" user="Main.NickBrook" version="4"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1219011424" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="745" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.txt" attr="" comment="" date="1219011372" name="ProdOpsProcedureTemplate.txt" path="ProdOpsProcedureTemplate.txt" size="240" stream="ProdOpsProcedureTemplate.txt" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="LHCbSoftwareDeployment.pdf" attr="" comment="" date="1217853807" name="LHCbSoftwareDeployment.pdf" path="LHCbSoftwareDeployment.pdf" size="203140" stream="LHCbSoftwareDeployment.pdf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="SAMProcedure010808.pdf" attr="" comment="" date="1217583512" name="SAMProcedure010808.pdf" path="SAMProcedure010808.pdf" size="476639" stream="SAMProcedure010808.pdf" user="Main.StuartPaterson" version="2"
META FILEATTACHMENT attachment="RecoveryOfFilesLostBySE.doc" attr="" comment="" date="1217533516" name="RecoveryOfFilesLostBySE.doc" path="RecoveryOfFilesLostBySE.doc" size="105984" stream="RecoveryOfFilesLostBySE.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.doc" attr="" comment="Template (.doc)" date="1217511949" name="ProdOpsProcedureTemplate.doc" path="ProdOpsProcedureTemplate.doc" size="98304" stream="ProdOpsProcedureTemplate.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="dirac-primary-states.pdf" attr="" comment="DIRAC primary job states" date="1221475462" name="dirac-primary-states.pdf" path="dirac-primary-states.pdf" size="33073" stream="dirac-primary-states.pdf" user="Main.GreigCowan" version="1"
META FILEATTACHMENT attachment="dirac-primary-states.png" attr="" comment="DIRAC primary job states" date="1221475538" name="dirac-primary-states.png" path="dirac-primary-states.png" size="111571" stream="dirac-primary-states.png" user="Main.GreigCowan" version="1"

Revision 532009-01-21 - RobertoSantinel

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"

LHCb Production Operations Procedures

Line: 190 to 190
 
  • When submitting GGUS tickets, make sure that you CC lhcb-grid@cernNOSPAMPLEASE.ch. This will enable a fast response if the original submitter is not available.
  • If the issue breaks or represents a show stopper for some of the MoU activities at T0 or T1, open a GGUS ALARM ticket. Only few people in the VO are entitled to open such kind of ticket and usually these people are grid experts. A list of authorized ALARMERS for LHCb is available here.
Changed:
<
<
  • If the problem concerns T0 and T1 and is an issue for production activity open a TEAM ticket that allows all memeber of a team (all LHCb collaborators with Role=production) to interact and modify the ticket
>
>
  • If the problem concerns production activity open a TEAM ticket that allows all member of a team (all LHCb collaborators with Role=production) to interact and modify the ticket and always select the affected site: this will allow a direct routing of the ticket to the site speeding up notification and avoid to have dependencies on TPM and ROC as intermediate steps to the site.
 

Site Mail contacts

WLCG "conventional" site-support mailing list are available here.

Line: 250 to 250
 
Deleted:
<
<
META FILEATTACHMENT attachment="feature_request_and_bug_submission.pdf" attr="" comment="Procedure to submit feature requests and report bugs." date="1220613807" name="feature_request_and_bug_submission.pdf" path="feature_request_and_bug_submission.pdf" size="199672" stream="feature_request_and_bug_submission.pdf" user="Main.PaulSzczypka" version="1"
META FILEATTACHMENT attachment="DataAccessProblems.pdf" attr="" comment="data access check list" date="1220972511" name="DataAccessProblems.pdf" path="DataAccessProblems.pdf" size="1600481" stream="DataAccessProblems.pdf" user="Main.NickBrook" version="4"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1219011424" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="745" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.txt" attr="" comment="" date="1219011372" name="ProdOpsProcedureTemplate.txt" path="ProdOpsProcedureTemplate.txt" size="240" stream="ProdOpsProcedureTemplate.txt" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="LHCbSoftwareDeployment.pdf" attr="" comment="" date="1217853807" name="LHCbSoftwareDeployment.pdf" path="LHCbSoftwareDeployment.pdf" size="203140" stream="LHCbSoftwareDeployment.pdf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="SAMProcedure010808.pdf" attr="" comment="" date="1217583512" name="SAMProcedure010808.pdf" path="SAMProcedure010808.pdf" size="476639" stream="SAMProcedure010808.pdf" user="Main.StuartPaterson" version="2"
META FILEATTACHMENT attachment="RecoveryOfFilesLostBySE.doc" attr="" comment="" date="1217533516" name="RecoveryOfFilesLostBySE.doc" path="RecoveryOfFilesLostBySE.doc" size="105984" stream="RecoveryOfFilesLostBySE.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.doc" attr="" comment="Template (.doc)" date="1217511949" name="ProdOpsProcedureTemplate.doc" path="ProdOpsProcedureTemplate.doc" size="98304" stream="ProdOpsProcedureTemplate.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="dirac-primary-states.pdf" attr="" comment="DIRAC primary job states" date="1221475462" name="dirac-primary-states.pdf" path="dirac-primary-states.pdf" size="33073" stream="dirac-primary-states.pdf" user="Main.GreigCowan" version="1"
 
META FILEATTACHMENT attachment="dirac-primary-states.png" attr="" comment="DIRAC primary job states" date="1221475538" name="dirac-primary-states.png" path="dirac-primary-states.png" size="111571" stream="dirac-primary-states.png" user="Main.GreigCowan" version="1"
Added:
>
>
META FILEATTACHMENT attachment="dirac-primary-states.pdf" attr="" comment="DIRAC primary job states" date="1221475462" name="dirac-primary-states.pdf" path="dirac-primary-states.pdf" size="33073" stream="dirac-primary-states.pdf" user="Main.GreigCowan" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.doc" attr="" comment="Template (.doc)" date="1217511949" name="ProdOpsProcedureTemplate.doc" path="ProdOpsProcedureTemplate.doc" size="98304" stream="ProdOpsProcedureTemplate.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="RecoveryOfFilesLostBySE.doc" attr="" comment="" date="1217533516" name="RecoveryOfFilesLostBySE.doc" path="RecoveryOfFilesLostBySE.doc" size="105984" stream="RecoveryOfFilesLostBySE.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="SAMProcedure010808.pdf" attr="" comment="" date="1217583512" name="SAMProcedure010808.pdf" path="SAMProcedure010808.pdf" size="476639" stream="SAMProcedure010808.pdf" user="Main.StuartPaterson" version="2"
META FILEATTACHMENT attachment="LHCbSoftwareDeployment.pdf" attr="" comment="" date="1217853807" name="LHCbSoftwareDeployment.pdf" path="LHCbSoftwareDeployment.pdf" size="203140" stream="LHCbSoftwareDeployment.pdf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.txt" attr="" comment="" date="1219011372" name="ProdOpsProcedureTemplate.txt" path="ProdOpsProcedureTemplate.txt" size="240" stream="ProdOpsProcedureTemplate.txt" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1219011424" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="745" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="DataAccessProblems.pdf" attr="" comment="data access check list" date="1220972511" name="DataAccessProblems.pdf" path="DataAccessProblems.pdf" size="1600481" stream="DataAccessProblems.pdf" user="Main.NickBrook" version="4"
META FILEATTACHMENT attachment="feature_request_and_bug_submission.pdf" attr="" comment="Procedure to submit feature requests and report bugs." date="1220613807" name="feature_request_and_bug_submission.pdf" path="feature_request_and_bug_submission.pdf" size="199672" stream="feature_request_and_bug_submission.pdf" user="Main.PaulSzczypka" version="1"

Revision 522009-01-20 - RobertoSantinel

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
Deleted:
<
<
-- StuartPaterson - 31 Jul 2008
 

LHCb Production Operations Procedures

This is the LHCb Production Operations Procedures page which contains procedures for Grid Experts and Grid Shifters to follow. More information about Production Operations and some useful links can be found at the Production Operations page. The draft template for defining production procedures is available here in the following formats: Template (.doc), Template (.rtf), Template (.txt). Each procedure should be posted with a link to the documentation and the desired audience e.g. Grid Expers, Grid Shifters or both.

Line: 37 to 35
  } } }
Changed:
<
<
to desactivate the LFC checking.
>
>
to desactivate the LFC checking.

Building,Deploying & Installing DIRAC (and Core Software)

Building the DIRAC binary distributions

The instructions for how to build the binaries for DIRAC are here. (Grid Experts)

Installing DIRAC on lxplus

  • If you are developing tools for DIRAC, it probably makes sense for you to install your own version of DIRAC from CVS.

Installing DIRAC on non CERN machines

For installing DIRAC on your local machine, you should download and install DIRAC, specifying the version <version> Then in your login script you should include:
  • source $MYSITEROOT/LbLogin.csh or . $MYSITEROOT/LbLogin.sh
and to set up the DIRAC environment (beware if you use ganga this is not needed as done internally by ganga)
  • SetupProject Dirac

Deployment of LHCb Software on the Grid

 

Workload management

Line: 76 to 101
  Test jobs should be submitted to help debug problems. This can be done through DIRAC to test the full chain of Grid submission or could be running an LHCb application directly on the site WN (if you have relevant access permissions) if you know that the problem is confined to the site.
Changed:
<
<

Production workflow

>
>

Production workflow

  * Production Job Finalization
Deleted:
<
<

SQLlite hint

The following SQLlite is meant to provide a template description to feed a GGUS requests of investigation about one of the most recurrent problems.
 

Data Management

Job Data Access Issues

Line: 103 to 125
 

Generating a POOL XML slice for some LFNs

The following is a simple description of the replacement for genCatalog in DIRAC3. This uses the standard LHCb input data resolution policy to obtain access URLs.

Added:
>
>

Determining Versions of LCG Utils, GFAL and the Default SRM2 Protocols List in DIRAC

 
Changed:
<
<

Troubleshooting

>
>
There is a simple non-intrusive script to obtain the external package versions in DIRAC (Grid Shifters and Grid Experts)

DIRAC Configuration Service

 
Changed:
<
<

Investigating failed jobs

>
>

Adding new users

 
Changed:
<
<
If many jobs start to fail at a site they should be immediately investigated.
>
>

Restarting the configuration service

  • The DIRAC configuration service sometime goes down and has to be restarted. (Grid Experts)

Getting information from BDII


DIRAC (and not) Services monitoring

Getting the List of Ports For DIRAC Central Services (and how to ping them)

 
Changed:
<
<

Sites

>
>
The ports for DIRAC central services for a given setup can easily be checked (useful for hardware requests). The procedure for port discovery as well as how to ping a service is available (Grid Shifters and Grid Experts).

SLS sensors

GridMap Overview

CERN Core Services monitoring


Sites Management

 

Banning and allowing sites

  • When to ban a site.
Changed:
<
<

Sites rank

>
>

Sites rank: Unspecified Grid Resources Error....

 
  • In this Rank procedure some guidelines to debug understand why a site is not running payload .

Site Availability Monitoring (SAM) tests

Line: 126 to 167
 
Added:
>
>

Sites Troubleshooting

 
Changed:
<
<

Tickets

>
>

Investigating failed jobs

 
Changed:
<
<

GGUS

>
>
If many jobs start to fail at a site they should be immediately investigated.

SQLlite hint

The following SQLlite is meant to provide a template description to feed a GGUS requests of investigation about one of the most recurrent problems.

Site Problems Follow up

Grid jobs (=pilot jobs in DIRAC terminology) are often failing. In this link an encyclopedic summary of all known issues concerning (mainly) grid jobs issues is reported

Getting in touch with sites: Tickets and Mails

Rules of thumb:

  • It is always good practice to open a GGUS ticket for reporting a problem to a site.
  • A problem is not a problem if a GGUS ahs not been open.
  • We noticed (years of experience) that GGUS + mail to support mailing list at the site is the most efficient and quick way to get in touch with the site.
  • The responsivness increases if T1 local contact person are in the loop. Contact mailing address for T1s are available at https://twiki.cern.ch/twiki/bin/view/LHCb/ProductionProcedures#Mailing_lists
  • In the last release of GGUS portal new flavors of tickets have been introduced that merge the traceability feature of a GGUS ticket and the effectiveness of the direct mail to the site.
  • Some of these 'new' tickets trigger e-mails to special site mailing list (alarming mailing list) that, internally on the site, trigger procedures to react quickly to a problem even outside working hour.

GGUS

 
  • When submitting GGUS tickets, make sure that you CC lhcb-grid@cernNOSPAMPLEASE.ch. This will enable a fast response if the original submitter is not available.
Changed:
<
<

Deployment

Installing DIRAC on lxplus

  • If you are developing tools for DIRAC, it probably makes sense for you to install your own version of DIRAC from CVS.

Installing DIRAC on non CERN machines

For installing DIRAC on your local machine, you should download and install DIRAC, specifying the version <version> Then in your login script you should include:
  • source $MYSITEROOT/LbLogin.csh or . $MYSITEROOT/LbLogin.sh
and to set up the DIRAC environment (beware if you use ganga this is not needed as done internally by ganga)
  • SetupProject Dirac

Deployment of LHCb Software on the Grid

Information System

Getting information from BDII

DIRAC Configuration Service

Adding new users

Restarting the configuration service

  • The DIRAC configuration service sometime goes down and has to be restarted. (Grid Experts)

DIRAC general

Getting the List of Ports For DIRAC Central Services (and how to ping them)

The ports for DIRAC central services for a given setup can easily be checked (useful for hardware requests). The procedure for port discovery as well as how to ping a service is available (Grid Shifters and Grid Experts).

Determining Versions of LCG Utils, GFAL and the Default SRM2 Protocols List in DIRAC

There is a simple non-intrusive script to obtain the external package versions in DIRAC (Grid Shifters and Grid Experts)

Building the DIRAC binary distributions

The instructions for how to build the binaries for DIRAC are here. (Grid Experts)

>
>
  • If the issue breaks or represents a show stopper for some of the MoU activities at T0 or T1, open a GGUS ALARM ticket. Only few people in the VO are entitled to open such kind of ticket and usually these people are grid experts. A list of authorized ALARMERS for LHCb is available here.
  • If the problem concerns T0 and T1 and is an issue for production activity open a TEAM ticket that allows all memeber of a team (all LHCb collaborators with Role=production) to interact and modify the ticket

Site Mail contacts

WLCG "conventional" site-support mailing list are available here.

 

Daily Shifter Checklist

Line: 243 to 251
 
Deleted:
<
<
META FILEATTACHMENT attachment="dirac-primary-states.png" attr="" comment="DIRAC primary job states" date="1221475538" name="dirac-primary-states.png" path="dirac-primary-states.png" size="111571" stream="dirac-primary-states.png" user="Main.GreigCowan" version="1"
META FILEATTACHMENT attachment="dirac-primary-states.pdf" attr="" comment="DIRAC primary job states" date="1221475462" name="dirac-primary-states.pdf" path="dirac-primary-states.pdf" size="33073" stream="dirac-primary-states.pdf" user="Main.GreigCowan" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.doc" attr="" comment="Template (.doc)" date="1217511949" name="ProdOpsProcedureTemplate.doc" path="ProdOpsProcedureTemplate.doc" size="98304" stream="ProdOpsProcedureTemplate.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="RecoveryOfFilesLostBySE.doc" attr="" comment="" date="1217533516" name="RecoveryOfFilesLostBySE.doc" path="RecoveryOfFilesLostBySE.doc" size="105984" stream="RecoveryOfFilesLostBySE.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="SAMProcedure010808.pdf" attr="" comment="" date="1217583512" name="SAMProcedure010808.pdf" path="SAMProcedure010808.pdf" size="476639" stream="SAMProcedure010808.pdf" user="Main.StuartPaterson" version="2"
META FILEATTACHMENT attachment="LHCbSoftwareDeployment.pdf" attr="" comment="" date="1217853807" name="LHCbSoftwareDeployment.pdf" path="LHCbSoftwareDeployment.pdf" size="203140" stream="LHCbSoftwareDeployment.pdf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.txt" attr="" comment="" date="1219011372" name="ProdOpsProcedureTemplate.txt" path="ProdOpsProcedureTemplate.txt" size="240" stream="ProdOpsProcedureTemplate.txt" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1219011424" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="745" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="DataAccessProblems.pdf" attr="" comment="data access check list" date="1220972511" name="DataAccessProblems.pdf" path="DataAccessProblems.pdf" size="1600481" stream="DataAccessProblems.pdf" user="Main.NickBrook" version="4"
 
META FILEATTACHMENT attachment="feature_request_and_bug_submission.pdf" attr="" comment="Procedure to submit feature requests and report bugs." date="1220613807" name="feature_request_and_bug_submission.pdf" path="feature_request_and_bug_submission.pdf" size="199672" stream="feature_request_and_bug_submission.pdf" user="Main.PaulSzczypka" version="1"
Added:
>
>
META FILEATTACHMENT attachment="DataAccessProblems.pdf" attr="" comment="data access check list" date="1220972511" name="DataAccessProblems.pdf" path="DataAccessProblems.pdf" size="1600481" stream="DataAccessProblems.pdf" user="Main.NickBrook" version="4"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1219011424" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="745" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.txt" attr="" comment="" date="1219011372" name="ProdOpsProcedureTemplate.txt" path="ProdOpsProcedureTemplate.txt" size="240" stream="ProdOpsProcedureTemplate.txt" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="LHCbSoftwareDeployment.pdf" attr="" comment="" date="1217853807" name="LHCbSoftwareDeployment.pdf" path="LHCbSoftwareDeployment.pdf" size="203140" stream="LHCbSoftwareDeployment.pdf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="SAMProcedure010808.pdf" attr="" comment="" date="1217583512" name="SAMProcedure010808.pdf" path="SAMProcedure010808.pdf" size="476639" stream="SAMProcedure010808.pdf" user="Main.StuartPaterson" version="2"
META FILEATTACHMENT attachment="RecoveryOfFilesLostBySE.doc" attr="" comment="" date="1217533516" name="RecoveryOfFilesLostBySE.doc" path="RecoveryOfFilesLostBySE.doc" size="105984" stream="RecoveryOfFilesLostBySE.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.doc" attr="" comment="Template (.doc)" date="1217511949" name="ProdOpsProcedureTemplate.doc" path="ProdOpsProcedureTemplate.doc" size="98304" stream="ProdOpsProcedureTemplate.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="dirac-primary-states.pdf" attr="" comment="DIRAC primary job states" date="1221475462" name="dirac-primary-states.pdf" path="dirac-primary-states.pdf" size="33073" stream="dirac-primary-states.pdf" user="Main.GreigCowan" version="1"
META FILEATTACHMENT attachment="dirac-primary-states.png" attr="" comment="DIRAC primary job states" date="1221475538" name="dirac-primary-states.png" path="dirac-primary-states.png" size="111571" stream="dirac-primary-states.png" user="Main.GreigCowan" version="1"

Revision 512009-01-13 - PhilippeCharpentier

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 140 to 140
 
  • If you are developing tools for DIRAC, it probably makes sense for you to install your own version of DIRAC from CVS.

Installing DIRAC on non CERN machines

Changed:
<
<
For installing DIRAC on your local machine, you should download and install DIRAC, specifying the version
   > setenv MYSITEROOT /my/location/to/install/dirac
   > setenv CMTCONFIG slc4_ia32_gcc34
   > cd $MYSITEROOT
   > wget -p http://cern.ch/lhcbproject/dist/install_project.py
   > python install_project.py -p Dirac -v <version>
>
>
For installing DIRAC on your local machine, you should download and install DIRAC, specifying the version <version>
 Then in your login script you should include:
Changed:
<
<
   > source $MYSITEROOT/LbLogin.csh or . $MYSITEROOT/LbLogin.sh
>
>
  • source $MYSITEROOT/LbLogin.csh or . $MYSITEROOT/LbLogin.sh
 and to set up the DIRAC environment (beware if you use ganga this is not needed as done internally by ganga)
Changed:
<
<
   > SetupProject Dirac
>
>
  • SetupProject Dirac
 

Deployment of LHCb Software on the Grid

Revision 502009-01-13 - PhilippeCharpentier

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 140 to 140
 
  • If you are developing tools for DIRAC, it probably makes sense for you to install your own version of DIRAC from CVS.

Installing DIRAC on non CERN machines

Changed:
<
<
to install outside CERN, you should download and install DIRAC (for example v4r2)
>
>
For installing DIRAC on your local machine, you should download and install DIRAC, specifying the version
   > setenv MYSITEROOT /my/location/to/install/dirac
   > setenv CMTCONFIG slc4_ia32_gcc34
   > cd $MYSITEROOT
   > wget -p http://cern.ch/lhcbproject/dist/install_project.py
   > python install_project.py -p Dirac -v <version>
Then in your login script you should include:
   > source $MYSITEROOT/LbLogin.csh or . $MYSITEROOT/LbLogin.sh
and to set up the DIRAC environment (beware if you use ganga this is not needed as done internally by ganga)
   > SetupProject Dirac
 

Deployment of LHCb Software on the Grid

Revision 492009-01-12 - RobertoSantinel

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 116 to 116
 
  • When to ban a site.
Changed:
<
<

Site Availability Monitoring (SAM) tests

>
>

Sites rank

 
Changed:
<
<
SAM tests are used by LHCb to check that basic functionality of each Grid site is working. They run a few times each day at each site. The SAM framework has been updated in DIRAC3 and now runs as a specialized workflow with tailored modules for each functional test. Running the SAM tests is restricted to Grid Experts having the lcgadmin VOMS role.
>
>
  • In this Rank procedure some guidelines to debug understand why a site is not running payload .

Site Availability Monitoring (SAM) tests


SAM tests are used by LHCb to check that basic functionality of each Grid site is working. They run a few times each day at each site. The SAM framework has been updated in DIRAC3 and now runs as a specialized workflow with tailored modules for each functional test. Running the SAM tests is restricted to Grid Experts having the lcgadmin VOMS role.
 
Line: 238 to 240
 
Deleted:
<
<
META FILEATTACHMENT attachment="feature_request_and_bug_submission.pdf" attr="" comment="Procedure to submit feature requests and report bugs." date="1220613807" name="feature_request_and_bug_submission.pdf" path="feature_request_and_bug_submission.pdf" size="199672" stream="feature_request_and_bug_submission.pdf" user="Main.PaulSzczypka" version="1"
META FILEATTACHMENT attachment="DataAccessProblems.pdf" attr="" comment="data access check list" date="1220972511" name="DataAccessProblems.pdf" path="DataAccessProblems.pdf" size="1600481" stream="DataAccessProblems.pdf" user="Main.NickBrook" version="4"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1219011424" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="745" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.txt" attr="" comment="" date="1219011372" name="ProdOpsProcedureTemplate.txt" path="ProdOpsProcedureTemplate.txt" size="240" stream="ProdOpsProcedureTemplate.txt" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="LHCbSoftwareDeployment.pdf" attr="" comment="" date="1217853807" name="LHCbSoftwareDeployment.pdf" path="LHCbSoftwareDeployment.pdf" size="203140" stream="LHCbSoftwareDeployment.pdf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="SAMProcedure010808.pdf" attr="" comment="" date="1217583512" name="SAMProcedure010808.pdf" path="SAMProcedure010808.pdf" size="476639" stream="SAMProcedure010808.pdf" user="Main.StuartPaterson" version="2"
META FILEATTACHMENT attachment="RecoveryOfFilesLostBySE.doc" attr="" comment="" date="1217533516" name="RecoveryOfFilesLostBySE.doc" path="RecoveryOfFilesLostBySE.doc" size="105984" stream="RecoveryOfFilesLostBySE.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.doc" attr="" comment="Template (.doc)" date="1217511949" name="ProdOpsProcedureTemplate.doc" path="ProdOpsProcedureTemplate.doc" size="98304" stream="ProdOpsProcedureTemplate.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="dirac-primary-states.pdf" attr="" comment="DIRAC primary job states" date="1221475462" name="dirac-primary-states.pdf" path="dirac-primary-states.pdf" size="33073" stream="dirac-primary-states.pdf" user="Main.GreigCowan" version="1"
 
META FILEATTACHMENT attachment="dirac-primary-states.png" attr="" comment="DIRAC primary job states" date="1221475538" name="dirac-primary-states.png" path="dirac-primary-states.png" size="111571" stream="dirac-primary-states.png" user="Main.GreigCowan" version="1"
Added:
>
>
META FILEATTACHMENT attachment="dirac-primary-states.pdf" attr="" comment="DIRAC primary job states" date="1221475462" name="dirac-primary-states.pdf" path="dirac-primary-states.pdf" size="33073" stream="dirac-primary-states.pdf" user="Main.GreigCowan" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.doc" attr="" comment="Template (.doc)" date="1217511949" name="ProdOpsProcedureTemplate.doc" path="ProdOpsProcedureTemplate.doc" size="98304" stream="ProdOpsProcedureTemplate.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="RecoveryOfFilesLostBySE.doc" attr="" comment="" date="1217533516" name="RecoveryOfFilesLostBySE.doc" path="RecoveryOfFilesLostBySE.doc" size="105984" stream="RecoveryOfFilesLostBySE.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="SAMProcedure010808.pdf" attr="" comment="" date="1217583512" name="SAMProcedure010808.pdf" path="SAMProcedure010808.pdf" size="476639" stream="SAMProcedure010808.pdf" user="Main.StuartPaterson" version="2"
META FILEATTACHMENT attachment="LHCbSoftwareDeployment.pdf" attr="" comment="" date="1217853807" name="LHCbSoftwareDeployment.pdf" path="LHCbSoftwareDeployment.pdf" size="203140" stream="LHCbSoftwareDeployment.pdf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.txt" attr="" comment="" date="1219011372" name="ProdOpsProcedureTemplate.txt" path="ProdOpsProcedureTemplate.txt" size="240" stream="ProdOpsProcedureTemplate.txt" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1219011424" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="745" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="DataAccessProblems.pdf" attr="" comment="data access check list" date="1220972511" name="DataAccessProblems.pdf" path="DataAccessProblems.pdf" size="1600481" stream="DataAccessProblems.pdf" user="Main.NickBrook" version="4"
META FILEATTACHMENT attachment="feature_request_and_bug_submission.pdf" attr="" comment="Procedure to submit feature requests and report bugs." date="1220613807" name="feature_request_and_bug_submission.pdf" path="feature_request_and_bug_submission.pdf" size="199672" stream="feature_request_and_bug_submission.pdf" user="Main.PaulSzczypka" version="1"

Revision 482009-01-06 - PhilippeCharpentier

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 22 to 22
 
Changed:
<
<
  • source $MYSITEROOT/LbLogin.csh
or
  • . $MYSITEROOT/LbLogin.sh
>
>
  • source $MYSITEROOT/LbLogin.csh or . $MYSITEROOT/LbLogin.sh
 

To use Dirac on a machine without lcg stuff, you need to add in the $DIRACROOT/etc/dirac.cfg the following lines

Line: 146 to 144
 
Changed:
<
<
  • source $MYSITEROOT/LbLogin.csh
or
  • . $MYSITEROOT/LbLogin.sh
>
>
  • source $MYSITEROOT/LbLogin.csh or . $MYSITEROOT/LbLogin.sh
 

Deployment of LHCb Software on the Grid

Revision 472008-12-17 - RobertoSantinel

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 39 to 39
  } } }
Changed:
<
<
to desactivate the LFC checking.
>
>
to desactivate the LFC checking.
 

Workload management

Line: 82 to 81
 

Production workflow

* Production Job Finalization

Added:
>
>

SQLlite hint

The following SQLlite is meant to provide a template description to feed a GGUS requests of investigation about one of the most recurrent problems.
 

Data Management

Line: 242 to 243
 
Deleted:
<
<
META FILEATTACHMENT attachment="dirac-primary-states.png" attr="" comment="DIRAC primary job states" date="1221475538" name="dirac-primary-states.png" path="dirac-primary-states.png" size="111571" stream="dirac-primary-states.png" user="Main.GreigCowan" version="1"
META FILEATTACHMENT attachment="dirac-primary-states.pdf" attr="" comment="DIRAC primary job states" date="1221475462" name="dirac-primary-states.pdf" path="dirac-primary-states.pdf" size="33073" stream="dirac-primary-states.pdf" user="Main.GreigCowan" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.doc" attr="" comment="Template (.doc)" date="1217511949" name="ProdOpsProcedureTemplate.doc" path="ProdOpsProcedureTemplate.doc" size="98304" stream="ProdOpsProcedureTemplate.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="RecoveryOfFilesLostBySE.doc" attr="" comment="" date="1217533516" name="RecoveryOfFilesLostBySE.doc" path="RecoveryOfFilesLostBySE.doc" size="105984" stream="RecoveryOfFilesLostBySE.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="SAMProcedure010808.pdf" attr="" comment="" date="1217583512" name="SAMProcedure010808.pdf" path="SAMProcedure010808.pdf" size="476639" stream="SAMProcedure010808.pdf" user="Main.StuartPaterson" version="2"
META FILEATTACHMENT attachment="LHCbSoftwareDeployment.pdf" attr="" comment="" date="1217853807" name="LHCbSoftwareDeployment.pdf" path="LHCbSoftwareDeployment.pdf" size="203140" stream="LHCbSoftwareDeployment.pdf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.txt" attr="" comment="" date="1219011372" name="ProdOpsProcedureTemplate.txt" path="ProdOpsProcedureTemplate.txt" size="240" stream="ProdOpsProcedureTemplate.txt" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1219011424" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="745" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="DataAccessProblems.pdf" attr="" comment="data access check list" date="1220972511" name="DataAccessProblems.pdf" path="DataAccessProblems.pdf" size="1600481" stream="DataAccessProblems.pdf" user="Main.NickBrook" version="4"
 
META FILEATTACHMENT attachment="feature_request_and_bug_submission.pdf" attr="" comment="Procedure to submit feature requests and report bugs." date="1220613807" name="feature_request_and_bug_submission.pdf" path="feature_request_and_bug_submission.pdf" size="199672" stream="feature_request_and_bug_submission.pdf" user="Main.PaulSzczypka" version="1"
Added:
>
>
META FILEATTACHMENT attachment="DataAccessProblems.pdf" attr="" comment="data access check list" date="1220972511" name="DataAccessProblems.pdf" path="DataAccessProblems.pdf" size="1600481" stream="DataAccessProblems.pdf" user="Main.NickBrook" version="4"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1219011424" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="745" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.txt" attr="" comment="" date="1219011372" name="ProdOpsProcedureTemplate.txt" path="ProdOpsProcedureTemplate.txt" size="240" stream="ProdOpsProcedureTemplate.txt" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="LHCbSoftwareDeployment.pdf" attr="" comment="" date="1217853807" name="LHCbSoftwareDeployment.pdf" path="LHCbSoftwareDeployment.pdf" size="203140" stream="LHCbSoftwareDeployment.pdf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="SAMProcedure010808.pdf" attr="" comment="" date="1217583512" name="SAMProcedure010808.pdf" path="SAMProcedure010808.pdf" size="476639" stream="SAMProcedure010808.pdf" user="Main.StuartPaterson" version="2"
META FILEATTACHMENT attachment="RecoveryOfFilesLostBySE.doc" attr="" comment="" date="1217533516" name="RecoveryOfFilesLostBySE.doc" path="RecoveryOfFilesLostBySE.doc" size="105984" stream="RecoveryOfFilesLostBySE.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.doc" attr="" comment="Template (.doc)" date="1217511949" name="ProdOpsProcedureTemplate.doc" path="ProdOpsProcedureTemplate.doc" size="98304" stream="ProdOpsProcedureTemplate.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="dirac-primary-states.pdf" attr="" comment="DIRAC primary job states" date="1221475462" name="dirac-primary-states.pdf" path="dirac-primary-states.pdf" size="33073" stream="dirac-primary-states.pdf" user="Main.GreigCowan" version="1"
META FILEATTACHMENT attachment="dirac-primary-states.png" attr="" comment="DIRAC primary job states" date="1221475538" name="dirac-primary-states.png" path="dirac-primary-states.png" size="111571" stream="dirac-primary-states.png" user="Main.GreigCowan" version="1"

Revision 462008-12-16 - GreigCowan

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 105 to 105
  The following is a simple description of the replacement for genCatalog in DIRAC3. This uses the standard LHCb input data resolution policy to obtain access URLs.
Added:
>
>

Troubleshooting

Investigating failed jobs

If many jobs start to fail at a site they should be immediately investigated.

 

Sites

Banning and allowing sites

Revision 452008-12-12 - RajaNandakumar

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 183 to 183
 
  • Each day the shifter should routinely check the following items to ensure the smooth running of distributed computing for LHCb.
Added:
>
>

End of production Checklist

 

Miscellaneous

Feature Requests and Bug Reports

Revision 442008-12-10 - GreigCowan

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 75 to 75
  Jobs moving into the Stalled state is OK, as they can recover again and complete successfully. However, if the job has been stalled for many days, action should be taken.
Added:
>
>

Submitting test jobs

Test jobs should be submitted to help debug problems. This can be done through DIRAC to test the full chain of Grid submission or could be running an LHCb application directly on the site WN (if you have relevant access permissions) if you know that the problem is confined to the site.

 

Production workflow

* Production Job Finalization

Revision 432008-12-09 - GreigCowan

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 175 to 175
  The instructions for how to build the binaries for DIRAC are here. (Grid Experts)
Added:
>
>

Daily Shifter Checklist

  • Each day the shifter should routinely check the following items to ensure the smooth running of distributed computing for LHCb.
 

Miscellaneous

Feature Requests and Bug Reports

Revision 422008-12-09 - GreigCowan

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 101 to 101
  The following is a simple description of the replacement for genCatalog in DIRAC3. This uses the standard LHCb input data resolution policy to obtain access URLs.
Changed:
<
<

Site Availability Monitoring (SAM) tests

>
>

Sites

Banning and allowing sites

  • When to ban a site.

Site Availability Monitoring (SAM) tests

  SAM tests are used by LHCb to check that basic functionality of each Grid site is working. They run a few times each day at each site. The SAM framework has been updated in DIRAC3 and now runs as a specialized workflow with tailored modules for each functional test. Running the SAM tests is restricted to Grid Experts having the lcgadmin VOMS role.

Revision 412008-12-09 - GreigCowan

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 136 to 136
 

Deployment of LHCb Software on the Grid

Added:
>
>
 

Information System

Revision 402008-12-05 - JoelClosier

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 16 to 16
 

non CERN machines

Changed:
<
<
to install outside CERN, you should download and install DIRAC (for example v4r0p2)
>
>
to install outside CERN, you should download and install DIRAC (for example v4r2)
 or
Changed:
<
<
  • ./ProductionEnv.sh
>
>
  To use Dirac on a machine without lcg stuff, you need to add in the $DIRACROOT/etc/dirac.cfg the following lines
Line: 120 to 123
 
  • If you are developing tools for DIRAC, it probably makes sense for you to install your own version of DIRAC from CVS.

Installing DIRAC on non CERN machines

Changed:
<
<
to install outside CERN, you should download and install DIRAC (for example v4r0p2)
>
>
to install outside CERN, you should download and install DIRAC (for example v4r2)
 or
Changed:
<
<
>
>
 

Deployment of LHCb Software on the Grid

Revision 392008-12-02 - StuartPaterson

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 94 to 94
  Simple four steps to know how to check if a file is cached or not (Grid Shifters and Grid Experts)
Added:
>
>

Generating a POOL XML slice for some LFNs

The following is a simple description of the replacement for genCatalog in DIRAC3. This uses the standard LHCb input data resolution policy to obtain access URLs.

 

Site Availability Monitoring (SAM) tests

SAM tests are used by LHCb to check that basic functionality of each Grid site is working. They run a few times each day at each site. The SAM framework has been updated in DIRAC3 and now runs as a specialized workflow with tailored modules for each functional test. Running the SAM tests is restricted to Grid Experts having the lcgadmin VOMS role.

Revision 382008-11-13 - SposS

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 22 to 22
  or
Changed:
<
<
>
>
  • ./ProductionEnv.sh

To use Dirac on a machine without lcg stuff, you need to add in the $DIRACROOT/etc/dirac.cfg the following lines

Resources
{
  FileCatalogs
  {
    LcgFileCatalogCombined
    {
      Status = InActive
    }
  }
}
to desactivate the LFC checking.
 

Workload management

Revision 372008-11-07 - JoelClosier

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 16 to 16
 

non CERN machines

Changed:
<
<
to install outside CERN, you should download and install DIRAC (foor example v3r3)
>
>
to install outside CERN, you should download and install DIRAC (for example v4r0p2)
 
  • cd /my/location/to/install/dirac
Changed:
<
<
  • tar zxf DIRAC-scripts-v3r3.tar.gz
  • scripts/dirac-install -v v3r3
>
>
  or
Line: 101 to 100
 
  • If you are developing tools for DIRAC, it probably makes sense for you to install your own version of DIRAC from CVS.
Added:
>
>

Installing DIRAC on non CERN machines

to install outside CERN, you should download and install DIRAC (for example v4r0p2) or
 

Deployment of LHCb Software on the Grid

Revision 362008-10-16 - AndreiTsaregorodtsev

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 58 to 58
  Jobs moving into the Stalled state is OK, as they can recover again and complete successfully. However, if the job has been stalled for many days, action should be taken.
Added:
>
>

Production workflow

* Production Job Finalization

 

Data Management

Job Data Access Issues

Revision 352008-10-13 - StuartPaterson

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 127 to 127
  There is a simple non-intrusive script to obtain the external package versions in DIRAC (Grid Shifters and Grid Experts)
Added:
>
>

Building the DIRAC binary distributions

The instructions for how to build the binaries for DIRAC are here. (Grid Experts)

 

Miscellaneous

Feature Requests and Bug Reports

Revision 342008-10-09 - JoelClosier

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 10 to 10
  An attempt to categorize common procedures has been made below, feel free to update this list as appropriate.
Added:
>
>

Setting Production environment

CERN - lxplus

After you login on lxplus, you have to run the two following commands to get the production environment

non CERN machines

to install outside CERN, you should download and install DIRAC (foor example v3r3) or
 

Workload management

Primary job states in DIRAC

Revision 332008-10-08 - StuartPaterson

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 22 to 22
  The procedure for search string in output for selected jobs (suitable for Grid Shifters and Grid Experts).
Added:
>
>

Running Jobs Locally With DIRAC

There are three submission modes associated with DIRAC: default WMS submission, local execution of a workflow and finally execution of a workflow in the full agent machinery. This procedure explains the steps for running jobs locally with DIRAC.
 

Production management

Dealing with Production IDs, Production Job IDs and WMS Job IDs

Revision 322008-10-06 - RajaNandakumar

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 115 to 115
 
Added:
>
>

DIRAC3 service problems on 4 October 2008

The problem and its resolution

 

Mailing lists

  • lhcb-dirac : all people involved in development of DIRAC

Revision 312008-09-30 - PhilippeCharpentier

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 117 to 117
 

Mailing lists

Changed:
<
<
  • lhcb-dirac : all people involved in development of DIRAC
  • lhcb-gridshifters : all Computing shifters : Used for annoucing events concerning the shifts
  • lhcb-ganga : LHCb ganga developers
  • lhcb-lcgcontact : LHCb people interacting with LCG
  • lhcb-datamanagement : LHCb support for Datamanagement
  • lhcb-bookkeeping : LHCb support for the bookkeeping
  • lhcb-sam : LHCb support for SAM test suite
  • lhcb-production-managers : LHCb people in harge of the management of the production
  • lhcb-gridresources : LHCb people in charge of the LHCb GRID resources
  • lhcb-cern-contact : LHCb contact at CERN
  • lhcb-cnaf-contact : LHCb contact at CNAF
  • lhcb-gridka-contact : LHCb contact at GRIDKA
  • lhcb-in2p3-contact : LHCb contact at IN2P3
  • lhcb-nikhef-contact : LHCb contact at NIKHEF
  • lhcb-pic-contact : LHCb contact at PIC
  • lhcb-ral-contact : LHCb contact at RAL
  • lhcb-grid-alarms : lhcb-dirac + lhcb-grditeam
  • lhcb-grid : lhb-grid-alarms + lhcb-gridshifters
  • lhcb-paste : lhcb-grid + lhcb-ganga
  • lhcb-gridteam : lhcb-lcgcontact + lhcb-datamanagement + lhcb-bookkeeping + lhcb-sam + lhcb-production-managers + lhcb-gridresources + lhcb-cern-contact + lhcb-cnaf-contact + lhcb-gridka-contact + lhcb-in2p3-contact + lhcb-nikhef-contact + lhcb-pic-contact + lhcb-ral-contact
  • lhcb-production : lhcb-grid + lhcb-ppg
  • lhcb-distributed-analysis : lhcb-grid + all members of LHCb interested
>
>
 

Questions and comments to the experts

Revision 302008-09-22 - GreigCowan

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 36 to 36
  Each time a new production is created, the Grid Shifter should check that a few jobs are successful prior to submitting the entire batch.
Added:
>
>

How to deal with Stalled jobs

Jobs moving into the Stalled state is OK, as they can recover again and complete successfully. However, if the job has been stalled for many days, action should be taken.

 

Data Management

Job Data Access Issues

Revision 292008-09-21 - GreigCowan

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 54 to 54
  Simple four steps to know how to check if a file is cached or not (Grid Shifters and Grid Experts)
Added:
>
>

Site Availability Monitoring (SAM) tests

SAM tests are used by LHCb to check that basic functionality of each Grid site is working. They run a few times each day at each site. The SAM framework has been updated in DIRAC3 and now runs as a specialized workflow with tailored modules for each functional test. Running the SAM tests is restricted to Grid Experts having the lcgadmin VOMS role.

 

Tickets

GGUS

Line: 66 to 75
 
  • If you are developing tools for DIRAC, it probably makes sense for you to install your own version of DIRAC from CVS.
Deleted:
<
<

Site Functional Tests

The SAM framework has been updated in DIRAC3 and now runs as a specialized workflow with tailored modules for each functional test. Running the SAM tests is restricted to Grid Experts having the lcgadmin VOMS role.

 

Deployment of LHCb Software on the Grid

Line: 88 to 91
 
Added:
>
>

Restarting the configuration service

  • The DIRAC configuration service sometime goes down and has to be restarted. (Grid Experts)
 

DIRAC general

Getting the List of Ports For DIRAC Central Services (and how to ping them)

Revision 282008-09-21 - VladimirRomanovskiy

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 14 to 14
 

Primary job states in DIRAC

Changed:
<
<
dirac-primary-states
>
>
dirac-primary-states
 

Job Management Operations

Changed:
<
<
The procedures for investigating jobs (suitable for Grid Shifters and Grid Experts) is coming.
>
>
The procedures for investigating production jobs (suitable for Grid Shifters and Grid Experts) is coming.

The procedure for search string in output for selected jobs (suitable for Grid Shifters and Grid Experts).

 

Production management

Line: 129 to 125
 
  • lhcb-grid-alarms : lhcb-dirac + lhcb-grditeam
  • lhcb-grid : lhb-grid-alarms + lhcb-gridshifters
  • lhcb-paste : lhcb-grid + lhcb-ganga
Changed:
<
<
  • lhcb-gridteam : lhcb-lcgcontact + lhcb-datamanagement + lhcb-bookkeeping + lhcb-sam + lhcb-production-managers + lhcb-gridresources + lhcb-cern-contact + lhcb-cnaf-contact + lhcb-gridka-contact + lhcb-in2p3-contact + lhcb-nikhef-contact + lhcb-pic-contact + lhcb-ral-contact
>
>
  • lhcb-gridteam : lhcb-lcgcontact + lhcb-datamanagement + lhcb-bookkeeping + lhcb-sam + lhcb-production-managers + lhcb-gridresources + lhcb-cern-contact + lhcb-cnaf-contact + lhcb-gridka-contact + lhcb-in2p3-contact + lhcb-nikhef-contact + lhcb-pic-contact + lhcb-ral-contact
 
  • lhcb-production : lhcb-grid + lhcb-ppg
  • lhcb-distributed-analysis : lhcb-grid + all members of LHCb interested
Line: 149 to 143
 
Deleted:
<
<
META FILEATTACHMENT attachment="feature_request_and_bug_submission.pdf" attr="" comment="Procedure to submit feature requests and report bugs." date="1220613807" name="feature_request_and_bug_submission.pdf" path="feature_request_and_bug_submission.pdf" size="199672" stream="feature_request_and_bug_submission.pdf" user="Main.PaulSzczypka" version="1"
META FILEATTACHMENT attachment="DataAccessProblems.pdf" attr="" comment="data access check list" date="1220972511" name="DataAccessProblems.pdf" path="DataAccessProblems.pdf" size="1600481" stream="DataAccessProblems.pdf" user="Main.NickBrook" version="4"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1219011424" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="745" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.txt" attr="" comment="" date="1219011372" name="ProdOpsProcedureTemplate.txt" path="ProdOpsProcedureTemplate.txt" size="240" stream="ProdOpsProcedureTemplate.txt" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="LHCbSoftwareDeployment.pdf" attr="" comment="" date="1217853807" name="LHCbSoftwareDeployment.pdf" path="LHCbSoftwareDeployment.pdf" size="203140" stream="LHCbSoftwareDeployment.pdf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="SAMProcedure010808.pdf" attr="" comment="" date="1217583512" name="SAMProcedure010808.pdf" path="SAMProcedure010808.pdf" size="476639" stream="SAMProcedure010808.pdf" user="Main.StuartPaterson" version="2"
META FILEATTACHMENT attachment="RecoveryOfFilesLostBySE.doc" attr="" comment="" date="1217533516" name="RecoveryOfFilesLostBySE.doc" path="RecoveryOfFilesLostBySE.doc" size="105984" stream="RecoveryOfFilesLostBySE.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.doc" attr="" comment="Template (.doc)" date="1217511949" name="ProdOpsProcedureTemplate.doc" path="ProdOpsProcedureTemplate.doc" size="98304" stream="ProdOpsProcedureTemplate.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="dirac-primary-states.pdf" attr="" comment="DIRAC primary job states" date="1221475462" name="dirac-primary-states.pdf" path="dirac-primary-states.pdf" size="33073" stream="dirac-primary-states.pdf" user="Main.GreigCowan" version="1"
 
META FILEATTACHMENT attachment="dirac-primary-states.png" attr="" comment="DIRAC primary job states" date="1221475538" name="dirac-primary-states.png" path="dirac-primary-states.png" size="111571" stream="dirac-primary-states.png" user="Main.GreigCowan" version="1"
Added:
>
>
META FILEATTACHMENT attachment="dirac-primary-states.pdf" attr="" comment="DIRAC primary job states" date="1221475462" name="dirac-primary-states.pdf" path="dirac-primary-states.pdf" size="33073" stream="dirac-primary-states.pdf" user="Main.GreigCowan" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.doc" attr="" comment="Template (.doc)" date="1217511949" name="ProdOpsProcedureTemplate.doc" path="ProdOpsProcedureTemplate.doc" size="98304" stream="ProdOpsProcedureTemplate.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="RecoveryOfFilesLostBySE.doc" attr="" comment="" date="1217533516" name="RecoveryOfFilesLostBySE.doc" path="RecoveryOfFilesLostBySE.doc" size="105984" stream="RecoveryOfFilesLostBySE.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="SAMProcedure010808.pdf" attr="" comment="" date="1217583512" name="SAMProcedure010808.pdf" path="SAMProcedure010808.pdf" size="476639" stream="SAMProcedure010808.pdf" user="Main.StuartPaterson" version="2"
META FILEATTACHMENT attachment="LHCbSoftwareDeployment.pdf" attr="" comment="" date="1217853807" name="LHCbSoftwareDeployment.pdf" path="LHCbSoftwareDeployment.pdf" size="203140" stream="LHCbSoftwareDeployment.pdf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.txt" attr="" comment="" date="1219011372" name="ProdOpsProcedureTemplate.txt" path="ProdOpsProcedureTemplate.txt" size="240" stream="ProdOpsProcedureTemplate.txt" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1219011424" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="745" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="DataAccessProblems.pdf" attr="" comment="data access check list" date="1220972511" name="DataAccessProblems.pdf" path="DataAccessProblems.pdf" size="1600481" stream="DataAccessProblems.pdf" user="Main.NickBrook" version="4"
META FILEATTACHMENT attachment="feature_request_and_bug_submission.pdf" attr="" comment="Procedure to submit feature requests and report bugs." date="1220613807" name="feature_request_and_bug_submission.pdf" path="feature_request_and_bug_submission.pdf" size="199672" stream="feature_request_and_bug_submission.pdf" user="Main.PaulSzczypka" version="1"

Revision 272008-09-19 - GreigCowan

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 33 to 33
  A brief guide describing how to check the status of given productions in the Production DB is available (Grid Shifters and Grid Experts)
Changed:
<
<

Checking if a file is cached

>
>

How to Validate a new production

 
Changed:
<
<
Simple four steps to know how to check if a file is cached or not (Grid Shifters and Grid Experts)
>
>
Each time a new production is created, the Grid Shifter should check that a few jobs are successful prior to submitting the entire batch.
 

Data Management

Line: 51 to 51
  The order of the list of protocols supplied to SRM can be changed e.g. testing root at NIKHEF means root is prepended to the list. This guide explains how to change the protocols list for a given site. This operation is restricted to those with the diracAdmin role. (Grid Experts)
Added:
>
>

Checking if a file is cached

Simple four steps to know how to check if a file is cached or not (Grid Shifters and Grid Experts)

 

Tickets

Revision 262008-09-18 - RicardoVazquez

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 33 to 33
  A brief guide describing how to check the status of given productions in the Production DB is available (Grid Shifters and Grid Experts)
Added:
>
>

Checking if a file is cached

 
Added:
>
>
Simple four steps to know how to check if a file is cached or not (Grid Shifters and Grid Experts)
 

Data Management

Revision 252008-09-16 - JoelClosier

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 107 to 107
 
  • lhcb-dirac : all people involved in development of DIRAC
  • lhcb-gridshifters : all Computing shifters : Used for annoucing events concerning the shifts
  • lhcb-ganga : LHCb ganga developers
Changed:
<
<
  • lhcb-lcgcontact :
  • lhcb-datamanagement :
  • lhcb-bookkeeping :
  • lhcb-sam :
  • lhcb-production-managers :
  • lhcb-gridresources :
>
>
  • lhcb-lcgcontact : LHCb people interacting with LCG
  • lhcb-datamanagement : LHCb support for Datamanagement
  • lhcb-bookkeeping : LHCb support for the bookkeeping
  • lhcb-sam : LHCb support for SAM test suite
  • lhcb-production-managers : LHCb people in harge of the management of the production
  • lhcb-gridresources : LHCb people in charge of the LHCb GRID resources
 
  • lhcb-cern-contact : LHCb contact at CERN
  • lhcb-cnaf-contact : LHCb contact at CNAF
  • lhcb-gridka-contact : LHCb contact at GRIDKA
Line: 125 to 125
 
  • lhcb-paste : lhcb-grid + lhcb-ganga
  • lhcb-gridteam : lhcb-lcgcontact + lhcb-datamanagement + lhcb-bookkeeping + lhcb-sam + lhcb-production-managers + lhcb-gridresources + lhcb-cern-contact + lhcb-cnaf-contact + lhcb-gridka-contact + lhcb-in2p3-contact + lhcb-nikhef-contact + lhcb-pic-contact + lhcb-ral-contact
Changed:
<
<
  • lhcb-production :
>
>
  • lhcb-production : lhcb-grid + lhcb-ppg
 
  • lhcb-distributed-analysis : lhcb-grid + all members of LHCb interested

Questions and comments to the experts

Revision 242008-09-15 - GreigCowan

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 58 to 58
 

Deployment

Added:
>
>

Installing DIRAC on lxplus

  • If you are developing tools for DIRAC, it probably makes sense for you to install your own version of DIRAC from CVS.
 

Site Functional Tests

The SAM framework has been updated in DIRAC3 and now runs as a specialized workflow with tailored modules for each functional test. Running the SAM tests is restricted to Grid Experts having the lcgadmin VOMS role.

Revision 232008-09-15 - JoelClosier

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 98 to 98
 
Added:
>
>

Mailing lists

  • lhcb-dirac : all people involved in development of DIRAC
  • lhcb-gridshifters : all Computing shifters : Used for annoucing events concerning the shifts
  • lhcb-ganga : LHCb ganga developers
  • lhcb-lcgcontact :
  • lhcb-datamanagement :
  • lhcb-bookkeeping :
  • lhcb-sam :
  • lhcb-production-managers :
  • lhcb-gridresources :
  • lhcb-cern-contact : LHCb contact at CERN
  • lhcb-cnaf-contact : LHCb contact at CNAF
  • lhcb-gridka-contact : LHCb contact at GRIDKA
  • lhcb-in2p3-contact : LHCb contact at IN2P3
  • lhcb-nikhef-contact : LHCb contact at NIKHEF
  • lhcb-pic-contact : LHCb contact at PIC
  • lhcb-ral-contact : LHCb contact at RAL
  • lhcb-grid-alarms : lhcb-dirac + lhcb-grditeam
  • lhcb-grid : lhb-grid-alarms + lhcb-gridshifters
  • lhcb-paste : lhcb-grid + lhcb-ganga
  • lhcb-gridteam : lhcb-lcgcontact + lhcb-datamanagement + lhcb-bookkeeping + lhcb-sam + lhcb-production-managers + lhcb-gridresources + lhcb-cern-contact + lhcb-cnaf-contact + lhcb-gridka-contact + lhcb-in2p3-contact + lhcb-nikhef-contact + lhcb-pic-contact + lhcb-ral-contact
  • lhcb-production :
  • lhcb-distributed-analysis : lhcb-grid + all members of LHCb interested
 

Questions and comments to the experts

Revision 222008-09-15 - RicardoVazquez

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 98 to 98
 
Added:
>
>

Questions and comments to the experts

 

Documents

Revision 212008-09-15 - GreigCowan

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 10 to 10
  An attempt to categorize common procedures has been made below, feel free to update this list as appropriate.
Changed:
<
<

Job Management Operations

>
>

Workload management

 
Changed:
<
<
The procedures for investigating jobs (suitable for Grid Shifters and Grid Experts) is coming.
>
>

Primary job states in DIRAC

 
Changed:
<
<

Job Data Access Issues

>
>
dirac-primary-states
 
Deleted:
<
<
The following document is meant to give Grid shifters a few hints on things they can check when a job has problem accessing data on a site. (Grid Shifters and Grid Experts)
 
Changed:
<
<

Dealing with Production IDs, Production Job IDs and WMS Job IDs

>
>

Job Management Operations

The procedures for investigating jobs (suitable for Grid Shifters and Grid Experts) is coming.

Production management

Dealing with Production IDs, Production Job IDs and WMS Job IDs

  This simple guide shows how to obtain production IDs from WMS job IDs and vice versa (Grid Shifters and Grid Experts)
Changed:
<
<

Checking the Status of Files in the Production Database

>
>

Checking the Status of Files in the Production Database

  A brief guide describing how to check the status of given productions in the Production DB is available (Grid Shifters and Grid Experts)
Changed:
<
<

Data Management Operations

>
>

Data Management

Job Data Access Issues

The following document is meant to give Grid shifters a few hints on things they can check when a job has problem accessing data on a site. (Grid Shifters and Grid Experts)

File recovery

 
Changed:
<
<

Site Functional Tests

>
>

Changing the Default Protocols List for a given Site (Tier-1)

The order of the list of protocols supplied to SRM can be changed e.g. testing root at NIKHEF means root is prepended to the list. This guide explains how to change the protocols list for a given site. This operation is restricted to those with the diracAdmin role. (Grid Experts)

Tickets

GGUS

  • When submitting GGUS tickets, make sure that you CC lhcb-grid@cernNOSPAMPLEASE.ch. This will enable a fast response if the original submitter is not available.

Deployment

Site Functional Tests

  The SAM framework has been updated in DIRAC3 and now runs as a specialized workflow with tailored modules for each functional test. Running the SAM tests is restricted to Grid Experts having the lcgadmin VOMS role.

Changed:
<
<

Deployment of LHCb Software on the Grid

>
>

Deployment of LHCb Software on the Grid

 
Deleted:
<
<

Getting information from BDII

 
Changed:
<
<
>
>

Information System

 
Changed:
<
<

Determining Versions of LCG Utils, GFAL and the Default SRM2 Protocols List in DIRAC

>
>

Getting information from BDII

 
Changed:
<
<
There is a simple non-intrusive script to obtain the external package versions in DIRAC (Grid Shifters and Grid Experts)
>
>
 
Added:
>
>

DIRAC Configuration Service

 
Changed:
<
<

Changing the Default Protocols List for a given Site (Tier-1)

>
>

Adding new users

 
Changed:
<
<
The order of the list of protocols supplied to SRM can be changed e.g. testing root at NIKHEF means root is prepended to the list. This guide explains how to change the protocols list for a given site. This operation is restricted to those with the diracAdmin role. (Grid Experts)
>
>

DIRAC general

 
Changed:
<
<

Getting the List of Ports For DIRAC Central Services (and how to ping them)

>
>

Getting the List of Ports For DIRAC Central Services (and how to ping them)

  The ports for DIRAC central services for a given setup can easily be checked (useful for hardware requests). The procedure for port discovery as well as how to ping a service is available (Grid Shifters and Grid Experts).
Changed:
<
<

Feature Requests and Bug Reports

>
>

Determining Versions of LCG Utils, GFAL and the Default SRM2 Protocols List in DIRAC

There is a simple non-intrusive script to obtain the external package versions in DIRAC (Grid Shifters and Grid Experts)

Miscellaneous

Feature Requests and Bug Reports

 
Line: 78 to 115
 
META FILEATTACHMENT attachment="SAMProcedure010808.pdf" attr="" comment="" date="1217583512" name="SAMProcedure010808.pdf" path="SAMProcedure010808.pdf" size="476639" stream="SAMProcedure010808.pdf" user="Main.StuartPaterson" version="2"
META FILEATTACHMENT attachment="RecoveryOfFilesLostBySE.doc" attr="" comment="" date="1217533516" name="RecoveryOfFilesLostBySE.doc" path="RecoveryOfFilesLostBySE.doc" size="105984" stream="RecoveryOfFilesLostBySE.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.doc" attr="" comment="Template (.doc)" date="1217511949" name="ProdOpsProcedureTemplate.doc" path="ProdOpsProcedureTemplate.doc" size="98304" stream="ProdOpsProcedureTemplate.doc" user="Main.StuartPaterson" version="1"
Added:
>
>
META FILEATTACHMENT attachment="dirac-primary-states.pdf" attr="" comment="DIRAC primary job states" date="1221475462" name="dirac-primary-states.pdf" path="dirac-primary-states.pdf" size="33073" stream="dirac-primary-states.pdf" user="Main.GreigCowan" version="1"
META FILEATTACHMENT attachment="dirac-primary-states.png" attr="" comment="DIRAC primary job states" date="1221475538" name="dirac-primary-states.png" path="dirac-primary-states.png" size="111571" stream="dirac-primary-states.png" user="Main.GreigCowan" version="1"

Revision 202008-09-09 - StuartPaterson

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 48 to 48
  There is a simple non-intrusive script to obtain the external package versions in DIRAC (Grid Shifters and Grid Experts)
Added:
>
>

Changing the Default Protocols List for a given Site (Tier-1)

The order of the list of protocols supplied to SRM can be changed e.g. testing root at NIKHEF means root is prepended to the list. This guide explains how to change the protocols list for a given site. This operation is restricted to those with the diracAdmin role. (Grid Experts)

 

Getting the List of Ports For DIRAC Central Services (and how to ping them)

The ports for DIRAC central services for a given setup can easily be checked (useful for hardware requests). The procedure for port discovery as well as how to ping a service is available (Grid Shifters and Grid Experts).

Revision 192008-09-09 - NickBrook

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 66 to 66
 

META FILEATTACHMENT attachment="feature_request_and_bug_submission.pdf" attr="" comment="Procedure to submit feature requests and report bugs." date="1220613807" name="feature_request_and_bug_submission.pdf" path="feature_request_and_bug_submission.pdf" size="199672" stream="feature_request_and_bug_submission.pdf" user="Main.PaulSzczypka" version="1"
Changed:
<
<
META FILEATTACHMENT attachment="DataAccessProblems.pdf" attr="" comment="data access check list" date="1220357652" name="DataAccessProblems.pdf" path="DataAccessProblems.pdf" size="846639" stream="DataAccessProblems.pdf" user="Main.NickBrook" version="3"
>
>
META FILEATTACHMENT attachment="DataAccessProblems.pdf" attr="" comment="data access check list" date="1220972511" name="DataAccessProblems.pdf" path="DataAccessProblems.pdf" size="1600481" stream="DataAccessProblems.pdf" user="Main.NickBrook" version="4"
 
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1219011424" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="745" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.txt" attr="" comment="" date="1219011372" name="ProdOpsProcedureTemplate.txt" path="ProdOpsProcedureTemplate.txt" size="240" stream="ProdOpsProcedureTemplate.txt" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="LHCbSoftwareDeployment.pdf" attr="" comment="" date="1217853807" name="LHCbSoftwareDeployment.pdf" path="LHCbSoftwareDeployment.pdf" size="203140" stream="LHCbSoftwareDeployment.pdf" user="Main.StuartPaterson" version="1"

Revision 182008-09-09 - StuartPaterson

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 44 to 44
 
Added:
>
>

Determining Versions of LCG Utils, GFAL and the Default SRM2 Protocols List in DIRAC

There is a simple non-intrusive script to obtain the external package versions in DIRAC (Grid Shifters and Grid Experts)

Getting the List of Ports For DIRAC Central Services (and how to ping them)

The ports for DIRAC central services for a given setup can easily be checked (useful for hardware requests). The procedure for port discovery as well as how to ping a service is available (Grid Shifters and Grid Experts).

 

Feature Requests and Bug Reports

Revision 172008-09-08 - StuartPaterson

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 16 to 16
 

Job Data Access Issues

Changed:
<
<
The following document is meant to give Grid shifters a few hints on things they can check when a job has problem accessing data on a site.
>
>
The following document is meant to give Grid shifters a few hints on things they can check when a job has problem accessing data on a site. (Grid Shifters and Grid Experts)

Dealing with Production IDs, Production Job IDs and WMS Job IDs

This simple guide shows how to obtain production IDs from WMS job IDs and vice versa (Grid Shifters and Grid Experts)

Checking the Status of Files in the Production Database

A brief guide describing how to check the status of given productions in the Production DB is available (Grid Shifters and Grid Experts)

 

Data Management Operations

Revision 162008-09-05 - VladimirRomanovskiy

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 12 to 12
 

Job Management Operations

Changed:
<
<
A procedure for investigating stalled jobs (suitable for Grid Shifters and Grid Experts) is coming soon. Procedures for investigating jobs
>
>
The procedures for investigating jobs (suitable for Grid Shifters and Grid Experts) is coming.
 

Job Data Access Issues

Line: 32 to 32
 
Added:
>
>

Getting information from BDII

 

Feature Requests and Bug Reports

Line: 47 to 51
 
META FILEATTACHMENT attachment="feature_request_and_bug_submission.pdf" attr="" comment="Procedure to submit feature requests and report bugs." date="1220613807" name="feature_request_and_bug_submission.pdf" path="feature_request_and_bug_submission.pdf" size="199672" stream="feature_request_and_bug_submission.pdf" user="Main.PaulSzczypka" version="1"
META FILEATTACHMENT attachment="DataAccessProblems.pdf" attr="" comment="data access check list" date="1220357652" name="DataAccessProblems.pdf" path="DataAccessProblems.pdf" size="846639" stream="DataAccessProblems.pdf" user="Main.NickBrook" version="3"
Deleted:
<
<
META FILEATTACHMENT attachment="bdii.pdf" attr="" comment="" date="1219957532" name="bdii.pdf" path="bdii.pdf" size="154746" stream="bdii.pdf" user="Main.VladimirRomanovskiy" version="1"
 
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1219011424" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="745" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.txt" attr="" comment="" date="1219011372" name="ProdOpsProcedureTemplate.txt" path="ProdOpsProcedureTemplate.txt" size="240" stream="ProdOpsProcedureTemplate.txt" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="LHCbSoftwareDeployment.pdf" attr="" comment="" date="1217853807" name="LHCbSoftwareDeployment.pdf" path="LHCbSoftwareDeployment.pdf" size="203140" stream="LHCbSoftwareDeployment.pdf" user="Main.StuartPaterson" version="1"

Revision 152008-09-05 - PaulSzczypka

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 32 to 32
 
Added:
>
>

Feature Requests and Bug Reports

 

Documents

Line: 39 to 43
 
Added:
>
>
 
Deleted:
<
<
 
META FILEATTACHMENT attachment="feature_request_and_bug_submission.pdf" attr="" comment="Procedure to submit feature requests and report bugs." date="1220613807" name="feature_request_and_bug_submission.pdf" path="feature_request_and_bug_submission.pdf" size="199672" stream="feature_request_and_bug_submission.pdf" user="Main.PaulSzczypka" version="1"
META FILEATTACHMENT attachment="DataAccessProblems.pdf" attr="" comment="data access check list" date="1220357652" name="DataAccessProblems.pdf" path="DataAccessProblems.pdf" size="846639" stream="DataAccessProblems.pdf" user="Main.NickBrook" version="3"
META FILEATTACHMENT attachment="bdii.pdf" attr="" comment="" date="1219957532" name="bdii.pdf" path="bdii.pdf" size="154746" stream="bdii.pdf" user="Main.VladimirRomanovskiy" version="1"

Revision 142008-09-05 - VladimirRomanovskiy

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 12 to 12
 

Job Management Operations

Changed:
<
<
A procedure for investigating stalled jobs (suitable for Grid Shifters and Grid Experts) is coming soon. Procedures for investigating jobs
>
>
A procedure for investigating stalled jobs (suitable for Grid Shifters and Grid Experts) is coming soon. Procedures for investigating jobs
 

Job Data Access Issues

Line: 41 to 40
 
Deleted:
<
<
 
Deleted:
<
<
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.doc" attr="" comment="Template (.doc)" date="1217511949" name="ProdOpsProcedureTemplate.doc" path="ProdOpsProcedureTemplate.doc" size="98304" stream="ProdOpsProcedureTemplate.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="RecoveryOfFilesLostBySE.doc" attr="" comment="" date="1217533516" name="RecoveryOfFilesLostBySE.doc" path="RecoveryOfFilesLostBySE.doc" size="105984" stream="RecoveryOfFilesLostBySE.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="SAMProcedure010808.pdf" attr="" comment="" date="1217583512" name="SAMProcedure010808.pdf" path="SAMProcedure010808.pdf" size="476639" stream="SAMProcedure010808.pdf" user="Main.StuartPaterson" version="2"
META FILEATTACHMENT attachment="LHCbSoftwareDeployment.pdf" attr="" comment="" date="1217853807" name="LHCbSoftwareDeployment.pdf" path="LHCbSoftwareDeployment.pdf" size="203140" stream="LHCbSoftwareDeployment.pdf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.txt" attr="" comment="" date="1219011372" name="ProdOpsProcedureTemplate.txt" path="ProdOpsProcedureTemplate.txt" size="240" stream="ProdOpsProcedureTemplate.txt" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1219011424" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="745" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="bdii.pdf" attr="" comment="" date="1219957532" name="bdii.pdf" path="bdii.pdf" size="154746" stream="bdii.pdf" user="Main.VladimirRomanovskiy" version="1"
META FILEATTACHMENT attachment="DataAccessProblems.pdf" attr="" comment="data access check list" date="1220357652" name="DataAccessProblems.pdf" path="DataAccessProblems.pdf" size="846639" stream="DataAccessProblems.pdf" user="Main.NickBrook" version="3"
 
META FILEATTACHMENT attachment="feature_request_and_bug_submission.pdf" attr="" comment="Procedure to submit feature requests and report bugs." date="1220613807" name="feature_request_and_bug_submission.pdf" path="feature_request_and_bug_submission.pdf" size="199672" stream="feature_request_and_bug_submission.pdf" user="Main.PaulSzczypka" version="1"
Added:
>
>
META FILEATTACHMENT attachment="DataAccessProblems.pdf" attr="" comment="data access check list" date="1220357652" name="DataAccessProblems.pdf" path="DataAccessProblems.pdf" size="846639" stream="DataAccessProblems.pdf" user="Main.NickBrook" version="3"
META FILEATTACHMENT attachment="bdii.pdf" attr="" comment="" date="1219957532" name="bdii.pdf" path="bdii.pdf" size="154746" stream="bdii.pdf" user="Main.VladimirRomanovskiy" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1219011424" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="745" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.txt" attr="" comment="" date="1219011372" name="ProdOpsProcedureTemplate.txt" path="ProdOpsProcedureTemplate.txt" size="240" stream="ProdOpsProcedureTemplate.txt" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="LHCbSoftwareDeployment.pdf" attr="" comment="" date="1217853807" name="LHCbSoftwareDeployment.pdf" path="LHCbSoftwareDeployment.pdf" size="203140" stream="LHCbSoftwareDeployment.pdf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="SAMProcedure010808.pdf" attr="" comment="" date="1217583512" name="SAMProcedure010808.pdf" path="SAMProcedure010808.pdf" size="476639" stream="SAMProcedure010808.pdf" user="Main.StuartPaterson" version="2"
META FILEATTACHMENT attachment="RecoveryOfFilesLostBySE.doc" attr="" comment="" date="1217533516" name="RecoveryOfFilesLostBySE.doc" path="RecoveryOfFilesLostBySE.doc" size="105984" stream="RecoveryOfFilesLostBySE.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.doc" attr="" comment="Template (.doc)" date="1217511949" name="ProdOpsProcedureTemplate.doc" path="ProdOpsProcedureTemplate.doc" size="98304" stream="ProdOpsProcedureTemplate.doc" user="Main.StuartPaterson" version="1"

Revision 132008-09-05 - RobertoSantinel

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 13 to 13
 

Job Management Operations

A procedure for investigating stalled jobs (suitable for Grid Shifters and Grid Experts) is coming soon.

Added:
>
>
Procedures for investigating jobs
 

Job Data Access Issues

Revision 122008-09-05 - PaulSzczypka

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 42 to 42
 
Added:
>
>
 
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.doc" attr="" comment="Template (.doc)" date="1217511949" name="ProdOpsProcedureTemplate.doc" path="ProdOpsProcedureTemplate.doc" size="98304" stream="ProdOpsProcedureTemplate.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="RecoveryOfFilesLostBySE.doc" attr="" comment="" date="1217533516" name="RecoveryOfFilesLostBySE.doc" path="RecoveryOfFilesLostBySE.doc" size="105984" stream="RecoveryOfFilesLostBySE.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="SAMProcedure010808.pdf" attr="" comment="" date="1217583512" name="SAMProcedure010808.pdf" path="SAMProcedure010808.pdf" size="476639" stream="SAMProcedure010808.pdf" user="Main.StuartPaterson" version="2"
Line: 50 to 52
 
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1219011424" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="745" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="bdii.pdf" attr="" comment="" date="1219957532" name="bdii.pdf" path="bdii.pdf" size="154746" stream="bdii.pdf" user="Main.VladimirRomanovskiy" version="1"
META FILEATTACHMENT attachment="DataAccessProblems.pdf" attr="" comment="data access check list" date="1220357652" name="DataAccessProblems.pdf" path="DataAccessProblems.pdf" size="846639" stream="DataAccessProblems.pdf" user="Main.NickBrook" version="3"
Added:
>
>
META FILEATTACHMENT attachment="feature_request_and_bug_submission.pdf" attr="" comment="Procedure to submit feature requests and report bugs." date="1220613807" name="feature_request_and_bug_submission.pdf" path="feature_request_and_bug_submission.pdf" size="199672" stream="feature_request_and_bug_submission.pdf" user="Main.PaulSzczypka" version="1"

Revision 112008-09-02 - NickBrook

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 49 to 49
 
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.txt" attr="" comment="" date="1219011372" name="ProdOpsProcedureTemplate.txt" path="ProdOpsProcedureTemplate.txt" size="240" stream="ProdOpsProcedureTemplate.txt" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1219011424" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="745" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="bdii.pdf" attr="" comment="" date="1219957532" name="bdii.pdf" path="bdii.pdf" size="154746" stream="bdii.pdf" user="Main.VladimirRomanovskiy" version="1"
Changed:
<
<
META FILEATTACHMENT attachment="DataAccessProblems.pdf" attr="" comment="Data Access check list" date="1220350198" name="DataAccessProblems.pdf" path="DataAccessProblems.pdf" size="355041" stream="DataAccessProblems.pdf" user="Main.NickBrook" version="2"
>
>
META FILEATTACHMENT attachment="DataAccessProblems.pdf" attr="" comment="data access check list" date="1220357652" name="DataAccessProblems.pdf" path="DataAccessProblems.pdf" size="846639" stream="DataAccessProblems.pdf" user="Main.NickBrook" version="3"

Revision 102008-09-02 - NickBrook

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 49 to 49
 
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.txt" attr="" comment="" date="1219011372" name="ProdOpsProcedureTemplate.txt" path="ProdOpsProcedureTemplate.txt" size="240" stream="ProdOpsProcedureTemplate.txt" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1219011424" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="745" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="bdii.pdf" attr="" comment="" date="1219957532" name="bdii.pdf" path="bdii.pdf" size="154746" stream="bdii.pdf" user="Main.VladimirRomanovskiy" version="1"
Changed:
<
<
META FILEATTACHMENT attachment="DataAccessProblems.pdf" attr="" comment="Data Access check list" date="1220337344" name="DataAccessProblems.pdf" path="DataAccessProblems.pdf" size="53235" stream="DataAccessProblems.pdf" user="Main.NickBrook" version="1"
>
>
META FILEATTACHMENT attachment="DataAccessProblems.pdf" attr="" comment="Data Access check list" date="1220350198" name="DataAccessProblems.pdf" path="DataAccessProblems.pdf" size="355041" stream="DataAccessProblems.pdf" user="Main.NickBrook" version="2"

Revision 92008-09-02 - NickBrook

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 14 to 14
  A procedure for investigating stalled jobs (suitable for Grid Shifters and Grid Experts) is coming soon.
Added:
>
>

Job Data Access Issues

The following document is meant to give Grid shifters a few hints on things they can check when a job has problem accessing data on a site.

 

Data Management Operations

Line: 45 to 49
 
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.txt" attr="" comment="" date="1219011372" name="ProdOpsProcedureTemplate.txt" path="ProdOpsProcedureTemplate.txt" size="240" stream="ProdOpsProcedureTemplate.txt" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1219011424" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="745" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="bdii.pdf" attr="" comment="" date="1219957532" name="bdii.pdf" path="bdii.pdf" size="154746" stream="bdii.pdf" user="Main.VladimirRomanovskiy" version="1"
Added:
>
>
META FILEATTACHMENT attachment="DataAccessProblems.pdf" attr="" comment="Data Access check list" date="1220337344" name="DataAccessProblems.pdf" path="DataAccessProblems.pdf" size="53235" stream="DataAccessProblems.pdf" user="Main.NickBrook" version="1"

Revision 82008-08-28 - VladimirRomanovskiy

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 44 to 44
 
META FILEATTACHMENT attachment="LHCbSoftwareDeployment.pdf" attr="" comment="" date="1217853807" name="LHCbSoftwareDeployment.pdf" path="LHCbSoftwareDeployment.pdf" size="203140" stream="LHCbSoftwareDeployment.pdf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.txt" attr="" comment="" date="1219011372" name="ProdOpsProcedureTemplate.txt" path="ProdOpsProcedureTemplate.txt" size="240" stream="ProdOpsProcedureTemplate.txt" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1219011424" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="745" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"
Added:
>
>
META FILEATTACHMENT attachment="bdii.pdf" attr="" comment="" date="1219957532" name="bdii.pdf" path="bdii.pdf" size="154746" stream="bdii.pdf" user="Main.VladimirRomanovskiy" version="1"

Revision 72008-08-18 - StuartPaterson

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008

LHCb Production Operations Procedures

Changed:
<
<
This is the LHCb Production Operations Procedures page which contains procedures for Grid Experts and Grid Shifters to follow. More information about Production Operations and some useful links can be found at the Production Operations page. The draft template for defining production procedures is available here in the following formats: Template (.doc), Template (.rtf). Each procedure should be posted with a link to the documentation and the desired audience e.g. Grid Expers, Grid Shifters or both.
>
>
This is the LHCb Production Operations Procedures page which contains procedures for Grid Experts and Grid Shifters to follow. More information about Production Operations and some useful links can be found at the Production Operations page. The draft template for defining production procedures is available here in the following formats: Template (.doc), Template (.rtf), Template (.txt). Each procedure should be posted with a link to the documentation and the desired audience e.g. Grid Expers, Grid Shifters or both.
 
Contents:
Line: 31 to 31
 

Documents

Added:
>
>
 
Line: 35 to 36
 
Added:
>
>
 
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.doc" attr="" comment="Template (.doc)" date="1217511949" name="ProdOpsProcedureTemplate.doc" path="ProdOpsProcedureTemplate.doc" size="98304" stream="ProdOpsProcedureTemplate.doc" user="Main.StuartPaterson" version="1"
Deleted:
<
<
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1217523241" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="15319350" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"
 
META FILEATTACHMENT attachment="RecoveryOfFilesLostBySE.doc" attr="" comment="" date="1217533516" name="RecoveryOfFilesLostBySE.doc" path="RecoveryOfFilesLostBySE.doc" size="105984" stream="RecoveryOfFilesLostBySE.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="SAMProcedure010808.pdf" attr="" comment="" date="1217583512" name="SAMProcedure010808.pdf" path="SAMProcedure010808.pdf" size="476639" stream="SAMProcedure010808.pdf" user="Main.StuartPaterson" version="2"
META FILEATTACHMENT attachment="LHCbSoftwareDeployment.pdf" attr="" comment="" date="1217853807" name="LHCbSoftwareDeployment.pdf" path="LHCbSoftwareDeployment.pdf" size="203140" stream="LHCbSoftwareDeployment.pdf" user="Main.StuartPaterson" version="1"
Added:
>
>
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.txt" attr="" comment="" date="1219011372" name="ProdOpsProcedureTemplate.txt" path="ProdOpsProcedureTemplate.txt" size="240" stream="ProdOpsProcedureTemplate.txt" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1219011424" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="745" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"

Revision 62008-08-04 - StuartPaterson

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 24 to 24
 
Added:
>
>

Deployment of LHCb Software on the Grid

 

Documents

Added:
>
>
 
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.doc" attr="" comment="Template (.doc)" date="1217511949" name="ProdOpsProcedureTemplate.doc" path="ProdOpsProcedureTemplate.doc" size="98304" stream="ProdOpsProcedureTemplate.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1217523241" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="15319350" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="RecoveryOfFilesLostBySE.doc" attr="" comment="" date="1217533516" name="RecoveryOfFilesLostBySE.doc" path="RecoveryOfFilesLostBySE.doc" size="105984" stream="RecoveryOfFilesLostBySE.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="SAMProcedure010808.pdf" attr="" comment="" date="1217583512" name="SAMProcedure010808.pdf" path="SAMProcedure010808.pdf" size="476639" stream="SAMProcedure010808.pdf" user="Main.StuartPaterson" version="2"
Added:
>
>
META FILEATTACHMENT attachment="LHCbSoftwareDeployment.pdf" attr="" comment="" date="1217853807" name="LHCbSoftwareDeployment.pdf" path="LHCbSoftwareDeployment.pdf" size="203140" stream="LHCbSoftwareDeployment.pdf" user="Main.StuartPaterson" version="1"

Revision 52008-08-04 - StuartPaterson

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008

LHCb Production Operations Procedures

Changed:
<
<
This is the LHCb Production Operations Procedures page which contains procedures for Grid Operators and Production Shifters to follow. More information about Production Operations and some useful links can be found at the Production Operations page. The draft template for defining production procedures is available here in the following formats: Template (.doc), Template (.rtf). Each procedure should be posted with a link to the documentation and the desired audience e.g. Grid Operators, Production Shifters or both.
>
>
This is the LHCb Production Operations Procedures page which contains procedures for Grid Experts and Grid Shifters to follow. More information about Production Operations and some useful links can be found at the Production Operations page. The draft template for defining production procedures is available here in the following formats: Template (.doc), Template (.rtf). Each procedure should be posted with a link to the documentation and the desired audience e.g. Grid Expers, Grid Shifters or both.
 
Contents:
Line: 12 to 12
 

Job Management Operations

Changed:
<
<
A procedure for investigating stalled jobs (suitable for Production Shifters and Grid Operators) is coming soon.
>
>
A procedure for investigating stalled jobs (suitable for Grid Shifters and Grid Experts) is coming soon.
 

Data Management Operations

Changed:
<
<
>
>
 

Site Functional Tests

Changed:
<
<
The SAM framework has been updated in DIRAC3 and now runs as a specialized workflow with tailored modules for each functional test. Running the SAM tests is restricted to Grid Operators having the lcgadmin VOMS role.
>
>
The SAM framework has been updated in DIRAC3 and now runs as a specialized workflow with tailored modules for each functional test. Running the SAM tests is restricted to Grid Experts having the lcgadmin VOMS role.
 
Changed:
<
<
>
>
 

Documents

Revision 42008-08-01 - StuartPaterson

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008

LHCb Production Operations Procedures

Changed:
<
<
Contents:
>
>
This is the LHCb Production Operations Procedures page which contains procedures for Grid Operators and Production Shifters to follow. More information about Production Operations and some useful links can be found at the Production Operations page. The draft template for defining production procedures is available here in the following formats: Template (.doc), Template (.rtf). Each procedure should be posted with a link to the documentation and the desired audience e.g. Grid Operators, Production Shifters or both.
 
Changed:
<
<
This is the LHCb Production Operations Procedures page which contains procedures for Grid Operators and Production Shifters to follow. More information about Production Operations and some useful links can be found at the Production Operations page. The draft template for defining production procedures is available here in the following formats: Template (.doc), Template (.rtf).
>
>
Contents:
 
Changed:
<
<
An attempt to categorize common procedures has been made below, feel free to complement or update this list as appropriate.
>
>
An attempt to categorize common procedures has been made below, feel free to update this list as appropriate.
 

Job Management Operations

Changed:
<
<
A procedure for investigating stalled jobs is coming soon.
>
>
A procedure for investigating stalled jobs (suitable for Production Shifters and Grid Operators) is coming soon.
 

Data Management Operations

Changed:
<
<
>
>
 

Site Functional Tests

Changed:
<
<
SAM documentation is coming soon.
>
>
The SAM framework has been updated in DIRAC3 and now runs as a specialized workflow with tailored modules for each functional test. Running the SAM tests is restricted to Grid Operators having the lcgadmin VOMS role.

 

Documents

Added:
>
>
 
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.doc" attr="" comment="Template (.doc)" date="1217511949" name="ProdOpsProcedureTemplate.doc" path="ProdOpsProcedureTemplate.doc" size="98304" stream="ProdOpsProcedureTemplate.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1217523241" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="15319350" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="RecoveryOfFilesLostBySE.doc" attr="" comment="" date="1217533516" name="RecoveryOfFilesLostBySE.doc" path="RecoveryOfFilesLostBySE.doc" size="105984" stream="RecoveryOfFilesLostBySE.doc" user="Main.StuartPaterson" version="1"
Added:
>
>
META FILEATTACHMENT attachment="SAMProcedure010808.pdf" attr="" comment="" date="1217583512" name="SAMProcedure010808.pdf" path="SAMProcedure010808.pdf" size="476639" stream="SAMProcedure010808.pdf" user="Main.StuartPaterson" version="2"

Revision 32008-07-31 - StuartPaterson

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 16 to 16
 

Data Management Operations

Changed:
<
<
>
>
 

Site Functional Tests

Line: 25 to 25
 

Documents

Added:
>
>
 
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.doc" attr="" comment="Template (.doc)" date="1217511949" name="ProdOpsProcedureTemplate.doc" path="ProdOpsProcedureTemplate.doc" size="98304" stream="ProdOpsProcedureTemplate.doc" user="Main.StuartPaterson" version="1"
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1217523241" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="15319350" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"
Added:
>
>
META FILEATTACHMENT attachment="RecoveryOfFilesLostBySE.doc" attr="" comment="" date="1217533516" name="RecoveryOfFilesLostBySE.doc" path="RecoveryOfFilesLostBySE.doc" size="105984" stream="RecoveryOfFilesLostBySE.doc" user="Main.StuartPaterson" version="1"

Revision 22008-07-31 - StuartPaterson

Line: 1 to 1
 
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008
Line: 6 to 6
 
Contents:
Changed:
<
<
This is the LHCb Production Operations Procedures page which contains procedures for Grid Operators and Production Shifters to follow. More information about Production Operations and some useful links can be found at the Production Operations page. The draft template for defining production procedures is available here in the following formats
>
>
This is the LHCb Production Operations Procedures page which contains procedures for Grid Operators and Production Shifters to follow. More information about Production Operations and some useful links can be found at the Production Operations page. The draft template for defining production procedures is available here in the following formats: Template (.doc), Template (.rtf).
  An attempt to categorize common procedures has been made below, feel free to complement or update this list as appropriate.
Changed:
<
<

Workload Management Operations

>
>

Job Management Operations

A procedure for investigating stalled jobs is coming soon.

 

Data Management Operations

Line: 17 to 21
 

Site Functional Tests

SAM documentation is coming soon.

Changed:
<
<
>
>

Documents

 
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.doc" attr="" comment="Template (.doc)" date="1217511949" name="ProdOpsProcedureTemplate.doc" path="ProdOpsProcedureTemplate.doc" size="98304" stream="ProdOpsProcedureTemplate.doc" user="Main.StuartPaterson" version="1"
Added:
>
>
META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.rtf" attr="" comment="" date="1217523241" name="ProdOpsProcedureTemplate.rtf" path="ProdOpsProcedureTemplate.rtf" size="15319350" stream="ProdOpsProcedureTemplate.rtf" user="Main.StuartPaterson" version="1"

Revision 12008-07-31 - StuartPaterson

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="LHCbComputing"
-- StuartPaterson - 31 Jul 2008

LHCb Production Operations Procedures

Contents:

This is the LHCb Production Operations Procedures page which contains procedures for Grid Operators and Production Shifters to follow. More information about Production Operations and some useful links can be found at the Production Operations page. The draft template for defining production procedures is available here in the following formats

An attempt to categorize common procedures has been made below, feel free to complement or update this list as appropriate.

Workload Management Operations

Data Management Operations

Site Functional Tests

SAM documentation is coming soon.

META FILEATTACHMENT attachment="ProdOpsProcedureTemplate.doc" attr="" comment="Template (.doc)" date="1217511949" name="ProdOpsProcedureTemplate.doc" path="ProdOpsProcedureTemplate.doc" size="98304" stream="ProdOpsProcedureTemplate.doc" user="Main.StuartPaterson" version="1"
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback