Difference: DiracForAdmins (1 vs. 43)

Revision 432015-08-21 - AndreSailer

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"
The previous page that was here is now under DiracForShifters
Line: 11 to 11
  Look at https://raw.github.com/DIRACGrid/DIRAC/integration/releases.cfg
Added:
>
>

Updating or patching the server installations

 

Administration Scripts

Changed:
<
<
Scripts that do not really belong into the ilcdirac code repository can be found here https://git.cern.ch/web/ilcdirac/ops.git. Accessible only for ilcdirac-admin e-group members at the moment.
>
>
Scripts that do not really belong into the ilcdirac code repository can be found here ILCDiracOps. Accessible only for ilcdirac-admin e-group members at the moment.
 

Accessing the machines

Revision 422015-07-24 - AndreSailer

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"
The previous page that was here is now under DiracForShifters
Line: 22 to 22
 Then one has to join ai-admin e-group and log in to aiadm first.

Granting access to someone

Changed:
<
<
The list of users allowed to log on to these machines is specified in the puppet manifests for voilcdirac.
>
>
The list of users allowed to log on to these machines is specified in the puppet manifests for voilcdirac.
 If you can edit these manifests you can add users to log on to these machines.

Revision 412015-05-06 - AndreSailer

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"
The previous page that was here is now under DiracForShifters
Line: 143 to 143
 

CE Maintenance

Added:
>
>
 

Registering New Users

RegisteringNewUsersToDirac

Revision 402015-03-12 - AndreSailer

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"
The previous page that was here is now under DiracForShifters
Line: 77 to 77
  The jobs know about the SE to use, as in their JDL, there is something like "SB:ProductionSandboxSE|/SandBox/i/ilc_prod/03f/e5e/03fe5e4cb9889b87bb437adcc310337f.tar.bz2" where the sandbox storage element is defined as ProductionSandboxSE. So when the job gets its input sandbox, it will connect to the ProductionSandboxSE which has as en end point a specific SandboxStore.
Deleted:
<
<

Checking the Status

JIRA

https://its.cern.ch/jira/browse/ILCDIRAC
 

gLite job matching

Revision 392015-01-29 - AndreSailer

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"
The previous page that was here is now under DiracForShifters
Line: 147 to 147
 ldapsearch -x -LLL -h lcg-bdii.cern.ch -p 2170 -b o=grid '(&(objectclass=GlueCE)(GlueCEAccessControlBaseRule=VO:ilc))' GlueCEUniqueID
Changed:
<
<

Adding new CEs to the Configuration Service

In the CE2CS agent e-mail, there is something like:

CE: hepgrid97.ph.liv.ac.uk, GOCDB Name: UKI-NORTHGRID-LIV-HEP
SystemName: ScientificSL, SystemVersion: Carbon, SystemRelease: 6.3
hepgrid97.ph.liv.ac.uk:8443/cream-pbs-long Production

dirac-admin-add-site DIRACSiteName UKI-NORTHGRID-LIV-HEP hepgrid97.ph.liv.ac.uk

The first block gives the new queue properties, and the command below gives a way to register that CE in the CS. If the corresponding site DOES NOT exist in the CS, the command can be used. If it does, the CE must simply be added to the list of CEs in /Resources/Sites/GRID/SITE/CE. The CE2CS will take care of adding and updating the corresponding sections later.

Note

One thing to remember: the info provided comes from the BDII, and that proved to be inconsistent in the past many times. In particular, some CEs are not allowing in fact the ILC VO despite the advertisement in the BDII. Checking the SiteDirector and the TaskQueueDirector can help find those (unauthorized credentials errors). Also, the CE2CS does not remove old CEs that were removed so keep an eye at the EGI DOWNTIMES mails as they usually advertise the final removal of CEs.
>
>

CE Maintenance

 
Added:
>
>
 

Registering New Users

RegisteringNewUsersToDirac

Revision 362014-10-31 - AndreSailer

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 175 to 175
 

Setting up development installation

Changed:
<
<
Do normal dirac installation.
wget -O dirac-install -np  https://raw.github.com/DIRACGrid/DIRAC/master/Core/scripts/dirac-install.py  --no-check-certificate
chmod +x dirac-install.py
./dirac-install -V ILCDIRAC -r <version>
mv DIRAC DIRAC_BAK
mv ILCDIRAC ILCDIRAC_BAK
mkdir DIRAC
cd DIRAC
git clone https://andresailer@github.com/DIRACgrid/DIRAC.git .
git remote set-url --push origin https://andresailer@github.com/andresailer/DIRAC.git
git checkout rel-<DiracVersion>
cd ..
mkdir ILCDIRAC
cd ILCDIRAC
git clone https://:@git.cern.ch/kerberos/ilcdirac .
git checkout rel-<ilcdiracVersion>
dirac-deploy-scripts
>
>
 

Sites

Revision 352014-10-30 - AndreSailer

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 71 to 71
  The jobs know about the SE to use, as in their JDL, there is something like "SB:ProductionSandboxSE|/SandBox/i/ilc_prod/03f/e5e/03fe5e4cb9889b87bb437adcc310337f.tar.bz2" where the sandbox storage element is defined as ProductionSandboxSE. So when the job gets its input sandbox, it will connect to the ProductionSandboxSE which has as en end point a specific SandboxStore.
Changed:
<
<

Checking the VOBOX status

Logon to the VOBOX of interest: voilcdirac01, voilcdirac02, voilcdirac03

 ssh voilcdirac01 

Make yourself dirac user: you'll need to be dirac to start/stop services:

 sudo su dirac 
Source the dirac environment if needed
source /opt/dirac/bashrc
Then you should go to
/opt/dirac/startup
to have the services/agents running on the machine.

Check the disk space with

df -h

/opt/dirac should never be at a 100%. In that case, the services start to have problems. In the worst case, the web page fails because it cannot put anything in cache. To "fix" the situation, usually restarting the services is enough: the mySQL cache is emptied, and some disk space recovered. It allows agents to work (in particular the JobCleaningAgent). Now, how to do that?

It requires to know that all services/agents are running with the runit framework (http://smarden.org/runit/). Dirac comes with a set of handy commands to allow proper supervision:

 runsvctrl t path/to/service 
restarts the service at path/to/service (example: DataManagement_FileCatalog). To restart properly an agent, it is needed to create an empty file called stop_agent under /opt/dirac/control/Sytem/Agent.

 runsvctrl d path/to/service
takes down the service
 runsvctrl u path/to/service
restarts the service after using the previous.

One can also use

 runsvstat *
To see what is running and what is down. All on volcd01 should be running.
>
>

Checking the Status

 

JIRA

https://its.cern.ch/jira/browse/ILCDIRAC

Revision 342014-10-29 - AndreSailer

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 11 to 11
 

Accessing the machines

Changed:
<
<
Resource request (new machine): https://cern.service-now.com/service-portal/report-ticket.do?name=hw-allocation&fe=HW-Resources
>
>
To access machine, simple ssh from inside cern to any of the voilcdirac* machines. It might be that this access will be restricted to aiadm machines in the future. Then one has to join ai-admin e-group and log in to aiadm first.
 
Changed:
<
<
Add the user to the e-group LxVoAdm-LCD so that he gets access to lxvoadm.cern.ch. lxvoadm.cern.ch cannot be reached from lxplus.

cdbop

get profiles/profile_volcd05

get prod/customization/lcd/vobox/config

get prod/customization/lcd/vobox/filesystem_dirac_fileserver

!emacs profiles/profile_volcd05.tpl (mind the !)

update prod/customization/lcd/vobox/config.tpl

(update any other file changed)

commit (and you are done after that)

sms get volcd05

sms set production other "default" volcd05

If you need to reinstall a machine (needed when changing the partitions) run PrepareInstall volcd05

sms set maintenance other volcd05

PrepareInstall volcd05

sms clear maintenance other volcd05

ssh volcd05

sudo spma_ncm_wrapper.sh

  • setup users, updates, etc.

You'll need to change the access rights on /opt/dirac that must belong to the user dirac, and /opt/dirac/etc must belong to dirac and should be readable ONLY by dirac (security issue). /opt/dirac/data (and /opt/dirac/data1) must belong to dirac and writable by dirac.

Firewall

  • by default the machines are added to the IT CC LHCB DIRAC3 set which opens port 9130:9200 to the outsite
  • Web portal requires explicit request through net service (ports 80 and 443)
>
>

Granting access to someone

The list of users allowed to log on to these machines is specified in the puppet manifests for voilcdirac. If you can edit these manifests you can add users to log on to these machines.
 

Migrating services to new machines

Line: 108 to 73
 

Checking the VOBOX status

Changed:
<
<
Log on to lxvoadm (not possible to access it from lxplus, don't really know why, so use your machine).
 ssh lxvoadm 

Logon to the VOBOX of interest: volcd01 for most services, volcd03 for dev platform and Log file storage, volcd02 for File Catalog DB only (no services running).

 ssh volcd01 
>
>
Logon to the VOBOX of interest: voilcdirac01, voilcdirac02, voilcdirac03
 ssh voilcdirac01 
  Make yourself dirac user: you'll need to be dirac to start/stop services:
 sudo su dirac 
Changed:
<
<
This also sources some environment: you should be under /opt/dirac/pro after sudo. If not,
source /opt/dirac/bashrc
Then you should go to
/opt/dirac/startup
to have the services/agents running on the machine.
>
>
Source the dirac environment if needed
source /opt/dirac/bashrc
Then you should go to
/opt/dirac/startup
to have the services/agents running on the machine.
  Check the disk space with
df -h
Line: 135 to 97
 
 runsvstat *
To see what is running and what is down. All on volcd01 should be running.
Deleted:
<
<

Granting access to someone

To have access to lxvoadm, a new user must be registered in the mailing list LxVoAdm-LCD. Then he can ssh lxvoadm.

Then, in addition, the new user should be granted access to the volcd pool of machines. This is done by changing the machine templates. For this, one should use CDB (for the moment PUPPET is not in production yet). Here is a small usage information: In lxvoadm, type

cdbop
This utility is the thing that is used to manage all the machines. It's got a shell like interface, with tab completion... In cdb, type
get prod/customization/lcd/vobox/config
but it will complain that the file is already there (no merging, don't ask why) if you already have the file([ERROR] 'prod/customization/lcd/vobox/config.tpl': file already exists). To execute a shell command it's done by prepending a !, like in the folllowing example:
!rm prod/customization/lcd/vobox/config.tpl

Once you got the config, you'll have a prod/customization/lcd/vobox/config.tpl in you directory. To edit, the easiest for me is to do

!nano prod/customization/lcd/vobox/config.tpl
In that file, there are losts of things, but for what concerns us, you need to find the lines
  "/software/components/useraccess" = add_root_access(list("sposs","atsareg","rgracian","cgrefe"));
and
   "/software/components/useraccess/users/cgrefe/acls" = list("system-auth");
and
   "/software/components/sudo/privilege_lines" = push(nlist(
    "user", "cgrefe",
    "run_as", "ALL",
    "host", "ALL",
    "cmd", "NOPASSWD:ALL"));
and finally
   "/software/components/interactivelimits/values" = list(
     list('sposs', '-', 'maxlogins', '10'),
     list('cgrefe', '-', 'maxlogins', '10'),
     list('rgracian', '-', 'maxlogins', '10'),
     list('dirac', '-', 'maxlogins', '15'),
     list('dirac', '-', 'nofile', '8192'),
     list('atsareg', '-', 'maxlogins', '10'),
     list('msapunov', '-', 'maxlogins', '10'),
     list('jfstrube', '-', 'maxlogins', '10'),
     list('*', '-', 'maxlogins', '0'),
  );

where you will add the new user.

Once done you need to save and exit, then

 update prod/customization/lcd/vobox/config.tpl 
then
 commit 
where it will ask to confirm and to give a message. In some cases, when you have a typo in the message, you cannot fix it as backspace introduces a new character. I do not fix my typos in there (who reads those things anyway?)

It should tell you that all went fine after a while, then you can quit cdb with

 exit 

Then on all the machines, to which you log on using ssh volcdX you'll need to run

 sudo spma_ncm_wrapper.sh 
that is used to apply the configuration, and do the necessary updates. This command should be ran in any case once in a while (every month or so) to make sure the machine is up to date. It will update the user access rights also.

Normally the new user is now allowed to log on to the machine.

 

JIRA

https://its.cern.ch/jira/browse/ILCDIRAC
Line: 288 to 198
 

Setting up development installation

Do normal dirac installation.

Changed:
<
<
Replace ILCDIRAC with ILCDIRAC from SVN

run dirac-deploy-scripts

>
>
wget -O dirac-install -np  https://raw.github.com/DIRACGrid/DIRAC/master/Core/scripts/dirac-install.py  --no-check-certificate
chmod +x dirac-install.py
./dirac-install -V ILCDIRAC -r <version>
mv DIRAC DIRAC_BAK
mv ILCDIRAC ILCDIRAC_BAK
mkdir DIRAC
cd DIRAC
git clone https://andresailer@github.com/DIRACgrid/DIRAC.git .
git remote set-url --push origin https://andresailer@github.com/andresailer/DIRAC.git
git checkout rel-<DiracVersion>
cd ..
mkdir ILCDIRAC
cd ILCDIRAC
git clone https://:@git.cern.ch/kerberos/ilcdirac .
git checkout rel-<ilcdiracVersion>
dirac-deploy-scripts
 

Sites

Revision 322014-05-08 - AndreSailer

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 55 to 55
 
  • Web portal requires explicit request through net service (ports 80 and 443)
Deleted:
<
<

Updating the machines

Don't screw the machines...

Now that this bit of advice is done, for the serious stuff.

Get an admin proxy

 dirac-proxy-init -g diracAdmin 

Start the cli:

 dirac-admin-sysadmin-cli 
then
 set host volcd<XX>.cern.ch 
Then
 show info 
The CLI is a usual DIRAC cli: help is available.

To update it's

 update v12r0p2 
The version can be found in http://svnweb.cern.ch/world/wsvn/dirac/ILCDIRAC/trunk/ILCDIRAC/releases.cfg (for now) and that should do the trick. Then you need to restart the services:
 restart *
Then the connection is lost. so you need to do the set host again.

All the machines should be updated the same way.

Then you have to make sure that all the services have restarted properly by checking the log files on every machine, agent and services. Logs can be found here: https://ilcdirac.cern.ch/DIRAC/ILC-Production/diracAdmin/systems/SystemAdministration/display (right click on the agent/service -->

Log)

Note: Stop the TransferAgent on volcdlogse.cern.ch before restarting volcd01 (the machine with the RequestManager, machine will be changed soon)

If the Transferagent does not find any waiting requests, probably have to fix the SELECT Limit in /opt/dirac/pro/DIRAC/RequestManagementSystem/DB/RequestDBMySQL.py from 100 change to 3000.

Things to do for the moment when updating

- There are several hacks that are needed because of the way we run:

You also need to change the extendable production type for the production Monitoring page on the web portal. Simply edit

/opt/dirac/pro/Web/dirac/public/javascripts/jobs/ProductionMonitor.js
line 369 should read
  if(type == 'MCGeneration'){

- Adding the JIRA report issue button Edit /opt/dirac/pro/Web/dirac/templates/diracPage.mako (on volcd04) and add

<script type="text/javascript" src="https://its.cern.ch/jira/s/en_US-nzmpdc-418945332/850/82/1.2.9/_/download/batch/com.atlassian.jira.collector.plugin.jira-issue-collector-plugin:issuecollector/com.atlassian.jira.collector.plugin.jira-issue-collector-plugin:issuecollector.js?collectorId=d37d5599"></script>
in the header of the html block.

(Or what you find in Jira Administration -> Issue Collector)

- Fixing the access right of the MetadataCalog browser There is a bug in the controller. Edit /opt/dirac/pro/Web/dirac/controllers/data/MetaCatalog.py, and replace

if group == "visitor" and credentials.getUserDN == "":
by
if group == "visitor" and not credentials.getUserDN():
Then restart Web_pastor.
 

Migrating services to new machines

For most services, it's very simple: simply install an instance of the service on the new machine, and you're done. For the services below, there are a few precautions.

Revision 312013-09-18 - StephanePoss

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 95 to 95
 

Things to do for the moment when updating

Changed:
<
<
There are several hacks that are needed because of the way we run:

- Running in DESY: the production jobs should not run in DESY-HH to avoid interfering with the ILD DBD prods. For this, a hack must be put in place in DIRAC/TransformationSystem/Client/TaskManager.py In the method

 prepareTransformationTasks 
of the class
 WorkflowTasks 
you need to add
      bannedsites = oJob.workflow.findParameter("BannedSites")
      if bannedsites:
        if not "LCG.DESY-HH.de" in bannedsites.getValue():
          bs = bannedsites.getValue()+";LCG.DESY-HH.de"
          oJob._setParamValue( 'BannedSites', bs )
      else:
        oJob.setBannedSites( 'LCG.DESY-HH.de' )
after the line
 oJob.setOwnerGroup( ownerGroup )
>
>
- There are several hacks that are needed because of the way we run:
  You also need to change the extendable production type for the production Monitoring page on the web portal. Simply edit
Line: 119 to 106
  if(type == 'MCGeneration'){
Added:
>
>
- Adding the JIRA report issue button Edit /opt/dirac/pro/Web/dirac/templates/diracPage.mako (on volcd04) and add
<script type="text/javascript" src="https://its.cern.ch/jira/s/en_US-nzmpdc-418945332/850/82/1.2.9/_/download/batch/com.atlassian.jira.collector.plugin.jira-issue-collector-plugin:issuecollector/com.atlassian.jira.collector.plugin.jira-issue-collector-plugin:issuecollector.js?collectorId=d37d5599"></script>
in the header of the html block.

(Or what you find in Jira Administration -> Issue Collector)

- Fixing the access right of the MetadataCalog browser There is a bug in the controller. Edit /opt/dirac/pro/Web/dirac/controllers/data/MetaCatalog.py, and replace

if group == "visitor" and credentials.getUserDN == "":
by
if group == "visitor" and not credentials.getUserDN():
Then restart Web_pastor.
 

Migrating services to new machines

For most services, it's very simple: simply install an instance of the service on the new machine, and you're done. For the services below, there are a few precautions.

Revision 302013-09-09 - AndreSailer

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 330 to 330
  The first block gives the new queue properties, and the command below gives a way to register that CE in the CS. If the corresponding site DOES NOT exist in the CS, the command can be used. If it does, the CE must simply be added to the list of CEs in /Resources/Sites/GRID/SITE/CE. The CE2CS will take care of adding and updating the corresponding sections later.
Added:
>
>

Note

One thing to remember: the info provided comes from the BDII, and that proved to be inconsistent in the past many times. In particular, some CEs are not allowing in fact the ILC VO despite the advertisement in the BDII. Checking the SiteDirector and the TaskQueueDirector can help find those (unauthorized credentials errors). Also, the CE2CS does not remove old CEs that were removed so keep an eye at the EGI DOWNTIMES mails as they usually advertise the final removal of CEs.
 

Registering New Users

RegisteringNewUsersToDirac

FAQ

Revision 292013-08-30 - AndreSailer

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 328 to 328
 dirac-admin-add-site DIRACSiteName UKI-NORTHGRID-LIV-HEP hepgrid97.ph.liv.ac.uk
Changed:
<
<
The first block gives the new queue properties, and the command below gives a way to register that CE in the CS. If the corresponding site DOES NOT exist in the CS, the command can be used. If it does, the CE must simply be added to the list of CEs in /Resources/Sites/GRID/Site/CEs. The CE2CS will take care of updating the corresponding sections later.
>
>
The first block gives the new queue properties, and the command below gives a way to register that CE in the CS. If the corresponding site DOES NOT exist in the CS, the command can be used. If it does, the CE must simply be added to the list of CEs in /Resources/Sites/GRID/SITE/CE. The CE2CS will take care of adding and updating the corresponding sections later.
 

Registering New Users

RegisteringNewUsersToDirac

Revision 282013-07-25 - AndreSailer

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Tagging and distributing new releases

Changed:
<
<
Now that we are using GIT, the procedure is as follows:

  • Get a github account.
  • Fork the https://github.com/LCDsoft/ILCDIRAC repository
  • Pull from https://github.com/LCDsoft/ILCDIRAC the master branch, and make it a new branch for you: it's better to keep things separated.
  • Make the changes
  • Push to YOUR fork
  • From the github portal, make a pull request from the branch you created on your fork to the master branch of ILCDIRAC
  • Someone (the admin of the LCDgit) will create the tag: get the master branch, add the relevant stuff in the releases.notes and release.cfg files to define the new tag, commit, push to the LCDgit repo, not your fork, then make a tag, and push it too.

Once the tag is created, you can run dirac-distribution -l ILCDIRAC -r TAG (where TAG is the tag name). Follow the instructions at the bottom to copy the tar balls to the right location. You'll need a password that I'll give to the right people.

  • You are done! Or nearly. You need to change the pilot version used by DIRAC: in the CS, under /Operations/ilc/ILC-Production/Pilot/Version, you need to change the version to the one you created: TAG (for example).

  • Commit the CS changes.

  • To make sure you pick up the right version strait away, you need to restart the TaskQueueDirector, the SiteDirector and the Matcher. This can be done using the dirac-admin-sysadmin-cli tool.

Changing the pilot version needs some time to be effective, as all pilots that were submitted with the old pilot version have to die to empty the queues. So make sure your changes are OK. Running on the DEV system helps of course (but not for productions). The release procedure is identical.

Once the release is made, you also need to update the file /afs/cern.ch/eng/clic/data/ILCDIRACTars/defaults/ilc.cfg to specify the new release number.

>
>
NewDiracTagAndRelease
 

Finding what Dirac version are available

Look at https://raw.github.com/DIRACGrid/DIRAC/integration/releases.cfg

Line: 363 to 342
 
  1. ) If LHCb does not see the problem, put a ggus ticket against the culprit.

When the problem is resolved (either when the site replies or when LHCb says it's fixed), you can close the JIRA issue, unban the site (dirac-admin-allow-site), and reallow the IP (/sbin/iptables -D INPUT -s 193.62.143.66  -j DROP). Hopefully you should start to see things running smoothly again.

Added:
>
>

Setting up development installation

Do normal dirac installation.

Replace ILCDIRAC with ILCDIRAC from SVN

run dirac-deploy-scripts

 
META TOPICMOVED by="sailer" date="1374054166" from="CLIC.DiracForGurus" to="CLIC.DiracForExperts"

Revision 272013-07-17 - AndreSailer

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 363 to 363
 
  1. ) If LHCb does not see the problem, put a ggus ticket against the culprit.

When the problem is resolved (either when the site replies or when LHCb says it's fixed), you can close the JIRA issue, unban the site (dirac-admin-allow-site), and reallow the IP (/sbin/iptables -D INPUT -s 193.62.143.66  -j DROP). Hopefully you should start to see things running smoothly again. \ No newline at end of file

Added:
>
>
META TOPICMOVED by="sailer" date="1374054166" from="CLIC.DiracForGurus" to="CLIC.DiracForExperts"

Revision 262013-05-28 - StephanePoss

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 189 to 189
 
    • New jobs will use the New SB for their sandboxes
  1. As soon as your old SB gracefully stops being used, remove it
Added:
>
>
The jobs know about the SE to use, as in their JDL, there is something like "SB:ProductionSandboxSE|/SandBox/i/ilc_prod/03f/e5e/03fe5e4cb9889b87bb437adcc310337f.tar.bz2" where the sandbox storage element is defined as ProductionSandboxSE. So when the job gets its input sandbox, it will connect to the ProductionSandboxSE which has as en end point a specific SandboxStore.
 

Checking the VOBOX status

Log on to lxvoadm (not possible to access it from lxplus, don't really know why, so use your machine).

Revision 252013-05-27 - StephanePoss

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 69 to 69
 sudo spma_ncm_wrapper.sh
  • setup users, updates, etc.
Changed:
<
<
You'll need to change the access rights on /opt/dirac that must belong to the user dirac, and /opt/dirac/etc must belong to dirac and should be readable ONLY by dirac (security issue). /opt/dirac/data (and /opt/dirac/data1) must belong to dirac and writable by dirac.
>
>
You'll need to change the access rights on /opt/dirac that must belong to the user dirac, and /opt/dirac/etc must belong to dirac and should be readable ONLY by dirac (security issue). /opt/dirac/data (and /opt/dirac/data1) must belong to dirac and writable by dirac.
  Firewall
  • by default the machines are added to the IT CC LHCB DIRAC3 set which opens port 9130:9200 to the outsite
Line: 108 to 108
 
https://ilcdirac.cern.ch/DIRAC/ILC-Production/diracAdmin/systems/SystemAdministration/display (right click on the agent/service -->
Log)

Note:

Changed:
<
<
Stop the Transferagent on volcdlogse.cern.ch before restarting volcd01 (the machine with the RequestManager, machine will be changed soon)
>
>
Stop the TransferAgent on volcdlogse.cern.ch before restarting volcd01 (the machine with the RequestManager, machine will be changed soon)
  If the Transferagent does not find any waiting requests, probably have to fix the SELECT Limit in
Changed:
<
<
/opt/dirac/pro/DIRAC/RequestManagementSystem/DB/RequestDBMySQL.py
>
>
/opt/dirac/pro/DIRAC/RequestManagementSystem/DB/RequestDBMySQL.py
 from 100 change to 3000.

Things to do for the moment when updating

There are several hacks that are needed because of the way we run:

Changed:
<
<
- Running in DESY: the production jobs should not run in DESY-HH to avoid interfering with the ILD DBD prods. For this, a hack must be put in place in DIRAC/TransformationSystem/Client/TaskManager.py
>
>
- Running in DESY: the production jobs should not run in DESY-HH to avoid interfering with the ILD DBD prods. For this, a hack must be put in place in DIRAC/TransformationSystem/Client/TaskManager.py
 In the method
 prepareTransformationTasks 
of the class
 WorkflowTasks 
you need to add
      bannedsites = oJob.workflow.findParameter("BannedSites")
      if bannedsites:
Line: 140 to 140
  if(type == 'MCGeneration'){
Added:
>
>

Migrating services to new machines

For most services, it's very simple: simply install an instance of the service on the new machine, and you're done. For the services below, there are a few precautions.

Proxy Manager

Nothing particular but the fact that the DB must be installed on the localhost, and not on the DBOD service. This is for security as the proxies are stored in there.

Configuration Service

  • Copy the ILC-Prod.cfg that's in /opt/dirac/etc/ to the new host. Also copy the /opt/dirac/etc/csbackup directory.
  • Edit the new host's /opt/dirac/etc/dirac.cfg and change the following sections:
DIRAC
{
  Configuration
  {
    Servers = dips://volcd05.cern.ch:9135/Configuration/Server, dips://volcd01.cern.ch:9135/Configuration/Server
    MasterServer = dips://volcd05.cern.ch:9135/Configuration/Server
    Master = yes
    Name = ILC-Prod
  }
  Setups
  {
    ILC-Production
    {
      Configuration = Production
    }
  }
}
  • Make sure the ILC-Prod.cfg file also contains the same info for the MasterServer.
  • Restart the Configuration_Server and check that in its log it starts with Starting configuration service as master
  • Update all the other machines.

Make sure the /afs/cern.ch/eng/clic/data/ILCDIRACTars/defaults/ilc.cfg contains the link to the new CS instance too. Potentially update the client's dirac.cfg (if the former host(s) is(are) killed)

Request Management

This cannot be moved easily at all: jobs need the service to be available (for failover requests), and it's not possible to move to a new system from scratch: jobs would never be cleaned as their requests would never be marked as Done. The new version of the system being completely different, it will be possible to have both running in parallel.

Sandbox Store and associated DB

This service is a bit tricky. It defines not only a service for the Job WMS, but also a storage element. To migrate, the recommended recipe is the following (A. Casajus):

  1. Install a new sandbox store in your new host
  2. Define a new SE for that sandbox (I'll call it NewSBSE)
    • Here you have two SB services running
  3. Define the SB SE to be your new one (NewSBSE)
    • Here both of your SB services should have configured the new SE
    • Old jobs will still retrieve their SB from the old one since the SE is embedded in the SB URL
    • New jobs will use the New SB for their sandboxes
  4. As soon as your old SB gracefully stops being used, remove it
 

Checking the VOBOX status

Revision 242013-05-22 - StephanePoss

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 288 to 288
 ldapsearch -x -LLL -h lcg-bdii.cern.ch -p 2170 -b o=grid '(&(objectclass=GlueCE)(GlueCEAccessControlBaseRule=VO:ilc))' GlueCEUniqueID
Added:
>
>

Adding new CEs to the Configuration Service

In the CE2CS agent e-mail, there is something like:

CE: hepgrid97.ph.liv.ac.uk, GOCDB Name: UKI-NORTHGRID-LIV-HEP
SystemName: ScientificSL, SystemVersion: Carbon, SystemRelease: 6.3
hepgrid97.ph.liv.ac.uk:8443/cream-pbs-long Production

dirac-admin-add-site DIRACSiteName UKI-NORTHGRID-LIV-HEP hepgrid97.ph.liv.ac.uk

The first block gives the new queue properties, and the command below gives a way to register that CE in the CS. If the corresponding site DOES NOT exist in the CS, the command can be used. If it does, the CE must simply be added to the list of CEs in /Resources/Sites/GRID/Site/CEs. The CE2CS will take care of updating the corresponding sections later.

 

Registering New Users

RegisteringNewUsersToDirac

Revision 232013-05-21 - StephanePoss

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 268 to 268
  It should be made clear that the GlueCEPolicyMaxCPUTime is not the GlueCEPolicyMaxWallTime parameter.
Added:
>
>
The relation between HEP-spec and kSI2K is value_kSI2K = value_HEP-SPEC / 4. This second relation is needed to understand the factor 250 above: value_SI00 = value_HEP-SPEC * 250
 

How to get GlueCEPolicyMaxCPUTime and CPUScalingReferenceSI00 from the BDII

You need to run

Line: 276 to 278
  where *kek*.jp* should be replaced by a proper CE.
Added:
>
>
See http://glueschema.forge.cnaf.infn.it/Spec/V13 for the GLUE specification document. There is also a v2.0 that was designed by OGF. I don't know when it will be used nor how. It will certainly be a big mess.
 

How to get the available CEs?

Revision 222013-05-17 - AndreSailer

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 285 to 285
 
Changed:
<
<

Registering New Users:

When someone wants to use ILCDIRAC, they should have sent a mail to ilcdirac-register@cernNOSPAMPLEASE.ch with full name, Institution, experiment.

First thing: put a ticket on JIRA about new registration (title should include user's name, and assign it to you.)

With the info in the mail, one needs to make sure they are registered in the ILC VO members.

  1. ) dirac-proxy-init
  2. ) dirac-ilc-list-users -u family_name
If the family name does not return anything, try without option, and it will list ALL the users.

This script will show the user specific info /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=sposs/CN=641989/CN=Stephane Guillaume Poss, /DC=ch/DC=cern/CN=CERN Trusted Certification Authority, stephane.guillaume.poss@cernNOSPAMPLEASE.ch The first part is the DN, then the second is the CN, and finally the mail. The 3 are needed for a proper DIRAC registration.

One last bit is essential: a username. It can be useful to check if the user has an account at CERN (phonebook) and use that as a user name. If not, I usually take the first letter of the first name, and the full last name (a bit like CERN does).

There are 2 possibilities to register a user once this is obtained:

  1. ) using the web portal: Manage Remote Configuration, in the section /DIRAC/Registry/Users, you need to add a section 'username' in which you will add the options DN, CN, Email having the values previously obtained. Then you will add the 'username' to the relevant group in /DIRAC/Registry/Groups. The base groups are ilc_user and private_proxy (although I'm not sure that one is still needed). And then Commit configuration and you are done
  2. ) run (with a diracAdmin proxy) dirac-admin-add-user -N username -D DN -M Email -G ilc_user. This one will not set the CN (not needed for the moment, maybe in the future) and will add the user to the ilc_user group (should be enough)
  3. ) run (with a diracAdmin proxy) dirac-ilc-add-user -N username -D DN -C CN -E Email -G ilc_user,private_pilot as this will do the same as 2), but will also add the user to the FC, and create the directories and register the Owner metadata tag.

If you used 1) or 2), you need to also add the user to the File Catalog. Run dirac-dms-filecatalog-cli

user add username
cd /ilc/user/
Check if the initial already exists, if not create it first
mkdir <initial>
cd <initial>
mkdir <username>
cd <username>
meta set . Owner <username>
cd ..
chgrp ilc_user <username>
chown <username> <username>

Add the new user to the ilc-dirac@cernNOSPAMPLEASE.ch egroup. Use external email if needed.

Then you can send a mail to the new registered user, and close the issue.

>
>

Registering New Users

RegisteringNewUsersToDirac
 

FAQ

What if the Configuration service starts to be very slow?

Revision 212013-05-15 - StephanePoss

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 291 to 291
  First thing: put a ticket on JIRA about new registration (title should include user's name, and assign it to you.)
Changed:
<
<
With the info in the mail, one needs to make sure they are registered in the ILC VO members. For this, there is the script 'grid_users' in ~sposs/public. To run it: 1) source /afs/cern.ch/eng/clic/software/DIRAC/bashrc-glibc-2.5 2) dirac-proxy-init -g ilc_admin 3) ./grid_users This script will dump the full list of registered members, on which usually a grep on the name is enough to find the desired info. It is presented as follows /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=sposs/CN=641989/CN=Stephane Guillaume Poss, /DC=ch/DC=cern/CN=CERN Trusted Certification Authority - stephane.guillaume.poss@cernNOSPAMPLEASE.ch The first part (before the coma) is the DN, the second the CN (before the dash), and after the dash the mail. The 3 are needed for a proper DIRAC registration.
>
>
With the info in the mail, one needs to make sure they are registered in the ILC VO members.
  1. ) dirac-proxy-init
  2. ) dirac-ilc-list-users -u family_name
If the family name does not return anything, try without option, and it will list ALL the users.

This script will show the user specific info /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=sposs/CN=641989/CN=Stephane Guillaume Poss, /DC=ch/DC=cern/CN=CERN Trusted Certification Authority, stephane.guillaume.poss@cernNOSPAMPLEASE.ch The first part is the DN, then the second is the CN, and finally the mail. The 3 are needed for a proper DIRAC registration.

 One last bit is essential: a username. It can be useful to check if the user has an account at CERN (phonebook) and use that as a user name. If not, I usually take the first letter of the first name, and the full last name (a bit like CERN does).

There are 2 possibilities to register a user once this is obtained: 1) using the web portal: Manage Remote Configuration, in the section /DIRAC/Registry/Users, you need to add a section 'username' in which you will add the options DN, CN, Email having the values previously obtained. Then you will add the 'username' to the relevant group in /DIRAC/Registry/Groups. The base groups are ilc_user and private_proxy (although I'm not sure that one is still needed). And then Commit configuration and you are done

Changed:
<
<
2) run (with a diracAdmin proxy) dirac-admin-add-user -N username -D DN -M Email -G ilc_user. This one will not set the CN (not needed for the moment, maybe in the future) and will add the user to the ilc_user group (should be enough)

Then run dirac-dms-filecatalog-cli

>
>
  1. ) run (with a diracAdmin proxy) dirac-admin-add-user -N username -D DN -M Email -G ilc_user. This one will not set the CN (not needed for the moment, maybe in the future) and will add the user to the ilc_user group (should be enough)
  2. ) run (with a diracAdmin proxy) dirac-ilc-add-user -N username -D DN -C CN -E Email -G ilc_user,private_pilot as this will do the same as 2), but will also add the user to the FC, and create the directories and register the Owner metadata tag.
 
Added:
>
>
If you used 1) or 2), you need to also add the user to the File Catalog. Run dirac-dms-filecatalog-cli
user add username
 cd /ilc/user/
Changed:
<
<
>
>
 Check if the initial already exists, if not create it first
Changed:
<
<
mkdir initial
>
>
mkdir
 cd mkdir cd
Changed:
<
<
meta set Owner
>
>
meta set . Owner
 cd .. chgrp ilc_user chown
Line: 323 to 327
 Add the new user to the ilc-dirac@cernNOSPAMPLEASE.ch egroup. Use external email if needed.

Then you can send a mail to the new registered user, and close the issue.

Added:
>
>

FAQ

What if the Configuration service starts to be very slow?

That can be due to many things. The first check is to look at the Monitoring plots of the Configuration service. This will tell you the load of the system. If you see sudden rise in the number of active queries and/or Max file descriptors, this indicates something like a DOS attack. It's most likely due to a site that has a router issue: when packets are transmitted, something is lost and DIRAC tries again and again. To identify if this is a real issue, contact the LHCb people (Joel and/or Philippe Charpentier) and ask them if they see something similar. As we share a few of our sites, but not all, that's not necessarily a good indication. To see a bit further, you need to log onto the vobox hosting the configuration services and check the netstat output, grepping for 9135 as that's the CS port number. If you see many hosts of the same site, then the hypothesis is validated, and the site can be treated the following way:

  1. ) Ban the site with dirac-admin-ban-site so that no new jobs are attempted there
  2. ) Add the hosts to the iptables. You'll need to log as root and run /sbin/iptables -I INPUT -s 193.62.143.66  -j DROP where the IP can be the host name. Given the host, finding the IP is possible on many sites (GIYF).
  3. ) Add a JIRA issue mentioning the problem.
  4. ) If LHCb does not see the problem, put a ggus ticket against the culprit.

When the problem is resolved (either when the site replies or when LHCb says it's fixed), you can close the JIRA issue, unban the site (dirac-admin-allow-site), and reallow the IP (/sbin/iptables -D INPUT -s 193.62.143.66  -j DROP). Hopefully you should start to see things running smoothly again.

Revision 202013-05-15 - AndreSailer

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 285 to 285
 
Changed:
<
<
-- StephanePoss - 02-Aug-2012
>
>

Registering New Users:

When someone wants to use ILCDIRAC, they should have sent a mail to ilcdirac-register@cernNOSPAMPLEASE.ch with full name, Institution, experiment.

First thing: put a ticket on JIRA about new registration (title should include user's name, and assign it to you.)

With the info in the mail, one needs to make sure they are registered in the ILC VO members. For this, there is the script 'grid_users' in ~sposs/public. To run it: 1) source /afs/cern.ch/eng/clic/software/DIRAC/bashrc-glibc-2.5 2) dirac-proxy-init -g ilc_admin 3) ./grid_users This script will dump the full list of registered members, on which usually a grep on the name is enough to find the desired info. It is presented as follows /DC=ch/DC=cern/OU=Organic Units/OU=Users/CN=sposs/CN=641989/CN=Stephane Guillaume Poss, /DC=ch/DC=cern/CN=CERN Trusted Certification Authority - stephane.guillaume.poss@cernNOSPAMPLEASE.ch The first part (before the coma) is the DN, the second the CN (before the dash), and after the dash the mail. The 3 are needed for a proper DIRAC registration. One last bit is essential: a username. It can be useful to check if the user has an account at CERN (phonebook) and use that as a user name. If not, I usually take the first letter of the first name, and the full last name (a bit like CERN does).

There are 2 possibilities to register a user once this is obtained: 1) using the web portal: Manage Remote Configuration, in the section /DIRAC/Registry/Users, you need to add a section 'username' in which you will add the options DN, CN, Email having the values previously obtained. Then you will add the 'username' to the relevant group in /DIRAC/Registry/Groups. The base groups are ilc_user and private_proxy (although I'm not sure that one is still needed). And then Commit configuration and you are done 2) run (with a diracAdmin proxy) dirac-admin-add-user -N username -D DN -M Email -G ilc_user. This one will not set the CN (not needed for the moment, maybe in the future) and will add the user to the ilc_user group (should be enough)

Then run dirac-dms-filecatalog-cli

cd /ilc/user/

Check if the initial already exists, if not create it first

mkdir initial
cd <initial>
mkdir <username>
cd <username>
meta set Owner <username>
cd ..
chgrp ilc_user <username>
chown <username> <username>

Add the new user to the ilc-dirac@cernNOSPAMPLEASE.ch egroup. Use external email if needed.

Then you can send a mail to the new registered user, and close the issue.

Revision 192013-05-14 - StephanePoss

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Changed:
<
<

Tagging and distributing new releases

>
>

Tagging and distributing new releases

  Now that we are using GIT, the procedure is as follows:
Line: 30 to 30
  Look at https://raw.github.com/DIRACGrid/DIRAC/integration/releases.cfg
Changed:
<
<

Accessing the machines

>
>

Accessing the machines

  Resource request (new machine): https://cern.service-now.com/service-portal/report-ticket.do?name=hw-allocation&fe=HW-Resources
Line: 76 to 76
 
  • Web portal requires explicit request through net service (ports 80 and 443)
Changed:
<
<

Updating the machines

>
>

Updating the machines

  Don't screw the machines...
Line: 131 to 131
 
 oJob.setOwnerGroup( ownerGroup )
Added:
>
>
You also need to change the extendable production type for the production Monitoring page on the web portal. Simply edit
/opt/dirac/pro/Web/dirac/public/javascripts/jobs/ProductionMonitor.js
line 369 should read
  if(type == 'MCGeneration'){
 
Changed:
<
<

Checking the VOBOX status

>
>

Checking the VOBOX status

  Log on to lxvoadm (not possible to access it from lxplus, don't really know why, so use your machine).
 ssh lxvoadm 
Line: 214 to 223
  Normally the new user is now allowed to log on to the machine.
Changed:
<
<

JIRA

>
>

JIRA

 https://its.cern.ch/jira/browse/ILCDIRAC
Added:
>
>

gLite job matching

Understanding the way the job matching is done is necessary to understand why jobs get killed sometimes.

When a job is submitted to DIRAC, it is inserted in a certain TaskQueue that has a set of properties among which the CPU time needed in seconds. That's normally a "Wall clock" time. This CPUtime is not the one specified (as then one would have thousands of TaskQueues), but they are grouped per "segments". The different segments are in

DIRAC/WorkloadManagementSystem/Private/Queues.py
For example, if the CPU time is set to 300000 (case in the production), the closest segment is 4*86400 = 345600. So all the jobs that require 300000 CPU seconds will in fact require 345600 seconds.

When the TaskQueues are examined to create a grid job, the requirements are built using the following complicated procedure:

Rank = ( other.GlueCEStateWaitingJobs == 0 ? ( other.GlueCEStateFreeCPUs * 10 / other.GlueCEInfoTotalCPUs + other.GlueCEInfoTotalCPUs / 500 ) : -other.GlueCEStateWaitingJobs * 4 / ( other.GlueCEStateRunningJobs + 1 ) - 1 );

Lookup = "CPUScalingReferenceSI00=*";
cap = isList(other.GlueCECapability) ? other.GlueCECapability : { "dummy" };
i0 = regexp(Lookup,cap[0]) ? 0 : undefined;
i1 = isString(cap[1]) && regexp(Lookup,cap[1]) ? 1 : i0;
i2 = isString(cap[2]) && regexp(Lookup,cap[2]) ? 2 : i1;
i3 = isString(cap[3]) && regexp(Lookup,cap[3]) ? 3 : i2;
i4 = isString(cap[4]) && regexp(Lookup,cap[4]) ? 4 : i3;
i5 = isString(cap[5]) && regexp(Lookup,cap[5]) ? 5 : i4;
index = isString(cap[6]) && regexp(Lookup,cap[6]) ? 6 : i5;
i = isUndefined(index) ? 0 : index;

QueuePowerRef = real( !isUndefined(index) ? int(substr(cap[i],size(Lookup) - 1)) : other.GlueHostBenchmarkSI00);
#This is the content of CPUScalingReferenceSI00 (ex. 44162 for kek)

QueueTimeRef = real(other.GlueCEPolicyMaxCPUTime * 60); 
QueueWorkRef = QueuePowerRef * QueueTimeRef;

CPUWorkRef = real(345600 * 250);# 250 SpecInt 2000 or 1 HepSpec 2006

requirements = Rank >  -2 && QueueWorkRef > CPUWorkRef ;

The Rank is used to sort the sites according to the request.

The next block is used to obtain from the BDII the CPUScalingReferenceSI00 parameter of the CE, or, if not defined, the GlueHostBenchmarkSI00 parameter is used. This is the normalization factor of the CPU to HepSpec (jn principle). The GlueCEPolicyMaxCPUTime (minutes) is also obtained and converted to seconds, then multiplied by the scaling to have the maxCPUTime in HepSpec seconds units. The job's CPUtime (already in seconds) is converted also to HepSpec seconds units (factor 250), and put in the requirement which are used by the resource brokers.

It should be made clear that the GlueCEPolicyMaxCPUTime is not the GlueCEPolicyMaxWallTime parameter.

How to get GlueCEPolicyMaxCPUTime and CPUScalingReferenceSI00 from the BDII

You need to run

ldapsearch -x -LLL -h lcg-bdii.cern.ch -p 2170 -b o=grid '(&(objectclass=GlueCE)(GlueCEUniqueID=*kek*.jp*)(GlueCEAccessControlBaseRule=VO:ilc))' GlueCECapability GlueCEPolicyMaxCPUTime
where *kek*.jp* should be replaced by a proper CE.

How to get the available CEs?

This should list the available sites for the ILC VO. The problem is that this depends on the good will of the sites to publish their info in the BDII, so ti's not necessarily correct...

ldapsearch -x -LLL -h lcg-bdii.cern.ch -p 2170 -b o=grid '(&(objectclass=GlueCE)(GlueCEAccessControlBaseRule=VO:ilc))' GlueCEUniqueID
 -- StephanePoss - 02-Aug-2012

Revision 182013-05-13 - AndreSailer

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 88 to 88
 Start the cli:
 dirac-admin-sysadmin-cli 
then
Changed:
<
<
set host volcd01
>
>
 set host volcd<XX>.cern.ch 
 Then
 show info 
The CLI is a usual DIRAC cli: help is available.

To update it's

 update v12r0p2 
Added:
>
>
The version can be found in http://svnweb.cern.ch/world/wsvn/dirac/ILCDIRAC/trunk/ILCDIRAC/releases.cfg (for now)
 and that should do the trick. Then you need to restart the services:
 restart *
Line: 102 to 103
  All the machines should be updated the same way.
Added:
>
>
Then you have to make sure that all the services have restarted properly by checking the log files on every machine, agent and services.
Logs can be found here:
https://ilcdirac.cern.ch/DIRAC/ILC-Production/diracAdmin/systems/SystemAdministration/display (right click on the agent/service -->
Log)

Note: Stop the Transferagent on volcdlogse.cern.ch before restarting volcd01 (the machine with the RequestManager, machine will be changed soon)

If the Transferagent does not find any waiting requests, probably have to fix the SELECT Limit in /opt/dirac/pro/DIRAC/RequestManagementSystem/DB/RequestDBMySQL.py from 100 change to 3000.

 

Things to do for the moment when updating

There are several hacks that are needed because of the way we run:

Revision 172013-04-05 - StephanePoss

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 7 to 7
 Now that we are using GIT, the procedure is as follows:

  • Get a github account.
Changed:
<
<
>
>
 
  • Make the changes
  • Push to YOUR fork
  • From the github portal, make a pull request from the branch you created on your fork to the master branch of ILCDIRAC

Revision 162013-04-05 - StephanePoss

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Tagging and distributing new releases

Changed:
<
<
First start by having your code commited. To understand better, an example is useful. Say you modified ILCDIRAC/Workflow/Modules/OverlayInput.py. Now you want to create a new release based on the changes.
>
>
Now that we are using GIT, the procedure is as follows:
 
Changed:
<
<
  • First thing: notice the date, it is used as a tag name. Example the June 23rd, 2011 is 20110623.
>
>
  • Get a github account.
  • Fork the https://github.com/LCDgit/ILCDIRAC repository
  • Pull from https://github.com/LCDgit/ILCDIRAC the master branch, and make it a new branch for you: it's better to keep things separated.
  • Make the changes
  • Push to YOUR fork
  • From the github portal, make a pull request from the branch you created on your fork to the master branch of ILCDIRAC
  • Someone (the admin of the LCDgit) will create the tag: get the master branch, add the relevant stuff in the releases.notes and release.cfg files to define the new tag, commit, push to the LCDgit repo, not your fork, then make a tag, and push it too.
 
Changed:
<
<
  • Assuming it's the first time of the day you tag something, the tag version is 01
>
>
Once the tag is created, you can run dirac-distribution -l ILCDIRAC -r TAG (where TAG is the tag name). Follow the instructions at the bottom to copy the tar balls to the right location. You'll need a password that I'll give to the right people.
 
Changed:
<
<
  • So the full tag version is 2011062301 (merging of the 2 above).

  • Now the "package" you were working on is Workflow, so the tag name for that is "wo_" (see below for the other "packages" names conventions)

  • So the full tag name of that package is wo_2011062301

  • You are going to tag in svn the content of the Workflow directory, NOT the Workflow directory itself. In the tag wo_2011062301 there should be Modules/ and init.py only

  • Adding this tag to the next release is done the following way: in ILCDIRAC/versions.cfg, create a new section (based on the previous versions examples) where you make sure you you increment the version number, and set properly the tag names. You also need to modify ILCDIRAC/__init__.py that contains the ILCDIRAC version. Commit versions.cfg and init.py.

  • create a dirac tag: run
 dirac-create-svn-tag -p ILCDIRAC -v v12r0pX
or whatever version you set in the versions.cfg. The -p option is necessary to tell that command to tag ILCDIRAC (and not LHCbDIRAC for e.g.).

  • Once the tag is done, you need to create an ILC release. This is done in svn/reps/dirac/ILCDIRAC/trunk/releases.cfg. You'll look for v12r0p1 for example (last version as of 20120731). Create again a new section where you'll increase the version number and set ILCDIRAC version to the one you created earlier. Commit releases.cfg.

  • Run
dirac-distribution -l ILCDIRAC -r v12r0p2
(I use v12r0p2 as an example). Follow the instructions at the bottom to copy the tar balls to the right location. You'll need a password that I'll give to the right people.

  • You are done! Or nearly. You need to change the pilot version used by DIRAC: in the CS, under /Operations/ilc/ILC-Production/Pilot/Version, you need to change the version to the one you created: v12r0p2 (for example).
>
>
  • You are done! Or nearly. You need to change the pilot version used by DIRAC: in the CS, under /Operations/ilc/ILC-Production/Pilot/Version, you need to change the version to the one you created: TAG (for example).
 
  • Commit the CS changes.

Line: 36 to 24
  Changing the pilot version needs some time to be effective, as all pilots that were submitted with the old pilot version have to die to empty the queues. So make sure your changes are OK. Running on the DEV system helps of course (but not for productions). The release procedure is identical.
Deleted:
<
<
The other packages are "Interfaces", "Core", "OverlaySystem", "ProcessProductionSystem", and they have the corresponding prefix "if_", "co_", "ov_", and "pprs_" respectively.
 Once the release is made, you also need to update the file /afs/cern.ch/eng/clic/data/ILCDIRACTars/defaults/ilc.cfg to specify the new release number.

Finding what Dirac version are available

Revision 152013-03-11 - StephanePoss

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 133 to 133
 
 oJob.setOwnerGroup( ownerGroup )
Deleted:
<
<
Also, there is a LHCb specific thing in the TaskManager.py. It will be remove in a later version, but only in the v6r6 series of DIRAC. Find
 
runNumber
in that code, and replace the code with
  def _handleInputs( self, oJob, paramsDict ):
    """ set job inputs (+ metadata)
    """
    try:
      if 'InputData' in paramsDict:
        if paramsDict['InputData']:
          self.log.verbose( 'Setting input data to %s' % paramsDict['InputData'] )
          oJob.setInputData( paramsDict['InputData'] )
    except KeyError:
      pass
 

Checking the VOBOX status

Revision 142013-02-13 - StephanePoss

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 50 to 50
  Add the user to the e-group LxVoAdm-LCD so that he gets access to lxvoadm.cern.ch. lxvoadm.cern.ch cannot be reached from lxplus.
Changed:
<
<
cdbop
>
>
cdbop
 
Changed:
<
<
get profiles/profile_volcd05
>
>
get profiles/profile_volcd05
 
Changed:
<
<
get prod/customization/lcd/vobox/config
>
>
get prod/customization/lcd/vobox/config
 
Changed:
<
<
get prod/customization/lcd/vobox/filesystem_dirac_fileserver
>
>
get prod/customization/lcd/vobox/filesystem_dirac_fileserver
 
Changed:
<
<
emacs profiles/profile_volcd05.tpl (mind the !)
>
>
!emacs profiles/profile_volcd05.tpl (mind the !)
 
Changed:
<
<
update prod/customization/lcd/vobox/config.tpl
>
>
update prod/customization/lcd/vobox/config.tpl
  (update any other file changed)
Changed:
<
<
commit (and you are done after that)
>
>
commit (and you are done after that)
 
Changed:
<
<
sms get volcd05
>
>
sms get volcd05
 
Changed:
<
<
sms set production other "default" volcd05
>
>
sms set production other "default" volcd05
 
Changed:
<
<
If you need to reinstall a machine (needed when changing the partitions) run PrepareInstall volcd05
>
>
If you need to reinstall a machine (needed when changing the partitions) run PrepareInstall volcd05
 
Changed:
<
<
sms set maintenance other volcd05
>
>
sms set maintenance other volcd05
 
Changed:
<
<
PrepareInstall volcd05
>
>
PrepareInstall volcd05
 
Changed:
<
<
sms clear maintenance other volcd05
>
>
sms clear maintenance other volcd05
 
Changed:
<
<
ssh volcd05
>
>
ssh volcd05
 
Changed:
<
<
sudo spma_ncm_wrapper.sh
>
>
sudo spma_ncm_wrapper.sh
 
  • setup users, updates, etc.

You'll need to change the access rights on /opt/dirac that must belong to the user dirac, and /opt/dirac/etc must belong to dirac and should be readable ONLY by dirac (security issue). /opt/dirac/data (and /opt/dirac/data1) must belong to dirac and writable by dirac.

Revision 132013-01-29 - StephanePoss

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"
Changed:
<
<

Dirac for experts

>
>

Dirac for experts

 

Tagging and distributing new releases

First start by having your code commited. To understand better, an example is useful. Say you modified ILCDIRAC/Workflow/Modules/OverlayInput.py. Now you want to create a new release based on the changes.

Line: 234 to 234
  Normally the new user is now allowed to log on to the machine.
Added:
>
>

JIRA

https://its.cern.ch/jira/browse/ILCDIRAC
 -- StephanePoss - 02-Aug-2012 \ No newline at end of file

Revision 122012-11-22 - StephanePoss

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 170 to 170
  It requires to know that all services/agents are running with the runit framework (http://smarden.org/runit/). Dirac comes with a set of handy commands to allow proper supervision:
 runsvctrl t path/to/service 
Changed:
<
<
restarts the service at path/to/service (example: DataManagement_FileCatalog).
>
>
restarts the service at path/to/service (example: DataManagement_FileCatalog). To restart properly an agent, it is needed to create an empty file called stop_agent under /opt/dirac/control/Sytem/Agent.
 
 runsvctrl d path/to/service
takes down the service
 runsvctrl u path/to/service

Revision 112012-11-09 - StephanePoss

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 67 to 71
 sms set production other "default" volcd05

If you need to reinstall a machine (needed when changing the partitions) run PrepareInstall volcd05

Added:
>
>
 sms set maintenance other volcd05
Added:
>
>
 PrepareInstall volcd05
Deleted:
<
<
sms clear maintenance other volcd05
 
Added:
>
>
sms clear maintenance other volcd05
  ssh volcd05
Added:
>
>
 sudo spma_ncm_wrapper.sh
  • setup users, updates, etc.
Line: 76 to 83
 sudo spma_ncm_wrapper.sh
  • setup users, updates, etc.
Added:
>
>
You'll need to change the access rights on /opt/dirac that must belong to the user dirac, and /opt/dirac/etc must belong to dirac and should be readable ONLY by dirac (security issue). /opt/dirac/data (and /opt/dirac/data1) must belong to dirac and writable by dirac.
 Firewall
  • by default the machines are added to the IT CC LHCB DIRAC3 set which opens port 9130:9200 to the outsite
  • Web portal requires explicit request through net service (ports 80 and 443)

Revision 102012-11-09 - StephanePoss

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 44 to 44
  Look at https://raw.github.com/DIRACGrid/DIRAC/integration/releases.cfg
Added:
>
>

Accessing the machines

Resource request (new machine): https://cern.service-now.com/service-portal/report-ticket.do?name=hw-allocation&fe=HW-Resources

Add the user to the e-group LxVoAdm-LCD so that he gets access to lxvoadm.cern.ch. lxvoadm.cern.ch cannot be reached from lxplus.

cdbop

get profiles/profile_volcd05 get prod/customization/lcd/vobox/config get prod/customization/lcd/vobox/filesystem_dirac_fileserver

emacs profiles/profile_volcd05.tpl (mind the !)

update prod/customization/lcd/vobox/config.tpl (update any other file changed)

commit (and you are done after that)

sms get volcd05 sms set production other "default" volcd05

If you need to reinstall a machine (needed when changing the partitions) run PrepareInstall volcd05 sms set maintenance other volcd05 PrepareInstall volcd05 sms clear maintenance other volcd05

ssh volcd05 sudo spma_ncm_wrapper.sh

  • setup users, updates, etc.

Firewall

  • by default the machines are added to the IT CC LHCB DIRAC3 set which opens port 9130:9200 to the outsite
  • Web portal requires explicit request through net service (ports 80 and 443)
 

Updating the machines

Don't screw the machines...

Revision 92012-10-22 - StephanePoss

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 40 to 40
  Once the release is made, you also need to update the file /afs/cern.ch/eng/clic/data/ILCDIRACTars/defaults/ilc.cfg to specify the new release number.
Added:
>
>

Finding what Dirac version are available

Look at https://raw.github.com/DIRACGrid/DIRAC/integration/releases.cfg

 

Updating the machines

Don't screw the machines...

Revision 82012-08-23 - StephanePoss

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 83 to 83
 
 oJob.setOwnerGroup( ownerGroup )
Added:
>
>
Also, there is a LHCb specific thing in the TaskManager.py. It will be remove in a later version, but only in the v6r6 series of DIRAC. Find
 
runNumber
in that code, and replace the code with
  def _handleInputs( self, oJob, paramsDict ):
    """ set job inputs (+ metadata)
    """
    try:
      if 'InputData' in paramsDict:
        if paramsDict['InputData']:
          self.log.verbose( 'Setting input data to %s' % paramsDict['InputData'] )
          oJob.setInputData( paramsDict['InputData'] )
    except KeyError:
      pass
 

Checking the VOBOX status

Log on to lxvoadm (not possible to access it from lxplus, don't really know why, so use your machine).

Revision 72012-08-21 - StephanePoss

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 75 to 75
 
      bannedsites = oJob.workflow.findParameter("BannedSites")
      if bannedsites:
        if not "LCG.DESY-HH.de" in bannedsites.getValue():
Changed:
<
<
bs = bannedsites.getValue()";LCG.DESY-HH.de"
>
>
bs = bannedsites.getValue()+";LCG.DESY-HH.de"
  oJob._setParamValue( 'BannedSites', bs ) else: oJob.setBannedSites( 'LCG.DESY-HH.de' )

Revision 62012-08-08 - StephanePoss

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 38 to 38
  The other packages are "Interfaces", "Core", "OverlaySystem", "ProcessProductionSystem", and they have the corresponding prefix "if_", "co_", "ov_", and "pprs_" respectively.
Added:
>
>
Once the release is made, you also need to update the file /afs/cern.ch/eng/clic/data/ILCDIRACTars/defaults/ilc.cfg to specify the new release number.
 

Updating the machines

Don't screw the machines...

Line: 64 to 66
  All the machines should be updated the same way.
Added:
>
>

Things to do for the moment when updating

There are several hacks that are needed because of the way we run:

- Running in DESY: the production jobs should not run in DESY-HH to avoid interfering with the ILD DBD prods. For this, a hack must be put in place in DIRAC/TransformationSystem/Client/TaskManager.py In the method

 prepareTransformationTasks 
of the class
 WorkflowTasks 
you need to add
      bannedsites = oJob.workflow.findParameter("BannedSites")
      if bannedsites:
        if not "LCG.DESY-HH.de" in bannedsites.getValue():
          bs = bannedsites.getValue()";LCG.DESY-HH.de"
          oJob._setParamValue( 'BannedSites', bs )
      else:
        oJob.setBannedSites( 'LCG.DESY-HH.de' )
after the line
 oJob.setOwnerGroup( ownerGroup )
 

Checking the VOBOX status

Log on to lxvoadm (not possible to access it from lxplus, don't really know why, so use your machine).

Revision 52012-08-02 - StephanePoss

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 32 to 32
 
  • Commit the CS changes.

Changed:
<
<
  • To make sure you pick up the right version strait away, you need to restart the TaskQueueDirector, the SiteDirector and the Matcher. This can be done using the dirac-sysadmin-cli tool.
>
>
  • To make sure you pick up the right version strait away, you need to restart the TaskQueueDirector, the SiteDirector and the Matcher. This can be done using the dirac-admin-sysadmin-cli tool.
  Changing the pilot version needs some time to be effective, as all pilots that were submitted with the old pilot version have to die to empty the queues. So make sure your changes are OK. Running on the DEV system helps of course (but not for productions). The release procedure is identical.

Revision 42012-08-02 - StephanePoss

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 20 to 20
 
  • Adding this tag to the next release is done the following way: in ILCDIRAC/versions.cfg, create a new section (based on the previous versions examples) where you make sure you you increment the version number, and set properly the tag names. You also need to modify ILCDIRAC/__init__.py that contains the ILCDIRAC version. Commit versions.cfg and init.py.
Changed:
<
<
  • create a dirac tag: run dirac-create-svn-tag -p ILCDIRAC -v v1r18pX or whatever version you set in the versions.cfg. The -p option is necessary to tell that command to tag ILCDIRAC (and not LHCbDIRAC for e.g.).
>
>
  • create a dirac tag: run
 dirac-create-svn-tag -p ILCDIRAC -v v12r0pX
or whatever version you set in the versions.cfg. The -p option is necessary to tell that command to tag ILCDIRAC (and not LHCbDIRAC for e.g.).
 
Changed:
<
<
  • Once the tag is done, you need to create an ILC release. This is done in svn/reps/dirac/trunk/releases.cfg (not svn/reps/dirac/ILCDIRAC/trunk/releases.cfg, that's for the future). You'll look for ILC-v1r20p12 for example (last version as of 20110623). Create again a new section where you'll increase the version number and set ILCDIRAC version to the one you created earlier. Commit releases.cfg.
>
>
  • Once the tag is done, you need to create an ILC release. This is done in svn/reps/dirac/ILCDIRAC/trunk/releases.cfg. You'll look for v12r0p1 for example (last version as of 20120731). Create again a new section where you'll increase the version number and set ILCDIRAC version to the one you created earlier. Commit releases.cfg.
 
Changed:
<
<
  • Run dirac-distribution -r ILC-v1r20p13 -E (I use v1r20p13 as an example). Follow the instructions at the bottom to copy the tar balls to the right location. You'll need a password that I'll give to the right people.
>
>
  • Run
dirac-distribution -l ILCDIRAC -r v12r0p2
(I use v12r0p2 as an example). Follow the instructions at the bottom to copy the tar balls to the right location. You'll need a password that I'll give to the right people.
 
Changed:
<
<
  • You are done! Or nearly. You need to change the pilot version used by DIRAC: in the CS, under /Operations/ilc/ILC-Production/Versions/PilotVersion, you need to change the version to the one you created: ILC-v1r20p13 (for example).
>
>
  • You are done! Or nearly. You need to change the pilot version used by DIRAC: in the CS, under /Operations/ilc/ILC-Production/Pilot/Version, you need to change the version to the one you created: v12r0p2 (for example).
 
  • Commit the CS changes.

Changed:
<
<
  • To make sure you pick up the right version strait away, you need to restart the TaskQueueManager and the Matcher. This can be done using the dirac-sysadmin-cli tool.
>
>
  • To make sure you pick up the right version strait away, you need to restart the TaskQueueDirector, the SiteDirector and the Matcher. This can be done using the dirac-sysadmin-cli tool.
 
Changed:
<
<
Changing the pilot version needs some time to be effective, as all pilots that were submitted with the old pilot version have to die to empty the queues. So make sure your changes are OK. Running on the DEV system helps of course (but not for productions). For that you just need to commit your changes and run dirac-distribution-r ILC-HEAD -E. Then submit a few test jobs. No need to change anything in the versions.cfg or to tag anything.
>
>
Changing the pilot version needs some time to be effective, as all pilots that were submitted with the old pilot version have to die to empty the queues. So make sure your changes are OK. Running on the DEV system helps of course (but not for productions). The release procedure is identical.
 
Changed:
<
<
The other packages are "Interfaces" and "Core", and they have the corresponding prefix "if_" and "co_" respectively.
>
>
The other packages are "Interfaces", "Core", "OverlaySystem", "ProcessProductionSystem", and they have the corresponding prefix "if_", "co_", "ov_", and "pprs_" respectively.

Updating the machines

Don't screw the machines...

Now that this bit of advice is done, for the serious stuff.

Get an admin proxy

 dirac-proxy-init -g diracAdmin 

Start the cli:

 dirac-admin-sysadmin-cli 
then set host volcd01 Then
 show info 
The CLI is a usual DIRAC cli: help is available.

To update it's

 update v12r0p2 
and that should do the trick. Then you need to restart the services:
 restart *
Then the connection is lost. so you need to do the set host again.

All the machines should be updated the same way.

 

Checking the VOBOX status

Changed:
<
<
Log on to lxvoadm (not possible to access it from lxplus, so use your machine).
>
>
Log on to lxvoadm (not possible to access it from lxplus, don't really know why, so use your machine).
 
 ssh lxvoadm 

Logon to the VOBOX of interest: volcd01 for most services, volcd03 for dev platform and Log file storage, volcd02 for File Catalog DB only (no services running).

Line: 46 to 74
  Make yourself dirac user: you'll need to be dirac to start/stop services:
 sudo su dirac 
Changed:
<
<
This also sources some environement: you should be under /opt/dirac/pro after sudo. Then you should go to
/opt/dirac/startup
to have the services/agents running on the machine.
>
>
This also sources some environment: you should be under /opt/dirac/pro after sudo. If not,
source /opt/dirac/bashrc
Then you should go to
/opt/dirac/startup
to have the services/agents running on the machine.
  Check the disk space with
df -h
Changed:
<
<
/opt/dirac should never be at a 100%. In that case, the services start to have problems. In the worst case, the web page fails because it cannot put anything in cache. To "fix" the situation, usually restarting the services is enough: the mySQL cache is emptied, and some disk space recovered. It allows agents to work (in particular the JobCleaningAgent). Now, how to do that?
>
>
/opt/dirac should never be at a 100%. In that case, the services start to have problems. In the worst case, the web page fails because it cannot put anything in cache. To "fix" the situation, usually restarting the services is enough: the mySQL cache is emptied, and some disk space recovered. It allows agents to work (in particular the JobCleaningAgent). Now, how to do that?
  It requires to know that all services/agents are running with the runit framework (http://smarden.org/runit/). Dirac comes with a set of handy commands to allow proper supervision:
 runsvctrl t path/to/service 
Changed:
<
<
restarts the service at path/to/service (example: DataManagement_FileCatalog).
>
>
restarts the service at path/to/service (example: DataManagement_FileCatalog).
 
 runsvctrl d path/to/service
takes down the service
 runsvctrl u path/to/service
Line: 64 to 92
 
 runsvstat *
To see what is running and what is down. All on volcd01 should be running.
Added:
>
>

Granting access to someone

To have access to lxvoadm, a new user must be registered in the mailing list LxVoAdm-LCD. Then he can ssh lxvoadm.

Then, in addition, the new user should be granted access to the volcd pool of machines. This is done by changing the machine templates. For this, one should use CDB (for the moment PUPPET is not in production yet). Here is a small usage information: In lxvoadm, type

cdbop
This utility is the thing that is used to manage all the machines. It's got a shell like interface, with tab completion... In cdb, type
get prod/customization/lcd/vobox/config
but it will complain that the file is already there (no merging, don't ask why) if you already have the file([ERROR] 'prod/customization/lcd/vobox/config.tpl': file already exists). To execute a shell command it's done by prepending a !, like in the folllowing example:
!rm prod/customization/lcd/vobox/config.tpl

Once you got the config, you'll have a prod/customization/lcd/vobox/config.tpl in you directory. To edit, the easiest for me is to do

!nano prod/customization/lcd/vobox/config.tpl
In that file, there are losts of things, but for what concerns us, you need to find the lines
  "/software/components/useraccess" = add_root_access(list("sposs","atsareg","rgracian","cgrefe"));
and
   "/software/components/useraccess/users/cgrefe/acls" = list("system-auth");
and
   "/software/components/sudo/privilege_lines" = push(nlist(
    "user", "cgrefe",
    "run_as", "ALL",
    "host", "ALL",
    "cmd", "NOPASSWD:ALL"));
and finally
   "/software/components/interactivelimits/values" = list(
     list('sposs', '-', 'maxlogins', '10'),
     list('cgrefe', '-', 'maxlogins', '10'),
     list('rgracian', '-', 'maxlogins', '10'),
     list('dirac', '-', 'maxlogins', '15'),
     list('dirac', '-', 'nofile', '8192'),
     list('atsareg', '-', 'maxlogins', '10'),
     list('msapunov', '-', 'maxlogins', '10'),
     list('jfstrube', '-', 'maxlogins', '10'),
     list('*', '-', 'maxlogins', '0'),
  );

where you will add the new user.

Once done you need to save and exit, then

 update prod/customization/lcd/vobox/config.tpl 
then
 commit 
where it will ask to confirm and to give a message. In some cases, when you have a typo in the message, you cannot fix it as backspace introduces a new character. I do not fix my typos in there (who reads those things anyway?)

It should tell you that all went fine after a while, then you can quit cdb with

 exit 

Then on all the machines, to which you log on using ssh volcdX you'll need to run

 sudo spma_ncm_wrapper.sh 
that is used to apply the configuration, and do the necessary updates. This command should be ran in any case once in a while (every month or so) to make sure the machine is up to date. It will update the user access rights also.
 
Added:
>
>
Normally the new user is now allowed to log on to the machine.
 
Deleted:
<
<
-- StephanePoss - 23-Jun-2011
 \ No newline at end of file
Added:
>
>
-- StephanePoss - 02-Aug-2012

Revision 32011-06-29 - StephanePoss

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"

Dirac for experts

Line: 22 to 22
 
  • create a dirac tag: run dirac-create-svn-tag -p ILCDIRAC -v v1r18pX or whatever version you set in the versions.cfg. The -p option is necessary to tell that command to tag ILCDIRAC (and not LHCbDIRAC for e.g.).
Changed:
<
<
  • Once the tag is done, you need to create an ILC release. This is done in dirac/releases.cfg (not ILCDIRAC/releases.cfg, that's for the future). You'll look for ILC-v1r20p12 for example (last version as of 20110623). Create again a new section where you'll increase the version number and set ILCDIRAC version to the one you created earlier. Commit releases.cfg.
>
>
  • Once the tag is done, you need to create an ILC release. This is done in svn/reps/dirac/trunk/releases.cfg (not svn/reps/dirac/ILCDIRAC/trunk/releases.cfg, that's for the future). You'll look for ILC-v1r20p12 for example (last version as of 20110623). Create again a new section where you'll increase the version number and set ILCDIRAC version to the one you created earlier. Commit releases.cfg.
 
  • Run dirac-distribution -r ILC-v1r20p13 -E (I use v1r20p13 as an example). Follow the instructions at the bottom to copy the tar balls to the right location. You'll need a password that I'll give to the right people.

Revision 22011-06-27 - StephanePoss

Line: 1 to 1
 
META TOPICPARENT name="DiracUsage"
Deleted:
<
<
 

Dirac for experts

Changed:
<
<
This page discusses the tagging and distribution of a new release.
>
>

Tagging and distributing new releases

  First start by having your code commited. To understand better, an example is useful. Say you modified ILCDIRAC/Workflow/Modules/OverlayInput.py. Now you want to create a new release based on the changes.
Line: 37 to 36
  The other packages are "Interfaces" and "Core", and they have the corresponding prefix "if_" and "co_" respectively.
Added:
>
>

Checking the VOBOX status

Log on to lxvoadm (not possible to access it from lxplus, so use your machine).

 ssh lxvoadm 

Logon to the VOBOX of interest: volcd01 for most services, volcd03 for dev platform and Log file storage, volcd02 for File Catalog DB only (no services running).

 ssh volcd01 

Make yourself dirac user: you'll need to be dirac to start/stop services:

 sudo su dirac 
This also sources some environement: you should be under /opt/dirac/pro after sudo. Then you should go to
/opt/dirac/startup
to have the services/agents running on the machine.

Check the disk space with

df -h

/opt/dirac should never be at a 100%. In that case, the services start to have problems. In the worst case, the web page fails because it cannot put anything in cache. To "fix" the situation, usually restarting the services is enough: the mySQL cache is emptied, and some disk space recovered. It allows agents to work (in particular the JobCleaningAgent). Now, how to do that?

It requires to know that all services/agents are running with the runit framework (http://smarden.org/runit/). Dirac comes with a set of handy commands to allow proper supervision:

 runsvctrl t path/to/service 
restarts the service at path/to/service (example: DataManagement_FileCatalog).
 runsvctrl d path/to/service
takes down the service
 runsvctrl u path/to/service
restarts the service after using the previous.

One can also use

 runsvstat *
To see what is running and what is down. All on volcd01 should be running.
  -- StephanePoss - 23-Jun-2011 \ No newline at end of file

Revision 12011-06-23 - StephanePoss

Line: 1 to 1
Added:
>
>
META TOPICPARENT name="DiracUsage"

Dirac for experts

This page discusses the tagging and distribution of a new release.

First start by having your code commited. To understand better, an example is useful. Say you modified ILCDIRAC/Workflow/Modules/OverlayInput.py. Now you want to create a new release based on the changes.

  • First thing: notice the date, it is used as a tag name. Example the June 23rd, 2011 is 20110623.

  • Assuming it's the first time of the day you tag something, the tag version is 01

  • So the full tag version is 2011062301 (merging of the 2 above).

  • Now the "package" you were working on is Workflow, so the tag name for that is "wo_" (see below for the other "packages" names conventions)

  • So the full tag name of that package is wo_2011062301

  • You are going to tag in svn the content of the Workflow directory, NOT the Workflow directory itself. In the tag wo_2011062301 there should be Modules/ and init.py only

  • Adding this tag to the next release is done the following way: in ILCDIRAC/versions.cfg, create a new section (based on the previous versions examples) where you make sure you you increment the version number, and set properly the tag names. You also need to modify ILCDIRAC/__init__.py that contains the ILCDIRAC version. Commit versions.cfg and init.py.

  • create a dirac tag: run dirac-create-svn-tag -p ILCDIRAC -v v1r18pX or whatever version you set in the versions.cfg. The -p option is necessary to tell that command to tag ILCDIRAC (and not LHCbDIRAC for e.g.).

  • Once the tag is done, you need to create an ILC release. This is done in dirac/releases.cfg (not ILCDIRAC/releases.cfg, that's for the future). You'll look for ILC-v1r20p12 for example (last version as of 20110623). Create again a new section where you'll increase the version number and set ILCDIRAC version to the one you created earlier. Commit releases.cfg.

  • Run dirac-distribution -r ILC-v1r20p13 -E (I use v1r20p13 as an example). Follow the instructions at the bottom to copy the tar balls to the right location. You'll need a password that I'll give to the right people.

  • You are done! Or nearly. You need to change the pilot version used by DIRAC: in the CS, under /Operations/ilc/ILC-Production/Versions/PilotVersion, you need to change the version to the one you created: ILC-v1r20p13 (for example).

  • Commit the CS changes.

  • To make sure you pick up the right version strait away, you need to restart the TaskQueueManager and the Matcher. This can be done using the dirac-sysadmin-cli tool.

Changing the pilot version needs some time to be effective, as all pilots that were submitted with the old pilot version have to die to empty the queues. So make sure your changes are OK. Running on the DEV system helps of course (but not for productions). For that you just need to commit your changes and run dirac-distribution-r ILC-HEAD -E. Then submit a few test jobs. No need to change anything in the versions.cfg or to tag anything.

The other packages are "Interfaces" and "Core", and they have the corresponding prefix "if_" and "co_" respectively.

-- StephanePoss - 23-Jun-2011

 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback