Important to know for the service

  • The pool account/static account mapping doesn't matter (at all and for all Experiments) on the WMS.
  • It only matters on the CEs. BUT Ulrich would like to have all mappings synchronized between all the CERN services. Makes sense.

  • CMS only uses WMS to submit jobs, not RBs.
  • Which WMSes to use is configured in CMS own CRAB configuration files... They don't come from the AFS UI config file. They have two different config files (one for the analysis, and one for ?)
  • RBs are only used by the SAM tests for CMS. In this case, the SAM RBs are used (i.e. rb113, 115 and 127), and not the CMS RBs.
  • Conclusion: the CMS RBs could be decommissioned at some point. (check with cms.support and Enzo Miccio first...)

  • Request from CMS: submit CMS SAM tests to a 3.1 SAM WMS (not a CMS WMS). lcgadmin and production CMS VOMS need to be able to submit jobs (not possible at the moment from wms113)...

  • From Maarten (email thread called "Atlas WMS"):

> The LB service must be drained for a few days before the index can be created.  

Draining means: taking it out of any /opt/glite/etc/glite_wms.conf,
restarting /opt/glite/etc/init.d/glite-wms-wmproxy on the affected WMS nodes,
and waiting for a few days (better a whole week) for activity to drop.

> > The WMS-LB relationships are bizarre for historical reasons and should be
> > cleaned up.  More on that separately.

There are a number of factors:
- each WMS should point to at least 2 LB servers for load-balancing and
  failover;
- each experiment should have its own LB servers, so that one experiment
  cannot slow down or screw up another (this has been seen);
- SAM should have its own LB servers for the same reasons;
- the small VOs should have their own shared LB servers;
- we should have at least 1 hot spare, e.g. to remedy overloads;
- we only have 5 machines (lb102-lb106).

So, we have had to juggle...

Ideally we would have 2 * (4 + 1 + 1) + 1 = 13 LB servers.
With the new 8-core boxes we could put 4 LB servers on a box,
each in its own VM (e.g. 1 per experiment), but I have no idea of
the combined performance then.  Both disk and network I/O matter.
What are the HW prospects for the WMS and LB services?
Thanks,
   Maarten

Log files

/var/log/glite/httpd-wmproxy-access.log
/var/log/glite/httpd-wmproxy-errors.log

FAQ

With a jobid, how to know from which WMS the job has been submitted

Directly in the LB database:

mysql> select host from events where jobid='https://lb106.cern.ch:9000/NML-uwEAttywJd66KZlrMw';
mysql> select e.host from events e,states s where e.jobid=s.parent_job and e.event=0 and s.jobid='NML-uwEAttywJd66KZlrMw';    # if it is a subjob (from a job collection)

-- SophieLemaitre - 02 Jul 2008

Edit | Attach | Watch | Print version | History: r5 < r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r5 - 2008-07-04 - SophieLemaitre
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Main All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback