LCG Grid Deployment - LCG Production Services

Definition of GMOD

  • The GMOD is a service manager from the GD group, changing on a weekly Rota basis. S/he has a back-up, also a service manager from the GD group.
  • The GMOD is supposed to be 'on duty' during working hours.
  • The main function of the GMOD is to ensure that problems reported for GD managed machines are properly followed up and solved.
  • The GMOD receives all tickets that are sent to the REMEDY mail feeds SERVICE.support@cernNOSPAMPLEASE.ch where SERVICE is one of: rb, wms, ce, lfc, fts, bdii, mon, px (myproxy server), sam, voms, vomrs.
  • The GMOD should either solve this problem herself/himself, if possible, ask other service managers for help, or make sure the problem is handed over to the expert and followed up by them. In addition, the GMOD should ensure that the problem is acknowledged to whoever reported the problem in a reasonable time.

People acting as GMOD should refer to the Remedy ROC Structure document for "official" details about workflow, roles and responsibilities defined for the IT Services within in the CERN Remedy PRMS.

This page deals with the specific details and duties of the GMOD "job" as done within the GD group.

How to contact the GMOD

  • PHONE:
    • primary: 164111 (+41764874111)
    • backup: 164222 (+41764874222)

The primary phone lies with the GMOD, the backup with the GMOD back up.

GMOD rota

GMOD rota for the GMOD and his/her back up

GMOD meetings

Day Time Meeting Docs
Monday 09:00 - 09:20 Daily Morning Meeting Last Alarms
  16:00 - 17:30 WLCG/EGEE/OSG operations meeting Minutes
Tuesday 09:00 - 09:20 Daily Morning Meeting Last Alarms
Wednesday 09:00 - 09:20 Daily Morning Meeting Last Alarms
  10:00 WLCG SCM Meeting Agenda
  15:00* CCSR Minutes
Thursday 09:00 - 09:20 Daily Morning Meeting Last Alarms
Friday 09:00 - 09:20 Daily Morning Meeting Last Alarms
  09:00 - 14:00 CERN-PROD availability report weekly CERN-PROD RC report

There is google calendar of these items you can use. However the list here is the authoritive list of events and actions to take. Please ask one of the other gmods to add you to the list of people who can maintain and view this version.

* Before you go to the CCSR meeting, submit a written report for the minutes.

Responsibilities of the GMOD

The GMOD should proactively handle problems arising during her/his duty time. The main activity is to coordinate problem solutions and to inform people concerned.

  • have a look and distribute the Remedy tickets
    • you should look for - NON FIXED CASES: My Group Assignments
  • follow up the status of the services we are responsible for GD services, July '07
  • represent all services in the weekly meetings:
  • Coordinate information sent outside, to the grid, about all the CERN-PROD services
    • EGEE broadcast (use To LCG Service Challenges responsibles target for WLCG services)
  • Coordinate interventions in the GD services and with FIO (SMOD), e.g.:
    • Mw upgrade
    • Kernel upgrade
  • Announce CERN production service interventions to the 9:00 daily meetings:
  • Announce CERN production service interventions to the grid:
    • It is the GMOD (it-dep-gd-gmod@cernNOSPAMPLEASE.ch) who will be responible for deciding if the announcement should be broadcast to 'the Grid' (via the CIC portal) and if additional info is required (e.g. to clarify for external users).
    • Use the EGEE broadcast for this, following the standard templates as defined here
  • Renew the host certificates for nodes in production when they are expired: https://twiki.cern.ch/twiki/bin/view/LCG/GDReqHostCert
  • Check the weekly CERN-PROD RC report and make sure that all unavailability longer than 2 hours are explained with the following format:
    • Problem
    • Cause
    • Solution
    • Most of them are related to site services, under FIO responsibility, so our task is to check that all are explained, and if not, get all information possible from SAM, correlate with the downtimes/broadcasts of the week (the gmod knows because she/he has been sending them and attending the morning meetings), and send all this to grid-cern-prod-admins@cernNOSPAMPLEASE.ch so they have all information and finally fill it. This should be done on Friday morning, before 2.00 pm

CDB/LANDB mapping

The service managers should check the mapping between the CDB and LANDB informations for the machines they are responsible, in particular for the following fields:

LANDB CDB Description
Tag /system/cluster/name The name of your cluster (eg. gridvoms, etc.).
Tag /system/cluster/subname The name of your subcluster, if any (eg. gridwms is a subcluster of cluster grid).
Description /system/cluster/description The description of your cluster/node (eg. "gLite WMS (Workload Management System) 3.1".
Main user of the device /system/cluster/usercontact The name of the main service manager.

Note that you can specify the value of the CDB variables at the node level (eg. by editing template profile_wms101.tpl) or at the cluster level (eg. by editing template pro_service_gridwms.tpl). Please contact SteveTraylen or Yvan.Calas@cernNOSPAMPLEASE.ch if you have any question concerning CDB.

For the time being, GD is responsible of the following clusters:

LANDB Tag Description Comments
gridwms gLite WMS (Workload Management System) 3.1 New Cluster.
gridlb gLite LB (Logging and Bookkeeping) 3.1 New Cluster.
gridrb gLite WMS 3.0 and 3.1 All the nodes belonging to this cluster will be moved to cluster gridwms in July 2007.
lcgrb LCG RB (Resource Broker) -
sam-mon SAM clients and servers -
sam-bdii BDIIs for SAM -
sam-dpm DPM for SAM -
gridvoms VOMS nodes -
gridfts FTS nodes landb to update.
gdui GD-only official UI with incoming connectivity For GD only.

For example, if you want to have the list of all the machines belonging to a given cluster (lcgrb for example), go the netops web page and fill the field "Tag" with string "lcgrb".

There is also a wiki page related to the actual status of the WMS, LB and RB nodes here.

Finding Information abut Clusters and Nodes

See GModClusterNodeQueries.

GMOD Reports/Presentations

Useful links

Link Purpose
Node Status page Services managed by GD, OPS cluster
IT Status board Service Status & Scheduled Interventions
CCAlarms Detailed view of CC alarms
LEMON CC Monitoring
SMOD FIO SMOD Twiki
EGEE Broadcast EGEE broadcast tool
SC4 Plans Experiment SC4 Schedules and Plans
ServiceInterventions Grid Service Interventions' template/check-list

Useful e-mails and mailing lists

E-mail Scope
mod@cernNOSPAMPLEASE.ch Manager of the day, to get an announcement on the IT servive status page
it-dep-fio-smod@cernNOSPAMPLEASE.ch FIO mod
it-support-mm@cernNOSPAMPLEASE.ch mailing list to receive the minutes of the CD morming meetings

How to use Remedy

you can use one of the following, but the Windows client is recommended

  • the (Remedy web interface)
  • the Remedy client available for any Windows PC/Laptop
  • the Remedy client available on the Windows Terminal Service via remote desktop (if you do not have the GUI standard on CERN SL type "rdesktop cernts -a 15 -g 1280x1024" in a terminal window)
  • the mail-feed to Remedy, i.e. by submitting email to arsystem@sunar01NOSPAMPLEASE.cern.ch with special keywords on the message Subject. Instructions here.

It may happen that a ticket previously assigned to the GMOD or to a service manager in GD had to be re-routed to people in FIO. In such a case, FIO service managers have explicitly asked us not to assign tickets to the relevant expert remedy sub-category (!) but to leave them in "General", because they want all their tickets to be processed by the SMOD.

The Cern ROC set-up a page with Remedy Tip and Tricks, where Gmods may find useful hints to use effectively some advanced features (e.g. Interaction with GGUS, Advaced Searches).

Some useful information (connection) can also be found in a FAQ page, this one more specifically addressed to the Remedy-GGUS inteface and therefore not directly in the scope of the GMOD.

Remedy CERN homepage is http://service-it-remedy.web.cern.ch/service-it-remedy

List of GD service managers and service experts

http://egee-docs.web.cern.ch/egee-docs/ROC_CERN\gd-service-mgrs-experts.htm

Instructions for EGEE broadcasts

Remember to communicate the information concerning CERN production services to the SMOD (it-dep-fio-smod@cernNOSPAMPLEASE.ch) and to the MOD (mod@cernNOSPAMPLEASE.ch) to ensure that they are also aware.

Guidelines to send broadcasts:

  • Use the EGEE broadcast for this, following the standard templates as defined here
  • Follow WLCG procedures as specified in Scheduling of Service Interruptions at WLCG Sites, mainly regarding:
    • Timelines for announcements
    • Announcement for some cases to the operations meeting through the site reports
    • Use UTC time (or local + UTC)
  • Write it from the user point of view, mentioning the way the service will be affected:
    • FTS service will be down, instead of LGCR rack down, or DNS service not available
  • List affected grid production services and VOs
  • Put a meaningful title, starting with the official site name related to the intervention, e.g. CERN-PROD:
  • Short and concise messages are preferred

Selection of the recipients:

  • Always set "News publication in all CIC portal views" to yes
  • If it ONLY affect T1s (and no other sites): To WLCG Tier-1 contacts
  • If it affects the COD activity: To CIC-on-duty (CIC-on-duty mailing list)
  • Include always: ROC Managers (ALL ROC Managers by default)
  • Include always: Affected VO managers; if this is not know, all VO managers (by default)
  • Affected VO users, only when affected, do not SPAM VO users mailing lists!
  • If affects all sites or a subset of T1s/T2s: Production Site Admin (All by default)
  • If affect the PPS service: PPS Site Admin (All by default)
  • Examples:
    • SAM will be down: it affects the COD, all production sites, PPS, all VO managers (SAM is also used by the VOS), ROC managers
    • VOMS intervention: Affected VO managers, affected VO users, ROC managers
    • Castor intervention: WLCG Tier-1 contacts, ROC managers, VO managers, VO users
    • FTS intervention: WLCG Tier-1 contacts, ROC managers, VO managers, VO users

Example of a good broadcast text:

Dear WLCG users,

On Thursday, February 22 from 8:00 am until 11:00 am UTC we are planning an 
intervention on our Oracle cluster.
During that time the following Grid services will be down at CERN:
  * FTS
  * LFC
  * VOMS/VOMRS (ALICE, ATLAS, CMS, LHCb, DTEAM, OPS, Sixt, Unosat, Geant4)
  * SAM and GridView
  * FCR

The intervention will take 3 hours and should be finished by 11:00 am UTC. 
Thank you for your understanding.

Grid Manager on Duty at CERN

VOMS Service interruption announcement template for GMOD use

Publish on the CIC portal with the following options:

  News on cic.gridops.org: YES

  Email to: 
       ROC managers, 
       VO managers of ALICE, ATLAS, CMS, LHCb, DTEAM, Geant4 and OPS *only!!*, 
       VO users of ALICE, ATLAS, CMS *only!!*,
       Production and PPS Site Admins **only if gridmap file generation is affected !!**
      
Add in copy on the CIC portal OSG contacts 
goc@opensciencegrid.org and rquick@iu.edu **NB!! There is no such button on the broadcast form!!**


Title: DATE TIME TIMEZONE scheduled interruption of the CERN vomrs and voms services

Text:
All voms and vomrs services (registration, gridmap file update and proxies) will not be accessible 
during DATE TIME TIMEZONE. Reason: TYPE THE REASON HERE.

This applies to VOname = ALICE, ATLAS, CMS, LHCb, DTEAM, OPS, Sixt, Unosat, Geant4

Please contact project-lcg-vo-dteam-admin@cern.ch in case of problem.

Thank you for your understanding.

-- Main.diana - 09 Oct 2006

Topic attachments
I Attachment History Action Size Date Who Comment
Texttxt 20060904_week_GMOD_report.txt r2 r1 manage 2.3 K 2006-09-08 - 12:05 UnknownUser GMOD Report on the week of 20060904
HTMLhtm C:\GMOD\20060911_week_GMOD_report.htm r1 manage 15.4 K 2006-09-15 - 18:49 UnknownUser  
PDFpdf SC4-scheduled-maintenance-June21.pdf r1 manage 22.9 K 2007-02-22 - 17:21 UnknownUser WLCG-scheduled-maintenance
HTMLhtm gd-service-mgrs-experts.htm r1 manage 5.5 K 2006-11-14 - 14:57 UnknownUser GD service managers and experts
Edit | Attach | Watch | Print version | History: r69 < r68 < r67 < r66 < r65 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r69 - 2009-03-30 - SophieLemaitre
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback