WLCG GGUS Operations

Every day

Scan through the GGUS notifications in your inbox. They may concern GGUS tickets for:

  • ROC_CERN, i.e. Grid Services. There it is important that the GGUS-SNow interface works well and the right supporters follow up the tickets with a response time relevant to the ticket priority and type. ALARM tickets should be in the hands of the experts within less than 1 hour.
  • The GGUS Support Unit (SU), i.e. incidents and requests related to the GGUS infrastructure itself.
  • Any other SU to which you belong. There you typically would act as a supporter.
  • Any GGUS ticket that was brought up at the WLCG Operations (Coordination) meetings as having a problem in routing or response.
  • Investigate user complaints for any email you receive as member of the ggus-escalation-notifications e-group, with Subject: REMINDER Escalation Level X.

In case of GGUS downtime

The e-group ggus-downtimes contains four sub-e-groups, named ggus-downtimes-VOname (VOname = alice | atlas | cms | lhcb). When the GGUS developers publish a downtime (scheduled or not) in GOCDB they should email the top-level e-group in addition. The sub-e-group members within the experiments decide whom to inform in their community.

Every Monday

Update the GGUS section in the appropriate page under WLCGOperationsMeetings with announcements of upcoming releases or debug info for relevant issues, if any. Participate in this meeting. If you can't be there, please read the notes from the meeting, in case there are GGUS tickets wrongly assigned or not properly followed up. There might also be new development requests, problems with the SNow or OSG interfaces, misunderstandings concerning the workflows, TEAM or ALARM ticket creators in the experiments who lost their privileges etc.

Before the WLCG MB (on Monday)

Prepare the graph of tickets

Update the file ggus-tickets.xlsx. Download the latest version from the WLCGOperationsMeetings page, where it should be permanently attached. This file contains weekly summaries so that the corresponding graph will show GGUS ticket traffic evolution over regular intervals. You need to cover the period from the Monday before the previous MB up to the Monday preceding the current MB, one week at a time. Open the ggus-tickets.xlsx file and:

  • move the mouse to the bottom right corner of the table where you will see a small mark;
  • click on that mark and drag the mouse downward to extend the table;
  • please add exactly the number of rows that are needed to cover the weeks for your report.

The dates and the totals per experiment will be filled in automatically. When needed, please move the graph area downward to make space for the additional rows.

The GGUS Report Generator is used to obtain the numbers for each additional row, per experiment. Instructions:

  1. Open the GGUS Report Generator (full documentation here).
  2. Select period from Monday-week-N to Monday-week-N+1
  3. Select the 4 LHC VOs and click on Group by.
  4. Select ALL ticket types and click on Group by.
  5. Select ALL ticket categories except Test and CMS Internal.
  6. Select weekly aggregation.
  7. Click GO!
  8. Write the totals of each week in your local copy of file ggus-tickets.xlsx.
  9. To help avoid mistakes, compare the value in each column with the ones directly above:
    big changes in any of the columns ought to be rare.
  10. Pay special attention to alarm tickets, see below.

When all weeks have been done, update the ggus-tickets.xlsx attachment on the WLCGOperationsMeetings page. Mind you will need the graph for the MB Service Report, as documented below.

Alarm tickets need special attention:

  • Test alarms (e.g. accompanying GGUS releases) are not always marked Test.
  • Use the GGUS search engine to list all alarm ticket candidates:
    • For Special attributes select ALARM-Tickets.
    • For Status ensure all is selected.
    • For Ticket category select Incident.
    • Select the appropriate time period and click GO!
  • Check the subjects of the list of tickets shown: test cases should be obvious
    and must neither be included in ggus-tickets.xlsx nor in the MB report.
    • The test alarm subjects for CERN start with FTS sends files in the wrong direction.
  • Each real alarm ticket should be briefly described in the MB report,
    usually in the operations section for the affected experiment and/or site.

Then, when all the relevant weeks have been done individually, to simplify filling out the table on the GGUS slide (see below):

  1. Select period from Monday-previous-MB-week to Monday-current-MB-week.
  2. Select yearly aggregation.
  3. Click GO!
  4. Write the totals per ticket type for each experiment in the table on the GGUS slide, see below.

Prepare the slide for the MB

  • Use the template attached to this page to make the slide for the service report.
  • Include the graph from the latest ggus-tickets.xlsx attached to WLCGOperationsMeetings. Beware: the legend info must be complete (all 4 experiments) and readable. NOTE: this may need to be done on a Windows host (e.g. the Windows Terminal Service cernts.cern.ch, available e.g. through the MS Remote Desktop client), because on MacOS the graph legend date format may not work! A workaround would be to import the graph as an image instead of an Excel workbook.

Around GGUS release dates

  1. On Monday at 3pm two days before: announce the upcoming release in the WLCG Operations meeting minutes. Emphasize any important upcoming changes listed in the release notes.
  2. Assist the GGUS team with the follow-up of problematic test alarm tickets, if needed.

About the GGUS-SNow interface

Although the mappings were agreed in January 2011, the interface has suffered from unilateral Snow changes for which GGUS was given no advance notification.

Documentation:

About the GGUS Architecture

Historic documentation:

Before the end-of-year break

Publish this text in the weekly operations meeting:

For the end-of-year break: GGUS is monitored by a system connected to the on-call service. In case of total GGUS unavailability the on-call engineer (OCE) at KIT will be informed and will take appropriate action. If GGUS is available but there is a problem with the workflow (e.g. ALARM to CERN doesn't generate email notification to the operators), then WLCG should submit an ALARM ticket, notifying site FZK-LCG2 (DE-KIT), which triggers a phone call to the OCE. As a last resort, the FZK-LCG2 emergency e-mail or telephone number published in the GOCDB can be contacted.

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatdocx CHEP-Architecture-LE.docx r1 manage 6.0 K 2014-04-08 - 11:04 MariaDimou GGUS Architecture description prepared by Oleg Dulov, KIT. Status of September 2013.
Unknown file formatpptx GGUS-template.pptx r1 manage 1306.6 K 2017-08-25 - 21:44 MaartenLitmaath Template for the GGUS slide in each WLCG Service Report to the MB.
Edit | Attach | Watch | Print version | History: r30 < r29 < r28 < r27 < r26 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r30 - 2019-04-02 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback