Stefano's experience as Computing Run Coordinator

a small blog Oct 29 - Nov 4 beware I am rearranging continously as I get more experience

Organization, responsibility etc.

  • I feel like having a running experiment, Hey, this is great !!!
  • need a procedure to pass control from one person to another at shift change
  • CRC location should be defined. Ideally should show up in CmsCenter most of the day
    • need a seating place in CmsCenter, ideally a small table next to the computing ops
  • the left side of CmsCenter is offline+computing+DQM, needs a shift leader (not two: CRC + ORC)
  • responsibility if any should be clarified: which action can the CRC take ? which decisions ? at which level
  • CRC functions covers two areas now, better separate
    • shift supervisions (be in CmsCenter when taking data, one week shift or longer, cover all of the left side: comptuing, offline, dqm)
    • Computing Ops Manager, Offline Ops Manager i.e. deputy for L1's as discussed, this needs longer time vision/memory, 3~6 months, located at CERN, only needed for policy making, current L1's could do * dataops other then T0 is not coming to this thread at all, even DataOps e-log only has T0 entries * it makes no sense that problems at T1's, T2's are found, tracked, fixed without contat with DataOps, actually with the possibility that they contact the sites independently

Communication flow

  • too much time goes in copying/pasting messges to/from e-logs/savannah/ggus/HN
    • whom/what is e-log for ? if use is limited, in the end none reads and then why write ?
  • CRC + CSP work in isolations, this is not good, examples
    • CSP report CAF has huge job load. Noted in e-log. Instructions say report. Whom do we report to ? Does anybody really need to be told ? Users and analysis groups should self-regulate looking themselves at lenght of batch queues. Are we looking for situations that may endanger the system, so need to report to IT and close queue, or to whom ? * simulation harvesting from Wisconsin and .... is failing. Whom do we tell ? And how ? And will they do anything ?

Instructions etc.

Shift(ers) management

  • whom they report to ? what they report ? which actions can/must they take and how is that tracked ?
  • is CSP simply a human replacement for an alert system ? But then whom do we alert ?
  • too may e-logs (including a hiddern one at FNAL), no cross-ref, data distribution topics entered in FNAL dataops e-log instead of CERN Distributed Data Transfers
  • problem notices comes via mail rather then e-log due to need to look at too many e-logs
  • getting one e-mail for every logbook entry is crazy
    • but actively watching 8 e-logs is also impossible
  • need a central e-log that CRC keeps always open where to watch for updates
  • CMS Center room needs a local manager locally
  • some shifters are very good we need to make more effort in making their effort and time more useful
  • there are many instances where shifters do not know what to do, a local supervisor will help
  • rules for when/whom to open/follow/close tickets should be defined, so that we get over the "3-shift experience"
  • simply stated, it is difficult to fill a shift summary or a day summary, e.g. CSP are supposed to report issues they found, but not if they are still ongoing. CRC shuld collect summary from all shifters and flag important things to pass to experts, not create the list himself

-- StefanoBelforte - 29 Oct 2008

Edit | Attach | Watch | Print version | History: r18 | r15 < r14 < r13 < r12 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r13 - 2008-10-31 - StefanoBelforte
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Sandbox All webs login

  • Edit
  • Attach
This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback