Week of 141117

WLCG Operations Call details

  • At CERN the meeting room is 513 R-068.

  • For remote participation we use the Vidyo system. Instructions can be found here.

General Information

  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Web

Monday

Attendance:

  • local: Stefan (SCOD), Maarten (ALICE), Alessandro (ATLAS), Michail (LHCb), Tsung-Hsun Wu (ASGC), Zbigniew (Databases), Ignazio (Grid Services),
  • remote: Lisa (FNAL), Sang Un (KISTI), Ulf (NDGF), Tiju (RAL), Onno (NL-T1), Antonio (CNAF), Rolf (IN2P3), Dimitri (KIT), Pepe (PIC), Kyle (OSG), Michael (BNL)

Experiments round table:

  • ATLAS
    • CentralService/T0/T1s
      • FZK-LCG2 GGUS:110157 . ATLAS experts and FZK experts are investigating together. The problem has to be studied more. For the future to be discussed with FTS3.
    • Daily Activity overview
      • Fix Prodsys1 unfinished oldest than 1 month tasks (ADCSUPPORT-4049): 42% completed. The 465 tasks remaining are pile (117), reco (43), merge (135), evgen (108), simul (62).
      • Fix Prodsys1 tasks almost finished with few jobs missing (ADCSUPPORT-4048): 10% completed . The 135 tasks remaining are pile (54), reco (5), merge (47) and simul (29).
      • Babysitting of the 8 TeV derivation production tasks in Prodsys2 (ADCSUPPORT-4045): 97% completed. The number of tasks has been reduced from 465 tasks (12 november) to 15 tasks.
      • DQ2 clients testing: Doug reported various issues. 2 blockers, one on container, one on dq2-put which requires lumiblock. This second already fixed in next rel cand, the first reported only sunday afternoon. Dq2 ls now mix the output with scope (if in rucio) or without scope. We write it in the FAQ.

  • CMS
    • NR

  • ALICE -
    • NTR

  • LHCb
    • MC and user jobs. "Legacy Run1 stripping campaign", new schedule for stripping21 is on Wed (the earliest)
    • T0: NTR
    • T1: Replication of full.dst is over in all sites except GridKa due to slow staging progress

Sites / Services round table:

  • ASGC: NTR
  • BNL: NTR
  • CNAF: NTR
  • FNAL: NTR
  • GridPP: NR
  • IN2P3: NTR
  • JINR: NR
  • KISTI: Issue with mailing list ticket GGUS:109886 is fixed (reported in last meeting), ALARM tickets will delivered correctly now
  • KIT: Working on fixing the staging problems discovered during the week-end, report to be provided later
  • NDGF: NTR
  • NL-T1: NTR
  • OSG: NTR
  • PIC: NTR
  • RAL: NTR
  • RRC-KI: NR
  • TRIUMF: NR

  • CERN batch and grid services: Problem with one squid server which was stuck, problem on lxplus, fixed now
  • CERN storage services: NR
  • Databases: Today migration of golden gate cluster. LHCb / ATLAS will be partially unavailable, each for approx 20 min downtime, migration will start after this meeting. Tomorrow rolling patches for ATLAS/CMS integration databases. Wednesday switch over active dataguard service (ADCR DB) for ATLAS to new hardware, to be transparent.
  • GGUS: NR
  • Grid Monitoring: NR
  • MW Officer: It was found that the new Red Hat kernel 2.6.32-504 that comes with RHEL6.6 and as a security patch for older RHEL6 installations has a bug wrt. fuse. Due to the bug, cvmfs clients that are exported via NFS immediately provoke a kernel panic. The problem is not cvmfs specific but all fuse modules that are exported via NFS are affected. Site's that use the NFS exported cvmfs client must not update to this kernel. The common cvmfs deployment mode as a fuse module on the worker nodes seems not to be affected.

AOB:

Thursday

Attendance:

  • local: Tsung-Hsun Wu (ASGC), Ignazio (Grid Services), Michail (LHCb), Stefan (SCOD), Maarten (ALICE), Herve (Storage), Alessandro (ATLAS)
  • remote: Andrej (ATLAS), Rolf (IN2P3), John (RAL), Lisa (FNAL), Ulf (NDGF), Christoph (CMS), Dea-Han (KISTI), Thomas (KIT), Dennis (NL-T1), Michael (BNL), Kyle (OSG),

Experiments round table:

  • ATLAS
    • CentralService/T0/T1s
      • TAIWAN-LCG2: Transfer failures. The site is in scheduled downtime which is declared as 'warning' so the switcher does not work. "DPM disk server memory upgrade and replace one hard drive"
      • Maarten: Is the ASGC issue still ongoing? Andrei: Itís not critical
    • Daily Activity overview
      • Rucio migration progress will be exported to a twiki
      • FTS3 REST defining in ProdSys1

  • CMS reports ( raw view) -
    • Some issues with SAM test submission earlier this week
      • Seems ok now
      • Maarten: all SAM instances were affected by this problem, suspect a network issue for this. Went away ~ Monday afternoon but not understood
    • Otherwise NTR

  • ALICE -
    • NTR

  • LHCb
    • MC and user jobs. Staging files to disk buffer for Legacy Run1 stripping campaign has been restarted
    • T0: lbvobox18 instabilities yet to be understood, likely TCP flooding
      • Stefan: maybe related to the network issue reported above
    • T1: NTR

Sites / Services round table:

  • ASGC: Downtime today for DPM upgrade, now extended to tomorrow 11 UTC
  • BNL: NTR
  • CNAF: NR
  • FNAL: NTR
  • GridPP: NR
  • IN2P3: NTR
  • JINR: NR
  • KISTI: NTR
  • KIT: ATLAS dCAche instance running stable again
  • NDGF: NTR
  • NL-T1: NTR
  • OSG: bank holidays Thu/Fri next week.
  • PIC: NR
  • RAL: Castor head nodes died for CMS, they were covered by backup nodes, currently in WARNING DT because moving back to production node. 25 Nov, WARNING DT for castor DB updates for all VOs
  • RRC-KI: NR
  • TRIUMF: NR
  • CERN batch and grid services:
    • myproxy.cern.ch will be upgraded to 6.0-2 on Tuesday 25th November between 10:00 and 12:00 CET. Users encouraged to validate the new version, see the ITSSB entry for more details.
    • The old VOMS servers 'voms.cern.ch' and 'lcg-voms.cern.ch' will be switched off for good and replaced by 'voms2.cern.ch' and 'lcg-voms2.cern.ch on Wednesday 26th November at 15:00 CET. More info in the ITSSB entry
  • CERN storage services: NR
  • Databases: NR
  • GGUS: NR
  • Grid Monitoring: NR
  • MW Officer: NR

AOB:

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatpptx MB-Nov-14.pptx r2 r1 manage 2877.4 K 2014-11-17 - 17:03 PabloSaiz  
Edit | Attach | Watch | Print version | History: r16 < r15 < r14 < r13 < r12 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r16 - 2014-11-20 - MichailSalichos
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback