Week of 141124

WLCG Operations Call details

  • At CERN the meeting room is 513 R-068.

  • For remote participation we use the Vidyo system. Instructions can be found here.

General Information

  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Web

Monday

Attendance:

  • local: Alessandro (ATLAS), Maarten (SCOD + ALICE), Mark (LHCb), Tsung-Hsun (ASGC), Xavi (storage)
  • remote: Antonio (CNAF), Christoph (CMS), Dea Han (KISTI), Dennis (NLT1), Dmytro (NDGF), Kyle (OSG), Pepe (PIC), Rolf (IN2P3), Tiju (RAL)

Experiments round table:

  • ATLAS reports ( raw view) -
    • CentralService/T0/T1s
    • Daily Activity overview
      • Grid draining Saturday-Sunday due to an Oracle DB procedures. Gancho and IT DBAs fixed the issue. Not clear if the problem is completely solved (it seems so, there could be other issues somewhere else)
      • validation tasks HITmerging done, reconstruction ongoing. Some inputs missing, DQ2 clients seems to get different answers for the same datasets. to be checked rucio shares.
      • rucio doesn't work for SamiKama user, we have to check account. cronjob needs to be checked to.
      • holding increasing, DQ2 registration issue, too many values to unpack.
      • Derivation Framework: noticed that there are some circumstances under which the tasks get aborted and not finished even if 95% done. Tadashi spot the issue and now waiting for a DB schema update to fix the issue.
      • some jobs in holding state because the groupdisk Top are in BNL is full. To be discussed with grouppeople and Nurcan.
    • Maarten: as requested by ATLAS, the old VOMS servers were blocked for ATLAS around 13:30 CET

  • ALICE -
    • NTR

  • LHCb reports ( raw view) -
    • MC and user jobs. Current schedule for "Legacy Run1 stripping campaign": Validate on a few K files end of week. Start full campaign next Monday if all well.
    • T0: NTR
    • T1: NTR

Sites / Services round table:

  • ASGC: ntr
  • BNL:
  • CNAF: ntr
  • FNAL:
  • GridPP:
  • IN2P3: ntr
  • JINR:
  • KISTI: ntr
  • KIT:
  • NDGF: ntr
  • NL-T1: ntr
  • OSG:
    • will not attend on Thu because of Thanksgiving holidays
    • the OSG ticket that keeps track of the VOMS situation for ATLAS and CMS should be updated with the latest info
      • Maarten: OK (done)
  • PIC: ntr
  • RAL: ntr
  • RRC-KI:
  • TRIUMF:

  • CERN batch and grid services:
    • Starting from Tuesday afternoon, various ATLAS squid alias all under atlast0fsquid.cern.ch will be moved onto new hardware and ownership within CERN IT. Fuller details are available GS-343. This will be completed by the end of Wednesday.
    • The old VOMS servers 'voms.cern.ch' and 'lcg-voms.cern.ch' will be switched off for good and replaced by 'voms2.cern.ch' and 'lcg-voms2.cern.ch on Wednesday 26th November at 15:00 CET. More info in the ITSSB entry
      • Clarification: the VOMS daemons (contacted by voms-proxy-init) will become unavailable on the old hosts, while VOMRS and VOMS-Admin keep working
    • myproxy.cern.ch will be upgraded to 6.0-2 on Tuesday 25th November between 10:00 and 12:00 CET. Users encouraged to validate the new version, see the ITSSB entry for more details.
  • CERN storage services:
    • there is a new CASTOR release fixing a few bugs: can we upgrade the ATLAS service Wed morning?
      • Alessandro: that is not OK because of the ongoing cosmic data taking!
        it may be OK one week from now, to be discussed
      • Mark: this week should be OK for LHCb (TBC)
      • Xavi: OK, we will negotiate with the others through the usual channels
    • the latest CASTOR client has been in QA for 2 weeks and will enter production tomorrow
      • this can in particular affect DAQ clients
      • experiments have been informed via e-mail
  • Databases:
  • GGUS:
  • Grid Monitoring:
  • MW Officer:

AOB:

Thursday

Attendance:

  • local: Andrea M (MW Officer), Maarten (SCOD + ALICE), Marcin (databases), Mark (LHCb), Massimo (storage), Nacho (grid services), Tsung-Hsun (ASGC), Xavi (storage)
  • remote: Alessandro (ATLAS), Christoph (CMS), Gareth (RAL), Jeremy (GridPP), Onno (NLT1), Sebastien (IN2P3), Thomas (KIT), Ulf (NDGF)

Experiments round table:

  • ATLAS reports ( raw view) -
    • Daily Activity overview
      • ProdSys2 postproduction node setup
      • registration for containers: template for container plus catching exceptions.
      • Deft/prodsys2 node aipanda15 to be upgraded by Monday.
      • ProdSys1 closure: 446tasks remaining, only 73 tasks are completed more than 80%.
      • validation still ongoing.
      • DQ2 clients: as of yesterday all the known issues have been fixed and released in stable ALRB by Asoka.
      • DaTri req page on pinkpandamon which is with 2.5: to be understood what to do, we need 2.6.
    • CentralService/T0/T1s
      • GGUS:110406 to BNL is most probably something ATLAS needs to investigate internally.

  • CMS reports ( raw view) -
    • NTR
    • Christoph: CMS was also affected by the dashboard glitch mentioned by LHCb

  • ALICE -
    • NTR

  • LHCb reports ( raw view) -
    • MC and user jobs. Current schedule for "Legacy Run1 stripping campaign": Davinci release is out today so validation should start very soon. Still hoping for full campaign next Monday if all well.
    • T0: NTR
    • T1: NTR
    • Other: Problem with the SAM dashboard no showing site availability properly overnight on Tuesday/Wednesday

Sites / Services round table:

  • ASGC: ntr
  • BNL:
  • CNAF:
  • FNAL:
  • GridPP: ntr
  • IN2P3: Problem with a switch on Wednesday morning. All WLCG voboxes were offline for a little more than 1 hour (between 8am to 10am). ALICE was the most impacted VO and lost most of the running jobs. Because of that, we are planning to write a SIR.
    • Maarten: as far as ALICE is concerned a SIR is not needed, but if you happen to create one anyway, it could be copied into the WLCG archive
  • JINR:
  • KISTI:
  • KIT:
    • we are testing our backup link to CERN; should be transparent
    • an upgrade of the dCache instance for ATLAS is foreseen for week 50, will be discussed with ATLAS
  • NDGF: ntr
  • NL-T1:
    • MSS will be down on Dec 10
  • OSG:
  • PIC:
  • RAL:
    • CASTOR upgrade for LHCb (to SL6) on Tue Dec 2
  • RRC-KI:
  • TRIUMF:

  • CERN batch and grid services:
    • The squid functionality of the atlas frontier service has now moved under IT's scope. i.e atlast0fsquid.cern.ch -> ca-proxy.cern.ch. GS-343
    • On Thu Nov 27th the CERN stratum one cvmfs-stratum-one.cern.ch is moving to a new software versions. (CentOS 7, squid 3.3.8). Migration is expected to transparent to all users.
  • CERN storage services:
    • CASTOR update for LHCb went OK yesterday, the others will be done next week:
      • ALICE + public on Mon 10:00-11:30
      • ATLAS + CMS on Tue 08:30-10:00
    • power cut tests on Dec 16 06:00-06:30 may put older HW in CASTOR and EOS at risk: the affected machines will be kept read-only during the tests
  • Databases:
    • security patches were applied OK to CASTOR and CMS databases
  • GGUS:
  • Grid Monitoring:
  • MW Officer: ntr

AOB:

  • Maarten: we seem to have successfully transitioned to the new VOMS servers!
Edit | Attach | Watch | Print version | History: r13 < r12 < r11 < r10 < r9 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r13 - 2014-11-27 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback