Week of 141110

WLCG Operations Call details

  • At CERN the meeting room is 513 R-068.

  • For remote participation we use the Vidyo system. Instructions can be found here.

General Information

  • The purpose of the meeting is:
    • to report significant operational issues (i.e. issues which can or did degrade experiment or site operations) which are ongoing or were resolved after the previous meeting;
    • to announce or schedule interventions at Tier-1 sites;
    • to inform about recent or upcoming changes in the experiment activities or systems having a visible impact on sites;
    • to provide important news about the middleware;
    • to communicate any other information considered interesting for WLCG operations.
  • The meeting should run from 15:00 until 15:20, exceptionally to 15:30.
  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Web
  • Whenever a particular topic needs to be discussed at the daily meeting requiring information from site or experiments, it is highly recommended to announce it by email to wlcg-operations@cernSPAMNOTNOSPAMPLEASE.ch to make sure that the relevant parties have the time to collect the required information or invite the right people at the meeting.

Monday

Attendance:

  • local: Belinda (storage), Hervé (storage), Lorena (databases), Maarten (SCOD + ALICE), Stefan (LHCb), Steve (grid services), Tsung-Hsun (ASGC)
  • remote: Antonio (CNAF), Christian (NDGF), Dea Han (KISTI), Dimitri (KIT), Lisa (FNAL), Onno (NLT1), Pepe (PIC), Rob (OSG), Rolf (IN2P3), Tiju (RAL), Tommaso (CMS)

Experiments round table:

  • ATLAS reports (raw view) -
    • Daily Activity overview
      • FZK DATADISK is still full (59TB free ), TAIWAN-LCG2 has 76TB.
        • some RDO from FZK to other place, but only 50TB.
      • DQ2 Central Catalog writer was stuck due to a log file. Fixed. N.B. Now we have 3 nodes behind that alias, will ask CS to put back the spare one with high weight.
      • Reprocessing tasks submitted with inputs in Rucio and in DQ2. The DQ2 ones started, not the one in Rucio. Under investigation.
    • CentralService/T0/T1s
      • Nothing special to report.

  • CMS reports (raw view) -
    • a few weeks long global run data taking. Involves also CERN/Remote computing
    • Testing of new VOMS server infrastructure
      • CMS computing operations reminded about the Nov 26th deadline
    • Operational items
      • Reconfiguration campaign of xrootd for European sites ongoing ( > 1/3 seem compliant now)
      • We had a Frontier problem on Thu night, due to the change of Launchpads to Puppetized machines, which did not have correct firewall settings. Reverted to old machines solved the problem.
    • GGUS open (T0, T1s only...)
      • GGUS:109876 : FS T1, a stuck submission, SOLVED
      • GGUS:109855 : a CREAM CE miss-behaving at CERN; STILL WAITING FOR RESPONSE
      • GGUS:109919 : glExec problem at RAL_T1, SOLVED
      • GGUS:109812 : slow Xrootd access at CERN, turns out to be VM related ; MOVED TO A NEW TICKET
    • Twiki is still often RED in SLS ...

  • ALICE -
    • KIT: due to local staging issues, raw data reprocessing jobs needed to read a lot of data from CERN, thereby putting a high load on the OPN link over the weekend
      • the staging should work again now

  • LHCb reports (raw view) -
    • MC and user jobs. "Legacy Run1 stripping campaign" to be launched today @ T1 sites
    • T0: NTR
    • T1: problem with SLC6.6 worker nodes and ROOT 6 based applications is fixed by deployment of a new software stack. All new productions (MC) launched as of today will be capable of running also on SLC6.6 nodes (some T1 sites were affected)

Sites / Services round table:

  • ASGC: ntr
  • BNL:
  • CNAF: ntr
  • FNAL: ntr
  • GridPP:
  • IN2P3: ntr
  • JINR:
  • KISTI: ntr
  • KIT:
    • the tape issue mentioned by ALICE was solved; it affected all experiments
  • NDGF:
    • tomorrow IPv6 will be enabled on Norwegian pool nodes, with short interruptions potentially impacting access to some ALICE or ATLAS data
  • NL-T1: ntr
  • OSG:
    • reminder: tomorrow's OSG release will have the new VOMS servers enabled for voms-proxy-init
      • Maarten: ATLAS and CMS did make progress with validating their central services for use with the new servers, but could not declare victory just yet; users may get warnings when trying to obtain a proxy while the new servers are not yet open to the world, but the command should then fall back on the old servers
      • Rob: the uptake of this new release probably will be slow; we just need to have it available given the expiration of the old servers on Nov 26
  • PIC: ntr
  • RAL: ntr
  • RRC-KI:
  • TRIUMF:

  • CERN batch and grid services: ntr
  • CERN storage services: ntr
  • Databases:
    • we are applying security patches to the integration DBs
  • GGUS:
  • Grid Monitoring:
  • MW Officer:

AOB:

Thursday

Attendance:

  • local: Alessandro (ATLAS), Andrea M (MW Officer), Andrea S (WLCG), Belinda (storage), Felix (ASGC), Hervé (storage), Jerome (grid services), Kate (databases), Luca M (WLCG), Maarten (SCOD + ALICE), Pablo (GGUS + grid monitoring), Stefan (LHCb)
  • remote: Dennis (NLT1), Gareth (RAL), Jeremy (GridPP), Kyle (OSG), Michael (BNL), Rolf (IN2P3), Sang-Un (KISTI), Thomas (KIT), Ulf (NDGF)

Experiments round table:

  • ATLAS reports (raw view) -
    • Daily Activity overview -- today all ATLAS internal
    • CentralService/T0/T1s
      • TAIWAN file transfer issue GGUS:110075 -- significant packet drop and strange network activities -- can we check with perfsonar?
        • Felix: we will follow up with our perfSonar setup
        • Alessandro: this serves as an illustration of the importance of a reliable and usable perfSonar infrastructure throughout WLCG
        • Maarten: with the latest perfSonar version we should be getting there soon

  • ALICE -
    • NTR

  • LHCb reports (raw view) -
    • MC and user jobs. "Legacy Run1 stripping campaign" first batch of jobs currently validated by physicists, continue to run early next week
    • T0: One VOMS server (old) crashed and needed to be restarted (GGUS:110068)
    • T1: glitch with FTS transfers to RRCKI on Tuesday, now fixed

Sites / Services round table:

  • ASGC:
    • router overload due to high traffic from Asia-Pacific, not yet understood, being investigated
    • downtime next Thu for DPM disk server upgrades
      • Andrea M: version 1.8.8 is OK, whereas for 1.8.9 the SAM SRM tests currently fail: that should be fixed later this month
  • BNL: ntr
  • CNAF:
  • FNAL:
  • GridPP: ntr
  • IN2P3: ntr
  • JINR:
  • KISTI:
    • the test alarm reply delay has been investigated: the message did not get filtered as spam, yet did not arrive, so the site's e-mail address appears not to be working; we are following up
  • KIT: ntr
  • NDGF:
    • busy all day upgrading dCache servers to 2.10.9 or 2.10.10, so far all looks OK
  • NL-T1: ntr
  • OSG:
    • the OSG release with the new VOMS servers in the client configuration happened on Nov 11 as planned; minor issues in the configuration files will be fixed in the next release; ATLAS and CMS are not affected
  • PIC:
  • RAL: ntr
  • RRC-KI:
  • TRIUMF:

  • CERN batch and grid services: ntr
  • CERN storage services: ntr
  • Databases:
    • Golden Gate servers will be migrated to new HW on Mon, with 5 min downtime per instance
  • GGUS:
  • Grid Monitoring:
    • SAM3 in production. This tool will replace SUM and MyWLCG
  • MW Officer: UMD 3.9.0 has been released (http://repository.egi.eu/2014/11/10/release-umd-3-9-0/). Among the packages released it includes the new gfal2/gfal2-utils libraries ( gfal/lcg-utils libraries are unsupported) and the new UI and WN metapackages including them.

AOB:

Edit | Attach | Watch | Print version | History: r9 < r8 < r7 < r6 < r5 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r9 - 2014-11-13 - MaartenLitmaath
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback