Week of 140901

WLCG Operations Call details

  • At CERN the meeting room is 513 R-068.

  • For remote participation we use the Vidyo system. Instructions to join the phone conference can be found here.

General Information

  • The SCOD rota for the next few weeks is at ScodRota
  • General information about the WLCG Service can be accessed from the Operations Web

Monday

Attendance:

  • local: Maria Alandes (chair, minutes), Zbigniew Baranowski (Databases), Ignacio Barrientos (Grid and Batch), Maarten Litmaath (ALICE), Andrea Manzi (MW officer), Luca Mascetti (Storage), Stefan Roiser (LHCb), Tsung-Hsun Wu (ASGC)
  • remote: Stefano Belforte (CMS), Michael Ernst (BNL), Tiju Idiculla (RAL), Rolf Rumler (IN2P3), Onno Zweers (NL-T1), Matteo (CNAF)

Experiments round table:

  • CMS reports (raw view) -
    • No major issues. No outstanding GGUS
    • Production and Analysis running full steam

  • ALICE
    • NTR

  • LHCb reports (raw view) -
    • MC and User jobs, system fully used during the week-end with MC
    • T0:
      • one lbvobox node cannot be accessed with user login (root works though)
      • no more GOCDB emails received since 7 Aug (GGUS:107812, INC:0629031) -> new ticket for CERN/egroups

Maarten adds that ALICE also sees the same problem with GOCDB mails but this is not an issue for them. Rolf explains that the mails come from the Operations portal at IN2P3 and maybe the address from IN2P3 should be white listed to be able to post in the egroup. Stefan confirms that this is the way it is configured but still doesn't work. They are waiting for an answer from the egroups team at CERN.

Sites / Services round table:

  • ASGC: NTR
  • BNL: NTR
  • CNAF: NTR
  • FNAL: Not present
  • GridPP: Not present
  • IN2P3: NTR
  • JINR: Not present
  • KISTI: Not present
  • KIT: Not present
  • NDGF: NTR (by mail)
  • NL-T1: Onno explains that in order to connect to the audio conference he has to call to the provided number. Call back didn't seem to work. Maria will open a ticket to the CERN team to report about this. Onno also asks LHCb whether they can update ticket GGUS:107655 about the problems with the Brazilian certificates to be able to follow up the problem. Stefan will update the ticket.
  • OSG: Not present
  • PIC: NTR (by mail)
  • RAL: Due to a network switch problem on Friday night, some services were unavailable until Saturday morning. Problem is now fixed.
  • RRC-KI: Not present
  • TRIUMF: Not present

  • CERN batch and grid services: NTR
  • CERN storage services:
    • There will be a Castor Upgrade for CMS on 04.09. Service may be unavailable from 5 to 10min. Stefano asks whether the CMS Ops team is aware of this and Luca confirms that they are indeed.
    • As of 01.09 only CMS authenticated users can access EOS CMS.
  • Databases:
    • ATLAS conditions T0 to T1 replication will migrate from Streams technology to Golden Gate. This will happen on Thursday for IN2P3 and will follow in the next weeks for RAL and TRIUMF. It should be transparent although some latencies could be experienced.
  • GGUS: Not present
  • Grid Monitoring: Not present
  • MW Officer: For DPM sites running as part of the ATLAS FAX (xrootd) federation: there is a recent release of the N2N plugin (xrootd-server-atlas-n2n-plugin-2.0-5) which fixes a buffer overflow issue . It's available at the WLCG repo (http://linuxsoft.cern.ch/wlcg). Please update your site’s N2N rpm ASAP and restart the services.

AOB:

Problems with Alcatel Audio Conference System reported in this ticket: INC0629273.

Thursday

Attendance:

  • local: Maria Alandes (chair, minutes), Zbigniew Baranowski (Databases), Jerome Belleman (Grid and Batch), Alessandro Di Girolamo (ATLAS), Felix Lee (ASGC), Maarten Litmaath (ALICE), Andrea Manzi (MW officer), Luca Mascetti (Storage), Stefan Roiser (LHCb)
  • remote: Stefano Belforte (CMS), Jeremy Coles (GridPP), Lisa Giacchetti (FNAL), Tiju Idiculla (RAL), Rolf Rumler (IN2P3), Dennis Van Dok (NL-T1), Ulf Tigerstedt (NDGF)

Experiments round table:

  • CMS reports (raw view) -
    • No major issues. No outstanding GGUS
    • Analysis running full steam. Production activity level low due to lack of demand. Running tests of Prompt Reconstruction at T1's driven from T0.
    • Starting to exercise WAN data access (aka AAA, aka xrootd) at larger scale, sites should expect increased activity and increased visibility of xrootd related problems

Alessandro asks whether it would be possible to know whether this exercise could impact other VOs. Stefano cannot provide any numbers right now but this could be followed up offline. Stefano adds that the exercise will target mostly US sites during the month of September and then European sites towards the end of the month. The corresponding SAM test is now set to critical for CMS.

  • ALICE -
    • KISTI T1 unavailable due to network issues

  • LHCb reports (raw view) -
    • MC and User jobs
    • T0:
      • As of Tuesday fetch of CRLs produces error on all voboxes (INC:0630629)
      • no more GOCDB emails received since 7 Aug (GGUS:107812, INC:0629031) -> ticket for CERN/egroups currently being discussed.

Sites / Services round table:

  • ASGC: NTR
  • BNL: Not present
  • CNAF: Not present
  • FNAL: NTR
  • GridPP: NTR
  • IN2P3: NTR
  • JINR: Not present
  • KISTI: (sent report by mail since couldn't attend the meeting) The link between KISTI and CERN has been down since 3 Sep. 09:48 CET. The reason is still unknown and the expert is investigating by contacting overseas providers.
  • KIT: Not present
  • NDGF:
    • The main network link between NDGF central and Norway was broken and the backup link had to be used instead. This wasn't working properly and some problems have been experienced. The equipment causing the problem has been identified and will be fixed soon.
    • The reboot of disk storage headnodes went OK
  • NL-T1: NTR
  • OSG: (Kyle Gross by mail) NTR
  • PIC: NTR
  • RAL: NTR
  • RRC-KI: Not present
  • TRIUMF: Not present

  • CERN batch and grid services: NTR
  • CERN storage services:
    • The CASTOR CMS intervention announced for this week has been postponed till Monday. More details in IT SSB.
    • On 15th September, due to physical movement of some racks inside the CC where CASTOR disk servers are hosted, some STAGED files on disk may be unavailable. All VOs are affected and the whole movement will take 5 days, but each disk server is expected to be offline for ~2h. More details in IT SSB.
  • Databases: Succesful migration of ATLAS conditions T0 to T1 replication to Golden Gate for IN2P3.
  • GGUS: Not present
  • Grid Monitoring: Not present
  • MW Officer: The reported fix for N2N plugin for FAX federation (xrootd-server-atlas-n2n-plugin-2.0-5) affects all FAX xrootd installations and not only DPM-xrootd. It's available at the WLCG repo (http://linuxsoft.cern.ch/wlcg). Please update your site’s N2N rpm ASAP and restart the services.
    • Alessandro asks whether it is worth sending a broadcast to ATLAS sites. Andrea replies that this affects ATLAS sites belonging to the federation and since this is a security issue, it would be good that this is communicated to them so the rpm is upgraded ASAP.

AOB:

Edit | Attach | Watch | Print version | History: r14 < r13 < r12 < r11 < r10 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r14 - 2014-09-04 - MariaALANDESPRADILLO
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback