Week of 131209
WLCG Operations Call details
To join the call, at 15.00 CE(S)T, by default on Monday and Thursday (at CERN in 513 R-068), do one of the following:
- Dial +41227676000 (Main) and enter access code 0119168, or
- To have the system call you, click here
The scod rota for the next few weeks is at
ScodRota
WLCG Availability, Service Incidents, Broadcasts, Operations Web
General Information
Monday
Attendance:
- local: Simone (SCOD), Alessandro (ATLAS), Raja (LHCb), Przemek (CERN-DB), Vitor (CERN-PES), Maarten (ALICE)
- remote: Xavier (KIT), Pepe (PIC), Sang-Un (KISTI), Rolf (IN2P3), Michael (BNL), Tiju (RAL), Jeremy (GridPP), Roger (NDGF), Onno (NL-T1), Rob (OSG)
Experiments round table:
- ATLAS reports (raw view) -
- Central services
- ATLAS_DDM_VOBOXes were unstable on Dec. 5th. Back stable at 2:00UTC on Dec. 6th.
- PilotFactories also. Degraded during 12:00UTC - 24:00UTC on Dec. 5th.
- T0/T1
- FZK-LCG2: Network trouble caused DNS lookup errors on Dec. 6th. GGUS:99571
. Fixed.
- FZK-LCG2: Transfer failures due to 'RQueued' (reported last Thursday ) still happening. Around 10% of failure rate since Dec. 8th.
- TAIWAN-LCG2: Recovered from disk server crash on Oct. 30th. GGUS:98482
closed.
- CMS reports (raw view) -
- It has been a very quiet few days, largely just some scattered issues at scattered T2 sites.
- The exception to this is CNAF, for which the storage was down for several days. It's back now.
- I have just (13:40) learned that there is trouble with the CERN BDII that are making sites appear unavailable in SAM tests. GGUS:99521
, perhaps there will be an update by 15:00?
- LHCb reports (raw view) -
- Main activities is Simulation at all Sites.
- T0:
- T1:
- CNAF storage is back in operation
- IN2P3 : Continuing problem with nagios probe (GGUS:99420
). Waiting for support from SAM/dashboard administrators.
Sites / Services round table:
- KIT: there will be 3 downtimes tomorrow: CMS dCache, firewall, tape management software. Thursday dCache for ATLAS will be upgraded as well.
- PIC: finishing the SIR on the network incident occurred last week. Will be provided ASAP.
- BNL: there will be a 2h network intervention one week from now (next monday). It will also affect access to LFC and FTS (therefore T2s activity) beside T1 services . On tuesday next dCache will be upgraded to the SHA-2 compliant version.
- IN2P3: downtime tomorrow. Operations portal down from 8:30 to 10:30.
- NL-T1: downtime on december 17th: 24 hours maintenance of the MSS. It will not be possible to stage files during that time. Maarten: what is the situation with the disk servers (which gave lots of troubles in the past weeks)? Onno: seem to be stable now after a lot of hardware replacement. New hardware should also arrive before the end of the year. Maarten: at the end of the process, a SIR should be provided (there was also some minimal data loss). Onno: will do.
- NDGF: this morning in downtime for upgrade of central storage services. On wednesday there will be a network intervention which will affect some pools; therefore some data might be unavailable.
- CERN DB: intervention on wednesday (10 AM CET) to the WLCGR test and integration database.
- ASGC: During the weekend, our data center was suffering high temperature issue due to there were some problems with our air conditions, so, it caused some CASTOR disks to be unstable, it should be improved in Monday morning.
- Maarten for PES: it is very urgent to upgrade the CERN and SAM BDII to the latest version to make sure the FCR mechanism does not affect SAM tests. Also T1s are invited to upgrade.
AOB:
Thursday
Attendance:
Experiments round table:
Sites / Services round table:
- CERN CvmFS The stratum 0 (cvmfs-stratum-zero.cern.ch) and stratum 1 (cvmfs-stratum-one.cern.ch) will migrate to new hardware, OS and in the stratum1 case also from from 2.0.? to 2.1.15. The migration will be transparent for all stratum 1s that are replicating from the stratum 0. It will also be transparent for all CvmFS clients (both 2.0.* and 2.1.*) that are using the stratum one ITSSB
.
AOB: Middleware Readiness WG meeting
TODAY at 4pm CET. Agenda and connection details in
https://indico.cern.ch/conferenceDisplay.py?confId=285681