TWiki
>
LCG Web
>
WLCGCommonComputingReadinessChallenges
>
WLCGOperationsMeetings
>
WLCGDailyMeetingsWeek131216
(revision 7) (raw view)
Edit
Attach
PDF
---+!! Week of 131216 %TOC% ---++ WLCG Operations Call details To join the call, at 15.00 CE(S)T, by default on Monday and Thursday (at CERN in 513 R-068), do one of the following: 1. Dial +41227676000 (Main) and enter access code 0119168, or 2. To have the system call you, click [[https://audioconf.cern.ch/call/0119168][here]] The scod rota for the next few weeks is at ScodRota ---++ WLCG Availability, Service Incidents, Broadcasts, Operations Web | *VO Summaries of Site Usability* ||||*SIRs* |*Broadcasts* |*Operations Web* | | [[http://dashb-alice-sum.cern.ch/dashboard/request.py/historicalsmryview-sum#view=siteavl&time%5B%5D=lastWeek&profile=ALICE_CRITICAL&group=all%2Bsites&site%5B%5D=CCIN2P3&site%5B%5D=CERN&site%5B%5D=CNAF&site%5B%5D=FZK&site%5B%5D=NIKHEF&site%5B%5D=RAL&site%5B%5D=SARA&type=quality][ALICE]] | [[http://dashb-atlas-sum.cern.ch/dashboard/request.py/historicalsmryview-sum#view=siteavl&time%5B%5D=lastWeek&profile=ATLAS_CRITICAL&group=All%2Bsites&site%5B%5D=BNL-ATLAS&site%5B%5D=CERN-PROD&site%5B%5D=FZK-LCG2&site%5B%5D=IN2P3-CC&site%5B%5D=INFN-T1&site%5B%5D=NDGF-T1&site%5B%5D=NIKHEF-ELPROD&site%5B%5D=pic&site%5B%5D=RAL-LCG2&site%5B%5D=SARA-MATRIX&site%5B%5D=Taiwan-LCG2&site%5B%5D=TRIUMF-LCG2&type=quality][ATLAS]] | [[http://dashb-cms-sum.cern.ch/dashboard/request.py/historicalsmryview-sum#view=siteavl&time%5B%5D=lastWeek&profile=CMS_CRITICAL_FULL&group=Tier1s%2B%252B%2BTier0&site%5B%5D=T0_CH_CERN&site%5B%5D=T1_CH_CERN&site%5B%5D=T1_DE_KIT&site%5B%5D=T1_ES_PIC&site%5B%5D=T1_FR_CCIN2P3&site%5B%5D=T1_IT_CNAF&site%5B%5D=T1_TW_ASGC&site%5B%5D=T1_UK_RAL&site%5B%5D=T1_US_FNAL&type=quality][CMS]] | [[http://dashb-lhcb-sum.cern.ch/dashboard/request.py/historicalsmryview-sum#view=siteavl&time%5B%5D=lastWeek&profile=LHCb_CRITICAL&group=Tier%2B0/1&site%5B%5D=LCG.CERN.ch&site%5B%5D=LCG.CNAF.it&site%5B%5D=LCG.GRIDKA.de&site%5B%5D=LCG.IN2P3.fr&site%5B%5D=LCG.NIKHEF.nl&site%5B%5D=LCG.PIC.es&site%5B%5D=LCG.RAL.uk&site%5B%5D=LCG.SARA.nl&type=quality][LHCb]] | [[https://twiki.cern.ch/twiki/bin/view/LCG/WLCGServiceIncidents][WLCG Service Incident Reports]] | [[https://operations-portal.egi.eu/broadcast/archive][Broadcast archive]] | [[WLCGOperationsWeb][Operations Web]] | ---++ General Information | *General Information* ||| *GGUS Information* | *LHC Machine Information* | | [[http://itssb.web.cern.ch/][CERN IT status board]] | [[https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions][WLCG Baseline Versions]] | [[http://cern.ch/planet-wlcg][WLCG Blogs]] | GgusInformation | [[https://espace.cern.ch/be-dep-op-lhc-machine-operation/default.aspx][Sharepoint site]] - [[http://op-webtools.web.cern.ch/op-webtools/vistar/vistars.php?usr=LHC1][LHC Page 1]] | <HR> ---++ Monday Attendance: * local: Alessandro, Belinda, Felix, Jerome, Maarten, Stefan, Steve, Xavier E * remote: Christian, Jose, Lisa, Michael, Onno, Pepe, Rob, Rolf, Sang-Un, Stefano, Tiju, Xavier M Experiments round table: * ATLAS [[https://twiki.cern.ch/twiki/bin/view/Atlas/ADCOperationsDailyReports2013][reports]] ([[https://twiki.cern.ch/twiki/bin/view/Atlas/ADCOperationsDailyReports2013?raw=on][raw view]]) - * Central services * NTR * T0/T1 * IN2P3-CC SOURCE error during TRANSFER_PREPARATION phase: RQueued GGUS:99777 , solved * INFN-T1 Transfers failing with error Request timeout GGUS:99771 , solved * RAL-LCG2 Transfer failures with "source file doesn't exist" GGUS:99768 , waiting for reply * FZK-LCG2 issue in reading from tape, site is working on it. (FZK internal monitoring which shows no activity http://gridmon-kit.gridka.de/tapeview/atlas/index.html ) * BNL-ATLAS is in scheduled maintenance, US Cloud offline during the first part of the intervention (which affect the network). * openssl issue: https://operations-portal.egi.eu/broadcast/archive/id/1066 * Maarten summarized the events leading up to the broadcast (further details there) and added that besides CREAM also other SLC6.5 services can be affected, e.g. WMS or even storage elements, as reported below by LHCb; as it looks unlikely that !RedHat will re-enable support for 512-bit proxies in a future update, we will need to pursue fixing all "client" instances that still generate such proxies * Rob added that OSG experts are working on reducing the fallout on the OSG side * new "gridsite" versions have just been released now: * [[http://www.eu-emi.eu/emi-2-matterhorn/updates/-/asset_publisher/9AgN/content/update-21-16-12-2013-v-2-10-5-1][EMI-2 Update 21]] * [[http://www.eu-emi.eu/releases/emi-3-monte-bianco/updates/-/asset_publisher/5Na8/content/update-12-16-12-2013-v-3-7-0-1][EMI-3 Update 12]] * CMS [[https://twiki.cern.ch/twiki/bin/view/CMS/FacOps_WLCGdailyreports][reports]] ([[https://twiki.cern.ch/twiki/bin/view/CMS/FacOps_WLCGdailyreports?raw=on][raw view]]) - * Very quiet weekend. No relevant issues to report. * Rob: the glideinWMS factory at Indiana University ran out of disk space on Fri and has been taken out of the list temporarily, while a new SSD drive is being awaited, which probably will not arrive before Jan * ALICE - * NTR * LHCb [[https://twiki.cern.ch/twiki/bin/view/LHCb/ProductionOperationsWLCGdailyReports][reports]] ([[https://twiki.cern.ch/twiki/bin/view/LHCb/ProductionOperationsWLCGdailyReports?raw=on][raw view]]) - * reprocessing of Proton-Ion collisions started last week at GRIDKA/CERN, * At other sites main activities are simulation & user jobs * T0: * T1: * FZK: Pilot problems, solved (GGUS:99725) * FZK: Issue with tape system over the week-end, now resolved, staging throughput increasing. * Other: * Problems with FTS3 transfers to CBPF which is running slc6.5. This linux version produces SSL3 handshake problems (GGUS:99398) * Steve: the FTS-3 nodes have almost finished getting reinstalled with SLC6.4 (sic), which we probably can live with for a few weeks * after the meeting FTS-3 project lead Michail Salichos clarified that both the FTS-3 client and the server depend on the "gridsite" provided by EPEL-stable; since the new version should get there soon and the few server instances can be kept on SL6.4 for now, standard updates can be done in Jan; the [[https://svnweb.cern.ch/trac/fts3/wiki][FTS-3 Wiki]] has been updated Sites / Services round table: * ASGC - ntr * BNL * network intervention ongoing * new switch installed, connectivity restarted * new spanning tree algorithm just started, being checked * tomorrow dCache upgrade to v2.6 for SHA-2 support * CNAF - ntr * FNAL - ntr * !IN2P3 - ntr * KISTI * last week's network problems were due to a chain of events: * a logical volume for hypervisor storage accidentally got overwritten in a test * VMs then could not mount their storage * as the DNS was running in a VM, it became unavailable, which caused all kinds of services to fail * the DNS is now running on a physical node and the services have been recovered * KIT * last week the SE for ATLAS was upgraded and ran into file system problems: * 90 TB are still unavailable; the tech support is coming from the US * reading from tape was not possible, but should be OK again now * NDGF * short downtime Wed ~noon CET to reboot some pool nodes and update them to dCache 2.6 * NLT1 * tomorrow evening at-risk downtime for tape back-end; files only on tape will be unavailable for a while * OSG - nta * PIC * Thu Dec 19 downtime for cooling system maintenance plus various upgrades * RAL - ntr * grid services * CVMFS Stratum-0 and -1 have been migrated and upgraded OK * FTS-3 is being downgraded to SL6.4 because of the openssl issue (almost done) * storage * transparent EOS updates to improve http performance and e-groups support: * EOS-CMS ongoing * EOS-ATLAS tomorrow morning AOB: ---++ Thursday Attendance: * local: * remote: Experiments round table: * ATLAS [[https://twiki.cern.ch/twiki/bin/view/Atlas/ADCOperationsDailyReports2013][reports]] ([[https://twiki.cern.ch/twiki/bin/view/Atlas/ADCOperationsDailyReports2013?raw=on][raw view]]) - * CMS [[https://twiki.cern.ch/twiki/bin/view/CMS/FacOps_WLCGdailyreports][reports]] ([[https://twiki.cern.ch/twiki/bin/view/CMS/FacOps_WLCGdailyreports?raw=on][raw view]]) - * Very quiet days. No relevant issues to report. * ALICE - * LHCb [[https://twiki.cern.ch/twiki/bin/view/LHCb/ProductionOperationsWLCGdailyReports][reports]] ([[https://twiki.cern.ch/twiki/bin/view/LHCb/ProductionOperationsWLCGdailyReports?raw=on][raw view]]) - Sites / Services round table: * PIC * Apologies, I cannot attend to today's meeting (Pepe). Today's downtime is going pretty well. We foresee to start services even before of the declared endtime for today's downtime. * GGUS: (!MariaD) Reminder: For the Year End period: GGUS is monitored by a system connected to the on-call service. In case of total GGUS unavailability the on-call engineer (OCE) at KIT will be informed and will take appropriate action. If GGUS is available but there is a problem with the workflow, e.g. ALARM to CERN doesn't generate email notification to the operators, then WLCG should submit an ALARM ticket, notifying Site DE-KIT, which triggers a phone call to the OCE. AOB:
Attachments
Attachments
Topic attachments
I
Attachment
History
Action
Size
Date
Who
Comment
pptx
MB-Dec.pptx
r1
manage
2843.5 K
2013-12-16 - 10:36
PabloSaiz
Edit
|
Attach
|
Watch
|
P
rint version
|
H
istory
:
r10
<
r9
<
r8
<
r7
<
r6
|
B
acklinks
|
V
iew topic
|
Raw edit
|
More topic actions...
Topic revision: r7 - 2013-12-19
-
JosepFlix
Log In
LCG
LCG Wiki Home
LCG Web Home
Changes
Index
Search
LCG Wikis
LCG Service
Coordination
LCG Grid
Deployment
LCG
Apps Area
Public webs
Public webs
ABATBEA
ACPP
ADCgroup
AEGIS
AfricaMap
AgileInfrastructure
ALICE
AliceEbyE
AliceSPD
AliceSSD
AliceTOF
AliFemto
ALPHA
Altair
ArdaGrid
ASACUSA
AthenaFCalTBAna
Atlas
AtlasLBNL
AXIALPET
CAE
CALICE
CDS
CENF
CERNSearch
CLIC
Cloud
CloudServices
CMS
Controls
CTA
CvmFS
DB
DefaultWeb
DESgroup
DPHEP
DM-LHC
DSSGroup
EGEE
EgeePtf
ELFms
EMI
ETICS
FIOgroup
FlukaTeam
Frontier
Gaudi
GeneratorServices
GuidesInfo
HardwareLabs
HCC
HEPIX
ILCBDSColl
ILCTPC
IMWG
Inspire
IPv6
IT
ItCommTeam
ITCoord
ITdeptTechForum
ITDRP
ITGT
ITSDC
LAr
LCG
LCGAAWorkbook
Leade
LHCAccess
LHCAtHome
LHCb
LHCgas
LHCONE
LHCOPN
LinuxSupport
Main
Medipix
Messaging
MPGD
NA49
NA61
NA62
NTOF
Openlab
PDBService
Persistency
PESgroup
Plugins
PSAccess
PSBUpgrade
R2Eproject
RCTF
RD42
RFCond12
RFLowLevel
ROXIE
Sandbox
SocialActivities
SPI
SRMDev
SSM
Student
SuperComputing
Support
SwfCatalogue
TMVA
TOTEM
TWiki
UNOSAT
Virtualization
VOBox
WITCH
XTCA
Welcome Guest
Login
or
Register
Cern Search
TWiki Search
Google Search
LCG
All webs
Copyright &© 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use
Discourse
or
Send feedback