TWiki
>
LCG Web
>
WLCGCommonComputingReadinessChallenges
>
WLCGOperationsMeetings
>
WLCGDailyMeetingsWeek141117
(2014-11-20,
MichailSalichos
)
(raw view)
E
dit
A
ttach
P
DF
---+!! Week of 141117 %TOC% ---++ WLCG Operations Call details * At CERN the meeting room is [[https://maps.cern.ch/mapsearch/?centerX=2492565¢erY=1121070¢erScale=2500][513]] R-068. * For remote participation we use the Vidyo system. Instructions can be found [[https://indico.cern.ch/conferenceDisplay.py?confId=287280][here]]. ---++ General Information * The SCOD rota for the next few weeks is at ScodRota * General information about the WLCG Service can be accessed from the [[WLCGOperationsWeb][Operations Web]] ---++ Monday Attendance: * local: Stefan (SCOD), Maarten (ALICE), Alessandro (ATLAS), Michail (LHCb), Tsung-Hsun Wu (ASGC), Zbigniew (Databases), Ignazio (Grid Services), * remote: Lisa (FNAL), Sang Un (KISTI), Ulf (NDGF), Tiju (RAL), Onno (NL-T1), Antonio (CNAF), Rolf (IN2P3), Dimitri (KIT), Pepe (PIC), Kyle (OSG), Michael (BNL) Experiments round table: * ATLAS * !CentralService/T0/T1s * FZK-LCG2 GGUS:110157 . ATLAS experts and FZK experts are investigating together. The problem has to be studied more. For the future to be discussed with FTS3. * Daily Activity overview * Fix Prodsys1 unfinished oldest than 1 month tasks (ADCSUPPORT-4049): 42% completed. The 465 tasks remaining are pile (117), reco (43), merge (135), evgen (108), simul (62). * Fix Prodsys1 tasks almost finished with few jobs missing (ADCSUPPORT-4048): 10% completed . The 135 tasks remaining are pile (54), reco (5), merge (47) and simul (29). * Babysitting of the 8 TeV derivation production tasks in Prodsys2 (ADCSUPPORT-4045): 97% completed. The number of tasks has been reduced from 465 tasks (12 november) to 15 tasks. * DQ2 clients testing: Doug reported various issues. 2 blockers, one on container, one on dq2-put which requires lumiblock. This second already fixed in next rel cand, the first reported only sunday afternoon. Dq2 ls now mix the output with scope (if in rucio) or without scope. We write it in the FAQ. * CMS * NR * ALICE - * NTR * LHCb * MC and user jobs. "Legacy Run1 stripping campaign", new schedule for stripping21 is on Wed (the earliest) * T0: NTR * T1: Replication of full.dst is over in all sites except GridKa due to slow staging progress Sites / Services round table: * ASGC: NTR * BNL: NTR * CNAF: NTR * FNAL: NTR * !GridPP: NR * !IN2P3: NTR * JINR: NR * KISTI: Issue with mailing list ticket GGUS:109886 is fixed (reported in last meeting), ALARM tickets will delivered correctly now * KIT: Working on fixing the staging problems discovered during the week-end, report to be provided later * NDGF: NTR * NL-T1: NTR * OSG: NTR * PIC: NTR * !RAL: NTR * RRC-KI: NR * TRIUMF: NR * CERN batch and grid services: Problem with one squid server which was stuck, problem on lxplus, fixed now * CERN storage services: NR * Databases: Today migration of golden gate cluster. LHCb / ATLAS will be partially unavailable, each for approx 20 min downtime, migration will start after this meeting. Tomorrow rolling patches for ATLAS/CMS integration databases. Wednesday switch over active dataguard service (ADCR DB) for ATLAS to new hardware, to be transparent. * GGUS: NR * Grid Monitoring: NR * MW Officer: It was found that the new Red Hat kernel 2.6.32-504 that comes with RHEL6.6 and as a security patch for older RHEL6 installations has a bug wrt. fuse. Due to the bug, cvmfs clients that are exported via NFS immediately provoke a kernel panic. The problem is not cvmfs specific but all fuse modules that are exported via NFS are affected. Site's that use the NFS exported cvmfs client must not update to this kernel. The common cvmfs deployment mode as a fuse module on the worker nodes seems not to be affected. AOB: ---++ Thursday Attendance: * local: Tsung-Hsun Wu (ASGC), Ignazio (Grid Services), Michail (LHCb), Stefan (SCOD), Maarten (ALICE), Herve (Storage), Alessandro (ATLAS) * remote: Andrej (ATLAS), Rolf (IN2P3), John (RAL), Lisa (FNAL), Ulf (NDGF), Christoph (CMS), Dea-Han (KISTI), Thomas (KIT), Dennis (NL-T1), Michael (BNL), Kyle (OSG), Experiments round table: * ATLAS * !CentralService/T0/T1s * TAIWAN-LCG2: Transfer failures. The site is in scheduled downtime which is declared as 'warning' so the switcher does not work. "DPM disk server memory upgrade and replace one hard drive" * _Maarten: Is the ASGC issue still ongoing? Andrei: Its not critical_ * Daily Activity overview * Rucio migration progress will be exported to a twiki * FTS3 REST defining in ProdSys1 * CMS [[CMS.FacOps_WLCGdailyreports][reports]] ( [[CMS.FacOps_WLCGdailyreports?raw=on][raw view]]) - * Some issues with SAM test submission earlier this week * Seems ok now * _Maarten: all SAM instances were affected by this problem, suspect a network issue for this. Went away ~ Monday afternoon but not understood_ * Otherwise NTR * ALICE - * NTR * LHCb * MC and user jobs. Staging files to disk buffer for Legacy Run1 stripping campaign has been restarted * T0: lbvobox18 instabilities yet to be understood, likely TCP flooding * _Stefan: maybe related to the network issue reported above_ * T1: NTR Sites / Services round table: * ASGC: Downtime today for DPM upgrade, now extended to tomorrow 11 UTC * BNL: NTR * CNAF: NR * FNAL: NTR * !GridPP: NR * !IN2P3: NTR * JINR: NR * KISTI: NTR * KIT: ATLAS dCAche instance running stable again * NDGF: NTR * NL-T1: NTR * OSG: bank holidays Thu/Fri next week. * PIC: NR * !RAL: Castor head nodes died for CMS, they were covered by backup nodes, currently in WARNING DT because moving back to production node. 25 Nov, WARNING DT for castor DB updates for all VOs * RRC-KI: NR * TRIUMF: NR * CERN batch and grid services: * myproxy.cern.ch will be upgraded to 6.0-2 on Tuesday 25th November between 10:00 and 12:00 CET. Users encouraged to validate the new version, see the [[https://cern.service-now.com/service-portal/view-outage.do?from=CSP-Service-Status-Board&&n=OTG0016007][ITSSB entry]] for more details. * The old VOMS servers 'voms.cern.ch' and 'lcg-voms.cern.ch' will be switched off for good and replaced by 'voms2.cern.ch' and 'lcg-voms2.cern.ch on Wednesday 26th November at 15:00 CET. More info in the [[https://cern.service-now.com/service-portal/view-outage.do?from=CSP-Service-Status-Board&&n=OTG0015989][ITSSB entry]] * CERN storage services: NR * Databases: NR * GGUS: NR * Grid Monitoring: NR * MW Officer: NR AOB:
Attachments
Attachments
Topic attachments
I
Attachment
History
Action
Size
Date
Who
Comment
pptx
MB-Nov-14.pptx
r2
r1
manage
2877.4 K
2014-11-17 - 17:03
PabloSaiz
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r16
<
r15
<
r14
<
r13
<
r12
|
B
acklinks
|
V
iew topic
|
WYSIWYG
|
M
ore topic actions
Topic revision: r16 - 2014-11-20
-
MichailSalichos
Log In
LCG
LCG Wiki Home
LCG Web Home
Changes
Index
Search
LCG Wikis
LCG Service
Coordination
LCG Grid
Deployment
LCG
Apps Area
Public webs
Public webs
ABATBEA
ACPP
ADCgroup
AEGIS
AfricaMap
AgileInfrastructure
ALICE
AliceEbyE
AliceSPD
AliceSSD
AliceTOF
AliFemto
ALPHA
Altair
ArdaGrid
ASACUSA
AthenaFCalTBAna
Atlas
AtlasLBNL
AXIALPET
CAE
CALICE
CDS
CENF
CERNSearch
CLIC
Cloud
CloudServices
CMS
Controls
CTA
CvmFS
DB
DefaultWeb
DESgroup
DPHEP
DM-LHC
DSSGroup
EGEE
EgeePtf
ELFms
EMI
ETICS
FIOgroup
FlukaTeam
Frontier
Gaudi
GeneratorServices
GuidesInfo
HardwareLabs
HCC
HEPIX
ILCBDSColl
ILCTPC
IMWG
Inspire
IPv6
IT
ItCommTeam
ITCoord
ITdeptTechForum
ITDRP
ITGT
ITSDC
LAr
LCG
LCGAAWorkbook
Leade
LHCAccess
LHCAtHome
LHCb
LHCgas
LHCONE
LHCOPN
LinuxSupport
Main
Medipix
Messaging
MPGD
NA49
NA61
NA62
NTOF
Openlab
PDBService
Persistency
PESgroup
Plugins
PSAccess
PSBUpgrade
R2Eproject
RCTF
RD42
RFCond12
RFLowLevel
ROXIE
Sandbox
SocialActivities
SPI
SRMDev
SSM
Student
SuperComputing
Support
SwfCatalogue
TMVA
TOTEM
TWiki
UNOSAT
Virtualization
VOBox
WITCH
XTCA
Welcome Guest
Login
or
Register
Cern Search
TWiki Search
Google Search
LCG
All webs
Copyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use
Discourse
or
Send feedback