---+!! Week of 140203 %TOC% ---++ WLCG Operations Call details * At CERN the meeting room is [[https://maps.cern.ch/mapsearch/?centerX=2492565¢erY=1121070¢erScale=2500][513]] R-068. * For remote participation we use the Alcatel system. At 15.00 CE(S)T on Monday and Thursday (by default) do one of the following: 1 Dial +41227676000 (just 76000 from a CERN office) and enter access code 0119168, or 1 To have the system call you, click [[https://audioconf.cern.ch/call/0119168][here]] * In case of problems with Alcatel, we will use Vidyo as backup. Instructions can be found [[https://indico.cern.ch/conferenceDisplay.py?confId=287280][here]]. The SCOD will email the WLCG operations list in case the Vidyo backup should be used. ---++ General Information * The SCOD rota for the next few weeks is at ScodRota * General information about the WLCG Service can be accessed from the [[WLCGOperationsWeb][Operations Web]] ---++ Monday Attendance: * local: !MariaD (SCOD), Maarten (ALICE), Massimo (CERN Data Mgnt), Vitor (CERN Grid Services), Felix (ASGC). * remote: Roger (NDGF), Sang-Un (KISTI), Michael (BNL), Matteo (CNAF), Elena (ATLAS), Eric (CMS), Onno (NL_T1), Kyle (OSG), Tiju (RAL), Alexei (LHCb), Lisa (FNAL), Pepe (PIC). Experiments round table: * ATLAS [[https://twiki.cern.ch/twiki/bin/view/AtlasComputing/ADCOperationsDailyReports2014][reports]] ([[https://twiki.cern.ch/twiki/bin/view/AtlasComputing/ADCOperationsDailyReports2014?raw=on][raw view]]) - * Central services/T0 * CERN_PROD: Transfers were failing with permission denied errors on Monday morning. Noticed and fixed by CERN team. Thanks. * T1 * TAIWAN: heavy SRM load caused transfer failures on Sunday (GGUS:100904). Fixed. * FZK: staging errors for DATATAPE on Friday (GGUS:100885). Fixed by issuing a retry for all outstanding stage requests for ATLAS and restarting tape storage software. * PIC: problem with one disk pool, which caused transfers to failed on Friday (GGUS:100874), dCache pool restarted. * CMS [[https://twiki.cern.ch/twiki/bin/view/CMS/FacOps_WLCGdailyreports][reports]] ([[https://twiki.cern.ch/twiki/bin/view/CMS/FacOps_WLCGdailyreports?raw=on][raw view]]) - * T1/T2/Others: Business as usual. Smooth running. * Preparing for DBS (data catalog) upgrade on Feb 10. That week will see little to no central processing * One problem: ARGUS cluster issue(s) (DNS? and then a new, uninitialized node in the cluster) caused problems with analysis jobs running. * Debugged by CMS analysis operations. Better would be to have SLS monitoring of the ARGUS cluster. Ticket is GGUS:100870 * ALICE - * sites please take note of the necessary WLCG VOBOX update announced last Fri * see details below * KIT * the number of corrupted files has _shrunk_ by 45% to 26126 * 21k files have been salvaged after all, thanks very much! * LHCb [[https://twiki.cern.ch/twiki/bin/view/LHCb/ProductionOperationsWLCGdailyReports][reports]] ([[https://twiki.cern.ch/twiki/bin/view/LHCb/ProductionOperationsWLCGdailyReports?raw=on][raw view]]) - * Mostly simulation and user jobs. Smooth running over most of the grid. * T0: Pilots aborted at ce202.cern.ch today. Ticket is GGUS:100902 * T1: NTR * T2: NTR Sites / Services round table: * ASGC: ntr * BNL: ntr * FNAL: ntr * OSG: ntr * KISTI: ntr * NL_T1: ntr * CNAF: ntr * PIC: ntr * NDGF: ntr * !IN2P3: ntr (sent be email) * !RAL: Tomorrow, between 8-10hrs am UK time, tape system intervention. Site set at risk in GOCDB. * CERN: * Grid Services: ntr * Data Mgnt: * Problem to access EOS from outside CERN. Now solved. Lasted for 1h 15'. * ROOT access to CASTOR is now switched off. Hardly 10 users concerned. They have been informed about alternative access methods. AOB: * WLCG VOBOX * as announced on the wlcg-operations list last Fri, please ensure your WLCG VOBOX instances generate host proxies with *1024-bit* keys! * preferably update Globus; correct minimal versions of the affected rpm: * =globus-proxy-utils-5.0-6= (Globus 5.0) * [[http://repository.egi.eu/sw/production/umd/3/sl6/x86_64/updates/globus-proxy-utils-5.0-6.el6.x86_64.rpm][UMD-3 SL6 rpm]] (can be installed manually also on UMD-2 machines) * [[http://repository.egi.eu/sw/production/umd/3/sl5/x86_64/updates/globus-proxy-utils-5.0-6.el5.x86_64.rpm][UMD-3 SL5 rpm]] (ditto) * =globus-proxy-utils-5.2-1= (Globus 5.2) * from EPEL for EMI-3 and EMI-2 * otherwise one can apply this quick hack: <verbatim> perl -pi.bak -e 's/ -q / -bits 1024 $&/' \ /etc/vobox/templates/voname-box-proxyrenewal \ /etc/init.d/*-box-proxyrenewal </verbatim> ---++ Thursday Attendance: * local: * remote: Experiments round table: * ATLAS [[https://twiki.cern.ch/twiki/bin/view/AtlasComputing/ADCOperationsDailyReports2014][reports]] ([[https://twiki.cern.ch/twiki/bin/view/AtlasComputing/ADCOperationsDailyReports2014?raw=on][raw view]]) - * Central services/T0 * IT and DE clouds moved to FTS3 * T1 * CERN-PROD CVMFS inside CERN faulty GGUS:100928 https://cern.service-now.com/service-portal/view-outage.do?from=CSP-Service-Status-Board&&n=OTG7278 * CMS [[https://twiki.cern.ch/twiki/bin/view/CMS/FacOps_WLCGdailyreports][reports]] ([[https://twiki.cern.ch/twiki/bin/view/CMS/FacOps_WLCGdailyreports?raw=on][raw view]]) - * ALICE - * CNAF * tape SE updated to xrootd v3.3.4 (on Jan 28) with new checksum plugin successfully validated (Feb 5) with test transfers, thanks! * KIT * investigating why many jobs read a lot of data remotely from CERN * LHCb [[https://twiki.cern.ch/twiki/bin/view/LHCb/ProductionOperationsWLCGdailyReports][reports]] ([[https://twiki.cern.ch/twiki/bin/view/LHCb/ProductionOperationsWLCGdailyReports?raw=on][raw view]]) - Sites / Services round table: * GGUS: * Suggestion to remove three fields from the 'Ticket Submission Form' (see [[https://twiki.cern.ch/twiki/pub/LCG/WLCGDailyMeetingsWeek140203/submitForm.png][attachment]]). Those fields are hardly ever used, and they are anyways concatenated to the body of the issue AOB: * !OpenSSL issue * [[https://operations-portal.egi.eu/broadcast/archive/id/1079][EGI broadcast]] sent Feb 4 describing current state of affairs and recipes for cures * Sites using *HTCondor as batch system* may need to apply one of these configuration changes for now: * =DELEGATE_JOB_GSI_CREDENTIALS = False= * =GSI_DELEGATION_KEYBITS = 1024= * !HTCondor v8.0.6 will have the default increased to 1024
Attachments
Attachments
Topic attachments
I
Attachment
History
Action
Size
Date
Who
Comment
png
submitForm.png
r1
manage
116.8 K
2014-02-06 - 11:41
PabloSaiz
This topic: LCG
>
WebHome
>
WLCGCommonComputingReadinessChallenges
>
WLCGDailyMeetingsWeek140203
Topic revision: r12 - 2014-02-06 - MariaDimou
Copyright &© 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use
Discourse
or
Send feedback