Week of 110110

Daily WLCG Operations Call details

To join the call, at 15.00 CE(S)T Monday to Friday inclusive (in CERN 513 R-068) do one of the following:

  1. Dial +41227676000 (Main) and enter access code 0119168, or
  2. To have the system call you, click here
  3. The scod rota for the next few weeks is at ScodRota

WLCG Service Incidents, Interventions and Availability, Change / Risk Assessments

VO Summaries of Site Usability SIRs, Open Issues & Broadcasts Change assessments
ALICE ATLAS CMS LHCb WLCG Service Incident Reports WLCG Service Open Issues Broadcast archive CASTOR Change Assessments

General Information

General Information GGUS Information LHC Machine Information
CERN IT status board M/W PPSCoordinationWorkLog WLCG Baseline Versions WLCG Blogs   GgusInformation Sharepoint site - Cooldown Status - News


Monday:

Attendance: local(Cedric, Julia, Maria, Jamie, Manuel, Dirk, Peter, Ueda, Stefan, Maarten, Massimo, Ignacio, MariaDZ, David, Luca);remote(Michael, Jon, Gareth, Alessandro, Rolf, Gonzalo, Ron, Tore, Suijan Zhou, Rob, Daniele).

Experiments round table:

  • ATLAS reports -
    • CERN-PROD_LOCALGROUPDISK SRM errors GGUS:65932
    • Problem with BNL voms server GGUS:65944. Fixed [ Michael - problem with VOMS service not server. Admin mistakenly thought that a particular certificate was not used; once original cert restored yesterday morning all ok. ]
    • LFC problem at GridKa : GGUS:65942

  • CMS reports -
    • Experiment activity
      • Shutdown activities, Physics analysis of 2010 data, heavy preparation period for Winter Physics conferences
    • CERN and Tier0
    • Tier1
      • No outstanding issues
    • Tier-2
      • No outstanding issues
    • AOB
      • Mail Gateway issue at CERN : affected sendmail from CMS hypernews, not user mails. One of these gateways was overloaded from an lxplus machine sending over 100,000+ mails. CERN/IT investigating who caused this volume of traffic
      • CRC this week (starting tomorrow) : Stefano Belforte (connecting remotely...)

  • ALICE reports -
    • T0 site
      • Efforts ongoing to make AliEn v2.19 work for users this week. Production continues and CAF is available for analysis on limited, pre-staged data sets (new data sets can be requested).
    • T1 sites
      • Nothing to report
    • T2 sites
      • Nothing to report

  • LHCb reports - MC productions on going. Need to rerun the stripping for two streams (CHARM FULL and CHARM CONTROL). This is a very huge activity over all 2010 data.
    • T0
      • NTR
    • T1 site issues:
      • IN2p3: still problem installing software in their AFS area (vos release problem). A meeting LHCb-IN2p3 is currently being held to discuss about the status of the shared area and plans for the future. [ VOS release has now been done ]

Sites / Services round table:

  • BNL - nta
  • FNAL - ntr
  • RAL - had been running with some ATLAS FTS channels turned down to 50% of normal channels as one of ATLAS areas was getting full. Changed this morning to 75% of nominal. Will restore asap.
  • CNAF - ntr
  • IN2P3 - nta
  • PIC - a pnfs glitch this morning due to human error. Caused dCache service to be down for a couple of hours.
  • NL-T1 - downtime next week Tuesday to attempt to switch Oracle over to new h/w (7 node RAC). Planned in December but had to be postponed due to faulty network h/w.
  • NDGF - ntr
  • ASGC - ntr
  • KIT - issue with FTS now fixed.
  • OSG - ntr

  • CERN DB - ntr
  • CERN storage - this morning did CASTOR CMS upgrade to 2.1.10. Also upgrading stager and srm DB to 10.2.0.5.
  • CERN Grid services - during last two weeks seen some degradation of batch system. Correlated to high rate of submission from grid ILC batch queues. Raised ticket to find out why. GGUS:65965

AOB:

Tuesday:

Attendance: local(Eddie, Roberto, Maarten, Jamie, Maria, Gavin, Miguel, Ueda, Stefan, Massimo, Julia, Simone, Jacek);remote(Tiju, Stefano, Jon, Ulf, Jeremy, Rolf, Suijan, Alessandro, Rob, Xavier).

Experiments round table:

  • ATLAS reports -
    • ATLAS distributed computing system downtime for database reorganization on 17-18 Jan,
      • start draining on 16 Jan, in the evening
    • ATLAS restarted a series of data transfer measurements of full matrix (every site - every site)
      • excepting the sites declared as not appropriate to the tests
      • sites would observe transfers from/to unusual sites
    • Please provide us the pointer to the CERN FTS monitor [ Gavin - it is https://fts-monitor.cern.ch/ ]

  • CMS reports -
    • Experiment activity
      • Shutdown activities, Physics analysis of 2010 data, heavy preparation period for Winter Physics conferences
    • CERN and Tier0
      • Tier-0 still down [ Miguel - is there any Tier0 issue? A - no, waiting for new version of CMS SW to start reprocessing when new cosmic ray run. Not processing data, but not broken. ]
    • Tier1
      • No outstanding issues
    • Tier-2
      • No outstanding issues
    • AOB
      • CRC-on-duty : Stefano Belforte
      • Meeting to discuss Dashboard problems during Xmas break. For scheduled downtimes someone working on it 100% will have new version of SSB collector in production in one week. Will report at next CMS Facilities meeting on Monday.

  • ALICE reports -
    • T0 site
      • Efforts ongoing to make AliEn v2.19 work for users this week. Production continues and CAF is available for analysis on limited, pre-staged data sets (new data sets can be requested).
    • T1 sites
      • Nothing to report
    • T2 sites
      • Nothing to report
  • LHCb reports - MC jobs running at full steam (30-40K jobs per day). New requests coming almost continuously for Moriond conference.
    • T0
      • NTR
    • T1 site issues:
      • IN2p3: After the SW has been installed yesterday MC jobs ramped up at IN2p3-CC and IN2P3-T2 centers.
    • AOB - conditions DB SAM test failing yesterday in 5 sites. (One of tests in critical availability - for time being taken out of critical test list)

Sites / Services round table:

  • IN2P3 - ntr
  • RAL - FTS channels for ATLAS; updated now to full values
  • ASGC - ntr
  • FNAL - ntr
  • NL-T1 - downtime tomorrow for top level BDII, CREAM CE, WMS, to be migrated to ore stable host
  • KIT - ntr
  • CNAF - ntr
  • GridPP - ntr
  • OSG - question: had people at FNAL who run VOMS for CDF and D0 and asked whether they should registered in GOCDB. Who to contact? A: Tiziana Ferrari

  • NDGF We have srm.ndgf.org pool software updates + tape system expansion tomorrow. AT_RISK has been scheduled, as some Atlas and Alice data might be unavailable.CSC T2 site has the ARC-CE node jade-cms.hip.fi down with hardware problems.

  • CERN VOMS On Thursday 13th January at 10:00 CET the host certificate for lcg-voms.cern.ch hosting VOs dteam, cms, atlas, alice, lhcb and ops will be updated. A new lcg-vomscerts 6.3.0 was released before the new year.

  • CERN DB - this morning ALICE online DB down for 3h due to power cut a pit.

AOB:

  • Next WLCG T1SCM on Thursday 20 January - agenda to be circulated shortly.

Wednesday

Attendance: local();remote().

Experiments round table:

Sites / Services round table:

AOB:

Thursday

Attendance: local();remote().

Experiments round table:

Sites / Services round table:

AOB:

Friday

Attendance: local();remote().

Experiments round table:

Sites / Services round table:

AOB:

-- JamieShiers - 07-Jan-2011

Edit | Attach | Watch | Print version | History: r13 | r11 < r10 < r9 < r8 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r9 - 2011-01-11 - JamieShiers
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback