WLCG Tier1 Service Coordination Minutes - 8th April 2010

Attendance

Site Name(s)
CERN  
ASGC  
BNL Carlos
CNAF Luca, Barbara
FNAL Jon
KIT  
IN2P3  
NDGF Vera
NL-T1 Ron
PIC Gonzalo
RAL Carimine, Andrew
TRIUMF Andrew

Experiment Name(s)
ALICE  
ATLAS Dario
CMS  
LHCb  

Interventions foreseen during LHC stop (26 - 28 April)

Site Intervention(s)
CERN  
ASGC no interventions planned
BNL no interventions planned
CNAF  
FNAL no interventions planned
KIT  
IN2P3  
NDGF no interventions planned
NL-T1 no interventions planned
PIC no interventions planned
RAL no interventions planned - may do a small network intervention (part of UPS room network)
TRIUMF no interventions planned

glexec deployment status

Site Status
CERN  
ASGC  
BNL  
CNAF  
FNAL  
KIT  
IN2P3  
NDGF  
NL-T1  
PIC  
RAL  
TRIUMF  

Data Management & Other Tier1 Service Issues

Storage systems: status, recent and planned changes

Site Status Recent changes Planned changes
CERN CASTOR 2.1.9-4 (all)
SRM 2.8-6 (ALICE, CMS, LHCb)
SRM 2.9-2 (ATLAS)
None None
ASGC CASTOR 2.1.7-19 (stager, nameserver)
CASTOR 2.1.8-14 (tapeserver)
SRM 2.8-2
   
BNL dCache 1.9.4-3    
CNAF CASTOR 2.1.7-27 (ALICE)
SRM 2.8-5 (ALICE)
StoRM 1.5.1-2 (ATLAS, CMS, LHCb)
   
FNAL dCache 1.9.5-10 (admin nodes)
dCache 1.9.5-12 (pool nodes)
none none
IN2P3 dCache 1.9.5-11 with Chimera    
KIT dCache 1.9.5-15 (admin nodes)
dCache 1.9.5-5 - 1.9.5-15 (pool nodes)
   
NDGF dCache 1.9.7    
NL-T1 dCache 1.9.5-16 (SARA), DPM 1.7.3 (NIKHEF)    
PIC dCache 1.9.5-15 xrootd doors enabled and published (request from LHCb) none
RAL CASTOR 2.1.7-27 (stagers)
CASTOR 2.1.8-3 (nameserver central node)
CASTOR 2.1.8-17 (nameserver local node on SRM machines)
CASTOR 2.1.8-8, 2.1.8-14 and 2.1.9-1 (tape servers)
SRM 2.8-2
   
TRIUMF dCache 1.9.5-11 with Chimera namespace    

Other Tier-0/1 issues

CASTOR news

dCache news

Nothing to report.

StoRM news

LFC news

The production version of LFC is now 1.7.3.

FTS

Experiment issues

WLCG Baseline Versions

Conditions data access and related services

Frontier/Squid

    • The minutes of the last meeting can be found at the usual URL:ATLAS weekly FroNTier meetings
    • Release 2.7.STABLE9-3 of frontier-squid has been announced. The release notes can be found here. The relative rpm has been made available for tests on Tuesday this week. Feedback received from BNL and CMS and integrated. A new rpm release will be announced soon.
    • Squid caches are needed at CERN to alleviate stress on launchpads at other sites (namely Lyon). Information requested about the number of batch slots allocated to ATLAS and CMS analysis jobs since the number of needed squid caches depends on the number of slots. Squid caches at CERN will be installed for ATLAS by the VOC as soon as this information and the new rpm will be available.
    • Squid caches can be installed on VMs provided that the physical machine hosting the VMs comes with multi-Gigabit network connectivity (1Gb/sec-link per Squid).
    • Dave Dykstra requested more resources to monitor Squid and Frontier launchpad in ATLAS. The request is being put forward by the ATLAS VOC.
    • Squid caches information will be stored in the ATLAS AGIS. Details on how to extract information from AGIS will be made public by the AGIS developers.
    • CNAF have asked if they should install a frontier server for ATLAS or just squid caches. The recommendation is to install squid caches. CNAF has already 2 squid caches for CMS installed. They can share them with ATLAS if the total number of job slots for the 2 experiments does not exceed 1000.

COOL/CORAL

  • The LFC read-only instance at CERN for LHCb was unreachable on Tuesday timing out all requests and causing many jobs to fail. This is again due to the sub-optimal use of LFC in the CORAL replica service component. The problem is known since a long time and had been avoided with a workaround for production jobs, but it reappeared this week in the analysis jobs submitted by individual users. Various actions have been taken in parallel to mitigate and eventually fix the problem:
    • A workaround has been deployed by LHCb on Wednesday to avoid LFC access from user analysis jobs submitted through the DIRAC backend of Ganga. If necessary, this might be extended next week to the whole LHCb software environment (including interactive jobs).
    • An SQLite snapshot produced on Thursday with all conditions taken so far will allow users to analyse the LHCb data collected before the LHC stop, bypassing the access to Oracle and hence to the LFC replica service.
    • A CORAL patch prepared last week has passed preliminary tests on Wednesday and will be tested more thoroughly next week by LHCb when the relevant experts are back, in view of its release and deployment.
  • A new release of COOL, CORAL and POOL (LCGCMT_56f) was prepared for ATLAS last week. The main motivation for this new release was to pick up some bug fixes and enhancements in the POOL collections package. Several bug fixes and improvements in CORAL and COOL were also included. The release notes are available on https://sftweb.cern.ch/persistency/releases.
    • Some problems with hanging connections in CORAL have been reported by ATLAS on Wednesday during the validation of the LCGCMT_56f release prepared last week and are currently being investigated.
  • Two patches have been received from Oracle Support to fix issues reported in the 11.2.0.1.0 client software. The patch for the first issue ('cannot restore segment prot after reloc' when loading the 64bit OCI library with SELinux enabled) has been fully validated. The patch for the second issue (crashes in ATLAS production jobs on AMD Opteron quadcore nodes),
which had triggered a downgrade to the 10g client for ATLAS a few weeks ago, has passed tests by the CORAL team on an ATLAS node in Ljubljana, but is still pending a more complete validation by ATLAS. A new client software installation '11.2.0.1.0p1', including these two patches and a third one previously received for the 32bit OCCI library on SELinux, has been prepared in the LCG AA software installation area in AFS.

Database services

AOB

-- JamieShiers - 30-Mar-2010

Edit | Attach | Watch | Print version | History: r24 | r14 < r13 < r12 < r11 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r12 - 2010-04-08 - AndreaValassi
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback