WLCG-OSG-EGEE Op's Minutes Mon 14 Jan 2008

Attendance

EGEE

  • Asia Pacific ROC: Min tsai
  • Central Europe ROC: Marcin Radecki
  • OCC / CERN ROC: John Shade, Antonio Retico, Nick Thackray, Steve Traylen
  • French ROC: Gilles + lots of people ???
  • German/Swiss ROC: Sven Hermann
  • Italian ROC: Alessandro Cavalli
  • Northern Europe ROC: Ron
  • Russian ROC: Lev Shamardin
  • South East Europe ROC: Kostas Koumantaros
  • South West Europe ROC: Kai Neuffer
  • UK/Ireland ROC: Matt Hodges, Derek Ross, Jeremy Coles
  • GGUS: Maria Dimou (for User Support)
  • OSCT: Absent

WLCG

  • WLCG Service Cordination: Harry, Jamie

WLCG Tier 1 Sites

  • ASGC: Min Tsai
  • BNL: Absent
  • CERN site: Harry Renshall
  • FNAL: Rob Quick
  • FZK: Sven Hermann
  • IN2P3: ???
  • INFN: Alessandro
  • NDGF:
  • PIC: Gonzalo
  • RAL: Abesent
  • SARA/NIKHEF: Ron
  • TRIUMF: Rod Walker

Reports Not Received

  • WLCG Tier 1s:
  • VOs:
  • EGEE ROCs (Prod Sites): CERN
  • EGEE ROCs (PPS Sites): AP, CERN, IT

Happy New year

Got off to a late start due to static caused by Fermilab connection Steve (chair) wished everyone a Happy New Year.

Feedback on Last Week's Minutes

None were given.

EGEE Items

Grid Operator Hand Over on Duty

  Primary Team Secondary Team
From Italy SouthWest
To Central Europe France

PPS Reports

Business as usual

EGEE Items From ROC Reports

  1. (ROC CE): It looks we have a central problem with accounting data. Listing of sites not publishing accounting data contains about 40 sites which suddenly stopped publishing in Dec 2007: http://www3.egee.cesga.es/acctenfor/nodata.php Some sites in CE reported problems with APEL similar to a bug: https://savannah.cern.ch/bugs/?32435 Could APEL people comment on that?
    • Derek(UKI) mentioned that the APEL box's certificate had expired - should have been fixed sometime last week. CE will check that it's OK.*

  1. (ROC CE): When could we expect MON BOX on SL(C)4? For sites using SL4 this is one of SL3 dependencies.
    • Marcin(CE): SL4 MONBOX is needed in order to publish accounting records.
    • Steve will ask Oliver to come next week for an update. He will look for more info. Other gLite services based on Tomcat are deployed on SLC4, so it shouldn't take too long to get things into production.

Upcoming gLite releases

* gL3.1 U10 --> Prod (~ Thursday) will contain, in particular
    • glite-PX for glite 3.1
    • gLite-AMGA_postgres for gLite 3.1
    • VOBOX
    • edg-mkgridmap-3.0.0 compatible with OpenSSL 0.9.7 * gL3.0 U38 --> Prod (~Thursday) will contain
    • ~ 20 patches with bug fixes

WLCG Items

Tier1 Reports

  • none received (business as usual)

WLCG issues coming from ROC reports

  • Item 1

Upcoming WLCG Service Interventions

  • Sven (DECH) asked to remove intervention at gridka (6 Nov) from the template
  • Ron (NE): dcache off-line at SARA. Re-configuration to implement requirements on space management from CCRC08

FTS Service Review

none

ATLAS Service (Alessandro Di Girolamo)

  1. Storage Space:
    Each site should publish in the Information System updated information in the following fields:
                    o GlueSAStateAvailableSpace
                    o GlueSATotalOnlineSize
                    o GlueSAUsedOnlineSize 
                for:
                    o each storage area with space tokens associated
                    o each storage area associated with "default spaces" for a given storage class 
                These informations are crucial for CCRC08
                Thanks in advance
       
    • Steve (OCC): Agrees that this is the correct solution but the publication of StorageSpace is not likely to be released
    • Alessandro (Atlas): The issue concerns all VOs (LHCb agrees). We need correct storage space to be published in the information system. Currently ATLAS uses dpm-query or VOBOX to get information. Information does not always match with what returned by lcg-infosites (RAL, with castor, is an example of info mismatch). Sites running out of space need to be black-listed immediately and it is important to rely on the info provider. Currently ATLAS has to retrieve the information site by site without any standards. At the same time there are sites publishing correctly, so at least the T1 should make their info converge.
    • Nick suggests to start with the T1s
      • Agree on a deterministic recipe for the T1s to follow with Flavia (GSSD)
      • open a set of tickets to the ROCs to follow up the change
    • Alessandro: the deadline for Atlas is end of January
    • Antonio (Cern ROC): with a good set of instruction (and a metric to verify) the ROCs can do it

  1. SE/SRM SAM critical tests for BNL Tier1 failing since mid December
    • Steve: GGUS 31218. To me at least the ticket is wrong for problem. The problem is "No space for atlas"
    • Rob will bring the issue at OSg ops meeting later in the afternoon

  1. ATLAS would know the status and the time schedule for srmls on lxplus: right now it is deployed only for CERN PPS.
    • Steve: The command is apparently available
      /afs/cern.ch/project/gd/LCG-share/current/d-cache/srm/bin/srmls
    • Alessandro will verify if that is the one wanted

  1. Problems to retrieve group attributes from DPM (Point added during the meeting)
    dpm-query is now more user friendly than before as it allows to retrieve more readable info than the bare UID or GID, but some sites in French ROC are still publishing in a way that dpm-query returns in the old fashion
  • This is a good candidate for a GGUS ticket
  • Alessandro opened it on-line

ALICE Service

nothing reported

CMS Service

nothing reported

LHCb Service (Roberto Santinelli)

  1. Roberto (LHCb) rfio problems at CNAF (and now also at RAL).
    The problem (hanging connection in case the file on the SE is read from the WN 
    using rfio protocol) is under investigation by CASTOR people with support of CNAF
     people. However being CNAF out of the production mask since months now (suffering 
    the accounting) we are looking for the shortest way to get it fixed: accessing files 
    through rootd rather than through rfiod. This has been proved to work at CERN 
    (where it is happily used).
         
    I'd like to remind with this report this issue (that heavily penalizes computing mask of LHCb)
    and to set some actions that should be addressed consistently:
        1. CASTOR people + CNAF people to debug the rfio problem
        2. CNAF people (to install,configure and test rootd). They got the support from FIO 
            and CASTOR people at CERN and it should foreseen for this week.
        3. In case the recipe works at CNAF involve RAL people for the point 2. 
    
    
  • Derek (RAL): We were not aware of castor problems at RAL
  • Roberto: the issue happened very recently at RAL as well. The suggestion is to wait for CNAf to get back in production and then apply the same fix at RAL
  • Roberto: the aim of this report is to get commitment from the OPS team to follow-up this particular problems, namely for one site (CNAF) to sum up the state-of-art solution and then for the other WLCG sites to fix it where needed
  • CNAF (with Luca dell'Agnello) is aware of this request and it is actively working. BTW LHCb cannot sunat CNAF since September.
  • Steve asked a representative from CNAF to join the meeting next week. A castor expert from CERN wil join as well
  • Alessandro Cavalli (CNAF) will transmit the request

WLCG Service Coordination

OSG Items

Maria (User Support) went through 4 old OSG tickets still open. Rob will check them.

Review of Action Items

AOB

  • Maria (User Support) VO YAIM configurator tool now linked to CIC portal - does anyone use it?
  • ??? it needs to mature a bit first and a documentation page for sites to consult
  • Kai (PIC) we have used it successfully. The output files where used for production after some manual modifications concerning local storage configuration
  • Nick: Question will be asked of ROCs to volunteer to try out the tool

Next Meeting

The next meeting will be Monday, 21 Jan 2007 15:00 UTC (16:00 Swiss local time).

  • Attendees can join from 14:45 UTC (15:45 Swiss local time) onwards.
  • The meeting will start promptly at 15:00 UTC (16:00 Swiss local time).
  • The WLCG section will start at the fixed time of 15:30 UTC (16:30 Swiss local time).
  • To dial in to the conference:
    • Dial +41227676000
    • Enter access code 0157610


These minutes can only be changed by members of:

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2008-03-03 - SteveTraylen
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EGEE All webs login

This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright & by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Ask a support question or Send feedback