EGEE Ops' Minutes Mon 02 Nov 2009

Summary

WLCG sites should now deploy a CREAM CE service in parallel with the production LCG-CEs in order to gain real production experience. Although the ability to submit from Condor clients is still missing, the site installation will not change when this is available. The CREAM CEs should also now be marked as “production” in the information system.

Attendance

EGEE

  • Asia Pacific ROC:
  • Central Europe ROC: Malgorzata Krakowian
  • OCC / CERN ROC: John Shade, Antonio Retico, Maite Barroso
  • French ROC: Pierre Girard, Rolf Rumler
  • German/Swiss ROC: Wen Mei, Sven Hermann
  • Italian ROC: Cristina Aftimiemi, Paolo Veronesi
  • Latina American ROC: Renato Santana
  • Northern Europe ROC: Michaela Lechner
  • Russian ROC:
  • South East Europe ROC: Marios Chatziangelou
  • South West Europe ROC: Christian Neissner
  • UK/Ireland ROC: Jeremy Coles
  • GGUS: Torsten Antoni
  • GOCDB: Gilles Mathieu

Feedback on Last Week's Minutes

None was given.

EGEE Items

Grid Operator Hand Over on Duty

  c-COD Team
From ROC Italy
To ROC France

Report from cCOD:

  • GGUS tickets 52012, 51135, and 51229 deal with some APEL issues that seem to be fixed, but the SAM tests don't say so. Since APEL also run the SAM tests in question, the ball is in their camp. Gilles requested that when similar problems are seen, the APEL support unit in GGUS is involved asap; these was not the case with some of these tickets. the problem is with APEL and the accounting portal, Gilles will have a look.

  • For the rest, all the CIC dashboard tabs (Dashboard, Tickets, Alarms) show nothing and the red line "The resource is not available". (Steve T. also reported a hanging CIC portal and no confirmation that the CERN ROC report had been correctly processed). Anybody from the CIC portal? No; please, check offline (mail sent).

    PPS Reports and Issues

    • First official staged rollout update last week, to the early adopters sites (previous release testing sites); deployed at 4 sites out of 5. Only new version of lcg-infosites, it is a real release. New release pages already produced. One question raised, core services pointing to PPS instances? Yes, let’s leave it like that at the moment, and gradually move to production ones. Date to change to production ones? Nothing set yet, we are only exercising the process in the old PPS environment, so no real change yet. Antonio hopes to involve production sites by the end of the year.

    gLite Release News

    • Release of UPDATE 58 to gLite 3. 1, restoring the components rolled back in previous update and not affected by the critical issue
    • lcg-vomscerts, new method is supposed to replace tit, why are we still releasing it? We still have to distribute it because of dependencies on WMS that are being worked out.

    EGEE Items From ROC Reports

    • DECH, GGUS ticket #47944 (sBDII SAM fests failing for FZK-LCG2 though GIIS is working fine) no news: Gstat developer (Joanna) removed from support list, now back in and checking this ticket.
    • Italy: all RBs have been retired; all WMS have been upgraded (WMS @ INFN-FERRARA update is in progress, the WMS is closed).
    • SEE: There are a couple of concerns from our region about the glite-SWAT utilities/tools. The most important/critical of them are:
      • Does the tool use the user's credentials (without their knowledge)? It does not use user credentials, just runs scripts in wn to get info about wn itself
      • If so, what about the privacy of user's data? not applicable (see previous answer)
      • Can this tool make modifications to the user's job environment? no, The tool does not export anything back into the job environment; results are stored in job directory of job user
      • Is it possible that a failure of the monitoring tool could lead to the user's job failing? So far we did not have that case but it could happen; development was careful to prevent this; mx 30 seconds and if no results it will be destroyed by itself.
    • (From the previous meeting) Italy: Which is the version for each Storage Element implementation to be compliant with the "Usage of Glue Schema v1.3 for WLCG Installed Capacity information"? As ROC, we could push and follow the upgrade of the old version and validate the published data. The Baseline versions of services and client tools for WLCG operations (https://twiki.cern.ch/twiki/bin/view/LCG/WLCGBaselineVersions) seems to be update last 02-Jun-2009. This useful page should be update more frequently (at every gLite update?) just to be sure that the recommendations are not out of date. ANSWER: We have contacted teh WLCG team and they will update this page to include the SE version compliant with Glue 1.3. Additionally we are building the EGEE baseline and have plans to update it automatically with each release.

    Grid Service Interventions

    • Due to improvement the service, it is possible that Monday, 2 November the accounting portal at www3.egee.cesga.es suffer intermittently cutting of service for a few minutes....
    • The GGUS system will go down for an unscheduled reboot today at 12:30 UTC

    CREAM-CE Deployment

    WLCG sites should now deploy a CREAM CE service in parallel with the production LCG-CEs in order to gain real production experience. Although the ability to submit from Condor clients is still missing, the site installation will not change when this is available. The CREAM CEs should also now be marked as “production” in the information system. On the EGEE side it will be discussed tomorrow at the SA1 coordination meeting.

    CREAM CE SAM tests are in production, non critical. Please start to have a look at the results (need to check "NA - no status available").
    Timeline? For wlcg now, no more details given.

    Miscellaneous

    • There are still a few RBs and old versions of WMS out there: Please check this page for the details: http://straylen.web.cern.ch/straylen/wms-count.txt
    • Please remember to keep the monthly availability comments Wiki up-to-date: https://twiki.cern.ch/twiki/bin/view/EGEE/MonthlyAvailability
    • Significant Update to SA1 OAT Nagios Release made today.
    • DECH: complaints from Nagios sites receiving GGUS tickets, what is this? we were not aware of anything related; 4150 and 4159, tickets opened against the regional dashboard. Wen will send more information to debug this incident.
    • Firewall port at CERN that allows direct update of job status to CREAM-CE is closed, so a fallback mechanism which takes a lot longer is used. This only affects timing, not functionality, but the problem is in the process of being fixed.

    Newly Created Action Items

    Assigned to Due date Description State Closed Notify  
    Main.OCC 2007-03-05 Example Action Item 2007-03-06 SteveTraylen   edit

    Review of Open Action Items

    Open Action Items

    IdSubmitterDescriptionCreationDueAssigned To 

    Actions Closed in Last 20 Days

    IdSubmitterDescriptionCreationDueAssigned ToClosed 

    AOB

    Next Meeting

    The next meeting will be Monday, 09 Nov 2009 14:00 UTC (16:00 Swiss local time).

    • Attendees can join from 13:45 UTC (15:45 Swiss local time) onwards.
    • The meeting will start promptly at 14:00 UTC (16:00 Swiss local time).
    • To dial in to the conference:
      • Dial +41227676000
      • Enter access code 0148141


    These minutes can only be changed by members of:

  • Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
    Topic revision: r4 - 2009-11-05 - MaiteBarroso
     
      • Cern Search Icon Cern Search
      • TWiki Search Icon TWiki Search
      • Google Search Icon Google Search

      EGEE All webs login

    This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
    Ideas, requests, problems regarding TWiki? Ask a support question or Send feedback