WLCG-OSG-EGEE Ops' Minutes Mon 31 Aug 2009

Summary

Despite the fact that MPI deployment problems exist with the gLite SL5 WN the the SAM MPI tests can be enabled since the sites published information system is interrogated first to check for MPI support at at given site.

Attendance

EGEE

  • Asia Pacific ROC: Jason Shih
  • Central Europe ROC: Małgorzata Krakowian
  • OCC / CERN ROC: John Shade, Antonio Retico, Steve Traylen, Maite
  • French ROC:
  • German/Swiss ROC:
  • Italian ROC:
  • Northern Europe ROC:
  • Russian ROC:
  • South East Europe ROC: Marios Chatziangelou
  • South West Europe ROC: Christian Neissner,
  • UK/Ireland ROC:
  • GGUS: Helmut
  • GOCDB:

OSG

  • Kyle

Feedback on Last Week's Minutes

None was given.

EGEE Items

Grid Operator Hand Over on Duty

  c-COD Team
From Northern
To Italy

  • The problems with Asia Pacific have now been resolved.
  • Two sites (NE and AP) have overdue alarms, but I have informed both of them about this.
  • No issues to report to the WLCG meeting, except to inform them that the AP problems are now resolved.

PPS Reports and Issues

  • Please find Issues from EGEE ROCs and general info in: https://twiki.cern.ch/twiki/bin/view/LCG/OpsMeetingPps
  • Site reports are now no longer given.
  • New BDII, ICE , CREAM, FTS a lot.
  • FTS is now SL5 with many new features and some obsoleted. See page above.
  • CREAM at version 1.5 - support for LSF in blah.
    • APEL records working where torque pbs_server on seperate host.
    • Empty changes of information providers.
  • dCache - Security update and bug fixes.
  • New YAIM core and clients. A number of new variables have been added.
  • SWAT renaming of GCM ( the old sam wrapper tests)

gLite Release News

EGEE Items From ROC Reports

Italy, France and UKI had not validated their ROC reports as of the 14:00 deadline. Reports show no major operational issues encountered during the reporting period, and no points to raise at this meeting.

  • FZK-LCG2: wishes to convey the following INFO: Planed downtime at FZK-LCG2 on 10-09-2009 07:00 - 08:00 UTC The LFC service lfc-fzk.gridka.de will be down (not LHCb LFC) due to splitting it into an ATLAS (atlas-lfc-fzk.gridka.de) and a non-ATLAS (lfc-fzk.gridka.de as before) one.
  • SEE ROC: At the previous operations meeting it is briefly discussed the issue “WLCG MB agreed on 4th of August to ask for the SL5 migration at all Sites, including the Tier-2 Sites.”. As far as we know MPI it is still not supported by the glite-3.2 (see GGUS:47422. We understand that this affects only the WLCG sites (at the moment), but since there are many users/teams in our region that they are depending on the MPI facility/capability of the Grid, we think that this issue could be given higher priority at the developers.
  • SWE ROC: We d like to certify a site that runs only central services (WMS, LFC, etc..), the site has no storage or computing backend. Is this possible from the point of view of OPS? From meeting: Please go ahead of and create the site. Update the action item. Any problems please report. This is certainly a valid configuration.
  • Problems of SL5 nodes? Is there a page somewhere with details on notes on SL5 for WLCG support. SL4toSL5wnMigration and SL5DependencyRPM.

Grid Service Interventions

  • Consult links on the agenda page.

Misc Items.

# SAM default DPM upgrade

Last reminder that the default DPM used for SAM tests will be upgraded to SL4 next Monday 7th of September, and that sites with obsolete client S/W will start failing tests.

  • SAM MPI tests will NOT be activated There are pending tickets for SL5
    • Following discussion in the meeting the MPI tests do check what is published and as such sites with ill-working MPI will not fail MPI tests so long as they do not publish that they support MPI which is perfectly correct.
  • Notification of new gstat beta version (see attached material)
  • 7 Sites running legacy gLite releases, those not upgraded next week will be moved to suspended/uncertified till they do so:

Site Host Version

  • EENet kriit.eenet.ee 3.0.2
  • HK-HKU-CC-01 ce.grid.hku.hk 3.0.2
  • JP-KEK-CRC-01 dg10.cc.kek.jp 3.0.2
  • Taiwan-IPAS-LCG2 atlasce.phys.sinica.edu.tw 3.0.2
  • Taiwan-NCUCC-LCG2 ce.cc.ncu.edu.tw 3.0.2
  • TW-NTCU-HPC-01 host001.hpc.ntcu.edu.tw 3.0.2
  • UKI-LT2-RHUL ce1.pp.rhul.ac.uk 3.0.2

OAT Items

GStat

GStat 2.0 Beta Release The Beta release of GStat is now available. Installation and configuration instructions are available. http://goc.grid.sinica.edu.tw/gocwiki/GSInstallationGuide For any questions or comments, please email GStat support list. project-grid-info-support@cernNOSPAMPLEASE.ch.

Nagios

Update to the EGEE SA1 OAT release An update to the EGEE SA1 OAT release has now been released and is available in the usual repositories.

There are no changes to the YAIM configuration required but it is necessary to rerun ncg.pl at least e.g via a YAIM rerun following the "yum update" of your packages.

Changes include:

  • Changes to grid-monitoring-probes-org.bdii probes with NCG providing configuration for them. Probe details: http://goc.grid.sinica.edu.tw/gocwiki/NagiosProbe
  • Addition of org.gstat.CE and org.gstat.SE probes. These provide the sanity checks similar to those the gstat1 web interface provided. These are the gstat2 probes. In particular these look for greater compliance to the WLCG/EGEE glue schema usage documents.
  • Nagios probe results that are collected via the messaging system now have their status prefixed with the hostname from where the test was executed. e.g For a ROC that submitted a WN test to site via a CE then the probe result once transmitted to the site nagios via msg service will appear as before as service "org.sam.WN-Bi-dteam-roc" on the CE node but the status line contains the WN name. e.g lxbra3908.cern.ch: OK: getCE: ce103.cern.ch:2119/jobmanager-lcglsf-grid_2nh_dteam indicating that lxbra3908 was the WN where the test was executed.

Bug Fixes:

Install Instruction via YAIM. GridMonitoringNcgYaim

Bug Reports https://savannah.cern.ch/projects/sa1tools/

Discussion Mailing List including pre-release announcements join egee3-operations-automation-discuss@cernNOSPAMPLEASE.ch via https://groups.cern.ch

Description of yum repositories including pretty repoview html pages and rss feeds of packages updates. EGEESA1PackageRepository

Known Problems: We plan to deploy a bug fix to the production message brokers shorty that at times can cause consumers to fail to get messages.

OSG Items

The OSG supporter wrote in the diary of GGUS:49970 that the problem is solved, hence the ticket will be closed. However, the corresponding OIM ticket 7148 is in Status: Support Agency. Therefore the GGUS ticket cannot be closed.

Please adapt the ticket status and put a comprehensive text in the Solution field for the GGUS Knowledge Data Base.

Comments from the meeting suggest that everything solved.

Newly Created Action Items

Review of Open Action Items

Both covered in the meeting.

Open Action Items

IdSubmitterDescriptionCreationDueAssigned To 

Actions Closed in Last 20 Days

IdSubmitterDescriptionCreationDueAssigned ToClosed 

AOB

Next Meeting

The next meeting will be Monday, dd mmm 2009 14:00 UTC (16:00 Swiss local time).

  • Attendees can join from 13:45 UTC (15:45 Swiss local time) onwards.
  • The meeting will start promptly at 14:00 UTC (16:00 Swiss local time).
  • To dial in to the conference:
    • Dial +41227676000
    • Enter access code 0148141


These minutes can only be changed by members of:

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r3 - 2009-09-07 - SteveTraylen
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EGEE All webs login

This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Ask a support question or Send feedback