WLCG-OSG-EGEE Operations Minutes Mon 3 Dec 2007

Attendance

There were many problems with the conferencing facilities and as such it is expected that some of the absentees were in fact unable to connect.

EGEE

  • Asia Pacific ROC: Min
  • Central Europe ROC: Someone?
  • OCC / CERN ROC: John Shade, Antonio Retico, Nick Thackray, Steve Traylen
  • French ROC: Gilles
  • German/Swiss ROC: Clemens , Sven
  • Italian ROC: Alessandro
  • Northern Europe ROC: Apoligies.
  • Russian ROC: Lev
  • South East Europe ROC: Kostas
  • South West Europe ROC: Kai, Gonzalo,
  • UK/Ireland ROC: Unable to connect.
  • GGUS: Thorsten
  • OSCT: Absent

WLCG

  • WLCG Service Cordination: Absent

WLCG Tier 1 Sites

  • ASGC: Min
  • BNL: Absent
  • CERN site: Ignacio Reguero
  • FNAL: Absent
  • FZK: Sven
  • IN2P3: Absent
  • INFN: Alesandro, Alfrede
  • NDGF: Absent
  • PIC: Gonzalo
  • RAL: Unable to connect.
  • SARA/NIKHEF: Apolgies
  • TRIUMF: Rod Walker

Reports Not Received

  • WLCG Tier 1s: None
  • VOs: CMS, ALICE, ATLAS
  • EGEE ROCs (Prod Sites): South Eastern Europe
  • EGEE ROCs (PPS Sites): AP, IT, SWE, SEE

Feedback on Last Week's Minutes

None were given.

EGEE Items

Grid Operator Hand Over on Duty

  Primary Team Secondary Team
From Germany/Switzerland Taiwan
To SouthWestern UK/I

  • There are quite some node appearing in the alarm table although they have monitoring disabled in GOCDB. Might be the change has been done only recently:
    1. srm-v2.cr.cnaf.infn.it
    2. gridse2.pg.infn.it
    3. dcsrmv2.usatlas.bnl.gov
  • Open GGUS ticket and put as many information as possible.
  • 11/28 Due to the connection problem to GOCDB can not access CIC dashboard.
  • Ru-Trcitsk-INR-LCG2 and KTU-BG-GLITE did not update their CA rpm to the latest version. Sent mail to site managers.
    • New tickets:24
    • 2nd mail:8
    • Quarantine:14
    • Extend:11
    • Close:19

PPS Reports

This is an invitation to interested sites to show-up and possibly contact Mario, David as coordinator of the pre-deployment, who will gladly provide them with the technical info they need. We would be particularly happy to receive volunteers for this activity, in the framework of the "Special support to PPS Operations" among those certified PPS sites which still don't appear in the lists in http://www.cern.ch/pps/index.php?dir=./panel/ , namely:

  • DESY-PPS
  • FZK-PPS
  • GSI-LCG2-PPS
  • SCAI-PPS
  • PPS-SiGNET
  • PreGR-01-UoM
  • PreGR-01-UPATRAS

Suggestions from the ROCs /PPS sites dealing with possible deployment scenarios of the AMGA service in PPS are also very welcome. We are actively looking for a user community interested to try out the newly released postgres-based version of AMGA.

Release News

  1. gLite3.1.0-PPS-UPDATE10 was released to PPS This update introduces a number of new services to gLite 3.1 for SL4 (32 bit)
    • glite-AMGA_postgres
    • glite-LFC_mysql
    • glite-LFC_oracle
    • glite-PX
    • glite-SE_dpm_disk
    • glite-SE_dpm_mysql
    • glite-VOMS_mysql
    • glite-VOMS_oracle
  2. Records of the pre-deployment testing can be found in http:www.cern.ch/pps/index.php?dir=./release/testreports/
  3. release of gLite3.1 Update07 to production in preparation: (To be announced early this week) This release will contain:
  • JobWrapper tests - new version with no R-GMA dependencies
  • glite-VOMS_mysql metapackage for gLite 3.1 and SL(C)4
  • glite-VOMS_oracle metapackage for gLite 3.1 and SL(C)4
  • Bug fixes for UI and WN

EGEE Items From ROC Reports

SAM Apel test
When it is scheduled to become a critical test? This is discussed at ROC managers’ meeting. Roc managers will take care of sites that aren't publishing account data. When sites start to publish the data test will be become critical. In this moment a lot of sites will be fail. This will be discussed again at the next ROC managers’ meeting (next Tuesday). Hopefully by then we should have reasonable number of sites.
Availability report
CYFRONET-LCG2 Tier-2 site remarks that while analyzing availability reports it is hard to determine the reason for decreased availability because the tools which affects (FCR) and computes (GridVIEW) availability base on SAM results which are available only for last 7 days. We are aware the longer history is a performance problem but maybe it would be possible to provide an interface to show some short period of SAM results in the past? We will check this with GridView people and inform you the next week. $ SFU-LCG: We have 400 queued atlasprd jobs for 10-cpu cluster. Some SFT job fail because they could not be run for a long time. Rod Walker - problem is fixed. $ CERN-PROD: Soon after the release of GGUS we received a number of update e-mails from GGUS concerning the verification done by the users of (sometimes) very old tickets. As the corresponding tickets were already frozen in our internal TT system, this caused a lot of new tickets to be opened. The issue was not systematic, in the sense that it did not concern tickets in the whole history, but it was however significant We are asking the GGUU team if thy are aware of possible causes. We reckon a post mortem analysis as envisageable in order to correclty record and address the same issue for future releases. GGUS reported in the meeting that there was a mistake and they should not have gone out. It was a one off anyway.
ROC-DECH
Some sites are unsure about the correct procedure to introduce new service nodes in the production environment. Now that GOCDB no longer allows sites to switch off the monitoring the sites should put the nodes initially in ''maintenance''!?! This was discussed and this seems logical. A request will be to the operational manual to make add this to the operations document.
RUSSIA
It seems like some users try to submit jobs to the sites bypassing RB/WMS system, directly using CE job submission APIs or globus tools. What should we do with this (i.e.: don't care, encourage, prohibit in some way)?
  • There was some discusion, in particular Ireland block all submissions except the trusted RBs but this is not suitable everywhere. It comes down to tracking down the user and contacting them if they are being anti social
RUSSIA
ALICE have asked the Russian ROC to install a working pbs client on their VOBoxes. Some discussion occured at the meeting and a link is to be provided for the published description ALICE VO box. VoBoxesInfo

WLCG Items

None

Tier1 Reports

None

WLCG issues coming from ROC reports

None

Upcoming WLCG Service Interventions

* Major intervention on all VOMS and VOMRS production services at CERN on Monday. All components will be changed dramatically as well as the database schema itself. All LHC VOMS and VOMRS services are will be unavailable on Monday morning 8-11 UTC.

FTS Service Review

TransferOperationsWeeklyReports though there does not appear to be a report this week?

ATLAS Service

ALICE Service

CMS Service

LHCb Service

Last Friday CNAF site admins went through an extra-ordinary emergency intervention and, using the usual procedure described at http://cic.gridops.org/index.php?section=home&page=SDprocedure they put in Scheduled Downtime the CNAF batch farm (until today).

The procedures foresee a broadcast message sent to affected people (for LHCb this is the lhcb-production mailing list). We didn't receive any message. It would be nice to understand the reason of that. Being the procedure very well defined (and then the possibility of errors from the sysadmin side minimized) I tend to believe that the broacast tool didn't work properly this time causing some perturbation in the daily activity of LHCb. Can relevant people (maintaining these tools) look into that?

From the meeting both CNAF and IN2P3 (CIC portal) are looking into it.

WLCG Service Coordination

OSG Items

Review of Action Items

Next Meeting

The next meeting will be Monday, 10 Dec 2007 14:00 UTC (16:00 Swiss local time).

  • Attendees can join from 13:45 UTC (15:45 Swiss local time) onwards.
  • The meeting will start promptly at 14:00 UTC.
  • The WLCG section will start at the fixed time of 16:30.
  • To dial in to the conference:
    • Dial +41227676000
    • Enter access code 0157610


These minutes can only be changed by members of:

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r7 - 2008-03-03 - SteveTraylen
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EGEE All webs login

This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright & by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Ask a support question or Send feedback