WLCG-OSG-EGEE Ops' Minutes Mon 07 Apr 2008

Attendance

EGEE

  • Asia Pacific ROC: Min Tsai???
  • Central Europe ROC: Malgorzata Krakowian
  • OCC / CERN ROC: Maite Barroso, Antonio Retico
  • French ROC: Gilles, Rolf Rumler
  • German/Swiss ROC: Clemens Koerdt
  • Italian ROC: Alessandro Cavalli???
  • Northern Europe ROC: Jules Wolfrat
  • Russian ROC: Lev Shamardin
  • South East Europe ROC: Ioannis Liabotis
  • South West Europe ROC:Gonzalo Merino
  • UK/Ireland ROC: ???
  • GGUS:Maria Dimou
  • OSCT: Absent

WLCG

  • WLCG Service Cordination: Harry Renshall,

WLCG Tier 1 Sites

  • ASGC: Min Tsai???
  • BNL: Absent
  • CERN site: Ignacio Reguero
  • FNAL: Joe Kaiser
  • FZK: Clemens Koerdt
  • IN2P3: Pierre Girard
  • INFN: Alessandro Paolini
  • NDGF: Leif ???
  • PIC: Gonzalo
  • RAL: Derek Ross, Matt Hodges???
  • SARA/NIKHEF: Absent???
  • TRIUMF: Absent???

Reports Not Received

  • VOs:
  • EGEE ROCs (Prod Sites): AP, IT

Maite pointed out that after a period of full completion, some ROCs are now missing it again. She stressed the fact that the report is something that the ROCs should be happy to do in their own interest.

Feedback on Last Week's Minutes

None were given.

EGEE Items

Grid Operator Hand Over on Duty

  Primary Team Secondary Team
From ROC CE ROC CERN
To ROC SWE ROC IT

Site: ITPA-LCG2 was failing GSTAT. It is publishing ScientificSL 5.0 which is not in the OS list used by GSTAT. In such case it should be up to the site/ROC to send a request to mailing list: roc-dev@listsNOSPAMPLEASE.grid.sinica.edu.tw to add the required OS version in the list. http://goc.grid.sinica.edu.tw/gocwiki/How_to_publish_the_OS_name

Assigned to Due date Description State Closed Notify  
SteveTraylen 2008-04-14 Define with gstat (roc-dev@listsNOSPAMPLEASE.grid.sinica.edu.tw) the new value to be set in the list of allowed OS describing the Scientific Linux 5 run at the site

Update 17th April
Min looking at it but the site should really submit a ticket. As per the instructions on
http://goc.grid.sinica.edu.tw/gocwiki/How_to_publish_the_OS_name

2008-05-06 edit

PPS Reports

An issue in the portal (confirmed after the meeting) did not allow the reports to be read. After the fix no issues were reported .

News about upcoming releases in the Agenda. Notably the certification and imminent release to PPS of the WMS and LB for SL4 was announced.

  • Lev Shamardin (ROC Russia) asked for news about the outcome of certification.
    • Antonio (PPS): Release notes not received yet. Apparently there is still an open issue on bulk submission affecting DAG jobs, but collections are working fine.
  • Maite (OCC): is the WMS now supposed to go through the full PPS cycle?
    • Antonio: for sure a pre-deployment test will be done. A lot will depend on whether the VOs will be available to test the WMS in the PPS environment. If they won't, a "pilot" service will have to be set up in some friendly production sites. This is a survey/decision, to be done as soon as the release documentation will be available.
  • Roberto Santinelli (LHCb): the two-month grace period foreseen by the gLite Middleware team for the support of the lcg-RB after the deployment of the new WMS is not enough for LHCb. In fact the developments on Dirac2 (submission engine currently used for LHCb production) are frozen and the framework can use only the RB. The VO is working intensively to the development of Dirac3, supporting the new system, but this will not be ready in two months.
    • Maite (OCC): there are no problems from the operations point of view. We will move forward the point to the gLite team.
    • Oliver Keeble (gLite): commented later (off-line via e-mail):
      
      On Mon, 7 Apr 2008, Oliver Keeble wrote:
      
      > 
      > Well, this is a largely hypothetical discussion as we haven't released 
      > an RB update for months and there are in fact no RB developers left... 
      > So, I don't think we can meaningfully 'extend support', but this does 
      > not prevent LHCb from continuing to use their existing RBs.
      > 
      > Oliver Keeble                      Information Technology Department
      > oliver.keeble@cern.ch                                           CERN
      > +41 22 76 72360                                    CH-1211 Geneva 23
      > 
      > 
      > Maite Barroso Lopez wrote:
      > > Hi Oliver,
      > > 
      > >  At today's ops meeting, Roberto said that the 2 months time after 
      > > the SL4 WMS release to support the RBs might not be enough for LHCb:
      > > 
      > > Lhcb is moving to dirac3, but normal production uses dirac2, which 
      > > still uses the RB.
      > > 
      > >  Could you please discuss it internally and with the RB developers 
      > > so we extend the RB support till LHCB migrates to the new Dirac version?
      > > 
      > > We can check where they are in 2 months from now (action at the ops 
      > > meeting).
      > > 
      > >  Thanks,
      > > 
      > >                  Maite
      
      

Assigned to Due date Description State Closed Notify  
Main.OCC 2008-06-10 Check with LHCb he status of the development of Dirac3 (version of the submission engine interfaced to WMS)

Update 17th April
Will be released in at least 2 months, close action item for now.

2008-04-17 edit

EGEE Items From ROC Reports

  1. (ROC France): Some site administrators complained because their e-mail address was added to a VO mailing-list without their agreement. The VO has been contacted and the problem is being solved, but that incident raises the more general problem of SPAM generated by the project itself. Could we agree in a good administration rule of mailing-list ? At least, except for some obvious and mandatory mailing-lists, an actor should have the possibility to unregister from any mailing-list by him/herself. The way to unregister should be made clear by the mailing list.
    • Rolf (ROC France): it is annoying for sysadmins to be solicited also for application and usage problems. This issue was reported also to the Coordinator of VO Management (Pierre Girard) so we hope that this fact remains sporadic.
  2. (ROC DECH): Please reopen action item 150. The problem is still present. see GGUS:33850.
    • Maite (OCC): the action was discussed and closed last week. After an intervention on SAM's side, it turned out that also gstat needed a fix, so the ticket was re-opened and assigned to gstat. The action is re-opened to track the issue.
  3. (ROC SEE): https://gus.fzk.de/ws/ticket_info.php?ticket=33697 is long overdue, please put some pressure on the corresponding support unit to respond.
    • Ioannis: as the test affect the availability results of a site which is currently on the spot for suspension, it is very important forour ROC to rely on correct SAM data for this case
    • Maite (OCC): The ticket was unduly assigned to the RB software support, where nobody was listening and it stood idle for a long time. Now it has been re-assigned to the SAM support. There are already some replies from the supporters.
  4. (ROC SEE): LCG-TAU still has some problems, thus it is now in downtime for the next 7 days in order to upgrade to the latest gLite release.
  5. (ROC SWE): We would like to have an update Top-BDII failover awareness on gLite client tools. Is it possible to configure several BDIIs in form of a list with yaim?
    • Lev Shamardin (ROC Russia): at least GFAL and lcg-utils should support a multi-bdii configuration, although YAIM does not support this option. A list of new client can be used in the LCG_GFAL_INFOSYS variable since gfal version 1.10.6 . NO corresponding Savannah bug was oopened.
    • The feature will be tried by Atlas and CMS and Atlas (mostly interested in the option) and eventually documented in the release notes.

Assigned to Due date Description State Closed Notify  
AndreaSciaba 2008-04-21 Verify and document in the User Guide the option to configure the GFAL client to use multiple BDIIs

Update 17th April, Maite will check.

Update 19th May, Andrea changed this on the same day the action was raised. This action can be closed.

2008-05-22 edit
  1. (ROC UKI): GGUS should respond whether the UKI-SOUTHGRID-CAM-HEP problem of 100 mails for the same ticket is a bug.
    • A reply was sent by GGUS via email:
      From: Grein, Guenter [mailto:Guenter.Grein@iwr.fzk.de] 
      Sent: Monday, April 07, 2008 3:58 PM
      To: Maite Barroso Lopez; Torsten Antoni
      Cc: Maria Dimou; Guenter Grein
      Subject: RE: GGUS issue for today's ops meeting
      [...] 
      Dear Maria and All,
      
      This huge traffic was caused by a guy from UK. He has entered the GGUS mail address helpdesk@ggus.org to the "Assign ticket to one person" field. As every update triggers an email to the mail addresses in this field, the system was looping until I got aware of it and stopped it.
      
      Meanwhile we have updated our mail parsing tool to avoid such things in future.
      
      Best regards 
      
      Guenter
            
  2. (ROC UKI): There have been many complaints in UKI about the move to the need to complete the site reports every day. Site admins often fill out the report for the week in one go and this seems a sensible approach - at least they should be able to choose. Several sites have indicated that they will stop filling out the reports in this new format. On the positive side the new interface seems better with the graphical representation of downtime etc. However, it would be very welcome if the colours used between tools were consistent. Previously grey represented downtime and red a failure... now we have black. Sites would also like to see the past history for the report so they can cross reference previous failures which is a feature lost in this upgrade.
    • Gilles Mathieu (CIC Portal): The developers have already been contacted and they will restore the functionality as it was, providing both the options to the site admins
    • Maite encourages the ROCs to provide written feedback for the new interface.
  3. (ROC UKI): The move to validating every use of a certificate on a site is becoming tedious. Is this a feature of the browser settings or does everyone get greeted with constant requests to use their certificate? Is it possible to have a compact view and a detailed view of site problems? I can not see correlations between sites anymore.
    • Maite: This needs clarification: Is it a general comment or related in particular to the CIC portal?

Assigned to Due date Description State Closed Notify  
Main.UKRoc 2008-04-14 Clarify the scope of the issue reported in WlcgOsgEgeeOpsMinutes2008x04x07 about continuous certificate requests. Is it a general comment or related in particular to the CIC portal?

Update 17th April, Gilles has done something.

2008-04-17 edit

WLCG Items

WLCG issues coming from ROC reports

* Item 1

Upcoming WLCG Service Interventions

Links in the agenda

WLCG Service Coordination

CCRC'08 Operational Review

Test of the Tier0 to Tier1 Optical Private Networks backup links from 15.00 to 19.00 CEST (13.00 to 17.00 UTC) on Wednesday 9 April.

More details in agenda.

ATLAS Service (Simone Campana, Alessandro Di Girolamo)

  1. deployment of new version of DPM(1.6.7-4): request for update (detailed request on the agenda)
    • _Antonio (PPS, SA1) as explained in the release section, a technical issue in the creation of the repository prevented the deployment in production to happen last Wednesday as announced. It will done definitely today
    • P.S.: gLite 3.1 Update 18 was actually released few hours after the meeting
  2. ATLAS sites with lcg-utils for SRM2:Request to the ROCs for follow-up (detailed request on the agenda)
    • Maite: from the SAM link, half the sites seem to have fixed the problem. The ROCs are kindly invited to finish the work. Are there issues at any sites which Atlas would like to address in particular?
    • Alessandro Di Girolamo (Atlas): No,vthe T1s are all working and this is the important thing for Atlas
    • Are there any updates regarding the SAm test being developed by Atlas to test the size of the Atlas SW area?
    • Alessandro sent details in an e-mail
      From: Alessandro Di Girolamo 
      Sent: Monday, April 07, 2008 4:46 PM
      To: Maite Barroso Lopez
      Subject: ATLAS issue: 100GB space in the sw area
      
      Ciao Maite,
      
      we are running the test CE-sft-vo-swspace that is trying to understand how much 
      space has been allocated for the ATLAS sw area.
      
      https://lcg-sam.cern.ch:8443/sam/sam.py?CE_atlas_disp_tests=CE-sft-lcg-version&CE_atlas_disp_tests=CE-sft-vo-swspace&order=RegionName&funct=ShowSensorTests&disp_status=na&disp_status=ok&disp_status=info&disp_status=note&disp_status=warn&disp_status=error&disp_status=crit&disp_status=maint
      
      I said "trying" since not for all the sites is possible to retrieve correctly this information, 
      but it is already a good starting point.
      
      Could be very useful if ROCs could have a look to the sites belonging to their clouds and 
      try to see the output one by one to see if sites match the 100GB ATLAS request (or if the 
      sites has problem in giving this number and in this case would be useful if the ROC would 
      directly ask to the site admin)
      
      Thanks
      CiaoCiao
                                                      Ale
      

ALICE Service

No report received

CMS Service (Andrea Sciaba')

Nothing to report

LHCb Service (Roberto Santinelli)

(LHCb): LHCb is planning week by week. The effort is currently focused in the development of dirac3 to accelerate the commissioning of the framework. This is a working in progress for some new relevant features of Dirac3 which were not implemented during the first February phase but will be in place for May

Last week a lot of sites were found in SAM db which are not in the production BDII. This is being analised with the SAM experts (Judit). Apparently the sites are relics of other ages and probably a clean up is needed.

Antonio Retico: These sites are currently not visualised in the SAM portal. Cannot the same flag be applied also for LHCb applications? Roberto: in theory, however there are actually issues recognised by Judit and she is working on them.

CCRC08; Sites

No comment

Roberto (LHCb) asks an update (ping) to PIC and SARA (or the relevant ROCs) about the status of the installation of the LFC, needed by the 18th of April

Goncalo Borges (PIC): The schedule is unchanged: we still foresee to deliver next week

Jules Wolfrat (NE ROC) will ask Ron Trompert to update the status.

OSG Items (Maria Dimou)

GGUS:31037 was closed; agreed it was a mistake

GGUS:33220 a long discussion between Steve Traylen and Rob Quick . Still not clear what is the conclusion. An interesting point is that the UFRJ, already part of EELA, is managed by OSG . Can this rule be somehow generalised and a FAQ be generated accordingly for the OSG support unit in GGUS?

Rob Quick: Definitely not. We will send a list of resourced outside US supported by OSG

An e-mail was received from Rob Quick following the discussion at the OPS meeting:

----------

Maria,

Here are the OSG resources not within the US borders.

Rob

Taiwan: osgc01.grid.sinica.edu.tw

Europe: rhilxs.ph.bham.ac.uk

South America: osgce.hepgrid.uerj.br, osg-ce.sprace.org.br, osg-se.sprace.org.br 
----------

The GGUS:33220 will be closed and split internally in two GOC tickets to be handled separately at the UFL and UFRJ sites

Action Items

Newly Created Action Items

Assigned to Due date Description State Closed Notify  
Main.OCC 2007-03-05 Example Action Item 2007-03-06 SteveTraylen   edit

Review of Open Action Items

Open Action Items

IdSubmitterDescriptionCreationDueAssigned To 

Actions Closed in Last 20 Days

IdSubmitterDescriptionCreationDueAssigned ToClosed 

AOB

Maria: Announced the phone conference for the User Support Advisory Group on Apr. 10th at 11am CEST, which VOs, ROCs and T1 sites This is the link of the USAG agenda : http://indico.cern.ch/conferenceDisplay.py?confId=30349

Next Meeting

The next meeting will be Monday, 14 Apr 2008 15:00 UTC (16:00 Swiss local time).

  • Attendees can join from 14:45 UTC (15:45 Swiss local time) onwards.
  • The meeting will start promptly at 15:00 UTC (16:00 Swiss local time).
  • The WLCG section will start at the fixed time of 15:30 UTC (16:30 Swiss local time).
  • To dial in to the conference:
    • Dial +41227676000
    • Enter access code 0157610


These minutes can only be changed by members of:

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r7 - 2008-05-22 - MaiteBarroso
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EGEE All webs login

This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright &© by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Ask a support question or Send feedback