WLCG-OSG-EGEE Ops Meeting Minutes and Action Items

Agendas for all operations meetings are located in Indico.

Mailing List

Attendees of the meeting should join. grid-operations-meeting@cernNOSPAMPLEASE.ch

Upcoming Meetings

Minutes for Meetings Modified In Last 20 Days

Agenda Date Minutes Taker Chair Last Edit

Minutes for Meetings Not Modified In Last 20 Days

Agenda Date Minutes Taker Chair Last Edit
88204 Mon 15 Mar 2010 WlcgOsgEgeeOpsMinutes2010x03x15 AntonioRetico NicholasThackray 3533 day(s) ago by UnknownUser
87567 Mon 08 Mar 2010 WlcgOsgEgeeOpsMinutes2010x03x08 AntonioRetico NicholasThackray 3543 day(s) ago by UnknownUser
86972 Mon 01 Mar 2010 WlcgOsgEgeeOpsMinutes2010x03x01 AntonioRetico NicholasThackray 3551 day(s) ago by UnknownUser
85275 Wed 15 Feb 2010 WlcgOsgEgeeOpsMinutes2010x02x15 NicholasThackray MaiteBarroso 3563 day(s) ago by UnknownUser
84350 Mon 08 Feb 2010 WlcgOsgEgeeOpsMinutes2010x02x08 MaiteBarroso AntonioRetico 3550 day(s) ago by MaiteBarroso
82497 Mon 25 Jan 2010 WlcgOsgEgeeOpsMinutes2010x01x25 NicholasThackray MaiteBarroso 3572 day(s) ago by MaiteBarroso
81101 Mon 18 Jan 2010 WlcgOsgEgeeOpsMinutes2010x01x18 MaiteBarroso NicholasThackray 3590 day(s) ago by MaiteBarroso
76178 Mon 07 Dec 2009 WlcgOsgEgeeOpsMinutes2009x12x07 MaiteBarroso JohnShade 3634 day(s) ago by MaiteBarroso
75507 Mon 30 Nov 2009 WlcgOsgEgeeOpsMinutes2009x11x30 DianaBosio SteveTraylen 3641 day(s) ago by UnknownUser
74987 Wed 23 Nov 2009 WlcgOsgEgeeOpsMinutes2009x11x23 AntonioRetico SteveTraylen 3645 day(s) ago by UnknownUser
74371 Mon 16 Nov 2009 WlcgOsgEgeeOpsMinutes2009x11x16 MaiteBarroso JohnShade 3653 day(s) ago by MaiteBarroso
73715 Mon 09 Nov 2009 WlcgOsgEgeeOpsMinutes2009x11x09 SteveTraylen MaiteBarroso 3663 day(s) ago by SteveTraylen
72954 Mon 02 Nov 2009 WlcgOsgEgeeOpsMinutes2009x11x02 MaiteBarroso JohnShade 3667 day(s) ago by MaiteBarroso
72091 Mon 26 Oct 2009 WlcgOsgEgeeOpsMinutes2009x10x26 DianaBosio MaiteBarroso 3675 day(s) ago by UnknownUser
71400 Mon 19 Oct 2009 WlcgOsgEgeeOpsMinutes2009x10x19 JohnShade SteveTraylen 3683 day(s) ago by JohnShade
70781 Mon 12 Oct 2009 WlcgOsgEgeeOpsMinutes2009x10x12 MaiteBarroso JohnShade 3684 day(s) ago by SteveTraylen
70007 Mon 05 Oct 2009 WlcgOsgEgeeOpsMinutes2009x10x05 DianaBosio SteveTraylen 3684 day(s) ago by SteveTraylen
69189 Mon 28 Sep 2009 WlcgOsgEgeeOpsMinutes2009x09x28 NicholasThackray JohnShade 3701 day(s) ago by UnknownUser
68261 Wed 16 Sep 2009 WlcgOsgEgeeOpsMinutes2009x09x14 MaiteBarroso DianaBosio 3717 day(s) ago by MaiteBarroso
67858 Mon 07 Sep 2009 WlcgOsgEgeeOpsMinutes2009x09x07 NicholasThackray SteveTraylen 3702 day(s) ago by UnknownUser
67352 Mon 31 Aug 2009 WlcgOsgEgeeOpsMinutes2009x08x31 SteveTraylen MaiteBarroso 3726 day(s) ago by SteveTraylen
66812 Mon 24 Aug 2009 WlcgOsgEgeeOpsMinutes2009x08x25 MaiteBarroso JohnShade 3677 day(s) ago by UnknownUser
65727 Mon 03 Aug 2009 WlcgOsgEgeeOpsMinutes2009x08x03 AntonioRetico SteveTraylen 3760 day(s) ago by AntonioRetico
65239 Mon 27 Jul 2009 WlcgOsgEgeeOpsMinutes2009x07x27 JohnShade SteveTraylen 3768 day(s) ago by JohnShade
64824 Mon 20 Jul 2009 WlcgOsgEgeeOpsMinutes2009x07x20 SteveTraylen MaiteBarroso 3771 day(s) ago by SteveTraylen
64288 Mon 13 Jul 2009 WlcgOsgEgeeOpsMinutes2009x07x13 JohnShade NicholasThackray 3739 day(s) ago by JohnShade
63913 Tue 07 Jul 2009 WlcgOsgEgeeOpsMinutes2009x07x06 MaiteBarroso JohnShade 3733 day(s) ago by SteveTraylen
63150 Mon 29 Jun 2009 WlcgOsgEgeeOpsMinutes2009x06x29 JohnShade DianaBosio 3794 day(s) ago by JohnShade
62665 Mon 22 Jun 2009 WlcgOsgEgeeOpsMinutes2009x06x22 DianaBosio NicholasThackray 3803 day(s) ago by DianaBosio
61238 Mon 08 Jun 2009 WlcgOsgEgeeOpsMinutes2009x06x08 DianaBosio AntonioRetico 3809 day(s) ago by DianaBosio
59253 Mon 18 May 2009 WlcgOsgEgeeOpsMinutes2009x05x18 NicholasThackray NicholasThackray 3834 day(s) ago by NickThackray
58811 Mon 11 May 2009 WlcgOsgEgeeOpsMinutes2009x05x11 DianaBosio JohnShade 3843 day(s) ago by DianaBosio
58154 Tue 05 May 2009 WlcgOsgEgeeOpsMinutes2009x05x04 MaiteBarroso NicholasThackray 3851 day(s) ago by MaiteBarroso
57797 Mon 27 Apr 2009 WlcgOsgEgeeOpsMinutes2009x04x27 SteveTraylen NicholasThackray 3858 day(s) ago by SteveTraylen
57117 Mon 20 Apr 2009 WlcgOsgEgeeOpsMinutes2009x04x20 JohnShade NicholasThackray 3851 day(s) ago by MaiteBarroso
56344 Mon 06 Apr 2009 WlcgOsgEgeeOpsMinutes2009x04x06 DianaBosio NicholasThackray 3834 day(s) ago by NickThackray
55792 Mon 30 Mar 2009 WlcgOsgEgeeOpsMinutes2009x03x30 SteveTraylen NicholasThackray 3887 day(s) ago by SteveTraylen
55286 Mon 23 Mar 2009 WlcgOsgEgeeOpsMinutes2009x03x23 JohnShade MaiteBarroso 3866 day(s) ago by JohnShade
54797 Mon 16 Mar 2009 WlcgOsgEgeeOpsMinutes2009x03x16 JohnShade SteveTraylen 3893 day(s) ago by JohnShade
53678 Mon 09 Mar 2009 WlcgOsgEgeeOpsMinutes2009x03x09 AntonioRetico JohnShade 3899 day(s) ago by JohnShade
53672 Mon 02 Mar 2009 WlcgOsgEgeeOpsMinutes2009x03x02 SteveTraylen NicholasThackray 3914 day(s) ago by SteveTraylen
53064 Mon 23 Feb 2009 WlcgOsgEgeeOpsMinutes2009x02x23 NicholasThackray MaiteBarroso 3907 day(s) ago by AntonioRetico
52812 Mon 16 Feb 2009 WlcgOsgEgeeOpsMinutes2009x02x16 SteveTraylen NicholasThackray 3907 day(s) ago by AntonioRetico
52457 Mon 09 Feb 2009 WlcgOsgEgeeOpsMinutes2009x02x09 SteveTraylen NicholasThackray 3933 day(s) ago by SteveTraylen
51947 Mon 02 Feb 2009 WlcgOsgEgeeOpsMinutes2009x02x02 JohnShade NicholasThackray 3918 day(s) ago by NickThackray
51058 Mon 26 Jan 2009 WlcgOsgEgeeOpsMinutes2009x01x26 JohnShade SteveTraylen 3933 day(s) ago by SteveTraylen
49102 Mon 19 Jan 2009 WlcgOsgEgeeOpsMinutes2009x01x19 JohnShade MaiteBarroso 3956 day(s) ago by JohnShade
48841 Mon 12 Jan 2009 WlcgOsgEgeeOpsMinutes2009x01x12 MaiteBarroso NicholasThackray 3914 day(s) ago by SteveTraylen
43734 Mon 08 Dec 2008 WlcgOsgEgeeOpsMinutes2008x12x08 JohnShade SteveTraylen 3998 day(s) ago by JohnShade
43733 Mon 01 Dec 2008 WlcgOsgEgeeOpsMinutes2008x12x01 JohnShade NicholasThackray 4005 day(s) ago by JohnShade
43732 Mon 24 Nov 2008 WlcgOsgEgeeOpsMinutes2008x11x24 SteveTraylen NicholasThackray 4005 day(s) ago by JohnShade
43731 Mon17 Nov 2008 WlcgOsgEgeeOpsMinutes2008x11x17 JohnShade SteveTraylen 4018 day(s) ago by JohnShade
43730 Tue 11 Nov 2008 WlcgOsgEgeeOpsMinutes2008x11x10 MariaDimou NicholasThackray 4010 day(s) ago by SteveTraylen
43729 Thu 03 Nov 2008 WlcgOsgEgeeOpsMinutes2008x11x03 NicholasThackray JohnShade 4030 day(s) ago by NickThackray
43701 Mon 27 Oct 2008 WlcgOsgEgeeOpsMinutes2008x10x27 SteveTraylen NicholasThackray 4018 day(s) ago by SteveTraylen
43700 Mon 20 Oct 2008 WlcgOsgEgeeOpsMinutes2008x10x20 JohnShade NicholasThackray 4018 day(s) ago by SteveTraylen
43122 Mon 13 Oct 2008 WlcgOsgEgeeOpsMinutes2008x10x13 NicholasThackray MaiteBarroso 4030 day(s) ago by NickThackray
42750 Mon 06 Oct 2008 WlcgOsgEgeeOpsMinutes2008x10x06 JohnShade SteveTraylen 4058 day(s) ago by JohnShade
42178 Mon 29 Sep 2008 WlcgOsgEgeeOpsMinutes2008x09x29 AntonioRetico SteveTraylen 4068 day(s) ago by AntonioRetico
40864 Wed 10 Sep 2008 WlcgOsgEgeeOpsMinutes2008x09x08 JohnShade SteveTraylen 4086 day(s) ago by JohnShade
40432 Mon 01 Sep 2008 WlcgOsgEgeeOpsMinutes2008x09x01 MaiteBarroso NicholasThackray 4069 day(s) ago by SteveTraylen
40050 Mon 25 Aug 2008 WlcgOsgEgeeOpsMinutes2008x08x25 JohnShade NicholasThackray 4086 day(s) ago by JohnShade
39666 Mon 18 Aug 2008 WlcgOsgEgeeOpsMinutes2008x08x18 SteveTraylen NicholasThackray 4109 day(s) ago by SteveTraylen
39271 Mon 11 Aug 2008 WlcgOsgEgeeOpsMinutes2008x08x11 MaiteBarroso NicholasThackray 4118 day(s) ago by SteveTraylen
38938 Mon 04 Aug 2008 WlcgOsgEgeeOpsMinutes2008x08x04 NicholasThackray MaiteBarroso 4051 day(s) ago by NickThackray
38629 Tue 29 Jul 2008 WlcgOsgEgeeOpsMinutes2008x07x28 JohnShade NickThackray 4131 day(s) ago by JohnShade
38373 Mon 21 Jul 2008 WlcgOsgEgeeOpsMinutes2008x07x21 JohnShade SteveTraylen 4139 day(s) ago by JohnShade
37973 Mon 14 Jul 2008 WlcgOsgEgeeOpsMinutes2008x07x14 JohnShade SteveTraylen 4145 day(s) ago by JohnShade
37459 Mon 07 Jul 2008 WlcgOsgEgeeOpsMinutes2008x07x07 SteveTraylen SteveTraylen 4149 day(s) ago by SteveTraylen
36781 Tue 01 Jul 2008 WlcgOsgEgeeOpsMinutes2008x06x30 JohnShade SteveTraylen 4139 day(s) ago by JohnShade
36456 Mon 23 Jun 2008 WlcgOsgEgeeOpsMinutes2008x06x23 SteveTraylen NicholasThackray 4150 day(s) ago by SteveTraylen
35651 Mon 09 Jun 2008 WlcgOsgEgeeOpsMinutes2008x06x09 SteveTraylen NicholasThackray 4139 day(s) ago by JohnShade
35210 Mon 02 Jun 2008 WlcgOsgEgeeOpsMinutes2008x06x02 AntonioRetico MaiteBarroso 4097 day(s) ago by MaiteBarroso
34793 Mon 26 May 2008 WlcgOsgEgeeOpsMinutes2008x05x26 SteveTraylen NicholasThackray 4150 day(s) ago by SteveTraylen
34181 Mon 19 May 2008 WlcgOsgEgeeOpsMinutes2008x05x19 MaiteBarroso NicholasThackray 4146 day(s) ago by SteveTraylen
33334 Mon 05 May 2008 WlcgOsgEgeeOpsMinutes2008x05x05 JohnShade NicholasThackray 4150 day(s) ago by SteveTraylen
33014 Mon 28 Apr 2008 WlcgOsgEgeeOpsMinutes2008x04x28 SteveTraylen JohnShade 4179 day(s) ago by SteveTraylen
32602 Mon 28 Apr 2008 WlcgOsgEgeeOpsMinutes2008x04x21 JohnShade NicholasThackray 4200 day(s) ago by NicholasThackray
32253 Mon 14 Apr 2008 WlcgOsgEgeeOpsMinutes2008x04x14 SteveTraylen MaiteBarroso 4199 day(s) ago by MaiteBarroso
31870 Mon 07 Apr 2008 WlcgOsgEgeeOpsMinutes2008x04x07 AntonioRetico MaiteBarroso 4199 day(s) ago by MaiteBarroso
31469 Mon 31 Mar 2008 WlcgOsgEgeeOpsMinutes2008x03x31 NicholasThackray MaiteBarroso 4249 day(s) ago by NicholasThackray
30747 Mon 17 Mar 2008 WlcgOsgEgeeOpsMinutes2008x03x18 JohnShade NicholasThackray 4261 day(s) ago by JohnShade
30153 Mon 10 Mar 2008 WlcgOsgEgeeOpsMinutes2008x03x10 SteveTraylen MaiteBarroso 4215 day(s) ago by SteveTraylen
30001 Mon 03 Mar 2008 WlcgOsgEgeeOpsMinutes2008x03x03 SteveTraylen NicholasThackray 4179 day(s) ago by SteveTraylen
23810 Mon 25 Feb 2008 WlcgOsgEgeeOpsMinutes2008x02x25 JohnShade MaiteBarroso 4284 day(s) ago by JohnShade
23809 Mon 18 Feb 2008 WlcgOsgEgeeOpsMinutes2008x02x18 SteveTraylen NicholasThackray 4285 day(s) ago by SteveTraylen
23808 Mon 11 Feb 2008 WlcgOsgEgeeOpsMinutes2008x02x11 FaridaNaz MaiteBarroso 4279 day(s) ago by SteveTraylen
23807 Mon 04 Feb 2008 WlcgOsgEgeeOpsMinutes2008x02x04 DusanVudragovic NicholasThackray 4279 day(s) ago by SteveTraylen
23806 Mon 28 Jan 2008 WlcgOsgEgeeOpsMinutes2008x01x28 AntonioRetico JohnShade 4279 day(s) ago by SteveTraylen
23805 Mon 21 Jan 2008 WlcgOsgEgeeOpsMinutes2008x01x21 FaridaNaz SteveTraylen 4279 day(s) ago by SteveTraylen
23804 Mon 14 Jan 2008 WlcgOsgEgeeOpsMinutes2008x01x14 AntonioRetico SteveTraylen 4279 day(s) ago by SteveTraylen
23803 Mon 17 Dec 2007 WlcgOsgEgeeOpsMinutes2007x12x17 SteveTraylen NicholasThackray 4279 day(s) ago by SteveTraylen
23802 Mon 10 Dec 2007 WlcgOsgEgeeOpsMinutes2007x12x10 NicholasThackray SteveTraylen 4279 day(s) ago by SteveTraylen
23801 Mon 3 Dec 2007 WlcgOsgEgeeOpsMinutes2007x12x03 NicholasThackray SteveTraylen 4279 day(s) ago by SteveTraylen

Open Action Items from Operations Meeting

New action items should just be added in the meeting minutes themselves. There are comments in the minutes to describe how to do this.

Assigned to Due date Description State Closed Notify  

Closed Action Items from Operations Meeting

Assigned to Due date Description State Closed Notify  
Main.SAM 2009-08-31 Activate in production SAM MPI tests
12/10/2009: the MPI tests are in validation (go to action)
2009-10-26 edit
Main.OCC 2010-02-08 Wrong version detection command for the LB service. BUG:61586 . This bug duplicates BUG:55482 from 2009-09-09 09:59. So it is not corrected during 3(!) months.

UPDATE AT THE MEETING: This will be fixed in gLite 3.2 but not in gLite 3.1. OCC will follow up.
08/02/2010There is now a fix for gLite 3.1, the bug is set to "Fix Certified", I think this action can be closed. (go to action)

2010-02-08 edit
Main.OCC 2010-02-15 Check MPI test status and request the move to critical next Friday, so from Monday alarms are sent to the regional operation teams

This is now done, started on Monday 15th of February. (go to action)

2010-03-02 edit
EgeeOCCGroup 2007-12-10 GGUS:28099 has been open for two weeks without comment.

Update Feb 11th set to unsolved (gLite Workload); Related to a MW bug BUG:32962 FQAN comparator does not work properly
status: integration candidate (go to action)
2008-02-12 edit
Main.OCC(John) 2008-02-04 Clarify "at risk" downtime & interaction with tools (esp. GridView)
Update Jan 31th: Done. Submitted Savannah bug 33104 against GridView?. They fixed the GOCDB synchronizer code (gocdb3_query.php ) to handle AT_RISK downtime (intervention) correctly. (go to action)
2008-02-11 edit
Main.OCC(John) 2008-02-04 What to do about FNAL & SAM timeouts?

*Update Jan 31th* : Piotr (Mr SAM) confirmed that site-specific timeouts are not an option. Also, modifying timeouts just for the DPM tests would take a while, and would require agreement from all VOs & ROCs (it would potentially increase the time to detect real DPM problems). One could argue that if the SRM tests are timing out after ten minutes, the SRM is probably not of much use to users at that time either. Therefore, tweaking SAM to mask the problem is not a good solution. Nevertheless, he suggested that FNAL investigate a local workaround, such as increasing the priority of ops monitoring jobs. Joe was notified of this, & we await his feedback.

*Update Feb 18th* More hardware was thrown at the problem and the situation is
resolved. (go to action)

2008-02-25 edit
Main.OCC(Nick) 2008-02-04 How to handle BDII/GOCDB mismatches, and the issue of introducing new sites?

*Update Jan 31th* : Will be discussed by the ROC managers in Lyon next week (Tuesday 5th)

*Update Feb 18th*: Will add a link to minutes of ROC managers meeting.

*Update Feb 25th*:
This is the link: https://edms.cern.ch/file/893655/1/ROC-mgrs-05-02-2008(ARM-11).htm
The conclusion was: Nick to ask the relevant development teams for an estimate of the effort required to implement the automatic removal of entries from the top-level BDII.

*Update Mar 3rd*
Being handled in the ROC managers meeting, closing here. (go to action)

2008-03-04 edit
Main.OCC(Antonio) 2008-02-04 Ensure instructions for publishing storage space reaches sites (ATLAS)

*Update Feb 1st* : tickets GGUS:32064 (ROC UKI), GGUS:32065 (ROC Russia), GGUS:32067 (ROC DECH), GGUS:32068 (ROC AP), GGUS:32070 (ROC France) submitted to track the issue

*Update Feb 13rd* :
GGUS:32064 (UKI) --> in progress
GGUS:32065 (ROC Russia) --> open. don't allow of queryconf
GGUS:32067 (ROC DECH) --> in progress
GGUS:32068 (ROC AP) --> solved
GGUS:32070 (ROC France) --> child tickets to sites GGUS:32071, GGUS:37072
- GGUS:32072 waiting for reply from Atlas with list of addresses

*Update Feb 18th*:
Insure instructions reaches sites about publishing storage... Lots of tickets submitted, close the item. (go to action)

2008-02-25 edit
Main.OCC(Antonio) 2008-02-04 Request all LHCb sites to provide a detailed SRMv2 status page

*Update Feb 1st* : Find it in the minutes

*Update Feb 11th* :production sites seem in general not available to provide what requested. The GGUS ticket GGUS:31800 has been set to 'unsolved' and the the issue is being tracked with
http://lblogbook.cern.ch/CCRC08/38 (go to action)

2008-02-13 edit
GridView 2007-12-10 What are the implications of no SAM test results at a site for >24 hours? How does it affect availability/reliability calculations?

*Update 11th Dec:* Gridview team responded, added to next weeks agenda (go to action)
2008-02-04 AntonioRetico   edit
Main.OCC 2007-12-10 GGUS:29208 has been open for a number of weeks without comment.

*Update Dec 10th* Will be raised at EMT

*Update Dec 13th* Now a confirmed BUG:32078. Already fixed for an upcoming release. (go to action)
2008-02-04 AntonioRetico   edit
Main.OCC 2007-12-17 SRM sam tests only run once every two hours. Can this be increased to every hour?

*Update Dec 12th* SRM tests are now running once an hour.\ (go to action)
2008-02-04 AntonioRetico   edit
Main.OCC 2007-12-17 Any component which goes straight from certification to production, missing out testing in the PPS, should have this clearly stated in the release notes.

*Update Dec 13th* This has been discussed with the Integration \& Deployment team who agree to include this information in the release notes from now on. (go to action)
2008-02-04 AntonioRetico   edit
Main.all, Main.ROCs 2008-03-15 Request to Atlas sites to upgrade WNs to SL4

*15th Feb:* broadcast sent

*Update Feb 18th*: Request atlas sites to upgrade WN. Broadcast sent , leave open for a bit, deadline was the 15th March. Review 2 weeks before this.

*Update Mar 3rd* Steve to produce data of queues by OS.
http://straylen.web.cern.ch/straylen/tmp/atlas-gluece-by-os.txt

*Update Mar 10th* From Steve:
http://straylen.web.cern.ch/straylen/tmp/atlas-sites-by-os.txt

*Update Mar 12th* Steve to create a finer report preferable by ROC, ... ( if only that were possible. Maybe via SAM DB)

*Update Mar 19th* Reminder to all sites, time is running out...

*Update Mar 31st* From ATLAS (Alessandro): we have developped a SAM test to see which version of lcg-utils has been installed on the WN of the ATLAS supporting sites. The results can be seen in the sam web page, selecting ATLAS VO, CE, CE-sft-lcg-version. The sites that give ERROR in this test didn't upgrade to the SRM2 compatible version of lcg-utils.
Hope this could help in following the action of having, in all the ATLAS supporting sites, the WN upgraded to SRM2

*Update Apr 21st* A GGUS ticket should be opened against all ROCs to follow-up this issue with sites. Nick knows how to clone a ticket...

*Update May 5th* As soon as ATLAS can confirmed that they've opened a GGUS ticket (cloned for all ROCs), we can close this item.

*Update May 19th* This action can be closed (go to action)

2008-05-22 edit
Main.all, Main.ROCs 2008-03-30 Request to Atlas sites to increase the shared sw installation area to 100 Gb

*15th Feb:* broadcast sent

*18th Feb:* Raised at operations meeting, too soon after broadcast for any feedback.

*19th Mar:* Ongoing, but not obvious how to check compliance.

*31 Mar:* Ongoing. ATLAS will look into building a SAM tests.

*Update Apr 21st* A GGUS ticket should be opened against all ROCs to follow-up this issue with sites. Nick knows how to clone a ticket...

*Update 5th May* Now a GGUS ticket, closing. (go to action)

2008-05-06 edit
Main.ROCs 2008-03-10 Is there a need to include DPM Oracle in the gLite distribution alongside DPM MySQL?
ROC Managers to check with their respective sites.

*Update 3rd March*
Closed as being tracked by ROC managers. (go to action)

2008-03-04 edit
Main.ROCs 2008-03-10 Input for consolidated prioritization of 64-bit porting of gLite components is requested. Feedback to Oliver Keeble, please.

*Update 12th March*
Received feedback from Italy, Southwest, and a few others. Close action. (go to action)

2008-03-12 edit
OliverKeeble 2008-03-10 Consolidated prioritization list for 32-bit releases will be provided by Oliver.

*Update: 3/3/08*
Oliver added a "priorities" section to the Node Tracker page:
https://twiki.cern.ch/twiki/bin/view/EGEE/Glite31NodeTracker

Closed. (go to action)

2008-03-04 edit
EgeeOCCGroup 2008-03-31 Broadcast that gLite 3.0 lcg-RB should henceforth be considered obsolete and unmaintained. It is replaced by WMS (preferably on SL4). Include link to user documentation in the broadcast.

*Update: 3/3/08*
Announcement should be made at the time of the release of the WMS/LB on SL4 (TBD), saying support will be dropped for the lcg-RB in 2(?) months.

*Update 12 March 2008:* No change.

*Update 31 Mar:* The RB will be obsoleted once the SL4 version of the WMS is available.

*Update 17th Apr:* Will be released in two months, closing. (go to action)

2008-04-17 edit
Main.SAM, Main.team 2008-05-27 Need to consider what SAM, alarm system and CIC portal should do mitigate against a high load CE.

*Update March 12th*
Had discussion with Ulrich and also submitted a new test for sam, it needs some
thought as to if it is a good idea but it would be a non-critical test on the
"GlueCEStateStatus: Production" attribute that the then critical CE tests would depend on.
The same logic as the existing SE free space tests.
https://savannah.cern.ch/bugs/?34443

*Update 31 March:* Request for new SAM sensor passed to SAM team.

*update 21 April:* On SAM work-list (Savannah). John thought that the item could be closed as far as the ROC managers are concerned, but Kostas was worried that the issue risked being forgotten. He suggested the possibility of a pending state for items that get transferred to other tracking mechanisms. Nick will think about it.

*Update 5th of May* Now on the SAM worklist. Nothing changed for now, ignore for 3 weeks.

*Update 2nd of June* No progress recorded.

*Update 11th June* This is present as BUG:34443 anyway so close here. (go to action)

2008-06-11 edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
Main.OCC 2007-03-05 Extract from the information system the list of WMS 3.0
Update from Steve:
Does not look too bad, this is only those who are publishing at all.

Those with old WMS (SL3 in fact)

EENet (Estonia)
ITEP (Russia)
RTUETF ( Latvia)
UNI-FREIBURG (Germany)

Those with new WMS (SL4 in fact)

AEGIS01-PHY-SCL
Australia-ATLAS
BY-UIIP
CERN-PROD
CESGA-EGEE
CGG-LCG2
CNR-PROD-PISA
CY-01-KIMON
CYFRONET-LCG2
DESY-HH
FZK-LCG2
GR-01-AUTH
GRIF
HG-06-EKT
INFN-CNAF
INFN-PADOVA
ITEP
JINR-LCG2
KR-KISTI-GCRT-01
NCP-LCG2
pic
prague_cesnet_lcg2
RAL-LCG2
RO-03-UPB
RTUETF
RU-Phys-SPbSU
ru-PNPI
SARA-MATRIX
Taiwan-LCG2
TR-01-ULAKBIM
UKI-SCOTGRID-GLASGOW
Uniandes
VU-MIF-LCG2

Note there may well be other WMS not included by siteBDIIs out there we know nothing about.

Update 10/9/08: The four sites running WMS on SL3 were asked to upgrade ASAP. (go to action)

2007-03-06 SteveTraylen edit
Main.OCC 2007-03-05 Example Action Item (go to action) 2007-03-06 SteveTraylen   edit
SteveTraylen 2008-05-26 The T0 FTS server has configured 0 retries by default, while T1s have 3 retries by default. This complicates Atlas workflow, if a transfer fails, we try to find another source with the same file. Could we have 0 retries in all FTS servers at T1s (this affects all channels, all VOs)? What is the position of the other LHC VOs?
- Not a problem for LHCb
- Ron (SARA): I thought this could be set up per channel, per VO agent. To be checked with Gaving & co

* Answer from Gavin:

The ‘retry’ count is a VO policy, so needs to be set in the relevant VO agent config for the FTS server (the default is 3 retries separated by minimum 10 minutes).

I know CMS’ Phedex prefer to fail-fast (and see the error as early as possible), so have asked T1 sites to set the retry to 0. Phedex then retries externally (i.e. with another FTS jobs for the failed files).

LHCb and ALICE I think are still set to the default.

See: https://twiki.cern.ch/twiki/bin/view/LCG/FtsYaimValues20

Contact fts-support@cernNOSPAMPLEASE.ch is case of problems.

*Update June 11th* Steve should submit tickets to all FTS sites.

*Update June 13th* GGUS:37415 submitted and child tickets sent to ROCs of each Tier1.
Review in two weeks time.

*Update June 20th* GGUS:37415 has been responded to by all FTS instances that
the changes have been made.... Except for:

For USCMS-FNAL-WC1 in GGUS:37428
For BNL-LCG2 in GGUS:37427

Both will be contacted again this week.
* Update June 30th* Steve will escalate, two U.S. sites are problematic.

* Update July 7th* BNL and Fermi have now responded that they made the
configuration change. Action item to be closed after next meeting. Steve (go to action)

2007-03-06 SteveTraylen edit
SteveTraylen 2009-02-02 Check VO-card setting for WN local disk space requirements for all HEP VOs.

Reviewed 3rd February, Alice = 10 GB, Atlas = 15 GB, CMS = 10GB, LHCb = 2 GB.

All LHC VOs specify values for WN disk space. Close this action, if you see particular
VOs exceeding this then submit GGUS tickets for the VO.

Close this after next operations meeting. (go to action)

2009-02-12 edit
Main.LHCb 2008-03-17 LHCb and Kostas to contact one another about middleware version tickets within SouthEast region.


** solved:
LHCb runs a custom SAM test that checks the version of lcg_utils and spots out sites with obsolete version installed.

The person in LHCb following these tickets submitted twice 24 tickets for 24 different sites because his first attempt (using mail ticketing system of GGUS) failed to return the GGUS reference. For your information this problem was due to a missed mapping of the submitters mail address (used by GGUS for submissions of tickets via mail) and his certificate. (go to action)

2008-03-05 edit
Main.Marcin 2007-03-19 Marcin to produce a list of examples where a site failure is attributed to a central service failure.

*Update 19th March*: Marcin supplied some examples. Problem is well understood, solution is less obvious. John to work with SAM & GridView team. (go to action)

2008-03-21 edit
Main.SAM 2007-03-19 Sam team to investigate promptly the BDII2SRM script to recognise GlueServiceType/Version SRM/1.10 correctly. GGUS:33726, BUG:31940

*Update 13th March 2008*
!BDII2SAM script now fixed, action should be closed following next meeting.

*Update 31 March:* Script is fixed. Close. (go to action)

2008-04-02 edit
GridView 2007-04-27 Please look into GGUS:33850 concerning transparent downtimes affecting site availability.

*update: 20/3/08* GridView team has fixed the bug (CVS tag gridview-synchronizer-20080318).

*update: 7/4/09*: ticket and action re-opened because also gstat needs a fix. (go to action)

2008-03-21 edit
SteveTraylen 2008-04-14 Define with gstat (roc-dev@listsNOSPAMPLEASE.grid.sinica.edu.tw) the new value to be set in the list of allowed OS describing the Scientific Linux 5 run at the site

Update 17th April
Min looking at it but the site should really submit a ticket. As per the instructions on
http://goc.grid.sinica.edu.tw/gocwiki/How_to_publish_the_OS_name (go to action)

2008-05-06 edit
Main.OCC 2008-06-10 Check with LHCb he status of the development of Dirac3 (version of the submission engine interfaced to WMS)

Update 17th April
Will be released in at least 2 months, close action item for now. (go to action)

2008-04-17 edit
AndreaSciaba 2008-04-21 Verify and document in the User Guide the option to configure the GFAL client to use multiple BDIIs

Update 17th April, Maite will check.

Update 19th May, Andrea changed this on the same day the action was raised. This action can be closed. (go to action)

2008-05-22 edit
Main.UKRoc 2008-04-14 Clarify the scope of the issue reported in WlcgOsgEgeeOpsMinutes2008x04x07 about continuous certificate requests. Is it a general comment or related in particular to the CIC portal?

Update 17th April, Gilles has done something. (go to action)

2008-04-17 edit
Main.CERNROC 2008-05-13 Follow up with YerPhi site to resolve or suspend site.

Update 5th May.
CERNROC to provide an update next week once they have been CIC on duty for a
week and cleaned everything up.


Update 9th May.
!YerPhi is now in state suspended and all existing COD tickets have been closed.
This item should be closed after next week's meeting.

Update 19th May, this action can be closed (go to action)

2008-05-22 edit
Main.Atlas 2008-05-13 Atlas to provide details of tests they are running. Atlas have provided the name of the test. CE-sft-vo-swspace . This item should be closed next week.

Update 5th May: Small amount still to do but progress has been made. Revisit next week.

Update 19th May: this action can be closed (go to action)

2008-05-22 edit
SteveTraylen 2008-05-27 While GFAL works with a multi-valued LCG_GFAL_INFOSYS variable there other bits of software that may not. e.g. glite-service-descovery, lcg-infosite, lcg-info,... . These need all to be checked for their support level. Currently assigned to Andrea but someone else should really do this...(Perhaps Steve?)

*Update Monday June 2nd*
No progress made.

*Update Friday June 6th*
3 pieces of software identified as affected.
lcg-infosites BUG:37572 , lcg-mon-stdout BUG:37571 and glite-sd-query BUG:37569

All are submitted as bugs and this item should be closed after the next monday meeting.
Steve

*Update June 11th* Savanah tickets all submitted, close here. (go to action)

2008-06-11 edit
NicholasThackray, AntonioRetico 2008-06-06 Check with CMS VO Cards about WMS and Pool account support.

Update 28-May-08: CMS confirms that the use of pool accounts for SGM has proved to be not working in many cases. The main problem is that the acls on the files are set by users and, if different accounts are used, one software manager could act (e.g. uninstall) packages installed by another.

On the other hand this is not relevant for WMS, where the distinction of pools per VO is not needed.

The conclusion is that the recommended configuration of the accounts has indeed to be different between CEs and WMS, at least as far as CMS is concerned.
As far as I am concerned this action can be closed.

Antonio (go to action)

2008-06-02 edit
SteveTraylen 2008-06-06 Check if CRL lifetimes are monitored anywhere?

*Update 2nd June*
From Romain: There is a SAM test called "CE-wn-sec-crl". General results are public, but detailed results are available only to the ROC security contacts + SAM team.

Follow up question for Romain. CE-wn-sec-crl monitors CRL status on the WNs them self.
What was being asked for was central monitoring of the CA's CRL URLs.... It would make
for an easy rrd plot.

*Update 9th June*
There is central monitoring of CRLs here http://nagios.eugridpma.org/.
Also I have requested that the WLCG Monitoring Group considers getting these
to sites via its alarm/nagios/messaging framework. BUG:37632.

*Update 11th June* Ask the CODS to look at the nagios alarms for CRLs twice
a week. (go to action)

2008-06-26 edit
SteveTraylen, EgeeSiteRepsGroup 2008-06-06 Look into why LHCb's files in /tmp are being deleted.

The reason is that python's tarfile unpacks files with a --preserve-atime so the
files are old as far as tmpwatch is concerned. A way forward is being discussed.

Update 2-Jun: discussion in minutes --> closing (go to action)

2008-06-02 edit
MaiteBarroso 2008-06-06 Check with Gridview/SAM if the tier1 availability for 20th -> 26th May can be recalculated given the failure of the ATLAS sam UI from 20th to the 26th May. Assigned to Maite for now.

Update 2-Jun: Discussed in minutes --> closing (go to action)

2008-06-02 edit
JeremyColes(UKI) 2008-06-09 follow-up reported site UKI-LT2-QMUL (transferred to Political Instance by COD on 2-Jun-08).

30/6/08 - Jeremy reckoned this action can be closed. (go to action)

2008-07-01 edit
RonTrompert(NE) 2008-06-09 follow-up reported site VGTU-gLite (transferred to Political Instance by COD on 2-Jun-08)

30/6/08 Site is still failing SAM tests and should be suspended.

3/07/08 Ron reported that the site has now reacted and fixed the situation, close this action after the next operations meeting. (go to action)

2008-07-08 edit
Main.ROC_France 2008-06-09 follow-up the following issue reported by ROC France: With our UIs we got some problems with Python for several VOs because those VOs use their own Python version (> 2.3.x). Unfortunately, UI installation provides standard python2.3 libraries within the externals directory, and set the PYTHONPATH accordingly. By the way, to be able to use their own python installation, VOs must convenably update the PYTHONPATH variable to ensure that the right version of the required libraries are firstly taken into account. Make sure also that you call the right python binary

*Update 11th June* Nick will look into this.

*Update 21st June* Waiting for Nick

*Update 28th July* Response from SA3 - _The tarball is produced to work with SL4, so python 2.3 has to be the default. To fully support python 2.5 (for example), you need to distribute the interpreter, reconfigure the environment and, ideally, have all your language extensions recompiled against the new python API. We are looking into how to do the last part, but the first two things are up to the site or VO.

Update 11th August this was raised by a VO in France, is the answer given by SA3 OK? how do we move from here? Helene will pass the feedback to the relevant people
Update 1st Septemberthe action can be closed; finally the real solution was in a savannah bug and it was a problem with the YAIM environment (go to action)

2008-09-01 edit
JudiNovak 2008-06-09 Modify the SAM unavailability list on twiki adding a section for availability of clients run by the VOs

*Update 11th June* This will be followed up with John and Judit immediately after the meeting on the 9th. (go to action)

2008-06-26 edit
SteveTraylen 2008-06-18 Check GGUS:36373 concerning advertising queues for production roles only.

July 2nd, got in touch with the job priorities work group to see if they can give a good
example of what should be done.

July 7th, there is now a massive thread which does contain the answer. The answer must
be extracted and documented next.

July16th. I've now written How_to_publish_queues_with_access_restricted_to_a_FQAN but it is unclear
to me what is wanted from the tickets that are assigned to me.

*Update 21/7/08* Now that Wiki page exists, Steve would like to close this item. Any problems should result in new tickets! (go to action)

2008-07-21 edit
Main.SAM 2008-06-30 Upgrade lcg-utils on SAM submission host.

Latest version of lcg_utils installed in SAM validation testbed & used against this site.
Previous version failed with: protocol not supported by Storage Element
Latest version fails with: CGSI-gSOAP: Error reading token data header: Connection reset by peer

Problem seems to be with the data that the site provides to the Information System. (go to action)

2008-07-08 edit
SteveTraylen 2008-06-30 Submit somewhere request for better downtime publishing as proposed by atlas sometime ago.

*Update 1st July* https://savannah.cern.ch/support/?104871 now submitted.

Steve (go to action)

2008-07-08 edit
JohnShade 2008-07-10 CIC portal uses the security certificate of a different site. Cyril will follow-up. John will submit a GGUS ticket.

Update: GGUS:38050 Problem is that Firefox doesn't recognize the French Certificate Authority. Solution is simply to define an exception in Firefox. (go to action)

2008-07-03 edit
SteveTraylen 2008-07-10 Steve to look at GGUS:37334 and escalate to someone.

2nd July - This may be resolved by a fix to BUG:37008, waiting for clarification.

8th July - Mentioned in the EMT yesterday, the fix is in an upcoming patch and
also a bug will be submitted to link to it. Add bug before next week and close.

14th July - Bug now submitted BUG:38820 . As I understand it this already fixed in
an upcoming release. Close the action here after today's meeting since the BUG is
now present. (go to action)

2008-07-14 edit
Main.UKRoc 2008-07-10 UK/I ROC to look at GGUS:37890.

*14th July 2008* Jeremy will take a look.

*21st July 2008* BUG:38320 which was the related item is now fixed and closed.
This item is, consequently, also closed. (go to action)

2008-07-21 edit
SteveTraylen 2008-07-10 Steve should submit a GGUS requesting that gstat monitors for LFCs not publishing as compared to GOCDB.

*2nd July* GGUS:38053 now submitted, leave action item until a response is given. (go to action)

2008-07-08 edit
Main.OCC 2008-09-01 Follow up on GGUS:34338. Concerns gstat sanity error at FNAL.

Update 8th August. Solved, as was suggested in April the CEs should not present in the GOCDB. GOCDB contains a list of EGEE siteBDIIs and services not under those siteBDIIs are not at the site as far as GOCDB/gstat or EGEE is concerned. Steve.

Close this action after next meeting. (go to action)

2008-08-11 edit
Main.OCC 2008-08-18 SAMAP is giving critical errors rather than warnings when sites do not update their CA RPMs 7 days prior to the deadline for update.

Update 25th August SAMAP will follow-up "later"
Update 8/9/08: Nick will follow-up.
Update 13th October The tool development team has fixed the bug. (go to action)

2008-10-17 edit
Main.OCC 2008-08-25 Find the probable release date of the CREAM CE.

Update 25th August: This will be released in the next update to gLite 3.1 - within 1-2 weeks.
1st September After teh update at today's meeting, this action can be closed: the EMT made the decision to delay the deployment of the CREAM CE (the certified patch). This is because not-ICE-enabled WMS could accidentally match the Cream CE and cause a submission failure. Waiting for the ICE-WMS to be deployed, as a workaround, Cream will be released with a GlueServiceStatus?? = ‘Production’, to be changed again later. One issue is represented by the old version of WMS on SL3 (unsupported). As they will not be integrated with ICE, once the Cream CE will be advertised again in real production mode, they would fail to submit. In order to size this issue up we would like to get from the WLCG EGEE Operation Meeting an estimation of the number of old SL3 WMS still in production. (go to action)

2008-09-01 edit
Main.OCC 2008-08-18 Make the owners of the CA RPM release process aware of the issues raised by ROC France.

Update August 19th: Maite has some news?

Update 1st September SAM agrees to extend the 7 days period in this specific case: the CA RPMs are not put in the repository in the 1 day scheduled for this. Technically it is feasible and already implemented. See diagram and explanations here:
https://twiki.cern.ch/twiki/bin/view/LCG/SAMSensorsTests#CE_sft_caver

Shorty, the diagram shows that it is possible to configure:
- time-stamp from which countdown of timeout starts
- delay of warning
- timeout before sites will get CRIT error

Update 10/9/08: Although Nick doesn't understand the text, he said that the ticket can be closed (SAM implemented what was asked). (go to action)

2008-09-12 edit
Main.ATLAS 2008-08-18 ATLAS to submit a GGUS ticket detailing the problems of slow response of the GOC DB seen in the evenings.

Update 19th August - Slow response no longer obvious, will close and reopen if need be. (go to action)

2008-08-19 edit
Main.OCC 2008-08-18 Submit a request to the FTS developers to provide suitable information providers for publishing the FTM end-points.
August 8th 2008 , bug now submitted GGUS:39906

The action can be closed (go to action)

2008-08-11 edit
Main.Alessandro_di_Girolamo 2008-08-18 Send details to Maite (Maria . Barroso . Lopez @ cern . ch) and Jeremy Coles (j . coles @ rl . ac . uk) of who to contact regarding the agenda of the ATLAS events at CERN during the week 25-29 August. (go to action) 2008-08-19 edit
NickThackray 2008-08-11 On the request of LHCb, escalate the bug BUG:39641 at the EMT.

the action can be closed (go to action)

2008-08-11 edit
SteveTraylen 2008-08-20 Check if KCA is still needed in the lcg-CA CA set.

Was raised at this week's LCG MB. Fermilab representatives are checking internally if it is still needed.

Update 19th of August: The LCB MB meets today and this will hopefully be resolved.

Update 25th August: The KCA will soon be officially approved as a trusted CA. Also, it is being used by the CDF VO. Therefore, KCA will remain in the list of CAs. (go to action)

2008-08-27 edit
Main.OCC 2008-09-01 Remind sites that the shared area is a critical service.

Update 10/9/08: Nick sent an EGEE broadcast about this. In fact, he sent two, to explain that this only concerned sites supporting VOs that had explicitly mentioned the shared software area in their respective VO cards. (go to action)

2008-09-12 edit
Main.OCC 2008-11-17 OCC to put an enhancement request into the GOCDB and CIC Portal for the following:
EGEE downtime announcement procedure:
1. Announcement of scheduled downtime with a mail "Announcement" at least 24h in advance as in the MoU.
2. Start of downtime (scheduled and unscheduled) as of the time when it starts with a mail "Start" (with correct time!)
3. End of downtime: mail"End" (with correct time)
Update 3 Nov: OCC has entered this enhancement request into the GOC DB "shopping list" in Savannah (https://savannah.cern.ch/support/?105977).
Close the item. (go to action)
2008-11-07 edit
Main.LHCb 2009-11-10 From UK/I but general, why are so many sites failing LHCb SAM tests. Please can LHCb give a summary. Roberto Santin. will check again on 11.11.2008. (go to action) 2008-11-19 edit
JohnShade 2009-11-10 Check there is progress on GGUS:42341

Update: Current monitoring architecture does not allow one service to be on multiple nodes/sites. A new database schema is being worked on (based on service end-points), but the restriction of binding a service to a specific site will probbaly remain. (go to action)

2008-11-12 edit
RocNorth 2009-11-10 Check IPTA-LCG2 for progress on GGUS:42015. This now led to suspension action on 2008-11-10. Closing this one. (go to action) 2008-11-12 edit
RocRussia 2009-11-10 Check RU-Phys-SPbSU for progress on GGUS:40521. RocRussia did suspend. (go to action) 2008-11-12 edit
DianaBosio 2008-12-08 Follow-up with Beijing site as per GGUS:40700 (go to action) 2008-11-19 SteveTraylen   edit
Main.ROC_NE 2008-12-08 Suspend ITPA site as per GGUS:42015.

*Update 17th November*, site has now responded.
North East ROC to respond next week.

*Update 27th November*, site has now corrected the problem, closing this. (go to action)

2008-11-27 edit
MariaDimou 2008-12-08 Follow-up escalation as per GGUS:42981. LCG CE update 33: thousands of defunct globus-gma (go to action) 2008-11-19 edit
MariaDimou 2008-12-08 Follow-up escalation as per GGUS:42999. WMS 3.1 update 34: ISM stops working (go to action) 2008-11-19 edit
NicholasThackray 2008-12-02 Nick to add hyperlink to agenda and minutes template for the Alcatel meeting call back.

Update: link was always there, but now uses a font for the blind. (go to action)

2008-12-02 edit
Main.OCC 2009-01-31 OCC to send broadcast to sites requesting to upgrade the GFAL version so it is higher than 1.10.6
More details about the issue can be found here: https://gus.fzk.de/ws/ticket_info.php?ticket=43994

Update 19/1/2009: Biomed GFAL version problem, Maite will send broadcast after the meeting (seems some sites are still on SL3 and need to upgrade the O/S as well as GFAL!)

Update 26/1/2009: no broadcast seen, OCC to follow up.

Update 2/2/2009: broadcast not sent, problem being followed up with sites through GGUS. Agreement to close item. (go to action)

2009-02-03 edit
Main.Akos 2009-01-31 The Data Management team (Akos) to provide a version of the LFC without list replica (related to the old GFAL version problem reported by Biomed)

Update 19/1/2009: (mail from Akos):
We have examined the issue and it does not look like a security problem, but a resource limitation: the number of threads in an LFC instance limits the number of clients that can connect concurrently and the Biomed usage patter exceeds that limit.
When the clients would finish their work, LFC would be responsive again.

The same problem would occur with other iterator like operations, like opendir/readdir/closedir.

Removing these operations would cause old clients to fail, however it would not solve the problem, so in my opinion the upgrade of lcg_utils is the right solution.

Unfortunately nobody has contacted us from the Biomed community regarding the possibility and context of a special build, so we did not progress on that side.

Update 26/1/2009: Can be closed. (go to action)

2009-01-27 edit
Main.Biomed 2009-02-28 Long term solution to the old GFAL version problem reported by Biomed: develop VO specific SAM test to detect this, and then exclude the sites with the wrong version

Update 19/1/2009: Long-term solution could be SAM tests, or adding GFAL version collection to job-wrapper scripts. (go to action)

2009-01-27 edit
Main.SAM 2009-01-31 SAM and Atlas (Alessandro) to get together to understand how SAM-Atlas deals with sites with no close SE defined and see if this can be used in SAM-operations

Update: 19/1/2009:
The outcome of the get-together was:

>> Not having SE affects on passing by site RM SAM tests - those tests take closest SE (default).
This is incorrect – the defined SE doesn’t have to be at the site!

>> Also setting up site in such situation is not possible because yaim require SE.
Correct, but again the SE doesn’t have to be local to the site.

>> In case of putting SE in Scheduled downtime, site have to put also CE into downtime (otherwise will not pass RM tests) or chose (lack in procedures) other SE (from other site).

This is correct, and the only real issue. ATLAS doen’t use Replica Management tests, but believe that they should be part of the ops infrastructure tests (which are more extensive). There may be a case for making the replica management tests non-critical, but they’ve been critical for two years now and most people seem happy with this.

The way for a site to change the defined SE is to modify the variable VO_OPS_DEFAULT_SE in the WNs’ site-info.def files. (go to action)

2009-01-27 edit
Main.CERN-ROC 2009-01-31 Check of existing cases of sites only hosting core services, without site services. This is to support a new site RedIRIS in SWE ROC

Update 19/1/2009: CERN ROC to check sites with only core services – no progress.

Update 2/2/2009: New SWE site RedIRIS will only host core services (BDII, WMS, etc.)

Problems until now:

1) GIIS performance error due to: GIIS Old Entries Found: 6 - ERROR
- This will make the SAM test gperf fail.

2) No Grid Version published: GridVersion: *NOTE* could not find valid LCG version
- This ist just a warning in GSTAT at this moment

The other tests seem to work only the gperf error is critical.

Update 12th February - Steve will take a look to understand what this is about.

Update 19th February - Steve - Confused , there is no RedIRIS site in gstat? http://gstat.gridops.org/gstat//SouthWesternEurope.html

Update at the meeting - Kai will check.

Update 3rd March The gstat errors are caused by the WMS publishing only static information. The new info provider just release publishes dynamic information
so this will fix itsef. (go to action)

2009-03-03 edit
NickThackray 2009-02-09 Ask SA3 for a list of library packages needed for 32 to 64-bit migration.

*Update 12th Feb* - There is no list. There is a list of per VO on the VO Cards, we may try and produce a common list. What next?

*Update at the meeting* - VOs will definitely have to maintain a list of the libraries they need, in their VO ID card.
Item closed. (go to action)

2009-02-27 edit
AllROCs 2009-02-09 All T1s to check and update list of FTM end-points. To be sent to Nick

12th Feb, TWiki page has been created. (go to action)

2009-02-12 edit
Main.ROCSE 2009-02-27 RO-03-UPB has been been escalated to operations meeting for possible suspension ROC: SEE; GGUS:45038

Feb 23rd Ticket is now solved, site in quarantine. Close this action item after next operations meeting. (go to action)

2009-02-27 edit
Main.Nick 2009-02-27 Nick to check on CE status with respect to gLite 3.0/3.1 and Condor.

Update at the meeting - Confusion as to what this action was about. Whether Condor is supported on the gLite 3.1 LCG CE. Nick will follow up.

Update 27 Feb 09 - In theory the gLite 3.1 LCG CE should support the Condor batch system. Instructions on how to set it up are here: https://twiki.cern.ch/twiki/bin/view/EGEE/BatchSystems. If a site has problems, please submit a GGUS ticket and CC neissner@picNOSPAMPLEASE.es.

Update 9 Mar 09 - closing (go to action)

2009-03-10 edit
Main.!JohnShade 2009-03-09 When an individual service at a site is marked as "not in production" in the GOCDB, but the site is "in production", SAM continues to test the service. This is not the intended functionality. Check if there is a bug outstanding on this already, and if not, create one.

Update 27/2/09: It turns out that GridView does not synchronize on that particular GOCDB field, so it isn't available to SAM. The recommended workaround is to create a scheduled downtime - tests will still run, but no tickets will be raised. The requested functionality will be in the new Aggregated Topology Provider, and GOCDB will have a production attribute associated with each service.

Update 9/3/09 - closing during meeting (go to action)

2009-03-10 edit
AntonioRetico 2009-03-16 Follow-up with EMT the re-prioritisation of PATCH:2784 and possibly increase it to high

UPDATE 20-Mar: I discussed with the EMT Coordinator. There is a long list of services which now have priority both in certification and release preparation. They welcomed the idea of a pilot service to be run at some sites, which by providing real-usage records would help them making the certification faster. I am preparing a request for sites to join this activity to be presented at the next OPS meeting. (go to action)

2009-03-18 edit
Main.All 2009-03-23 Note all problems linked to CERN outage of the 19th

Update 23/3/09: other than some SAM alarms due to temporary glitches with the central LFC and Top-level BDII, no problems were noted. (go to action)

2009-03-24 edit
AntonioRetico 2009-03-30 Check with EMT about plans for FTS with credentials.

Last meeting agreed to close this item, but it was not done at the time. (go to action)

2009-04-20 edit
DianaBosio 2009-03-30 Check validity of CERN & FNAL FTM points advertised in Wiki.

UPDATE 25/3/2009
Two GGUS tickets have been open
for CERN 47367
for FNAL 47368

Update 20/4/09: This should have been closed last week; issue being tracked in GGUS. (go to action)

2009-04-20 edit
Main.OCC 2009-04-27 Check with the GOCDB if the RSS feed is updated when the downtime is modified (extended or shortened)

Update 20/4/09: Nick to check with Gilles (but he thinks that the answer is no).

Update 27/4/09: An update has been given to Nick by Gilles, this will be added here.

18/05/09: An RSS notification is sent by the Operations Portal whenever there is a change to a down-time (see minutes of today's meeting for more details). (go to action)

2009-05-22 edit
NickThackray 2009-04-27 Check with Romain impact of OSCT duty contact being different to that of the COD schedule

Update 27th April. A timetable will be provided by OSCT.

Update 4th May: No impact. OSCT will provide a time table, carrying on the old schedule from the COD to be applied to the OSCT till teh end of EGEE III. (go to action)

2009-05-05 edit
SteveTraylen 2007-07-13 What installed capacity should be published for sites with only storage.


31st August - ROC should go ahead and certify the site once happy. Its a valid configuration and any problems in the monitoring or similar should be fixed.
At the moment none are expected.
(go to action)

2009-08-31 edit
NickThackray 2009-07-20 Check whether CERN's Quattor templates for VOMS could be useful to LAL

This wasn't done fast enough to be useful, so closing the action. John 25/8/09 (go to action)

2009-08-25 edit

Creation of New Minute Skeleton from Template.

To create a minutes page for a particular date edit the box below and submit.

Chair
Minute Taker
Indico Id
New Minutes Page (date format is YYYYxMMxDD)
 

The template used for new minutes is WlcgOsgEgeeOpsMinutesTemplate. Any changes made to the template will only influence new minutes.


Page settings.

These page can only be changed by:

Edit | Attach | Watch | Print version | History: r44 < r43 < r42 < r41 < r40 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r44 - 2010-02-08 - unknown
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EGEE All webs login

This site is powered by the TWiki collaboration platform Powered by Perl This site is powered by the TWiki collaboration platformCopyright & by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Ask a support question or Send feedback