This is the list of Actions for the WLCG EGEE Operations meeting.

Assigned to Due date Description State Closed Notify  
Main.ROCs 2008-03-10 Is there a need to include DPM Oracle in the gLite distribution alongside DPM MySQL?
ROC Managers to check with their respective sites.

*Update 3rd March*
Closed as being tracked by ROC managers.

2008-03-04 edit
Main.ROCs 2008-03-10 Input for consolidated prioritization of 64-bit porting of gLite components is requested. Feedback to Oliver Keeble, please.

*Update 12th March*
Received feedback from Italy, Southwest, and a few others. Close action.

2008-03-12 edit
OliverKeeble 2008-03-10 Consolidated prioritization list for 32-bit releases will be provided by Oliver.

*Update: 3/3/08*
Oliver added a "priorities" section to the Node Tracker page:


2008-03-04 edit
EgeeOCCGroup 2008-03-31 Broadcast that gLite 3.0 lcg-RB should henceforth be considered obsolete and unmaintained. It is replaced by WMS (preferably on SL4). Include link to user documentation in the broadcast.

*Update: 3/3/08*
Announcement should be made at the time of the release of the WMS/LB on SL4 (TBD), saying support will be dropped for the lcg-RB in 2(?) months.

*Update 12 March 2008:* No change.

*Update 31 Mar:* The RB will be obsoleted once the SL4 version of the WMS is available.

*Update 17th Apr:* Will be released in two months, closing.

2008-04-17 edit
Main.all, Main.ROCs 2008-03-30 Request to Atlas sites to increase the shared sw installation area to 100 Gb

*15th Feb:* broadcast sent

*18th Feb:* Raised at operations meeting, too soon after broadcast for any feedback.

*19th Mar:* Ongoing, but not obvious how to check compliance.

*31 Mar:* Ongoing. ATLAS will look into building a SAM tests.

*Update Apr 21st* A GGUS ticket should be opened against all ROCs to follow-up this issue with sites. Nick knows how to clone a ticket...

*Update 5th May* Now a GGUS ticket, closing.

2008-05-06 edit
Main.all, Main.ROCs 2008-03-15 Request to Atlas sites to upgrade WNs to SL4

*15th Feb:* broadcast sent

*Update Feb 18th*: Request atlas sites to upgrade WN. Broadcast sent , leave open for a bit, deadline was the 15th March. Review 2 weeks before this.

*Update Mar 3rd* Steve to produce data of queues by OS.

*Update Mar 10th* From Steve:

*Update Mar 12th* Steve to create a finer report preferable by ROC, ... ( if only that were possible. Maybe via SAM DB)

*Update Mar 19th* Reminder to all sites, time is running out...

*Update Mar 31st* From ATLAS (Alessandro): we have developped a SAM test to see which version of lcg-utils has been installed on the WN of the ATLAS supporting sites. The results can be seen in the sam web page, selecting ATLAS VO, CE, CE-sft-lcg-version. The sites that give ERROR in this test didn't upgrade to the SRM2 compatible version of lcg-utils.
Hope this could help in following the action of having, in all the ATLAS supporting sites, the WN upgraded to SRM2

*Update Apr 21st* A GGUS ticket should be opened against all ROCs to follow-up this issue with sites. Nick knows how to clone a ticket...

*Update May 5th* As soon as ATLAS can confirmed that they've opened a GGUS ticket (cloned for all ROCs), we can close this item.

*Update May 19th* This action can be closed

2008-05-22 edit
EgeeOCCGroup 2007-12-10 GGUS:28099 has been open for two weeks without comment.

Update Feb 11th set to unsolved (gLite Workload); Related to a MW bug BUG:32962 FQAN comparator does not work properly
status: integration candidate
2008-02-12 edit
Main.OCC(John) 2008-02-04 Clarify "at risk" downtime & interaction with tools (esp. GridView)
Update Jan 31th: Done. Submitted Savannah bug 33104 against GridView?. They fixed the GOCDB synchronizer code (gocdb3_query.php ) to handle AT_RISK downtime (intervention) correctly.
2008-02-11 edit
Main.OCC(John) 2008-02-04 What to do about FNAL & SAM timeouts?

*Update Jan 31th* : Piotr (Mr SAM) confirmed that site-specific timeouts are not an option. Also, modifying timeouts just for the DPM tests would take a while, and would require agreement from all VOs & ROCs (it would potentially increase the time to detect real DPM problems). One could argue that if the SRM tests are timing out after ten minutes, the SRM is probably not of much use to users at that time either. Therefore, tweaking SAM to mask the problem is not a good solution. Nevertheless, he suggested that FNAL investigate a local workaround, such as increasing the priority of ops monitoring jobs. Joe was notified of this, & we await his feedback.

*Update Feb 18th* More hardware was thrown at the problem and the situation is

2008-02-25 edit
Main.OCC(Nick) 2008-02-04 How to handle BDII/GOCDB mismatches, and the issue of introducing new sites?

*Update Jan 31th* : Will be discussed by the ROC managers in Lyon next week (Tuesday 5th)

*Update Feb 18th*: Will add a link to minutes of ROC managers meeting.

*Update Feb 25th*:
This is the link: https://edms.cern.ch/file/893655/1/ROC-mgrs-05-02-2008(ARM-11).htm
The conclusion was: Nick to ask the relevant development teams for an estimate of the effort required to implement the automatic removal of entries from the top-level BDII.

*Update Mar 3rd*
Being handled in the ROC managers meeting, closing here.

2008-03-04 edit
Main.OCC(Antonio) 2008-02-04 Ensure instructions for publishing storage space reaches sites (ATLAS)

*Update Feb 1st* : tickets GGUS:32064 (ROC UKI), GGUS:32065 (ROC Russia), GGUS:32067 (ROC DECH), GGUS:32068 (ROC AP), GGUS:32070 (ROC France) submitted to track the issue

*Update Feb 13rd* :
GGUS:32064 (UKI) --> in progress
GGUS:32065 (ROC Russia) --> open. don't allow of queryconf
GGUS:32067 (ROC DECH) --> in progress
GGUS:32068 (ROC AP) --> solved
GGUS:32070 (ROC France) --> child tickets to sites GGUS:32071, GGUS:37072
- GGUS:32072 waiting for reply from Atlas with list of addresses

*Update Feb 18th*:
Insure instructions reaches sites about publishing storage... Lots of tickets submitted, close the item.

2008-02-25 edit
Main.OCC(Antonio) 2008-02-04 Request all LHCb sites to provide a detailed SRMv2 status page

*Update Feb 1st* : Find it in the minutes

*Update Feb 11th* :production sites seem in general not available to provide what requested. The GGUS ticket GGUS:31800 has been set to 'unsolved' and the the issue is being tracked with

2008-02-13 edit
GridView 2007-12-10 What are the implications of no SAM test results at a site for >24 hours? How does it affect availability/reliability calculations?

*Update 11th Dec:* Gridview team responded, added to next weeks agenda
2008-02-04 AntonioRetico   edit
Main.OCC 2007-12-10 GGUS:29208 has been open for a number of weeks without comment.

*Update Dec 10th* Will be raised at EMT

*Update Dec 13th* Now a confirmed BUG:32078. Already fixed for an upcoming release.
2008-02-04 AntonioRetico   edit
Main.OCC 2007-12-17 SRM sam tests only run once every two hours. Can this be increased to every hour?

*Update Dec 12th* SRM tests are now running once an hour.\
2008-02-04 AntonioRetico   edit
Main.OCC 2007-12-17 Any component which goes straight from certification to production, missing out testing in the PPS, should have this clearly stated in the release notes.

*Update Dec 13th* This has been discussed with the Integration \& Deployment team who agree to include this information in the release notes from now on.
2008-02-04 AntonioRetico   edit

