Agenda | Date | Minutes | Taker | Chair | Last Edit |
---|
Assigned to | Due date | Description | State | Closed | Notify |
---|
Assigned to | Due date | Description | State | Closed | Notify | |
---|---|---|---|---|---|---|
Main.SAM | 2009-08-31 | Activate in production SAM MPI tests 12/10/2009: the MPI tests are in validation ![]() |
2009-10-26 | edit | ||
Main.OCC | 2010-02-08 | Wrong version detection command for the LB service. BUG:61586![]() ![]() 08/02/2010There is now a fix for gLite 3.1, the bug is set to "Fix Certified", I think this action can be closed. ![]() |
2010-02-08 | edit | ||
Main.OCC | 2010-02-15 | Check MPI test status and request the move to critical next Friday, so from Monday alarms are sent to the regional operation teams This is now done, started on Monday 15th of February. ![]() |
2010-03-02 | edit | ||
EgeeOCCGroup | 2007-12-10 | GGUS:28099![]() Update Feb 11th set to unsolved (gLite Workload); Related to a MW bug BUG:32962 ![]() status: integration candidate ![]() |
2008-02-12 | edit | ||
Main.OCC(John) | 2008-02-04 | Clarify "at risk" downtime & interaction with tools (esp. GridView) Update Jan 31th: Done. Submitted Savannah bug 33104 against GridView?. They fixed the GOCDB synchronizer code (gocdb3_query.php ) to handle AT_RISK downtime (intervention) correctly. ![]() |
2008-02-11 | edit | ||
Main.OCC(John) | 2008-02-04 | What to do about FNAL & SAM timeouts? *Update Jan 31th* : Piotr (Mr SAM) confirmed that site-specific timeouts are not an option. Also, modifying timeouts just for the DPM tests would take a while, and would require agreement from all VOs & ROCs (it would potentially increase the time to detect real DPM problems). One could argue that if the SRM tests are timing out after ten minutes, the SRM is probably not of much use to users at that time either. Therefore, tweaking SAM to mask the problem is not a good solution. Nevertheless, he suggested that FNAL investigate a local workaround, such as increasing the priority of ops monitoring jobs. Joe was notified of this, & we await his feedback. *Update Feb 18th* More hardware was thrown at the problem and the situation is resolved. ![]() |
2008-02-25 | edit | ||
Main.OCC(Nick) | 2008-02-04 | How to handle BDII/GOCDB mismatches, and the issue of introducing new sites? *Update Jan 31th* : Will be discussed by the ROC managers in Lyon next week (Tuesday 5th) *Update Feb 18th*: Will add a link to minutes of ROC managers meeting. *Update Feb 25th*: This is the link: https://edms.cern.ch/file/893655/1/ROC-mgrs-05-02-2008(ARM-11).htm ![]() The conclusion was: Nick to ask the relevant development teams for an estimate of the effort required to implement the automatic removal of entries from the top-level BDII. *Update Mar 3rd* Being handled in the ROC managers meeting, closing here. ![]() |
2008-03-04 | edit | ||
Main.OCC(Antonio) | 2008-02-04 | Ensure instructions for publishing storage space reaches sites (ATLAS) *Update Feb 1st* : tickets GGUS:32064![]() ![]() ![]() ![]() ![]() GGUS:32064 (UKI) --> in progress GGUS:32065 (ROC Russia) --> open. don't allow of queryconf GGUS:32067 (ROC DECH) --> in progress GGUS:32068 (ROC AP) --> solved GGUS:32070 (ROC France) --> child tickets to sites GGUS:32071 ![]() ![]() - GGUS:32072 ![]() Insure instructions reaches sites about publishing storage... Lots of tickets submitted, close the item. ![]() |
2008-02-25 | edit | ||
Main.OCC(Antonio) | 2008-02-04 | Request all LHCb sites to provide a detailed SRMv2 status page *Update Feb 1st* : Find it in the minutes *Update Feb 11th* :production sites seem in general not available to provide what requested. The GGUS ticket GGUS:31800 ![]() http://lblogbook.cern.ch/CCRC08/38 ![]() |
2008-02-13 | edit | ||
GridView | 2007-12-10 | What are the implications of no SAM test results at a site for >24 hours? How does it affect availability/reliability calculations? *Update 11th Dec:* Gridview team responded, added to next weeks agenda ![]() ![]() |
2008-02-04 AntonioRetico | edit | ||
Main.OCC | 2007-12-10 | GGUS:29208![]() *Update Dec 10th* Will be raised at EMT ![]() *Update Dec 13th* Now a confirmed BUG:32078 ![]() ![]() |
2008-02-04 AntonioRetico | edit | ||
Main.OCC | 2007-12-17 | SRM sam tests only run once every two hours. Can this be increased to every hour? *Update Dec 12th* SRM tests are now running once an hour.\ ![]() |
2008-02-04 AntonioRetico | edit | ||
Main.OCC | 2007-12-17 | Any component which goes straight from certification to production, missing out testing in the PPS, should have this clearly stated in the release notes. *Update Dec 13th* This has been discussed with the Integration \& Deployment team who agree to include this information in the release notes from now on. ![]() |
2008-02-04 AntonioRetico | edit | ||
Main.all, Main.ROCs | 2008-03-15 | Request to Atlas sites to upgrade WNs to SL4 *15th Feb:* broadcast sent *Update Feb 18th*: Request atlas sites to upgrade WN. Broadcast sent , leave open for a bit, deadline was the 15th March. Review 2 weeks before this. *Update Mar 3rd* Steve to produce data of queues by OS. http://straylen.web.cern.ch/straylen/tmp/atlas-gluece-by-os.txt *Update Mar 10th* From Steve: http://straylen.web.cern.ch/straylen/tmp/atlas-sites-by-os.txt *Update Mar 12th* Steve to create a finer report preferable by ROC, ... ( if only that were possible. Maybe via SAM DB) *Update Mar 19th* Reminder to all sites, time is running out... *Update Mar 31st* From ATLAS (Alessandro): we have developped a SAM test to see which version of lcg-utils has been installed on the WN of the ATLAS supporting sites. The results can be seen in the sam web page, selecting ATLAS VO, CE, CE-sft-lcg-version. The sites that give ERROR in this test didn't upgrade to the SRM2 compatible version of lcg-utils. Hope this could help in following the action of having, in all the ATLAS supporting sites, the WN upgraded to SRM2 *Update Apr 21st* A GGUS ticket should be opened against all ROCs to follow-up this issue with sites. Nick knows how to clone a ticket... *Update May 5th* As soon as ATLAS can confirmed that they've opened a GGUS ticket (cloned for all ROCs), we can close this item. *Update May 19th* This action can be closed ![]() |
2008-05-22 | edit | ||
Main.all, Main.ROCs | 2008-03-30 | Request to Atlas sites to increase the shared sw installation area to 100 Gb *15th Feb:* broadcast sent *18th Feb:* Raised at operations meeting, too soon after broadcast for any feedback. *19th Mar:* Ongoing, but not obvious how to check compliance. *31 Mar:* Ongoing. ATLAS will look into building a SAM tests. *Update Apr 21st* A GGUS ticket should be opened against all ROCs to follow-up this issue with sites. Nick knows how to clone a ticket... *Update 5th May* Now a GGUS ticket, closing. ![]() |
2008-05-06 | edit | ||
Main.ROCs | 2008-03-10 | Is there a need to include DPM Oracle in the gLite distribution alongside DPM MySQL? ROC Managers to check with their respective sites. *Update 3rd March* Closed as being tracked by ROC managers. ![]() |
2008-03-04 | edit | ||
Main.ROCs | 2008-03-10 | Input for consolidated prioritization of 64-bit porting of gLite components is requested. Feedback to Oliver Keeble, please. *Update 12th March* Received feedback from Italy, Southwest, and a few others. Close action. ![]() |
2008-03-12 | edit | ||
OliverKeeble | 2008-03-10 | Consolidated prioritization list for 32-bit releases will be provided by Oliver. *Update: 3/3/08* Oliver added a "priorities" section to the Node Tracker page: https://twiki.cern.ch/twiki/bin/view/EGEE/Glite31NodeTracker Closed. ![]() |
2008-03-04 | edit | ||
EgeeOCCGroup | 2008-03-31 | Broadcast that gLite 3.0 lcg-RB should henceforth be considered obsolete and unmaintained. It is replaced by WMS (preferably on SL4). Include link to user documentation in the broadcast. *Update: 3/3/08* Announcement should be made at the time of the release of the WMS/LB on SL4 (TBD), saying support will be dropped for the lcg-RB in 2(?) months. *Update 12 March 2008:* No change. *Update 31 Mar:* The RB will be obsoleted once the SL4 version of the WMS is available. *Update 17th Apr:* Will be released in two months, closing. ![]() |
2008-04-17 | edit | ||
Main.SAM, Main.team | 2008-05-27 | Need to consider what SAM, alarm system and CIC portal should do mitigate against a high load CE. *Update March 12th* Had discussion with Ulrich and also submitted a new test for sam, it needs some thought as to if it is a good idea but it would be a non-critical test on the "GlueCEStateStatus: Production" attribute that the then critical CE tests would depend on. The same logic as the existing SE free space tests. https://savannah.cern.ch/bugs/?34443 *Update 31 March:* Request for new SAM sensor passed to SAM team. *update 21 April:* On SAM work-list (Savannah). John thought that the item could be closed as far as the ROC managers are concerned, but Kostas was worried that the issue risked being forgotten. He suggested the possibility of a pending state for items that get transferred to other tracking mechanisms. Nick will think about it. *Update 5th of May* Now on the SAM worklist. Nothing changed for now, ignore for 3 weeks. *Update 2nd of June* No progress recorded. *Update 11th June* This is present as BUG:34443 ![]() ![]() |
2008-06-11 | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Extract from the information system the list of WMS 3.0 Update from Steve: Does not look too bad, this is only those who are publishing at all. Those with old WMS (SL3 in fact) EENet (Estonia) ITEP (Russia) RTUETF ( Latvia) UNI-FREIBURG (Germany) Those with new WMS (SL4 in fact) AEGIS01-PHY-SCL Australia-ATLAS BY-UIIP CERN-PROD CESGA-EGEE CGG-LCG2 CNR-PROD-PISA CY-01-KIMON CYFRONET-LCG2 DESY-HH FZK-LCG2 GR-01-AUTH GRIF HG-06-EKT INFN-CNAF INFN-PADOVA ITEP JINR-LCG2 KR-KISTI-GCRT-01 NCP-LCG2 pic prague_cesnet_lcg2 RAL-LCG2 RO-03-UPB RTUETF RU-Phys-SPbSU ru-PNPI SARA-MATRIX Taiwan-LCG2 TR-01-ULAKBIM UKI-SCOTGRID-GLASGOW Uniandes VU-MIF-LCG2 Note there may well be other WMS not included by siteBDIIs out there we know nothing about. Update 10/9/08: The four sites running WMS on SL3 were asked to upgrade ASAP. ![]() |
2007-03-06 SteveTraylen | edit | ||
Main.OCC | 2007-03-05 | Example Action Item ![]() |
2007-03-06 SteveTraylen | edit | ||
SteveTraylen | 2008-05-26 | The T0 FTS server has configured 0 retries by default, while T1s have 3 retries by default. This complicates Atlas workflow, if a transfer fails, we try to find another source with the same file. Could we have 0 retries in all FTS servers at T1s (this affects all channels, all VOs)? What is the position of the other LHC VOs? - Not a problem for LHCb - Ron (SARA): I thought this could be set up per channel, per VO agent. To be checked with Gaving & co * Answer from Gavin: The ‘retry’ count is a VO policy, so needs to be set in the relevant VO agent config for the FTS server (the default is 3 retries separated by minimum 10 minutes). I know CMS’ Phedex prefer to fail-fast (and see the error as early as possible), so have asked T1 sites to set the retry to 0. Phedex then retries externally (i.e. with another FTS jobs for the failed files). LHCb and ALICE I think are still set to the default. See: https://twiki.cern.ch/twiki/bin/view/LCG/FtsYaimValues20 Contact fts-support@cernNOSPAMPLEASE.ch is case of problems. *Update June 11th* Steve should submit tickets to all FTS sites. *Update June 13th* GGUS:37415 ![]() Review in two weeks time. *Update June 20th* GGUS:37415 ![]() the changes have been made.... Except for: For USCMS-FNAL-WC1 in GGUS:37428 ![]() For BNL-LCG2 in GGUS:37427 ![]() * Update June 30th* Steve will escalate, two U.S. sites are problematic. * Update July 7th* BNL and Fermi have now responded that they made the configuration change. Action item to be closed after next meeting. Steve ![]() |
2007-03-06 SteveTraylen | edit | ||
SteveTraylen | 2009-02-02 | Check VO-card setting for WN local disk space requirements for all HEP VOs. Reviewed 3rd February, Alice = 10 GB, Atlas = 15 GB, CMS = 10GB, LHCb = 2 GB. All LHC VOs specify values for WN disk space. Close this action, if you see particular VOs exceeding this then submit GGUS tickets for the VO. Close this after next operations meeting. ![]() |
2009-02-12 | edit | ||
Main.LHCb | 2008-03-17 | LHCb and Kostas to contact one another about middleware version tickets within SouthEast region. ** solved: LHCb runs a custom SAM test that checks the version of lcg_utils and spots out sites with obsolete version installed. The person in LHCb following these tickets submitted twice 24 tickets for 24 different sites because his first attempt (using mail ticketing system of GGUS) failed to return the GGUS reference. For your information this problem was due to a missed mapping of the submitters mail address (used by GGUS for submissions of tickets via mail) and his certificate. ![]() |
2008-03-05 | edit | ||
Main.Marcin | 2007-03-19 | Marcin to produce a list of examples where a site failure is attributed to a central service failure. *Update 19th March*: Marcin supplied some examples. Problem is well understood, solution is less obvious. John to work with SAM & GridView team. ![]() |
2008-03-21 | edit | ||
Main.SAM | 2007-03-19 | Sam team to investigate promptly the BDII2SRM script to recognise GlueServiceType/Version SRM/1.10 correctly. GGUS:33726![]() ![]() !BDII2SAM script now fixed, action should be closed following next meeting. *Update 31 March:* Script is fixed. Close. ![]() |
2008-04-02 | edit | ||
GridView | 2007-04-27 | Please look into GGUS:33850![]() ![]() |
2008-03-21 | edit | ||
SteveTraylen | 2008-04-14 | Define with gstat (roc-dev@listsNOSPAMPLEASE.grid.sinica.edu.tw) the new value to be set in the list of allowed OS describing the Scientific Linux 5 run at the site Update 17th April Min looking at it but the site should really submit a ticket. As per the instructions on http://goc.grid.sinica.edu.tw/gocwiki/How_to_publish_the_OS_name ![]() |
2008-05-06 | edit | ||
Main.OCC | 2008-06-10 | Check with LHCb he status of the development of Dirac3 (version of the submission engine interfaced to WMS) Update 17th April Will be released in at least 2 months, close action item for now. ![]() |
2008-04-17 | edit | ||
AndreaSciaba | 2008-04-21 | Verify and document in the User Guide the option to configure the GFAL client to use multiple BDIIs Update 17th April, Maite will check. Update 19th May, Andrea changed this on the same day the action was raised. This action can be closed. ![]() |
2008-05-22 | edit | ||
Main.UKRoc | 2008-04-14 | Clarify the scope of the issue reported in WlcgOsgEgeeOpsMinutes2008x04x07 about continuous certificate requests. Is it a general comment or related in particular to the CIC portal? Update 17th April, Gilles has done something. ![]() |
2008-04-17 | edit | ||
Main.CERNROC | 2008-05-13 | Follow up with YerPhi site to resolve or suspend site. Update 5th May. CERNROC to provide an update next week once they have been CIC on duty for a week and cleaned everything up. Update 9th May. !YerPhi is now in state suspended and all existing COD tickets have been closed. This item should be closed after next week's meeting. Update 19th May, this action can be closed ![]() |
2008-05-22 | edit | ||
Main.Atlas | 2008-05-13 | Atlas to provide details of tests they are running. Atlas have provided the name of the test. CE-sft-vo-swspace . This item should be closed next week. Update 5th May: Small amount still to do but progress has been made. Revisit next week. Update 19th May: this action can be closed ![]() |
2008-05-22 | edit | ||
SteveTraylen | 2008-05-27 | While GFAL works with a multi-valued LCG_GFAL_INFOSYS variable there other bits of software that may not. e.g. glite-service-descovery, lcg-infosite, lcg-info,... . These need all to be checked for their support level. Currently assigned to Andrea but someone else should really do this...(Perhaps Steve?) *Update Monday June 2nd* No progress made. *Update Friday June 6th* 3 pieces of software identified as affected. lcg-infosites BUG:37572 ![]() ![]() ![]() Steve *Update June 11th* Savanah tickets all submitted, close here. ![]() |
2008-06-11 | edit | ||
NicholasThackray, AntonioRetico | 2008-06-06 | Check with CMS VO Cards about WMS and Pool account support. Update 28-May-08: CMS confirms that the use of pool accounts for SGM has proved to be not working in many cases. The main problem is that the acls on the files are set by users and, if different accounts are used, one software manager could act (e.g. uninstall) packages installed by another. On the other hand this is not relevant for WMS, where the distinction of pools per VO is not needed. The conclusion is that the recommended configuration of the accounts has indeed to be different between CEs and WMS, at least as far as CMS is concerned. As far as I am concerned this action can be closed. Antonio ![]() |
2008-06-02 | edit | ||
SteveTraylen | 2008-06-06 | Check if CRL lifetimes are monitored anywhere? *Update 2nd June* From Romain: There is a SAM test called "CE-wn-sec-crl". General results are public, but detailed results are available only to the ROC security contacts + SAM team. Follow up question for Romain. CE-wn-sec-crl monitors CRL status on the WNs them self. What was being asked for was central monitoring of the CA's CRL URLs.... It would make for an easy rrd plot. *Update 9th June* There is central monitoring of CRLs here http://nagios.eugridpma.org/ ![]() Also I have requested that the WLCG Monitoring Group considers getting these to sites via its alarm/nagios/messaging framework. BUG:37632 ![]() a week. ![]() |
2008-06-26 | edit | ||
SteveTraylen, EgeeSiteRepsGroup | 2008-06-06 | Look into why LHCb's files in /tmp are being deleted. The reason is that python's tarfile unpacks files with a --preserve-atime so the files are old as far as tmpwatch is concerned. A way forward is being discussed. Update 2-Jun: discussion in minutes --> closing ![]() |
2008-06-02 | edit | ||
MaiteBarroso | 2008-06-06 | Check with Gridview/SAM if the tier1 availability for 20th -> 26th May can be recalculated given the failure of the ATLAS sam UI from 20th to the 26th May. Assigned to Maite for now. Update 2-Jun: Discussed in minutes --> closing ![]() |
2008-06-02 | edit | ||
JeremyColes(UKI) | 2008-06-09 | follow-up reported site UKI-LT2-QMUL (transferred to Political Instance by COD on 2-Jun-08). 30/6/08 - Jeremy reckoned this action can be closed. ![]() |
2008-07-01 | edit | ||
RonTrompert(NE) | 2008-06-09 | follow-up reported site VGTU-gLite (transferred to Political Instance by COD on 2-Jun-08) 30/6/08 Site is still failing SAM tests and should be suspended. 3/07/08 Ron reported that the site has now reacted and fixed the situation, close this action after the next operations meeting. ![]() |
2008-07-08 | edit | ||
Main.ROC_France | 2008-06-09 | follow-up the following issue reported by ROC France: With our UIs we got some problems with Python for several VOs because those VOs use their own Python version (> 2.3.x). Unfortunately, UI installation provides standard python2.3 libraries within the externals directory, and set the PYTHONPATH accordingly. By the way, to be able to use their own python installation, VOs must convenably update the PYTHONPATH variable to ensure that the right version of the required libraries are firstly taken into account. Make sure also that you call the right python binary *Update 11th June* Nick will look into this. *Update 21st June* Waiting for Nick *Update 28th July* Response from SA3 - _The tarball is produced to work with SL4, so python 2.3 has to be the default. To fully support python 2.5 (for example), you need to distribute the interpreter, reconfigure the environment and, ideally, have all your language extensions recompiled against the new python API. We are looking into how to do the last part, but the first two things are up to the site or VO. Update 11th August this was raised by a VO in France, is the answer given by SA3 OK? how do we move from here? Helene will pass the feedback to the relevant people Update 1st Septemberthe action can be closed; finally the real solution was in a savannah bug and it was a problem with the YAIM environment ![]() |
2008-09-01 | edit | ||
JudiNovak | 2008-06-09 | Modify the SAM unavailability list on twiki adding a section for availability of clients run by the VOs *Update 11th June* This will be followed up with John and Judit immediately after the meeting on the 9th. ![]() |
2008-06-26 | edit | ||
SteveTraylen | 2008-06-18 | Check GGUS:36373![]() example of what should be done. July 7th, there is now a massive thread which does contain the answer. The answer must be extracted and documented next. July16th. I've now written How_to_publish_queues_with_access_restricted_to_a_FQAN ![]() to me what is wanted from the tickets that are assigned to me. *Update 21/7/08* Now that Wiki page exists, Steve would like to close this item. Any problems should result in new tickets! ![]() |
2008-07-21 | edit | ||
Main.SAM | 2008-06-30 | Upgrade lcg-utils on SAM submission host. Latest version of lcg_utils installed in SAM validation testbed & used against this site. Previous version failed with: protocol not supported by Storage Element Latest version fails with: CGSI-gSOAP: Error reading token data header: Connection reset by peer Problem seems to be with the data that the site provides to the Information System. ![]() |
2008-07-08 | edit | ||
SteveTraylen | 2008-06-30 | Submit somewhere request for better downtime publishing as proposed by atlas sometime ago. *Update 1st July* https://savannah.cern.ch/support/?104871![]() ![]() |
2008-07-08 | edit | ||
JohnShade | 2008-07-10 | CIC portal uses the security certificate of a different site. Cyril will follow-up. John will submit a GGUS ticket. Update: GGUS:38050![]() ![]() |
2008-07-03 | edit | ||
SteveTraylen | 2008-07-10 | Steve to look at GGUS:37334![]() ![]() also a bug will be submitted to link to it. Add bug before next week and close. 14th July - Bug now submitted BUG:38820 ![]() an upcoming release. Close the action here after today's meeting since the BUG is now present. ![]() |
2008-07-14 | edit | ||
Main.UKRoc | 2008-07-10 | UK/I ROC to look at GGUS:37890![]() ![]() This item is, consequently, also closed. ![]() |
2008-07-21 | edit | ||
SteveTraylen | 2008-07-10 | Steve should submit a GGUS requesting that gstat monitors for LFCs not publishing as compared to GOCDB. *2nd July* GGUS:38053![]() ![]() |
2008-07-08 | edit | ||
Main.OCC | 2008-09-01 | Follow up on GGUS:34338![]() ![]() |
2008-08-11 | edit | ||
Main.OCC | 2008-08-18 | SAMAP is giving critical errors rather than warnings when sites do not update their CA RPMs 7 days prior to the deadline for update. Update 25th August SAMAP will follow-up "later" Update 8/9/08: Nick will follow-up. Update 13th October The tool development team has fixed the bug. ![]() |
2008-10-17 | edit | ||
Main.OCC | 2008-08-25 | Find the probable release date of the CREAM CE. Update 25th August: This will be released in the next update to gLite 3.1 - within 1-2 weeks. 1st September After teh update at today's meeting, this action can be closed: the EMT made the decision to delay the deployment of the CREAM CE (the certified patch). This is because not-ICE-enabled WMS could accidentally match the Cream CE and cause a submission failure. Waiting for the ICE-WMS to be deployed, as a workaround, Cream will be released with a GlueServiceStatus?? = ‘Production’, to be changed again later. One issue is represented by the old version of WMS on SL3 (unsupported). As they will not be integrated with ICE, once the Cream CE will be advertised again in real production mode, they would fail to submit. In order to size this issue up we would like to get from the WLCG EGEE Operation Meeting an estimation of the number of old SL3 WMS still in production. ![]() |
2008-09-01 | edit | ||
Main.OCC | 2008-08-18 | Make the owners of the CA RPM release process aware of the issues raised by ROC France. Update August 19th: Maite has some news? Update 1st September SAM agrees to extend the 7 days period in this specific case: the CA RPMs are not put in the repository in the 1 day scheduled for this. Technically it is feasible and already implemented. See diagram and explanations here: https://twiki.cern.ch/twiki/bin/view/LCG/SAMSensorsTests#CE_sft_caver Shorty, the diagram shows that it is possible to configure: - time-stamp from which countdown of timeout starts - delay of warning - timeout before sites will get CRIT error Update 10/9/08: Although Nick doesn't understand the text, he said that the ticket can be closed (SAM implemented what was asked). ![]() |
2008-09-12 | edit | ||
Main.ATLAS | 2008-08-18 | ATLAS to submit a GGUS ticket detailing the problems of slow response of the GOC DB seen in the evenings. Update 19th August - Slow response no longer obvious, will close and reopen if need be. ![]() |
2008-08-19 | edit | ||
Main.OCC | 2008-08-18 | Submit a request to the FTS developers to provide suitable information providers for publishing the FTM end-points. August 8th 2008 , bug now submitted GGUS:39906 ![]() ![]() |
2008-08-11 | edit | ||
Main.Alessandro_di_Girolamo | 2008-08-18 | Send details to Maite (Maria . Barroso . Lopez @ cern . ch) and Jeremy Coles (j . coles @ rl . ac . uk) of who to contact regarding the agenda of the ATLAS events at CERN during the week 25-29 August. ![]() |
2008-08-19 | edit | ||
NickThackray | 2008-08-11 | On the request of LHCb, escalate the bug BUG:39641![]() ![]() |
2008-08-11 | edit | ||
SteveTraylen | 2008-08-20 | Check if KCA is still needed in the lcg-CA CA set. Was raised at this week's LCG MB. Fermilab representatives are checking internally if it is still needed. Update 19th of August: The LCB MB meets today and this will hopefully be resolved. Update 25th August: The KCA will soon be officially approved as a trusted CA. Also, it is being used by the CDF VO. Therefore, KCA will remain in the list of CAs. ![]() |
2008-08-27 | edit | ||
Main.OCC | 2008-09-01 | Remind sites that the shared area is a critical service. Update 10/9/08: Nick sent an EGEE broadcast about this. In fact, he sent two, to explain that this only concerned sites supporting VOs that had explicitly mentioned the shared software area in their respective VO cards. ![]() |
2008-09-12 | edit | ||
Main.OCC | 2008-11-17 | OCC to put an enhancement request into the GOCDB and CIC Portal for the following: EGEE downtime announcement procedure: 1. Announcement of scheduled downtime with a mail "Announcement" at least 24h in advance as in the MoU. 2. Start of downtime (scheduled and unscheduled) as of the time when it starts with a mail "Start" (with correct time!) 3. End of downtime: mail"End" (with correct time) Update 3 Nov: OCC has entered this enhancement request into the GOC DB "shopping list" in Savannah (https://savannah.cern.ch/support/?105977 ![]() Close the item. ![]() |
2008-11-07 | edit | ||
Main.LHCb | 2009-11-10 | From UK/I but general, why are so many sites failing LHCb SAM tests. Please can LHCb give a summary. Roberto Santin. will check again on 11.11.2008. ![]() |
2008-11-19 | edit | ||
JohnShade | 2009-11-10 | Check there is progress on GGUS:42341![]() ![]() |
2008-11-12 | edit | ||
RocNorth | 2009-11-10 | Check IPTA-LCG2 for progress on GGUS:42015![]() ![]() |
2008-11-12 | edit | ||
RocRussia | 2009-11-10 | Check RU-Phys-SPbSU for progress on GGUS:40521![]() ![]() |
2008-11-12 | edit | ||
DianaBosio | 2008-12-08 | Follow-up with Beijing site as per GGUS:40700![]() ![]() |
2008-11-19 SteveTraylen | edit | ||
Main.ROC_NE | 2008-12-08 | Suspend ITPA site as per GGUS:42015![]() North East ROC to respond next week. *Update 27th November*, site has now corrected the problem, closing this. ![]() |
2008-11-27 | edit | ||
MariaDimou | 2008-12-08 | Follow-up escalation as per GGUS:42981![]() ![]() |
2008-11-19 | edit | ||
MariaDimou | 2008-12-08 | Follow-up escalation as per GGUS:42999![]() ![]() |
2008-11-19 | edit | ||
NicholasThackray | 2008-12-02 | Nick to add hyperlink to agenda and minutes template for the Alcatel meeting call back. Update: link was always there, but now uses a font for the blind. ![]() |
2008-12-02 | edit | ||
Main.OCC | 2009-01-31 | OCC to send broadcast to sites requesting to upgrade the GFAL version so it is higher than 1.10.6 More details about the issue can be found here: https://gus.fzk.de/ws/ticket_info.php?ticket=43994 ![]() ![]() |
2009-02-03 | edit | ||
Main.Akos | 2009-01-31 | The Data Management team (Akos) to provide a version of the LFC without list replica (related to the old GFAL version problem reported by Biomed) Update 19/1/2009: (mail from Akos): We have examined the issue and it does not look like a security problem, but a resource limitation: the number of threads in an LFC instance limits the number of clients that can connect concurrently and the Biomed usage patter exceeds that limit. When the clients would finish their work, LFC would be responsive again. The same problem would occur with other iterator like operations, like opendir/readdir/closedir. Removing these operations would cause old clients to fail, however it would not solve the problem, so in my opinion the upgrade of lcg_utils is the right solution. Unfortunately nobody has contacted us from the Biomed community regarding the possibility and context of a special build, so we did not progress on that side. Update 26/1/2009: Can be closed. ![]() |
2009-01-27 | edit | ||
Main.Biomed | 2009-02-28 | Long term solution to the old GFAL version problem reported by Biomed: develop VO specific SAM test to detect this, and then exclude the sites with the wrong version Update 19/1/2009: Long-term solution could be SAM tests, or adding GFAL version collection to job-wrapper scripts. ![]() |
2009-01-27 | edit | ||
Main.SAM | 2009-01-31 | SAM and Atlas (Alessandro) to get together to understand how SAM-Atlas deals with sites with no close SE defined and see if this can be used in SAM-operations Update: 19/1/2009: The outcome of the get-together was: >> Not having SE affects on passing by site RM SAM tests - those tests take closest SE (default). This is incorrect – the defined SE doesn’t have to be at the site! >> Also setting up site in such situation is not possible because yaim require SE. Correct, but again the SE doesn’t have to be local to the site. >> In case of putting SE in Scheduled downtime, site have to put also CE into downtime (otherwise will not pass RM tests) or chose (lack in procedures) other SE (from other site). This is correct, and the only real issue. ATLAS doen’t use Replica Management tests, but believe that they should be part of the ops infrastructure tests (which are more extensive). There may be a case for making the replica management tests non-critical, but they’ve been critical for two years now and most people seem happy with this. The way for a site to change the defined SE is to modify the variable VO_OPS_DEFAULT_SE in the WNs’ site-info.def files. ![]() |
2009-01-27 | edit | ||
Main.CERN-ROC | 2009-01-31 | Check of existing cases of sites only hosting core services, without site services. This is to support a new site RedIRIS in SWE ROC Update 19/1/2009: CERN ROC to check sites with only core services – no progress. Update 2/2/2009: New SWE site RedIRIS will only host core services (BDII, WMS, etc.) Problems until now: 1) GIIS performance error due to: GIIS Old Entries Found: 6 - ERROR - This will make the SAM test gperf fail. 2) No Grid Version published: GridVersion: *NOTE* could not find valid LCG version - This ist just a warning in GSTAT at this moment The other tests seem to work only the gperf error is critical. Update 12th February - Steve will take a look to understand what this is about. Update 19th February - Steve - Confused , there is no RedIRIS site in gstat? http://gstat.gridops.org/gstat//SouthWesternEurope.html ![]() so this will fix itsef. ![]() |
2009-03-03 | edit | ||
NickThackray | 2009-02-09 | Ask SA3 for a list of library packages needed for 32 to 64-bit migration. *Update 12th Feb* - There is no list. There is a list of per VO on the VO Cards, we may try and produce a common list. What next? *Update at the meeting* - VOs will definitely have to maintain a list of the libraries they need, in their VO ID card. Item closed. ![]() |
2009-02-27 | edit | ||
AllROCs | 2009-02-09 | All T1s to check and update list of FTM end-points. To be sent to Nick 12th Feb, TWiki page has been created. ![]() |
2009-02-12 | edit | ||
Main.ROCSE | 2009-02-27 | RO-03-UPB has been been escalated to operations meeting for possible suspension ROC: SEE; GGUS:45038![]() ![]() |
2009-02-27 | edit | ||
Main.Nick | 2009-02-27 | Nick to check on CE status with respect to gLite 3.0/3.1 and Condor. Update at the meeting - Confusion as to what this action was about. Whether Condor is supported on the gLite 3.1 LCG CE. Nick will follow up. Update 27 Feb 09 - In theory the gLite 3.1 LCG CE should support the Condor batch system. Instructions on how to set it up are here: https://twiki.cern.ch/twiki/bin/view/EGEE/BatchSystems. If a site has problems, please submit a GGUS ticket and CC neissner@picNOSPAMPLEASE.es. Update 9 Mar 09 - closing ![]() |
2009-03-10 | edit | ||
Main.!JohnShade | 2009-03-09 | When an individual service at a site is marked as "not in production" in the GOCDB, but the site is "in production", SAM continues to test the service. This is not the intended functionality. Check if there is a bug outstanding on this already, and if not, create one. Update 27/2/09: It turns out that GridView does not synchronize on that particular GOCDB field, so it isn't available to SAM. The recommended workaround is to create a scheduled downtime - tests will still run, but no tickets will be raised. The requested functionality will be in the new Aggregated Topology Provider, and GOCDB will have a production attribute associated with each service. Update 9/3/09 - closing during meeting ![]() |
2009-03-10 | edit | ||
AntonioRetico | 2009-03-16 | Follow-up with EMT the re-prioritisation of PATCH:2784![]() ![]() |
2009-03-18 | edit | ||
Main.All | 2009-03-23 | Note all problems linked to CERN outage of the 19th Update 23/3/09: other than some SAM alarms due to temporary glitches with the central LFC and Top-level BDII, no problems were noted. ![]() |
2009-03-24 | edit | ||
AntonioRetico | 2009-03-30 | Check with EMT about plans for FTS with credentials. Last meeting agreed to close this item, but it was not done at the time. ![]() |
2009-04-20 | edit | ||
DianaBosio | 2009-03-30 | Check validity of CERN & FNAL FTM points advertised in Wiki. UPDATE 25/3/2009 Two GGUS tickets have been open for CERN 47367 ![]() for FNAL 47368 ![]() ![]() |
2009-04-20 | edit | ||
Main.OCC | 2009-04-27 | Check with the GOCDB if the RSS feed is updated when the downtime is modified (extended or shortened) Update 20/4/09: Nick to check with Gilles (but he thinks that the answer is no). Update 27/4/09: An update has been given to Nick by Gilles, this will be added here. 18/05/09: An RSS notification is sent by the Operations Portal whenever there is a change to a down-time (see minutes of today's meeting for more details). ![]() |
2009-05-22 | edit | ||
NickThackray | 2009-04-27 | Check with Romain impact of OSCT duty contact being different to that of the COD schedule Update 27th April. A timetable will be provided by OSCT. Update 4th May: No impact. OSCT will provide a time table, carrying on the old schedule from the COD to be applied to the OSCT till teh end of EGEE III. ![]() |
2009-05-05 | edit | ||
SteveTraylen | 2007-07-13 | What installed capacity should be published for sites with only storage. |
2009-08-31 | edit | ||
NickThackray | 2009-07-20 | Check whether CERN's Quattor templates for VOMS could be useful to LAL This wasn't done fast enough to be useful, so closing the action. John 25/8/09 ![]() |
2009-08-25 | edit |