Week of 061023
Open Actions from last week: Check rb110 (with D.Qing) and put into production for CSA06. When done return rb101 to
GSSDATLAS (edit and release AFS wmsui.conf). Deploy the new FTS which fixes the DB lockup problem.
Chair: James
Gmod: Yvan
Smod: Jan
Monday:
Log: LFC NO_READs over the weekend. Gridview problem (RAC issue)
New Actions:
- FTS Upgrade for today (Gavin) - DONE
- LFC - follow up problem (James) - DONE
- Gridview - check on issue with sessions not failing over in Gridview app (Harry)
- Gridview - report on DB problem (Ioannis/Miguel)
Discussion:
- When will the LFC upgrade to new hardware happen - machines coming tomorrow.
- what is status of lcg-mon-gridftp re: rgma/python version (Maarten to report)
Tuesday:
Log: LFC NO_READ continures
New Actions:
- Contact errant LFC User (Jan)
- Up timeouts to PIC on FTS (Gavin) - DONE
- gLite RBs not publishing job stats in gridview (Harry) *
Discussion:
- FTS intervention went OK
- LFC problem down to one user flooding the server for periods of 15 minutes several times a day
- Gridview DB node locked - under investigation with linux.support.
- lcg-mon-gridftp issues still under investigation by maarten
- New kernel will be appearing for ES nodes (Jan)
- CMS see timeouts on gridftp to PIC - wonder if it is related to FTS config. to check, and modify if needed
- ALICE problems are mostly happening in their software layer - if fixed should bring to ~97-99% reliability
- CASTOR upgrades announced
- castorpublic tomorrow
- alice, atlas, lhcb next week
- cms after CSA06
- new client on all nodes coming monday
Wednesday
Log:
New Actions:
- castorpublic upgrade today (Jan)- DONE
Discussion:
- Michael reported two problems they had with their central DB instance at CERN
- @noon
, they had lots of locks occuring which needed to be cleaned by hand
- at ~10pm all transfers stopped -this was due to the problem (for which we received alarms) on itrac09
Thursday
Log:
New Actions:
Discussion:
Friday
Log:
New Actions:
Discussion: