Week of 060717
Open Actions from last week: Check on daily
LSF reconfig.
Chair: H.Renshall
SMOD: Miguel Dos Santos
GMOD: Yvan Calas
Monday:
Log: Problem with LFC - /var filling up
New Actions:
Discussion:
LSF debug reconfig was done and daily reconfig is again disabled. ATLAS and CMS ran stably over weekend but FZK went down at 19.30 Saturday with chilled water problem and do not expect to come back till Tuesday.
Tuesday:
Log: LFC LHCb lxb1133 having problems. GRIDVIEW down since 19:00 last night.
New Actions:
* lxb1133 should be taken out of the monitoring, and looked at by experts. (GMOD) - DONE
Discussion: To solve the /var problem on the LFC nodes, we need to reinstall them all ! Should check the installation procedure works and then schedule it.
Wednesday
Log:
Actions: lxb1133 was removed from the load balancing (with some difficulty).
New actions: HRR to make a CERN Remedy ticket for monb001 R-GMA archiver not working.
Discussion: Plan to reinstall LFC nodes to be discussed at WLCG SCM (it was and is approved). Report from gridview team they are still not getting all data from monb001 after its reboot. HRR to raise ticket.
Thursday
Log:
Actions: HRR found looping lb-bkserverd processes on rb101 and 103 (Savannah 18123) and killed them. cms gLite UI was then pointed to rb103 leaving rb102 for cms rb fixes/patches/testing (Calas, Qing, Sciaba).
Discussion: HRR talked to Steve Hicks at
RAL and was told R-GMA problems were not at CERN.
Friday
Log: Gridview seems better
Actions:
- Restart producer on atlas gridftp servers (James) -DONE
- New LSF version and config to be deployed on Monday (Ulrich) -DONE
Discussion:
- Gridview now looks more consistent - need to restart producers on c2atlas in order to provide all data
Your signature to copy/paste:
Force new revision help
| | or or or
Access keys: S = Save, Q = Quiet save, K = Checkpoint, P = Preview, C = Cancel