WLCG-OSG-EGEE Op's Minutes Mon 18 Feb 2008
Attendance
EGEE
- Asia Pacific ROC: Min
- Central Europe ROC: Marcin
- OCC / CERN ROC: Steve, Harry, Nick, Alessandro, Gavin. Patricia, Maite, Maria, Antonio, John
- French ROC: Gilles, Pierre
- German/Swiss ROC: Sven Hermann
- Italian ROC: Alessandro
- Northern Europe ROC:
- Russian ROC: Lev
- South East Europe ROC: Kostas
- South West Europe ROC: Kai, Gonzalo,
- UK/Ireland ROC: Derek, Catalin, Jeremy
- GGUS: Torsten
- OSCT: Absent
WLCG
- WLCG Service Cordination: Harry, Jamie
WLCG Tier 1 Sites
- ASGC: Min
- BNL: Absent
- CERN site: Ignacio Reguero
- FNAL: Absent
- FZK: Sven Hermann
- IN2P3: Piere
- INFN: Alessandro,
- NDGF: Leif
- PIC: Gonzalo
- RAL: Catalin, Derek
- SARA/NIKHEF: Ron
- TRIUMF: Absent
Reports Not Received
- WLCG Tier 1s:
- VOs:
- EGEE ROCs (Prod Sites):
- EGEE ROCs (PPS Sites): AP, CERN, IT, NE, RU, SWE
Feedback on Last Week's Minutes
None were given.
EGEE Items
Grid Operator Hand Over on Duty
|
Primary Team |
Secondary Team |
From |
SouthWest Europe |
Russia |
To |
France |
UK/I |
PPS Reports
EGEE Items From ROC Reports
- (ROC CE): There is a GGUS ticket from Central European site related to inconsistencies in DPM assigned to ROC CE. Some files are in DPM DB but not on the disk. We would suggest to remove files manually from the DPM DB to clear inconsistencies, but we are not sure if we should inform VO(s) about such changes and what is the procedure in case files are lost? For reference, the ticket link: GGUS:33012
- (ROC CE): WARSAW-EGEE site experiences problems similar to those of FNAL which resulted in action 101. WARSAW-EGEE is interested if there is any progress on the issue. They have in place standing reservation for OPS VO jobs which however did not help to avoid problems with RM test timeouts.
- (ROC SEE): While working on registration of new Serbian EGEE site (AEGIS07-PHY-ATLAS), we encountered the following problem in GOCDB, which does not recognize existing country reps for newly created sites: GGUS:32910
gLite Release News.
- gLite 3.1.0 PPS Update18 was released to pre-production on Wednesday: It has passed the pre-deployment test and it is currently being deployed to the full PPS: The update contains:
- 64bit versions of SE_dpm_mysql/_disk
- dcache now installs with yum install (not groupinstall)
- bdii v. 3.9.1-5
- yaim-core update
- support for SGE CEs
As there is an update of
YAIM core
all metapackages are reported as affected by this update
More detailed info at:
PPSReleaseNote
gLite WN 32bit vs 64bit.
Moving to yumgroups rather than meta packages to allow installation of 32 and 64 packages
concurrently.
Will be released to PPS but there are still things that need to be understood and perhaps altered
before a final release can be made.
More Information
.
64 bit published machines all have 32 bit middleware since that is all that released.
Various versions will exist in the future though.
- 64 bit machine with with 32 bit middleware.
- 64 bit machine with 32 and 64 bit middleware.
Possible solutions:
- Use the LCG version tag.
- Have the hybrid sites publish something special to suggest they are weird.
Come back next week with some input.
Co-installation of mw services on the same box: Known issues?
We would like a list of known broken combinations.
* Obvious ones are the MON and SE.
port 8443
.
* Ron - RFIO on UI and
DPM do not match.
* Oliver - RFIO should be gone from the UI.
*
DPM 1.6.10 will have combined RFIO.
* Sven Site
BDII and CE -
GGUS:32473
, is the bug it has or will be fixed.
Remove LDAP support for VOs.
YAIM will remove support for ldap based VOs.
- No response.
- CIC portal has already dropped support for LDAP VOs.
YAIM Exit Codes.
_We r about to implement
YAIM exit codes, error codes, and associated
error messages, in order to help a bit the interaction between
YAIM
and fabric management systems.
In principle it will conform with /usr/include/sysexits.h and some
additional own exit code all below 126.
If you are using some tool (ex. quattor) which could make use of this
feature, and/or you have advice/opinion how you would like to see this
to be implemented, please shout now _
And instead of flooding Rollout please send it to
yaim-contact@cernNOSPAMPLEASE.ch thx, Gergo
Please give feedback.
WLCG Items
Upcoming WLCG Service Interventions
- RAL FireHazard -> Sometime next week possible but not yet confirmed.
LHCb Service
And comment in meeting:
- LHCb not receiving enough notice for them to be able to cope with downtimes.
- While there are documents that specify the minimum notice period it seems many periods of notice are not being adhered to...
- Need to be removed from the infosystem while in SD.
- Question from Gonzalo at PIC:
- What is the high load coming in now from LHCb?
- Rate ramped up now to half target rate.
- Please contact Roberto and follow up.
WLCG Service Coordination
ATLAS
- SRMv1 at CERN not working on Thursday.
- One disk server was failing and fixed on Saturday.
- FTS proxy issue still exists.
- ATLAS now running 30%, they hope to raise this.
Use of SRMv1 vs SRMv2.
Only LHCb using SRMv2, CMS and ATLAS are not.
- ATLAS are not ready to SRMv2 yet.
- T0 export was always planned with SRMv1 for ATLAS.
- CMS - no comment.
LHCb
- Failure of 2 files to CRCC08 to Lyon, Saturday morning.... Detailed and followed in the eLog.
- LFC registration problems went away.
ALICE
- Disk full in LYON.
- But not running due to optimisations of the software being done.
CCRC08 Overview
Not running concurrently at the moment and this is something we want to achieve... Consequently
the running experiments may be asked to continue to exercise the T1 tape systems at full rate
for instance.
Database services - no comment.
None
Review of Action Items
- 101: Update, debugging on the SRM... Not much could be done so more hardware will be thrown at the problem. -> Close it.
- 102: BDII-GOCDB mismatches. Discussed at ROC managers... Will add a link to minutes of ROC managers meeting.
- 103: Insure instructions reaches sites about publishing storage... Lots of tickets submitted, close the item.
- 106: Request atlas sites to upgrade WN. Broadcast sent , leave open for a bit, deadline was the 15th March. Review 2 weeks before this.
- 107: Atlas software area needs to be 100 GB.
Next Meeting
The next meeting will be Monday, 25 Feb 2008 15:00 UTC (16:00 Swiss local time).
- Attendees can join from 14:45 UTC (15:45 Swiss local time) onwards.
- The meeting will start promptly at 15:00 UTC (16:00 Swiss local time).
- The WLCG section will start at the fixed time of 15:30 UTC (16:30 Swiss local time).
- To dial in to the conference:
- Dial +41227676000
- Enter access code 0157610
These minutes can only be changed by members of: