LCG Service Co-ordination Meeting Status - 2023-03-27
This page is dynamically constructed from each of the individual service reports.
Deployment Status
Last update of
PPSCoordinationWorkLog 2010-03-25 by Main.unknown
Now in Production
Now in PPS
Soon in Production
PPS News and Info
Data Management Services
Last update of
LcgScmStatusDms 2009-10-21 by
GavinMcCance
Castor
- Currently running 2.1.8-12 on all VO instances
- DB hardware moves ongoing.
- Nameserver currently downgraded to 2.1.8. Waiting for a fixed version of 2.1.9, planning to deploy this before startup.
- Startup: 2.1.9-2 will be deployed on
public
and c2cernt3
. Possible upgrade of LHC experiment instances to be discussed per experiment.
- xroot for Atlas Grid jobs (gssklog converting x509 cert to kerberos) - to be discussed.
- Issue with checksums reported by CMS (CDR) - to discuss at Castor meeting.
SRM
- All SRM instance at 2.8 - (LHCb still at 2.8.0 rather than 2.8.1 - transparent upgrade.)
FTS
- Should aim for the fixed FTS 2.2 (using the old FTS/SRM interaction). New version in testing on PPS. ATLAS will already move. To discuss with other VOs.
- Leave FTS 2.1 service around for a while.
LFC
- Runing version 1.7.2-4.
- Hardware and DB moves done.
- Plan to move support for CMS to
prod-lfc-shared-central
, and to decommission =prod-lfc-cms-centra=l. Important: no data import/export, so all CMS data will be "lost". No date fixed yet.
Database Services
Last update of
LcgScmStatusDb 2009-11-03 by
EvaDafonte
- April security patch update and Oracle recommended bundle patches have been successfully installed on all production databases. Some issues have been observed during the patching:
- ATLAS offline DB (ATLR) has suffered from severe performance degradation between 16:40 and 17:20 (on Tuesday 12th May), due to high load generated by ATLAS DQ2
- CMS online DB (CMSONR) was temporarily (around 11:00, 14th May) not accessible to some applications due to wrong connection string.
- Oracle Enterprise Grid Control agents have been upgraded to version 10.2.0.5.
- Streams:
- Replication from CERN to IN2P3 (Lyon) was stopped on Sunday 03.05 due to a cooling problem at the computing center. ATLAS, LHCb and LFC replication was resumed on Monday 04.05 around 14:00.
- A network problem at RAL affected ATLAS replication from Sunday 03.05 until Tuesday 05.05.
- ATLAS requested to stop the apply process at IN2P3 (Lyon) during part of the ATLAS Stress tests. Apply was disabled from Tuesday 12.05 to Wednesday 13.05.
Workload Management
Last update of
LcgScmStatusWms 2009-10-21 by
RicardoSilva
Status
- 4 LCG CEs are in production pointing to the SLC5 resources (2 running the latest version of the software, 3.1.35-0).
Work in progress
- Upgrade of SLC5 WN to gLite version 3.2.3-0 is being tested (preprod). It will be deployed with the next scheduled linux upgrades.
- 2 new LCG CEs will be put into production today, pointing to the SLC5 resources.
- 3 LCG CEs pointing to SLC4 will be retired, draining will start on the 26th Oct.
- SLC5 BDII (3.2.3-0) will be in production soon.
Issues
Authentication and Authorisation - VOMS, VOMRS.
Last update of
LcgScmStatusAas 2009-10-07 by
SteveTraylen
Submitted by
SteveTraylen
- Voms has been running with out incident since April.
- As of the 30th September there has been a change in behaviour.
- A voms-proxy-init across 12 VOs used to take around 15 seconds and now takes 70 seconds.
- VomsPostMortem2009x10x01 - work in progress.
- No updates are pending at the moment.
- A voms and voms-admin is in certification but not SL5
- A new vo vo.delphi.cern.ch (?) will added in the near future it looks like.
- MyProxy
- Hardware warranty issues: both
myproxy
and myproxy-fts
will be moved this month
- Would still aim to remove
myproxy-fts
- negotiate with CMS.
- The newer VOMS enabled MyProxy software is being looked at for SLC5. Not aiming for this for startup.
Monitoring, Logging and Reporting.
Last update of
LcgScmStatusMLR 2009-04-29 by
JamesCasey
- Ongoing work to get latest MoU pledges into a DB for use by the other monitoring applications. Updating with latest numbers post RRB
Miscellaneous Services
Last update of
LcgScmStatusMisc 2008-11-26 by
SteveTraylen
Not a service but the index page for this meeting,
LcgScmStatus is becoming to slow to load.
I see how to speed it up. The START/END LCGSCM tags will change for next meeting.
After todays meeting I will retrofit to this weeks and then send a mail around before the following
meeting.
Minutes
Last update of
LcgScmStatusMinutes 2009-10-07 by
SteveTraylen
Present
SteveT (minutes), Gavin, Jamie (chair), Maria, Andrea, Harry, Sophie, Riccardo, Antonio, Olaf, Eva, Miguel, Maarten, JanI
Introduction
- There has not been a meeting for a while partly because things are running smoothly
- Most of LHC at 2 or 3 degrees - something will happen soon.
LCG Quarterly Report
WLCG Service Reports
See above for individual reports.
Production Services
Production release , small but mega important for cream - 3.1 update 56 fixes two high risk vulnerabilities and some outstanding
bugs in
CREAM.
- One for accounting , records are now filled in the case of pbs_server/cream CE on split nodes.
- Other functional issues, mainly from CERN.
- New BLParser for LSF.
Future releases going to PPS tomorrow with corresponding release to production next week presumably. Moving now to new process
the staged rollout. First voms and glue 2.0 enabled BDII. Unclear if new VO-Box will go with old or new process.
- question
- What is the staged roll-out process?
- answer
- Releases go essentially to production first, a club of sites will be early adopters. Interested sites can take part.
- question
- Is the cream CE now as good as the lcg-ce?
- answer
- certainly one item is fixed but needs testing on the submission side. First release to consider production rather than special. There are now SAM tests in place.
Over night 8 cream CEs were updated with around 11 (that publish) to go
Current Status of cream CEs
.
- question
- Will there be a WMS release?
- answer
- There is minor release on the way.
- Maarten
- List of minimum versions will be revamped shortly.
- Olaf
- Following releases as they go is not always easy.
- Jamie
- Hopefully better in an SSC world.
- Olaf
- Yesterday's cream CE may also contain config changes that slows deployment.
- Antonio
- The Cream CE deployment is small and the current one is broken anyway. This also includes the rpath RPM fixes which allowed time for the changes already to have extensive certification.
- Jan
- Work around for current deployments is always appreciated.
- Antonio
- Security fixes do in other cases block addition of new features , e.g rpath fixes.
Data Managment
See Data Management Services above.
Castor
- Miguel
- Main new castor feature is throttling , particularly interesting to CMS and Atlas.
- Harry
- What is the latest date for this change?
- Miguel
- Feature not released, due for release first week of November.
- Jamie
- Late so should it wait till it shutdown at XMAS?
- Miguel
- We want to install on the non-LHC instances and be ready for LHC VOs if they demand it.
- Jamie
- Gap is widening with castor version from Tier0 to Tier1.
- Miguel
- Up to the Tier1s
- Jamie
- Does it mean that the old version will no longer supported.
FTS
SRM interaction does change hence the need for scale testing which is now underway.
- Jamie: ATLAS need new FTS with checksum support at all Tier1s?
- Gavin: Certainly atlas are interested to have it.
- Jamie: It seems difficult to imagine tier1s doing this before xmas, model was to run here for a month anyway with the new version.
LFC
CMS not using their prod LFC so will be merged into General LFC. CMS happy but needs some wider advertisement.
Database
See report above.
- Migration to RHEL4 despite validation on RHEL5 but problems found there.
Worklaod Management.
See report above.
- Two new aliases for slc4/5 lxplus. Encourage users to move in this direction.
- Harry
- Are there too many slc4 CEs over slc5 CEs?
- Riccardo
- Work in progress, news CE nodes are being deployed for slc5.
- Maarten
- Newer lcg-ces are better anyway from load point of view.
- Harry
- How is the doubling of WN capacity going? Will it be there in October?
- Olaf
- Is delayed at delivery at the moment but FIO is ready to deploy when it arrives.
Authentication
- See voms report above
- MyProxy - trying to remove myproxy-fts from production service.
- On the horizon the voms enabled version of myproxy.
AOB
How often do we meet? Every two weeks - usual clashes with GDB/F2F SSC/TMB.
4th November for sure and possibly in two weeks time.
Reports and Minutes from Previous Meetings