WLCG Operations Coordination Minutes, Feb 7, 2019

Highlights

Agenda

https://indico.cern.ch/event/795889/

Attendance

  • local: Giuseppe Bagliese, Christoph Wissing, Alessandro Di Girolamo, Julia Andreeva
  • remote: Catherine Biscarat, Mikhail Titov, Maria Grigorieva, Jeremy Coles, Di Qing, Vladimir Romanovski, Jean-Roch Vlimant, Alessandra Doria, Daniel Abercrombie, Jose Flix
  • apologies:

Operations News

  • End of support of CREAM CE. The CREAM working group has announced that official support for the CREAM-CE component will cease at the end of the EOSC-hub project, i.e. in Dec 2020. To prepare for this, EGI Foundation and CERN are actively working to help to minimise disruption. This will include helping users migrate to alternative solutions, i.e. ARC-CE or HTCondor-CE. The CREAM product team will be providing full support until the end of 2019, including one minor release already scheduled. During 2020 only security updates will be released.
  • The workshop on migration from CREAM CE to other solutions will be held during the EGI Conference which will take place 6-8 of May in Amsterdam. More details later. If you would like to participate, to share your experience or concerns please get in touch with wlcg-ops-coord-chairpeople@cernNOSPAMPLEASE.ch

Discussion

  • Julia asked Pepe whether PIC can share experience of migration to HTCondor. Pepe confirmed that PIC would contribute to the workshop

Special topics

Preparation of the operational intelligence discussion during HOW 19 workshop

Discussion

  • Christoph and Daniel mentioned work in CMS in this area. Necessary machinery to collect data has been set up. Might be interesting to share experience.
  • Alessandro told that several projects were started in ATLAS, but not much progress demonstrated
  • Alessandro encouraged people to send him ideas for the session

Middleware News

Tier 0 News

  • OTG:0046088: CERN LSF public has now been decommissioned as of Wednesday 30th January 2019. A few dedicated shares are being handled separately.

  • OTG:0047300: ~30% of the CERN batch capacity is now on CC7. The remaining capacity will be migrated with the following schedule:
    • end March 2019: ~50% public/grid will be CC7
    • 2nd April 2019: lxplus.cern.ch alias change to CC7 (lxplus6 service will remain accessible on lxplus6.cern.ch), Default HTCondor target change to CC7 for local submission.
    • early June 2019: remainder of capacity will have been migrated.
    • OTG:0048002 ce511, ce512, ce513, ce514 can now be used to target CC7 (by default). Others will follow as we migrate more capacity.

  • Unprivileged user namespaces are being enabled on CERN CC7 capacity to support Singularity and other user-space container tools.

Tier 1 Feedback

Tier 2 Feedback

Experiments Reports

ALICE

  • High activity since the end of Run 2
    • Lots of everything: MC, reconstruction, analysis trains, user jobs
  • No major issues

ATLAS

  • Smooth Grid production over the last weeks with ~300-330k concurrently running grid job slots with the usual mix of MC generation, simulation, reconstruction, derivation production and analysis and a small fraction of dedicated data reprocessing. Some periods of additional HPC contributions with peaks of ~150k concurrently running job slots last months and ~15-20k jobs from Boinc.
  • The HLT farm/Sim@P1 is undergoing system and hardware upgrades and will only be available in April again.
  • CentOS7: ATLAS would like to start a more forceful migration to CentOS7 and have the vast majority of resources, if not all, migrated by June 1.
  • SCRATCHDISK: ATLAS would like to increase the SCRATCHDISK quota to 100TB per 1000 analysis slots
  • IPv6: if sites update to IPv6 dual-stack please let us know in advance. SAM tests are under development (a new SAM IPv6 dev node is sending the "normal" tests to sites, results to be understood)
  • DPM DOME upgrade: ATLAS still sees instabilities in the DPM DOME sites used already in production. This first has to stabilize before a larger deployment of DPM DOME can be considered in a few month from now. Discussions with DPM team ongoing to have clear understanding on what would be best to suggest to sites.
  • ATLAS sites jamboree and HPC strategy meeting, 5-8 March at CERN, https://indico.cern.ch/event/770307/

Discussion

  • IPv6. Alessandro mentioned that CMS is more advanced compared to ATLAS (CMS- 65%, ATLAS 35-40%). Giuseppe mentioned that CMS experiences some eventual problems at the sites which require intervention of site support (firewall problems, wrong configuration). CMS setup SAM tests, and ATLAS intends to do the same, so ATLAS can benefit from the CMS experience with ipv6 SAM tests. Di mentioned the problem of misconfigured BNL server. Alessandro replied that it should have been fixed last week, but had to be confirmed. One of the problems with ipv6 deployment campaign is that ATLAS does not really know which sites do have ipv6 which ones don't.

  • DPM DOME migration. Alessandro stressed that massive migration with deadline for migration can be started when there is a prove that migrated DPM sites work well and provide reliable storage. Alessandra Doria told that though migration was not smooth, now site (Naples) is working well. In her opinion it is better to wait for the next release (1.11) which should be out in the coming days. Julia suggested to invite DPM experts and to review this topic at the next meeting.

CMS

  • smooth running, compute systems busy at about 220k cores
    • usual production/analysis mix (75%/25%)
  • 2018 data re-processing being rounded up
  • Monte Carlo production ongoing
  • staging in B-parked data for reconstruction
    • ongoing with good performance
  • two incidents the last months where SAM3 got stuck
    • thanks to Nikolay and Simone for restoring service on the weekends
  • decoupling production services from EOS

Discussion

  • EOS instabilities were also experienced by ATLAS (fuse problem). Should be followed up with EOS team and invite them to the next meeting

LHCb

  • Data stripping, MC simulation and user analysis with ~100K jobs running
  • No major problems

Task Forces and Working Groups

GDPR and WLCG services

Accounting TF

Archival Storage WG

Update of providing tape info

PLEASE CHECK AND UPDATE THIS TABLE
Site Info enabled Plans Comments
CERN YES    
BNL YES    
CNAF YES   Space accounting info is integrated in the portal. Other metrics are on the way
FNAL YES    
IN2P3 YES   Space accounting info is integrated in the portal. Other metrics are on the way
JINR YES    
KISTI YES   KISTI has been contacted. Will work on in the second half of September
KIT YES    
NDGF NO   NDGF has a distributed storage which complicates the task. Discuss with NDGF possibility to do aggregation on the storage space accounting server side. Should be accomplished by the end of the year
NLT1 YES   Almost done, waiting for opening of the firewall, order of couple of days
NRC-KI YES    
PIC YES   Space accounting info is integrated in the portal. Other metrics are on the way
RAL YES   Space accounting info is integrated in the portal. Other metrics are on the way
TRIUMF YES    

One can see all sites integrated in storage space accounting for tapes here

Information System Evolution TF

  • IS Evolution task force meeting took place on the 24th of January. Main topic of discussion was the json structure for description of the computing resource (CRR). The latest version of the CRR format can be found here

IPv6 Validation and Deployment TF

Detailed status here.

Machine/Job Features TF

Monitoring

MW Readiness WG

Network Throughput WG


  • perfSONAR infrastructure status - CC7/4.1 campaign ongoing
    • perfSONAR 4.0 and perfSONARs on SL6 are no longer supported since Q4 2018 - please update ASAP
    • We have started ticketing sites, starting with T1s and major T2s
  • WG update will be presented at HEPiX in San Diego
  • WLCG/OSG network services were updated
  • WLCG Network Throughput Support Unit: see twiki for summary of recent activities.

Squid Monitoring and HTTP Proxy Discovery TFs

  • Nothing to report this month

Traceability WG

Container WG

Action list

Creation date Description Responsible Status Comments

Specific actions for experiments

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

Specific actions for sites

Creation date Description Affected VO Affected TF/WG Deadline Completion Comments

AOB

No AOBs next meeting is planned for the 7th of March

Edit | Attach | Watch | Print version | History: r15 < r14 < r13 < r12 < r11 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r15 - 2019-02-11 - JuliaAndreeva
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback