Summary of October GDB, December 13, 2017 (CERN)

Agenda

https://indico.cern.ch/event/578992/

Introduction - I. Collier

presenation

  • Some scheduling updates for 2018 GDBs
  • Upcoming non-GDB meetings, European HT-Condor workshop to be added.
  • CVMFS workshop and colocated pre-GDB on HPC usage, the pre-GDB isn't
colocated with the CVMFS workhop and not the day before GDB.

WLCG SOC workshop report - D. Crooks

presenation

  • 27 attendees
  • Work where they participants deployed MISP, then setup event syncing
with the WLCG instance.
  • High level discussion on how to manage trust and intelligence sharing.
  • Several sites also deployed Bro.
  • Workshop thinks they were successful with excellent discussions.
  • Next workshop in half a year, maybe extended to 2 full days.

Asia Tier Center Forum 3 Summary - Sang Un Ahn

presentation

  • Motivation for the forum started out with the need for netwrok
interconnectivity between Asian sites. So this is the short-term focus to improve. Long-term focus is also to open up to other collaborative issues in Asian sites.
  • The latest even had 30 participants from 13 sites. Internetworking
in Asia betwen sites has been much improved in the last couple of years after the first Asia Tier Center Forum.
  • Site reports, where some smaller sites have very limited bandwidth, and
some have good connectivity.
  • Network provider reports:
    • Taiwan is planning for an open exchange point in Taipei. 100G links over
the Pacific ocean plenty. TEIN is upgrading links to Hong Kong and Korea to 10G.
    • LHCONE connectivity has improved signifincantly since 2016, but still
a few important links are missing.
    • Since basic connectivity is starting to be solved, future meetings could
focus a bit more on end-to-end connectivity issues etc.

Benchmarking update - M. Alef

presentation

  • The performance of LHC applications (and benchmarks) when oversubscribing
physical cores has gotten more data since last update a couple of months ago. In particular CMS data has been contirbuted.
  • The different experiment workloads scale differently, a single benchmark
will not be a perfect fit, so a procurement benchmark will have to match an average.
  • For the usecase of usage of hyperthreading, it has been clearly shown
that DB12 does not have the same scaling as experiment workloads for moving to more job slots per CPU on common CPU models.
  • The benchmarking working group needs input on how sites are configured
with regards to job slots per CPU core.
  • Compiler flags for new SPEC CPU based benchmark needs to be defined,
feedback from experiments on what flags they use are needed.

Indigo DataCloud outcomes - D. Salomoni

presentation

  • Indigo datacloud had a vision.
  • They released a software name Eletricindigo that provides components to have services to meet this vision.
  • Lots of improvement to free software has been developed and pushed upstreams.

the eXtreme DataCloud project - D. Cesini

presentation

Notes

  • Project started in November so this is a summary of proposal
  • extension of INDIGO PaaS Orchestrator
  • Smart caching
  • storage systems in some fashion ingest other storage systems
  • Data in remote location should be accessible quasi-transparently
  • Caching managed by storage system
  • 2. Cache federation at a single site
  • 3. Dynafed shown but it is to be decided
  • 4. Several research communities expressed interest in ONEDATA
    • biggest question is searchability
  • Metadata
  • Data in different formats/sources
  • metadata system different to all source systems
  • metadata handling
  • high level architecture ("stratosphere")
    • shows components
    • orchestrator + data management

Questions

Ian: Look forward to hearing more once more concrete outputs

EOSC hub introduction - D. Salomoni

presentation

Notes

  • What is EOSC?
  • each of E. O. S. C. is not a given
  • limitation in EU funded research infrastructure
  • any components in EOSC must be production ready
    • no dev, production and maintenance only
  • build a hub
    • system integrates "fragmented" scenario of EU resources
  • reuse solutions taken from common areas
  • EOSC-hub: free at point of use
  • ~true
  • providers funded by project
  • common view of policies
  • tools to satisfy demand for solutions
  • Innovation does not mean dev of services
  • unified service catalogue
  • 4 main blocks
  • some thematic component servcies preselected
  • internal call for thematic services
    • see outcome of selection process
  • DODAS
    • EOSC-hub, not so much CMS but more general
    • containers on htcondor
    • AMD: standalone batch system
    • see earlier talk
    • modular - generic/expt free workflow
    • (Latchezar) that means elements in every DODAS: Docker is req to work?
    • Discussion around services on services - what does DODAS need?
    • Davide: everything designed to run on bare metal, whether provide VM or bare metal
    • This is more general than, eg, Singularity for CMS
    • Azure, private cloud, open nebula...
    • Typical use case, exploit resources
    • Flexible - Docker, uDocker, perf impact negligible
    • Q: CVMFS on Docker
    • A: Currently CVMFS on VM, will move: interesting thing, immediate perf benefit
    • Not linked to specific experiment
  • Technical Coordination WP
  • Potentially interesting to all EU parties

Questions

  • Jeremy: good overview. EGI provides same things already. EGI vs EOSC-hub?
  • Davide: Good question. EGI provides services
    • Some funded by EGI fees
    • Some not fees but by EOSC-hub
    • eg helpdesk/argo funded by project
  • How to access EOSC/EGI?
  • blurred - EOSC meant to take over, merge services (EGI/EUDAT)
  • single entry point to make services discoverable
  • depends on community
  • some communities used orchestrator
  • vision EGI -> EOSC
  • Hannah: Single endpoint - you mention 3?
  • Davide: Don't have a clear answer
  • over course of project entry point harmonised
  • all three OIDC
  • together with token translation
  • not that difficult
  • decided not to decide [in first instance]
  • see what required by community
  • standards based, don't see problem; issue at social level

CNAF Status Report - D. Cesini

presentation

Notes

  • flood not rain
  • from [prior] rain, know flood is a risk, provision made
  • it worked - sealed room with pneumatics
  • access from under DC
    • floating floor
    • access underneath is difficult
  • fire brigade trapped by hole
  • 25cm water main, 8 bar
  • eletrical room under DC 1.5m water coverage
  • DC 20 cm over floating floor, 2U
  • system disconnected IT equip.
  • but tension in main line; discharged into ground
  • steam around racks -> don't know full extent because of this
  • mud was left
  • Bottom row of tapes affected
  • mud and dust on motherboards
  • Fibre channel cards -> expensive
  • Told that most tapes recoverable
  • survey with expts will copy from other sites
  • Oracle library - rechecked.
  • 20000 euro to start up and check
  • nearly all storage disk systems involved
  • DDN - RAID arranged vertically, only lose bottom
  • Other experiments RAID created horizontally -> potential data loss
  • hired company that specialises in this kind of intervention
  • 60 kW for NREN POP
  • 2017 tender installation adds complexity but important to provide place to transfer data to
  • now just have electricity -> then equipment
  • until end of year no UPS, not until January
  • Start adding production in beginning February, not before
  • 1000 wet disks
  • cleaning with special liquid, remove dust
  • special oven, pressurised, 60 degrees
  • statistics not bad, ~80% where disk broken, same JBOD or not
  • LHC, CMS confident no data loss, others in danger

Questions

  • Jeremy: Recovery strategy assume can't happen again?
  • Daniele: First step to understand how water entered - through floor but how?
  • Have to understand and fix this
  • Matthias: Can't move to new building?
  • Daniele: Long term plan (2020) to move all to new location more suitable for DC
  • Possible to move to this new facility now? No.
  • Latchezar: Good luck, we understand that it's a huge job
  • Storage affected in different way. Los 4 out of 4 front end servers, replace?
  • Daniele: easy to replace front end, provisioning

Authz update - H. Short and B. Bockelman

AuthZ Update AARC2 Grid Community Forum/SciTokens

Notes

  • Hannah, 10 min: Brian, other updates in US
  • Hannah
  • AARC: EU research communities
  • EGI: EOSC-hub also there
  • likely policies based on WLCG/WGI
  • FIM4R: SAML, OIDC
  • white paper by 2018
  • then tech provider come up with solutions
  • Community proxies important , CERN SSO
  • Model accepted as necessary
  • white paper express complexities
  • list of things we can't do ourselves
  • become reliant on eduGAIN
    • not built for operational support
    • more sense to be more support, eg helpdesk
  • Authz pre-GDB
  • Look on paper where we are, where we want to be
  • -> pilots
  • workflow diagram
  • where we are
  • where we want to be
  • new structure becomes a lot more complicated
  • does help the user
  • gaps shown
  • VOMS provisioning unsolved bit
  • CERN SSO community proxy for LHC
  • Now is a good time to comment
  • JWT
    • No big decision
    • Get down on paper
    • people identified
    • orgs identified
  • Brian
  • 2 updates
    • Globus retirement
  • Normal support ends 31 December
  • Security support ongoing
  • [Grid Community Forum ] recent builds going green on Travis
  • Maareten: until now, EPEL was closely related to globus project
    • now putting stuff with nothing to do with them but with their names - are they OK?
    • need our own repo?
  • Brian: EPEL maintainer is on board with this
  • Globus doesn't manage any of the repos we used
  • trademarks - trying to make clear not through globus org.
  • They have guidance on website, this seems a use case, I Am Not A Lawyer
  • Maarten: not too worred about that aspect.
    • What about other communities submitting pull reqs - do what we can?
  • Brian: Other communities, PRACE, XSEDE that can accept pull requests.
  • Need 2 distinct people to put code in
  • easy to block
  • forces people to get along, worked out OK so far (Apache principles)
  • Think this will work out, so far OK, time will tell
  • Now, SciTokens
  • because claims are well defined, optimistic thgat WLCG can come up with a profile largely compatible with SciTokens
  • GFAL: hello world - like transfers

Questions

  • Daniele: Plans, HTCondor integration with SciTokens, expand?
  • Brian: Contributors include Condor team,
  • want Condor to be able to manage OAuth tokens
  • Condor will manage proxies, refresh them etc.
  • By March similar setup for OAuth tokens, inc SciTokens but want it general
  • Q: JWT, HTCondor, translate to x509, nice to use directly - offer of beta tester
  • Brian: Idea would be rather than use token translation service, use natively

-- DavidCrooks - 2017-12-20

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2018-01-05 - IanCollier
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback