Summary of GDB meeting, December 11, 2013 (CERN)

Agenda

https://indico.cern.ch/conferenceDisplay.py?confId=251192

Welcome - M. Jouvin

Next meetings

  • March planned outside CERN: proposal to host it must be sent quickly to Michel
  • Pre-GDB expected confirmed for January and February, probably one in March (outside CERN)

VO-based SAM tests: potential scheduling issue as test jobs are run in competition with production jobs

  • No easy solution
    • Do we care about the status of site when the VO has no share available?
  • Agreeement we need consistency in the handling between VOs: all VOs seems happy not to consider a timeout as a failure

Security

Identity Federation pre-GDB - R. Wartel

25 persons

Identity federation

  • Follow-up discussions on the pilot
  • Not only a technical problem: devils in the details!

Need a policy for "attributes for WLCG": which attributes are required from the users, when/how they should be released

  • Persistent ID
  • Back channel to the user (e.g. email address)
  • VO membership and roles
  • Real name: not to be released except under very special circumstance like incident handling

We should accept to lose Common Name in credentials

  • Requirements to comply with data privacy policies

Transfer LoA on VOs

  • Make easier for existing IdP to contribute to WLCG

WLCG should build on existing federations, generally based on NRENs

  • eduGAIN seems the appropriate forum

Building blocs identified and exist and several successful pilots done but unclear how to fit them together

  • Concentrate on web applications as CLI is difficult with the lack of ECP availability currently
  • Would be useful to have a pilot app from an experiment, like downloading a file with gridftp from a web portal
  • Use the experience gained with a few pilot apps to build a strawman architecture

New Authorization Profile - D. Kelsey

Current authorization profile is based on ID vetting done by CA: works well for structured communities like WLCG but doesn't fit all communities, IdPs and stronger authorization needs

IGTF proposal: Identifier-Only Trust Assurance with Secured Infrastructure Authentication Profile (IOTA)

  • Lower ID vetting by CAs
  • Transfer level of assurance to communities/VOs
  • Persistent unique identifier: the unique identifier will never be reused for a different user/entity
    • Generated by authorities using secured and trusted infrastructure
    • Renewal or re-keying must ensure that this is the same entity that is requesting the renewal as the original one
    • Nothing prevents an entity for requesting several certificates with different identifiers but this is already the case
    • Recommendation to issue long lived certificates
  • Authorities are required to collect only the user data necessary to ensure ID uniqueness, not the full traceability
    • Must be used with complementary services/informations managed by other authorities doing the strong vetting process
  • No requirement for incident handling
  • Current draft uploaded to GDB agenda (original page not publicly accessible)
  • Input from CILogon Basics and UK SARoNGS
    • Accept federations/IdPs that do not perform F2F identity vetting (photo-ID)
    • Accept federations/IdPs that refuse to release common names

Many operational issues

  • Identity vetting: for WLCG VOs could be done by CERN HR
  • Are site accepting not to have the common name?
  • VO should know the name of the user and have way to contact him if necessary
  • IOATA CAs may operate trusted credential repositories that can be used by other services
  • Will be difficult for sites to have a mix of VOs IOTA-compliant and VOs that are not

TERENA about to call for tender for the renewal of TCS service: IOTA may be added as a requirement for the next version of the service

EGI core services provisionning and UMD decommissionning plans - P. Solagna

UMD2: security updates only

  • End of support has been fixed to end of April 2014: according to policy means updgrad by end of May
  • Jan. 31: First broadcast about EMI-3 upgrade
  • Feb. 28: start of alarms to site
  • May 31: sites not upgraded eligible for suspension
    • At least service downtime
  • UMD3: no specific problem known, except the VOMS client memory leak on the WN
    • Problem not fully understood yet
    • Not considered as a showstopper, workaround available for affected VOs (mainly Atlas)

New core service provisionning will start May 1st

  • Affected services are services run by EGI.eu that will be run by NGIs in the future
  • Handover process, where applicable, is starting
  • Message brokers network will be provided by GRNET and SRCE
    • The main change is that CERN will not contribute anymore
    • Not much impact on the service except that the test infrastructure will be discontinued
  • Accounting: still provided/operated by STFC but with only one instance
    • Also 2nd level of support for APEL by STFC
  • Accounting/metrics portal: no change
  • SAM central service: SRCE, GRNET + CNRS in replacement of CERN
    • Also a migration plan to a new availability/reliability calculation engine for EGI
    • CERN will continue to operate the central SAM until the end of handover by CNRS
  • Monitoring central tools (e.g. probes for MW decommissionning and security): SRCE, GRNET
    • CERN stopping its contribution
  • GOCDB: no change (STFC)
  • Operational support services (COD, catch-all services, dteam): GRNET (catch-all VOMS/CA, dteam) + CYFRONET (COD)
    • SARA stopping its contribution
  • Security tools: NIKHEF + CESNET + SRCE/GRNET for security-related Nagios componenents
  • Security coordination: FOM, STFC as currently + SNIC
    • Security training to be funded by EGI mini-project (link with EGI 6-month extension)
  • UMD criteria: will continue as today by CSIC + FCTSG (IberGrid)
    • No new criteria added in 2014
    • Reducing the testbed
  • Staged rollout coordination: IberGrid rather than LIP (CISC added)
    • No expected changed despite a small effort reduction
  • SW provisionning tools: CESGA in addition to GRNET
    • UMD repositories and release tools
    • Release management still handled at EGI.eu
    • Minimum requirement for availability reduced to 90%
  • Helpdesk
    • KIT committed to continue support/operation/development for GGUS
    • 1st and 2nd level support: CESNET + IBERGRID instead of CESNET + KIT + INFN, both level merged, SAM and APEL maintained by sites operating them

Plan for core services provisionning is independent of EGI-Inspire extension: funded by EGI partners

Discussion on the impact on WLCG of further reduction of services provided by EGI

  • Maarten: WLCG has been relying on UMD provisionning and 1st/2nd level support provided by EGI, will be difficult to live without them
  • No consensus on whether WLCG can live without them in the future but WLCG will continue to use them as long as they are available

Actions in Progress

Ops Coord Report - A. Sciaba

TF changes

  • New TF approved on multicore deployment, led by A. Forti and A. Perez-Calero
    • Focused on grid resources
  • 2 TF cancelled : data access and dynamic data placement

SL6 TF completed

  • 92% of the infrastructure upgraded at the end of October
  • EMI3 WN usable

CVMFS

  • ALICE deployment going more quickly than anticipated: new deadline advanced to end of this year
  • A few operational issues (in particular with caches): SAM probe being added
  • NEw baseline version: 2.1.15

glexec

  • Still 30 sites having to deploy it (some not yet migrated to SL6)
  • ATLAS and ALICE need developments to use it

perfSONAR

  • 3.3.1 upgrade deadline: April 1st
    • Sites with older version already received tickets

FTS3

  • Service stable in the last 2 months
    • 30% of production tranfers for ATLAS, 100% for LHCb
  • Deployment scenario: single instance favoured

Machine/job features

  • Recent contact with I. Sfiligoi's development in CMS to avoid the draining waste of time
    • Involves bi-directional communication between machine and pilot
    • Intention to merge both approaches

IPv6

  • CMS has started testing with promising results
  • ATLAS about to start DDM tests

WMS decommissionning

  • Still a small usage in CMS for analysis: migration to be done
  • LHCb: still 20 sites to move to direct submission

MW readiness

  • TF will have a kick-off meeting tomorrow...

Christmas break: all experiments will continue some activity but happy with best-effort support provided by most sites during this period

SHA-2 readiness - M. Litmaath

Main concern is SEs

  • Last EGI OMB, Nov. 28: 5 dCache, 7 StoRM to be done
  • OSG: BNL planned Dec. 17, FNAL just started, aiming to be ready at the end of the month
  • dCache SRM client needs to be updated to 2.6.12 to be able to connect to handle SHA-2 host cert
    • Not necessarily urgent...
    • Last client being released as part of EMI3 update 11, EMI2 update being prepared

Experiment frameworks: lots of testing, no problem found so look ready for SHA-2

  • Tested with CERN test CA, need to be confirmed with first real user certs

By mid-January, WLCG infrastructure should be ready

  • Unlikely that SHA-2 cert will appear before next year

VOMRS: deosn't support SHA-2 compliant and not maintain anymore

  • Should be replaced by VOMS-Admin but different GUI and CLI delayed adoption by experiments
  • VOMS-Admin test setup started a few weeks ago
    • Loaded with VOMRS data
    • Some instabilities being investigated
  • In the meantime (workaround), SHA-2 certificate can be uploaded as a secondary cert in VOMRS if the user has a SHA-1 certificate

Migration to new generation of DM clients - A. Alvarez Ayllon

gfal 2.3.0 released

  • GFAL1 in maintenance mode
  • ABI and API incompatible with GFAL ones but many advantages (protocol independence, less dependencies...)
  • 2.4.8 about to be released: mainly FTS3 related changes
  • 2.5 will bring LFC registration support and multiple BDII

gfal-utils ready for testing: feedback needed

  • Replaces lcg-utils
  • No replacement for lcg-stmd
  • Partial replacement for LFC related commands
    • lcg-cr is 2 step operation now

Released only in EPEL

  • EPEL-testing first

lfn:// deprecated and replaced by lfc://

Discussion

  • Claudio: any plan for a GFAL2 plugin for ROOT to replace the GFAL1 one
    • Oliver: no known plan, to be further rediscussed if there is a real need
  • LHCb: will move to to new DM clients as part of the effort in progress to move to native FTS3 clients
  • CMS: no firm plan yet but no major difficulty expected
    • Using only the CLI
  • ATLAS: to be checked offline

Action

  • Experiments must update develop team with their concrete plan to move to new DM clients (GFAL2 and FTS3)

Network

LHCOPN/ONE Report - E. Martelli

New T1s:

  • KISTI ready to connect to LHCOPN with a 2 Gb/s link
  • Russia: problem to connect to LHCOPNE, aiming at connecting LHCONE first using a connection to Starlight in Chicago... Not optimal

OPN evolution

  • Triggered by I. Fisk presentation on evolution of computing models at WLCG workshop
  • Discussion about opening LHCOPN to T2s and possibility of merging of the LHCOPN and LHCONE
  • Workshop planned on Feb. 10-11
    • Overlap with Ops Coord F2F: adjust agendas to minimize overlap

LHCONE L3VPN: 50 sites connected

  • All French sites

P2P service: on-demand service available at most providers in Europe and US, would like to demonstrate their service with LHC use cases

  • Looking for sites wanted to be involved in this effort
  • Also need integration with the experiment SW
  • Coordinated by M. Ernst and the Network WG

100G transatlantic link just put in production

  • Currently a demonstration that it can be used

IPv6 WG Report - D. Kelsey

IPv6 usage taking off/growing: Google reported 2.5%, growing

New testbed sites: good site coverage now but still very few machines

  • Also increased involvment from all LHC experiments, CMS the most active
  • Lots of testing activities

WLCG Ops Coord TF created: working together

  • Focusing on some concrete use case: see slides
  • Dual stack being tested at Imperial College

IPv6 file transfers since 8 months (T. Wildish, CMS)

  • 1 GB file with gridftp (UberFtp)
  • Monitoring time to transfer and errors
  • Good success rate (87%) if taking into consideration that this is pure best-effort
    • 2 PB transferred

CMS PhEDEX/FTS3/DPM

  • Dual-stack FTS3 server at Imperial College
  • 2 IPv6-only DPM at Imperial and Glasgow
  • No problem: PhEDEX is production ready on IPv6

Stress testing important to find problems

  • E.g. FNAL has been using IPv6 for more than a year and never noticed a problem on their border router due to a misconfiguration
  • Useful for a site to be in the testbed!

Dual-stacked production tests: Imperial configured a subset of almost all their services (including core service like DNS, NFS, SSH) to dual-stack

  • Using Stateless autoconfiguration
  • No problem observed: no need to turn off IPv6

SW and tools survey in progress: need to cover all apps

  • when IPv6 readiness is known, can be registered. Else need to be investigated further.
  • http://hepix-ipv6.web.cern.ch/wlcg-applications
  • DPM, StoRM, dCache all work in some configuration
    • May require some specific configuration
  • Last Globus (5.2.5) fixing the globus ftp client issues found by the WG at the beginning of its tests
  • xrootd: will have to wait v4, expected beginning of next year
  • Batch systems: many known problems, work in progress

Testing activities planned for 2014

  • Try more T2s with dual-stacked services
  • Glasgow, CERN, KIT deploying larger test clusters
    • IPv6 WN at CERN in 2014?
  • Decide a target date for large deployment of dual-stacked services?

Next F2F meeting in Spring/Summer at CERN

Conclusion: good progress despite limited effort

  • Sites encouraged to join: contact Dave

IpV6@CERN - E. Martelli

IPv4 depletion foreseen during 2014 based on current usage and growth seen in last year

CERN approach to IPv6 decided one year ago

  • All machines/devices dual-stacked when possible
    • True for GPN only, can be done on experiment networks when they want, probably not done on technical network (accelerators) before next LS
    • Every device with an IPv4 address has an IPv6 address in CSDB, independently of the real use of it
    • DynDNS for dynamic/portable devices
  • Identifical performances as IPv4
    • True everywhere except for the IPv6 firewall bypass (expected next year)
  • Same provisionning tools
    • True for the main tools: cfsmgr, CSDB, WebReq
  • Same network services
    • True but with some restrictions
    • IPv6 address returned from DNS only if querying the ipv6 zone (to avoid timeout with device without IPv6 connectivity) until the device is flagged as IPv6 ready
  • Common security policies

dhcpv6

  • For static and dynamic (portable) devices
  • IPv6 address returned only if the device is flagged as IPv6 enabled in CSDB
  • Unknown portable registration can only be done through IPv4 but they can use IPv6 after registration
  • Know issue with CERN MAC address authentication as dhcpv6 client doesn't have to use the MAC address of the interface. Will be fixed by a new RFC.

IPv6 ready flag is triggering opening the appropriate thing into the firewall for IPv6

Next steps

  • January 2014: deploy SW for main firewall, training for support, user information
  • January: dhcpv6 for IT departement
  • February: dhcpv6 for static devices
  • March: dhcpv6 for dynamic devices
  • General IPv6 availability at the end of 2014Q1

WLCG Global Service Registry - M. Alandes Pradillo

VO Information Systems are the authoritative source of info for VO information

  • Partly duplicating BDII information
  • Some specific VO information about services like internal VO names
  • Full control by the VO

GSR is an attempt to provide VO with a central authoritative source of information, hiding the different sources and avoiding the duplication of effort by each VO

  • Dynamic aggregation of different sources
  • Unique entry point, single interface
  • Caching, including ability to fix problem in information published by sources
  • No intention to replace VO configuration databases/sources but to simplify their maintenance and increase their consistency

ALICE and CMS are currently not interacting with information system services (GOCDB, OIM, BDII) but are interested by GSR

  • ALICE has no effort available in the short term
  • ATLAS has been strongly involved in the prototype phase and will keep on integrating GSR into AGIS
    • In fact GSR has been designed with the AGIS use case in mind...

Positive feedback after the first prototype but problems remain about what is the authoritative sources for VOs

  • If a VO doesn't trust GOCDB, this will not work if it is used as a source by GSR
  • Need a real use case to make further progress
  • Need to evaluate the cost/motivation of VOs to migrate to GSR
    • Effort to move is not big, pretty simple.

Next steps proposed after this discussion: ATLAS and CMS should try to put some efforts in evaluating it and in attempting to integrate it into their VO information system for better assessment of benefits and shortcomings.

  • Specific meetings already exist to discuss this with VOs

HEPiX Report - H. Meinhard

Reminder: open to everybody (mainly sysadmins and service managers) interested

  • No formal procedure to apply: just register to the mailing list and to workshops

Last workshop in Ann Arbor, Michigan

  • 115 participants: record for North-American meetings
  • ~half from the US: many new faces from universities and T2s

Networking & Security

  • 100G now available for WAN and several successful tests demonstrating efficient usage
  • BNL mentionning looking at IPoIB as an alternative to 10G for the WNs: cost advantage, better perfs

Storage

  • openAFS: complex situation, 2 companies providing closed-source versions where most new features are added, no IPv6 support foreseen but now confidence that not really need in HEP (access restricted in the site)
  • CEPH: very promising, several pre-prod services including CERN and RAL
    • Currently mainly distributed object storage (block devices)
  • Very interesting talk from WD about drive reliabilities and new features planned for improving predictions: see slides!

Batch systems: situation now clearer

  • Several large sites moved to UNIVA GE
    • Oracle sold all assets to UNIVA
    • Little uptake at scale for open-source projects
  • Several sites looking at HTCondor: scalability seems impressive
    • RAL moved its production CE, CERN investigating
  • SLURM: several disappointing experiences
    • Not so good scalability with high number of jobs: more focused on high number of nodes (large HPC clusters)

Configuration management: Puppet is the clear winner nowadays, adopted by most sites starting with a configuration tool

  • Other configuration management tools (CfEngine, Quattor, Chief) still present
  • WG in charge of establishing best-practicies and promoting collaboration between sites in module development

Several WGs in HEPiX

  • IPv6: see Dave's talk
  • Benchmarking: lots of results collected, new CPU benchmark from SPEC expected for October 2014
    • Need to prepare for a new HEP benchmark, starting now: long process to validate the benchmarks, requires experiment participations, need to identify the people wanting to contribute
    • Discuss boundary conditions: OS and compiler versions, optimisation level...
  • Configuration management: see above
  • Bit preservation: technical advices on bit preservation as an input to DPHEP project
  • Energy efficiency: nothing at Ann Arbor but W. Shalter in charge of a new attempt at next meeting in Annecy

Next HEPiX meetings

  • LAPP, Annecy, May 19-23
  • Fall 2014: Univ. of Nebraska, dates to be defined soon
  • Spring 2015: a candidate site in Europe identified
  • Proposals to host meeting always welcome

-- MichelJouvin - 03 Jan 2014

Edit | Attach | Watch | Print version | History: r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r1 - 2014-01-03 - MichelJouvin
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback