Summary of GDB meeting, July 8, 2015 (CERN)
Agenda
https://indico.cern.ch/event/319749/
Introduction
Next pre-GDB/GDBs
- September meeting: many topics almost confirmed, agenda may be late due to holidays
- October meeting jointly with HEPiX at BNL
- No pre-GDB foreseen in September and October
- Probably pre-GDBs in November and December: several possible topics including HTCondor workshop, DPM workshop, Cloud and Storage
GDB evolution: still being discussed, more in September
ARGUS: EL7/Java8 support expected in September
- First tests already done and successful so far: need to build/test the compatibility matrix between server and client versions
- Packages available for all components
WLCG workshop: will take place in Lisbon first week of February
- 2.5 day, exact dates still being discussed
- More focused on in-depth discussion about WLCG future
Actions in progress
- Multicore accounting: almost done! ~10 sites in WLCG still having problems
- Demonstrated once again efficiency of GGUS tickets
- Experiments asked to fill the pages built on "class 2 services" and storage protocols: see slides for URLs
Forthcoming meetings
- HEPiX, BNL, October 12-16
- EGI Community Forum, Bari, November 10-13
- Call for paper deadline: July 12
- SuperComputing, Austin, November 15-20
Security
Update on activities about federated identities - R. Wartel
AARC (H2020 project) started
Sirtifi: WG on incident response for federations
- Significant US involvement
- Links with FIM4R and REFEDS
- Continue work on trust framework, agree how to express compliance in metadata
WLCG Pilot: currently only ATLAS participating, open to other VOs interested
User mapping difficulties
- Rely on email and ePPN (eduPersonPrincipalName) received from eduGAIN but ePPN is not guaranteed to be unique for a person over time and is recyclable
- ePersonUniqueID: non recyclable but nobody implements it
- Require also to register a nickname (CERN username) in the VOMS server of the VO: registering/checking takes ~9min...
- Requires a unique STS service shared by all the portals for a VO
- CERN username as the unique ID to push IOTA DN to VOMS
- Persistent ID format different from one SP to another one: makes parsing difficult...
- AARC is supposed to tackle this issue
- Influence over IDPs is currently limited, many do not implement or release certain identifiers
Next steps
- More apps
- Use of ePPN or better when available
New Threats - R. Wartel
Landscape has changed
- Datacenter security is as important as laptop security
- Datacenter compromises mainly by administrator credential thefts
- Linux = Windows as far as security threats are concerned: main attacks target both platforms
Attacks more and more sophisticated
- Very customized to match information that may be expected by the users
- Includes fake conferences with sites very similar to an official conference
- Exemple: ICFNP in Istanbul, RD89 meeting (RD89 doesn't exist)...
- RD89 example (see slides) very sophisticated with cascaded malicious payloads advertised by an email without any sign of being a malicious email
Angler Exploit Kit: the most advanced/impressive EK available today
- Payload encryption, AV and VM detection, fileless, daily URL changes...
90%+ of breaches caused by spear phishing
Antivirus now highly ineffective
- Attackers prepare an undetected variant of the malware and sent a short, high intensity burst of spam: AV cannot cope with such short attacks
Objective: raise the bar as much as we can afford, no perfect security
- For the most sophisticated attacks and the government security agencies attacks, little chance to win... focus on protecting your people
Security Policies Update - D. Kelsey
Most work common between WLCG and EGI
AUP revision to include all EGI service offerings
- Infrastructure agnostic
- Includes requirement to acknowledge support in publications
- More work needed on data protection issues
VM endorsement and operation: adapt to reflect current use cases
- Important for EGI FedCloud
- 2 roles defined: VM Operator (privileged) and VM Consumer (unprivileged)
- Soon ready for distribution and public comments
Personal Data Protection
- Originally issues related to X509 DN (accounting) but need to generalise to cover all forms of logging
- WLCG's global scope makes this harder
- User need to be informed of Policy whenever they register/use a service
- GEANT Data Protection Code of Conduct: create a trust framework between SPs and IdPs for IdP to release attributes
- Transfer data outside EU is even more complex: requires many bilateral contracts between SPs andIdPs
- Now evaluating the use of a single policy "Processing Personal Data"
- All EGI/WLCG participants are bound to this
- "Binding Corporate Rules" for international data transfers
- EU Data Protection: new regulation supposed to be agreed by the end of the year but currently 3 different drafts from Council of Ministers, Commission and Parliament...
- At some point, may need to adapt our policy again...
Federated identity relies on IGTF IOTA profile
- No F2F identity vetting done
- Robust identification done by the (LHC) VOs
- But trust is CA is per site and not per VO: need mechanisms to restrict certificates to VO members
Discussion
- What's the OSG view on this?
- Dave: working on this, should involve them as early as possible
- What about Japan/BelleII, since they share a lot of sites and services and are willing to collaborate with WLCG?
- Dave: let's involve them too - aiming for a single doc with different details per use case (monitoring, accounting...)
- Remember that access to monitoring info is an operational issue - now sites struggle to see VO monitoring data
- Dave: policy would enable but not mandate this, should be reviewed on a case by case basis
- Romain: anonymising data would allow freer publication, currently too much data are either public or private
HEPiX Report - H. Meinhard
Last meeting in March, Oxford: record participation (134 registered) many first timers again
- Agenda: https://indico.cern.ch/event/346961
- IPv6 tutorial and Ceph BOF in addition to regular presentations
- IPv6 tutorial: many participant desktops configured to IPv6 successfully during the tutorial
Storage/FS: confirmed hype around Ceph/CephFS, storage remains a hot topic!
- Also BeeGFS at DESY
- AFS seems to have no long term future in HEP, in particular because of the absence of planned IPv6 support
Clouds: increasingly used for HEP workloads
- Private and public clouds
- Containers emerging as an interesting technology
Computing/batch
- Benchmarking: discussion on a fast benchmark for estimate the perf of a given (virtual) machine
- Use case different from capacity planning
- HTCondor gaining momentum
- cgroups support maturing, use increases
SL vs.
CentOS: diversity not seen as an issue
Next meeting at BNL: grid/cloud session coorganized with GDB (Wednesday)
- Large participation from GDB attendees encouraged: attendance to the full week preferred
EGI
ARGO: new monitoring infrastructure - C. Kanellopoulos
Flexible and scalable framework for status, availability and reliability
- Multi-reports
- Multi-tenants
- Modular architecture
- Integration with external tools
- Relying on standard components/technologies
- Developed by GRNET, SRCE and CNRS
Service monitoring still relying on Nagios
- Using same probe conventions
- Some add-ons added
Availability and Reliability: several profiles possible
Support several deployment models
- Distributed monitoring with centralized reports
- Centralized model
- Distributed monitoring with local and centralized reporting
Status
- Run since 1 year in // with the SAM infrastructure: http://argo.egi.eu
- Comparison showed no major problems
- Using the Message Broker Network to report results
- Currently using distributed monitoring with centralized reporting: investigating migration to a fully centralized model
Discussion
- Topology: from external source, dynamically (daily) updated
- No built in topology
- No notion of service, site... built into ARGO: everything defined per customer
- Several infrastructure can use different topologies: important to support infrastructures as different as EGI, EUDAT, Prace...
- The same tests can be used differently in different topology context (e.g. global EGI vs. NGI)
- Topology changes are taken into account transparently for tests run after the change
- A/R calculation: flexible aggregation of results
- Tests grouped into groups and aggregation rules defined on groups (AND, OR, percentage of available resources...)
- Contact between ARGO and SAM3 developers: both projects started at the same time because of the same limitations seen in SAM. Both projects designed along similar principles despite the absence of coordination/discussion. Discussing experience and implementations would be useful.
- One difference is the deployment model: SAM3 built on a single, centralized deployment model, led to some simplifications. Christos: not much effort put for keeping the current distributed deployment model, most efforts in offering an infrastructure agnostic to infrastructures and customizable to each customer use case.
- Christos, Julia and Pablo will follow-up offline
FedCloud and Community Clouds - T. Ferrari
FedCloud: open hybrid cloud federation
- Different levels of federation possible offering various degrees of interoperability
- Multi-tenant model: common set of procedures, choose the interoperability level that you need
- Low barrier for joining
FedCloud common services
- SSO for authn and authz
- Federated accounting
- Service registry (GOCDB)
- Federated information discovery: compute and storage endpoints, list locally available VM images
- Users query AppDB, Platforms can use LDAP queries
- Federated monitoring: availability, reliability
- VM image catalog : EGI endorsed images, VM image management through a central registration point
EGI
FedCloud made of Cloud Realms that are subsets of cloud providers exposing homogeneous resources
- E.g. Open Standard (OCCI) Cloud Realm
- Mandatory services: AAI, service registry, accounting, monitoring
- Also support Peer Realms: mandatory services reduced to AAI and policy compliance
- Base for federating worldwide, e.g. NeCTAR (Australia)
Cloud platforms: community-specific tools/data/apps built on one or several Cloud Realms
- E.g. VOSpace (defined by International Virtual Obs Alliance), Joint EGI/CANFAR (Canada) effort to build a distributed international cloud for astronomers
Discussion
- Tiziana: how WLCG could be part of this FedCloud landscape?
- Maarten: don't see it happening soon for CERN as we are lacking a use case. But WLCG VOs started to make use of FedCloud resources.
- Michel: positive step forward that FedCloud model now includes this Cloud Realm concept that doesn't make mandatory to embrace all the initial technical choices made by Fedcloud. Should make much easier joining.
- Tim: still have an issue with OpenStack with the AAI requirement (based on VOMS) as it relies on a component which is not mainstream and is in fact not working with OpenStack versions released in the last year.
- Ian: need to clarify that there is no concept of a WLCG cloud. There are cloud resources used by WLCG and clouds operated by WLCG sites. Each site may have its own reasons for joining or not joining FedCloud: WLCG has no specific role in this.
IPv6 Update - D. Kelsey
IPv6 important growth according to global Google clients: 35% of Belgium, 20% of USA and Switzerland
- IPv4 address pool exhaustion progressing everywhere: Europe being the less affected
Recent news from the group
- Still IPv6 routing problems at CNAF: being worked on
- F. Prelz started work on XRootD for testbed
- ATLAS: 2 sites receiving test transfers on IPv6 space tokens
- LHCb: new DIRAC version released and about to be tested
- DESY: IPv6 enabled on EDUROAM
- NDGF about to move to dual-stack
FTS3 testbed
- gfal-copy fails to use IPv6: seemed coincident with a gfal2 upgrade but turned out to be related to a Globus upgrade
- Underline that things are not yet robust with IPv6 support
Dual-stack services in production at several sites: need more testing before recommending widely
- 2% of endpoints in central BDII dual-stacked: slight increase in the last year
- Also an issue with SAM3: largely linked to IPv6 support not ready in SAM-Nagios
- The OS is still SL5, which does not fully support IPv6 by default and has indeed failed for us there
- The SAM port to SL6 is being worked on
- Not confident yet that there are no other problems: more tests planned next Fall
- Plan is to have a separate IPv4 and IPv6 infrastructure to help disentangle problems
Experiment news
- ATLAS: all T1s with a dual-stack perfSonar by last April, all T2Ds by August
- CMS request to have substantial fraction of CMS data available through IPv6 in AAA by the end of the year to enable IPv6-only WNs
LHCOPN: many sites with plans by the end of the year...
Next steps
- Add XRootD to the testbed
- More robust production dual-stacked testing
- Dual-stack SAM testing: probably a showstopper for wider dual-stack production services
- Another workshop/training event early 2016?
Discussion
- Maarten: IPv6 support not yet the highest priority topic but, from what was reported/seen this year, 2016 may be the IPv6 year in WLCG!
- Michel: progress may not seem very spectacular but we are now in a position where we should be able to do it if there was a need to rush. Most critical issues are now either solved or about to be solved (by the end of the year).
- When Run 2 reaches a steady state, sites may be more available to tackle the IPv6 issue: main work to be done is sometimes with the basic network infrastructure readiness rather than with grid services.
OSG Update - B. Bockelman
Mistake with time zone: Brian didn't join.
- Presentation postponed to September
--
MichelJouvin - 2015-07-08