Summary of pre-GDB meeting on Cloud Issues, September 14, 2014 (CERN)
Agenda
https://indico.cern.ch/event/272791/
Cloud Status in WLCG - L. Field
At the first glance, a cloud is just an altnernative way of provisionning compute resources in a environment using pilot jobs
Image management challenges
- Provides the software, the configuration and do the contextualization specific to the cloud used
- Must balance between pre and post instantiation operations: don't want to rebuild the image for every change
- CVMFS is the key component to isolate the base image for the experiment requirements, needed anyway
- Main feature required from an image is to be CVMFS-ready: (micro)CERNVM is the foundation
- Transient perspective : an instance is not update but rather destroyed/recreated
- Need for automated image build/management tools
- Contextualization is a VO issue
Capacity management
- Not only need to start/manage VM but ensure that enough resources are started and that resources in excess are stopped
- Shift from resource to capacity management: ensure there are enough VMs running
- Require a componenet with some intelligence to decide whether a VM must be started/Stopped where
Accounting and commercial providers
- Helix Nebula as a pathfind project: a great learning experience
- Need to understand how to cross check invoices with real usage
- Need to run our own consumer-side accounting: course granularity acceptable, no need for detailed accuracy
- Ganglia is a good tool to base it on: already well adopted in the cloud world
- Accounting records can be derived from integral of historical data in Ganglia
Accounting and WLCG
- By default trust site accounting: no job information at the site level as there is no batch system
- Unefficient use of a VM must be accounted to the VO
- Job information in the VO domain
- Dashboard to correlate them
- Prototype developed at CERN
- Core metrics: time, capacity, power, total computing done, efficiency
- Need to ensure they are collected in consistent way by sites and VOs: work in progress
- In commercial clouds, resources are provided by "flavour" (num of core, mem, ...), defined in SLAs
- Reuse the same appoach with our community clouds?
- SLA monitoring ?
- Site rating against SLA?
- Benchmarking is not really required inside a VO: VO has enough information on executed paylog. But benchmarks are required to compare VOs...
Volunteer computing has a role to play
- Should try to get as much as possible from it before going to commercial providers
Cloud adoption document being written: should be released in the next few weeks.
Target Shares
Vac and Vcycle - A. McNab
Both are lifecycle managers based on the same idea
- See previous meeting for Vac details
- Vcycle: manages VM lifecycle on IaaS clouds. Currently OpenStack only but being ported to other cloud MW (EC2, OCCI).
- Can be run by the site, the VO, the region...
- OCCI to take advantage of EGI Federated Cloud
- Main use case: pilot VM
- Need a boot image and user_data file with contextualization
- Compatible with machine/job features
- VM state is monitored
- The VM must execute the shutdown procedure when there is no more work to do
Vac: no external orchestrator, simpler infrastructure for sites that supports only pilot-based VOs
- Every factory node has enough information to act autonomously
- Target shares implemented by querying other Vac nodes to discover what they are running: instantaneous
- No target share history implemented currently: need to adjust target shares
- Accounting done with APEL by production Torque/Blah compatible record, consumed by APEL/PBS parser
- Working on direct reporting through SSM
- Working on implementing the same possibility in Vcycle
Vcycle can support multiple projects
- Target shares at the level of project/tenancy
No interaction between Vcycle (or Vac) and the pilot framework: the VM will be in charge of contacting the pilot framework
Machine/job features is a very useful feature in this context
- Vac: information available to VM via NFS
- Vcycle: information available via http
Already used by several sites in the UK
- In particular connected to the UK DIRAC instance: commitment to run as much of DIRAC payloads in VMs
Fairshare provisionning in OpenStack - L. Zangrando
In
OpenStack, the default scheduler (nova-scheduler) is a pluggable component
- Implemnts FIFO scheduling, no queuing: if no resource is available, the request fails
Started to develop an alternate scheduler,
FairShareScheduler, that can act as a drop-in replacement for nova-scheduler
- Dynamic priority assigned to users
- Based on SLURM multifactor scheduler
- Implements request queueing: a queued request remains in the "scheduling" state
Available for Havana and
IceHouse
- Testing in progress in Bari and planned at UVic
- A few know issues
Trying to get
FairShareScheduler inside the official
OpenStack distribution
- Failed to get it accepted in to GANTT, the project that should replace nova-scheduler
- Now focusing on BLAZAR, a resource reservation service, based on the concept of lease
- Supports several types of lease (lease with different characteristics/lifetimes)
- Proposal: implement fairshare as a lease type. Drawback: limited lifetime.
Discussion
Andrew
McNab:
FairShareScheduler solves the scheduling problem but not the capacity management which is the decision to start a new VM if needed
Reintroducing queueing at sites
- Andrew: source of problem for sites
- Philippe Charpentier: interested in clouds if they can provide long lived VMs, in this case queueing will not bring much advantage
Complexity of cloud MW management for sites
- Cloud MW is not necessarily simpler to operate than grid MW
- Volunteer computing may be an alternative for small sites... but currently also limitations for data access for example
- Philippe: Don't need all sites to be usable for all workloads, we have enough MC to fill small sites
A few
OpenStack sites interested by testing the
FairShareScheduler
Accounting
CERN Approach - U. Schwickerath
Most resources virtualized at CERN but a large heterogeneity of HW
- Assessed the same heterogeneity found in VMs
- Most LSF WNs virtualized
VMs classified according to the OS-provided info (cpuinfo, dmidecode, OS, memspeed)
- More information on how to classify HW type: http://world.std.com/~swmcd/steven/tech/cpu.html
- Benchmark a large number of VM in one given class; benchmark (HS06) run at the beginning of the payload
- A VM can look up a table giving the performance for a particular HW type
- Be pessimistic: no problem to deliver more
- The same for virtualized and physical WN
- A few drawbacks: emulated CPU, memspeed not passed to VM
- Experience @CERN
: 10% or better accuracy of the VM performance
Andrew: would be good to have the HS06 score of a VM available as machine/job features
- Lookup could occur as part of site contextualization
Cloud accounting based on ceilometer
- Published to APEL via SSM
- Problem: VM benchmark value cannot be published to APEL
- Also a few metrics missing like the number of cores
Long lived VMs: need to produced usage records per days rather than per VM
- Not necessarily a problem: for example ceilometer can do it
- Need to look at other cloud MW if the same is possible
John Gordon: hope to see this implementable in the next 6 months
Security
Introduction - I. Collier
Follow-up for Vincent's presentation at last meeting, last January
For efficient incident handling, we need to be able to answer: who, what, when, where
In grid, traceability covered by policies
- Sites
- Central collection of logs
- Archival policy
- Every service must log actions and initiator of actions
In cloud, a site can no longer see
- Credential
- Detailed apps logs
- But we can isolate a VM... and even capture the image
Missing information will have to be provided by the VO
Need to address this before workflows are in place
- Easier to build earlier
- Proposal: create a WG made of sites and VOs to test different approaches for filling traceability gaps
Discussion
Andrew: LHCb already uploads apps logs from VMs on a central server for application debugging, could contribute to security traceability
- Vincent: but there is a risk that useful logs are deleted before upload by a malicious job
Vincent: a site see only the VO, not the user
- Need the data from the VO
Romain/Michel/Andrew: no real change in the cooperation required between VO and user but we need to ensure that VO and sites are archiving the relevant information in central places as the VM is transient
WG is seen as a good idea to understand what is the minimum requirements that must set on sites (and VOs) but is there any chance to get active members
- Propose the idea at the GDB, look for volunteers from VOs and sites, decide next month
- WG could assess practices to start
Alessandro: need to see if
CernVM could have a hook that would allow the VO to configure the security tools required
Wrap-Up
Milestones
- Target shares: find early adopters for Vcycle and FairShareScheduler
- Accounting: improve agents and portal
- Security: see if we can bootsrap the WG
pre-GDB on VM image management?
- What base?
- How we configure tools like Ganglia, security tools...
- Contextualization
Next meeting: ~March 2015
- Work should progress in the meantime