Summary of pre-GDB on Cloud Traceability, February 2015 (CERN)


Introduction - I. Collier

In Grid, well established policies and incident response processes: need to understand how we can achieve the same in the cloud world

  • What are the new challenges?
  • Ensure legitimate use of the resources: may affect reputation
  • Preserve the ability to use new type of resources with flexibility


U. Schwickerath

  • Batch, grid and cloud groups
  • Involved in traceability operations for a long time

D. Collados

  • Database group: not an expert of this topic!
  • In charge of collaboration with India: India involved in cloud and cloud operations (OpenStack)

D. Kelsey

  • Various securities in WLCG and EGI security
  • Importance of policies and procedures applicable to EGI Fed Cloud too
  • Convinced of the importance of fine grained traceability
    • Don't believe that VO suspension is really affordable...

S. Lin

  • Not directly involved but interested: take advantage of being at CERN

C. Grandi

  • Representing both INFN and CMS
  • INFN: understand what should be done with our cloud infrastructure

M. Litmaath

  • Involved in deployment, operations, security and interoperability with other infrastructures
  • Convinced of the importance of learning from the experience of security policies/practicies we set up in the grid world

V. Brillaut

  • With cloud, shift of responsibility from sites to VOs: cloud a black box for sites

A. Sciaba

  • Understand impact on WLCG operations

R. Lopes

  • Brunel University
  • Increase in risk level for T2 sites with cloud computing.
  • Processing of network traces.
  • Insider threat in the context of cloud computing.
  • Identification of reliable logging and formalization of threat/safety.

G. Ryall,

  • STFC - RAL
  • In charge of deployment of new cloud services at RAL

D. Crooks

  • Glasgow site sysadmin
  • Also wants to report to UK T2s

A. McNab

  • In charge of building/providing VM images for LHCb and UK
  • Author of the Vac infrastructure to provision VMs from experiment frameworks
  • Main interest in what is supposed to be logged and how
  • Site access to VM in a similar way as with WN ?

Sven Gabriel

  • Working on EGI FedCloud security/traceability

Sophie Ferry

  • French NGI Security Officer

Misha Salle

  • Developer of glexec, good experience with traceability in grid
  • Member of EGI CSIRT

P. Flix

  • PIC Tier-1 site manager
  • Not directly involved but interested: implications for eventual adoption of cloud computing solutions, in particular if Grid to Cloud solidifies in the future.


Information sources

Information sources

  • Rely mainly on network and hypervisor information: these ones cannot be hacked by a VM malicious user
  • Importance of cross checking tools/sources: no one source can be trusted in itself
    • Difficult to require it: a significant infrastructure/effort needed
  • Network monitoring requires a lot of effort... not affordable to all sites


  • VM images controlled by VOs: this allows some trust, not possible with other communities


  • Do we need VO participation?
    • No: CSIRT is mostly about site issues
    • Yes: within a cloud world, VO become a site (provision and manage resources)

Security context separation in VMs between "supervisor" (controlled by the VO) and the user payload

  • CMS testing using glexec inside a VM
  • WLCG: collaboration allows trust on the supervisor context

Site vs. VO central syslog

  • Probably keep logging data at site to make the cloud traceability data as similar to our grid data as possible

How can VO (extensive) logging contributes to the global traceability.

How to configure the central syslog from the VM

  • Machine/job features could be used to pass the information, VOs would have to check the information and configure it
  • Should also explore site contextualization tool
  • Andrew volunteers for this action, probably Ulrich can be contacted, ATALS willing to participate

Information contents/quality

Syslog: pretty easy to setup for sites, nice/useful complement to other sources (network, hypervisor), define information expected from experiments

  • Used to track what happened inside to VM
  • How to relate entries to VMs (hostname reused): by timestamp?
    • Information should exist in the cloud logs about which VM used a given hostname at one time but are we able to correlate both sources?
    • Is the timestamp information trustable? Not necessarily but they are logged in order so it should be possible to detect tampered timestamp? In fact rsyslog provides 2 timestamps: generatedTimestamp and reportedTimestamp
    • Having UUID of the VM would be of great help: where to put it?

Clearly syslog will not provide a 100% coverage to all our needs but far better than nothing

  • Accept corner case as long as we provide a reasonable coverage
  • Review corner cases when we have experience with the default tools

We may try with a sort of recommended configuration for the central syslog machine

  • As much as possible, should include some basic log mining tools
  • RAL, CERN, Glasgow having some experience
    • Also INFN has done some work on this, try to establish contact

Information we'd like to log through syslog

  • Change of identity is an example
  • Remain pragmatic: things that are achievable without too much effort in every context

Netflow and network monitoring

  • Netflow very site (hardware) dependent
  • iptables in the VM: not necessarily trustable but in the WLCG context where the 'root' can be used only by a trusted VO user (production)
  • iptables can also be used in the hypervisor to log the traffic from the VM: cannot be forged by a malicious VM user
    • Andrew has ideas on how to log net connections (at least TCP) at the hypervisor level this way. Need to test it.
  • Also some work happening at CERN, NIKHEF, RAL with Netflow
    • Michel will try to contact IN2P3 people developping applications around Netflow
  • Would be good to have a specific review of with experts on what we could do (or not do) with Netflow: looking for somebody to convene this work
    • Sven may agree to convene this work if experts can be identified at other sites

Site Access to VM for Incident response

LHCb by policy allows site access to VMs

  • Implementation is MW dependent...

CMS would also probably have no problem with this

(root) SSH access not really required for forinsic analysis but generally makes it easy

  • Image quarantine is probably partly an alternative
  • Some problems are not completely security threats but misbehaviours that can be troubleshooted only on live machine

How to pass site SSH keys to VMs: site contextualization? machine/job features?

  • Need to be a mechanism agnostic with respect to the instantiation mechanism, easy to implement on non CermVM-based machines

Need to classify workloads

Are we still in the paradygm where a "pilot" is in charge of finding a payload from the VO central queue or are they some use cases envisionned?

Claudio: CMS tends to build a "site" instantiated on cloud resources, this requires some additional services in addition to "WN"

  • Example: squid
  • These additional services will/may be instantiated in the cloud too

Class of machines should not affect traceability requirements but only incident response

  • Service machines instantiated in the cloud are probably not a real challenge: managed by VO sysadmins instead of site sysadmins but they are well defined and controled
  • General use case from EGI Fed Cloud (any user able to build its own VM) seems much more challenging: may take advantage of Amsterdam pre-GDB to discuss with EGI and see if our work can be of any use in this context

Image upgrade management

Not really a traceability issue.

Need to keep it on a list of important topics to discuss

  • One example of the responsibility shift from site to VOs: sites will have no way to contribute to image upgrades in the cloud world

How to monitor that VM images are really managed, updated, not carrying well known vulnerabilities

Current WLCG endorsement policy is based that a running image is never patched and that if it needs to be upgraded it is revoked and a new image is created

  • EGI Fed Cloud has the opposite view: base image may remain unpatched as long as they are upgraded at instantiation
  • At the same time WLCG is not using the tools implementing this endorsement framework (vmcaster/vmcatcher, StratusLab Marketplace): do we need to have again a look at them at some point?
    • At the current scale of our cloud work, not mandatory but may be necessary if going to production or larger scale
    • In fact probably not needed with the move to MicroCernVM where the system instantiated in not in the (micro)image (just a bootstrap) but in a CVMFS repo: in some sense the problem is simpler, an operation procedure issue.

Image Quarantine

Seems as an attractive feature for forensics, not clear if it is available/feasible in all cloud MW

Policy Evolutions

Should describe the shift/evolution of responsibility between sites and VOs

  • VO as a source of information for security traceability should be recognized

VO logs and traceability

VOs are collecting a lot of information, how to assess if they can be used for security traceability and identify the gaps.

  • Example: CMS relying a lot of Condor and its logs, would be interesting to challenge it and see how useful are these logs as a source of security information

Reuse the EGI CSIRT experience with challenges.

Summary presented to GDB February 11th

Edit | Attach | Watch | Print version | History: r4 < r3 < r2 < r1 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r4 - 2015-02-13 - IanCollier
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback