TWiki
>
LCG Web
>
WLCGGDBDocs
>
GDBMeetingNotes20160307
(2016-04-15,
IanCollier
)
E
dit
A
ttach
P
DF
March 2016 GDB notes
Agenda
Introduction (Ian Collier)
NL EScience Centre Report (Daniela Remenska)
WLCG Workshop Report (Ian Bird)
First Suggestions for a WLCG Fast Benchmark (Manfred Alef)
Argus Central Suspension Update (Vincent Brillault)
Improving Traceability - Introduction (Dave Kelsey)
WLCG Risk Assessment revisited (Ian Neilson)
A new Model for traceability & separation (Vincent Brillault)
VO Perspective (Alessandro Di Girolamo)
Security Operations Centre update (David Crooks, Liviu Valsan)
Discussion
Wrap up & next steps (Ian Collier)
Agenda
http://indico.cern.ch/event/394780/
Introduction (Ian Collier)
Slides
NL EScience Centre Report (Daniela Remenska)
Slides
Oxana: you managed to create a common platform with people working together
Daniela: we gain from common approaches, generic solutions
Oxana: how do you impose that?
Daniela: projects proposals are required to aim for generic results and partnerships
Jeff:
the BiGGrid predecessor project gave funding directly to projects
PhD students then did the work and not a lot was heard from them afterward
a culture shift was needed to allow the NLeSC paradigm to succeed
WLCG Workshop Report (Ian Bird)
Slides
Jeff:
a new Information System also needs agreement from sites
avoid a second place where information needs to be provided
the BDII will not go away for EGI sites
if the BDII is not used, some MW development may be needed to cover gaps
Peter:
the task to merge the accounting DB data is being finalized
some info is hard to get out, a discussion with WLCG Operations was started
Ian B: let's wait for the April GDB accounting discussions to see what we need
Jeff: clearer reports will also help the Scrutiny Group
Jeremy: what other activities have started, e.g. lightweight sites?
Ian B:
most started activities concern the medium term
for the longer term the study group will start
there is the team for Understanding Performance, and the Tech Lab
Operations Coordination can do prototyping
Jeff: the Technical Forum should include site people
Ian B:
the Technical Forum essentially is the GDB
but we need to ensure all relevant areas are tracked by an advisory panel
GDB Advisory Panel = GAP, aptly named for the gap analysis
First Suggestions for a WLCG Fast Benchmark (Manfred Alef)
Slides
Manfred: unexposed compiler flags might get changed under the hood
Maarten: is it good or bad that the Haswell and Sandy Bridge CPUs look different in the ROOTmarks test?
Manfred: good for ALICE, bad for ATLAS and LHCb
Maarten: what are CMS doing?
Helge: they should have something, e.g. for the Amazon campaign run by FNAL
Helge:
the good correlation between Whetstone and HS06 is quite remarkable
maybe the WN variety at KIT is not wide enough to spoil the correlation?
this should be tested at other sites
also the experiments should be involved to check this further
Jeff: can we have the scripts?
Manfred: they will be published on the WG web site
Jeff: store them in CVMFS to allow running them in grid jobs?
Manfred: we will look into that
Jeff: these investigations could make a nice CHEP paper
Jeff: HS06 does not represent the experiments, maybe due to different compiler flags?
Helge: you also need to know the HW you run on, which may be impossible in clouds
Mattias:
ATLAS and others know the number of events handled in an MC job
that can then be correlated with the benchmark
we need complete coverage, i.e. simulation, reco and analysis, per experiment
Manfred: we can keep running all 5 benchmarks via the KitValidation framework
Argus Central Suspension Update (Vincent Brillault)
Slides
Jeff: the VO frameworks definitely need to consume suspension rules
Ian B:
the VO needs to be able to ban its users
that responsibility has moved to them
if their reaction is untimely, sites can just ban the VO
Sven:
banned DNs should also be communicated upward to the central team
so that sites can ban direct access attempts by those DNs
Dave K: why didn't things work at sites?
Vincent: an NGI campaign was done, a site campaign not yet
Ian N: in the UK there also was an Argus version mismatch
Sven: we need to have automatic monitoring
Ian N: work in progress in the UK
Improving Traceability - Introduction (Dave Kelsey)
Slides
WLCG Risk Assessment revisited (Ian Neilson)
Slides
Sven: misused identities are hard to detect if their activities stay under the radar
Jeff: admin identities and ordinary user identities may need different treatments
Vincent: attacks are propagated through common SW like ssh and possibly OpenStack
Ian C: standard components are better maintained, but also more popular for attacks
Vincent: a badly configured standard service is easier to attack than a non-standard service
Maarten: we can use the recently refreshed EGI Security Threats Assessment to update the one for WLCG
A new Model for traceability & separation (Vincent Brillault)
Slides
Maarten:
why would multi-user pilot-jobs be on the decline?
it would depend on whether each pilot commits itself to a single user
DIRAC used to do that and maybe still does today
the other pilot frameworks may not do that, e.g. AliEn does not
VO Perspective (Alessandro Di Girolamo)
Slides
Oxana: also the ARC Control Tower plays a role in traceability
Sven: in 2011 ATLAS said that a number of weeks might be needed to find the DN who submitted a particular job - how is it today?
Alessandro:
the bitcoin incident of last year was resolved during one morning
it took 6 or 7 people to work together on it, though
Ian C:
by aggregating logs we can make the problem tractable
this is not yet easy for all cases today
we will need to use ElasticSearch etc.
and scale down the amount of information
Vincent: could the user payload kill the wrapper?
Alessandro: yes
Security Operations Centre update (David Crooks, Liviu Valsan)
Slides 1
Slides 2
Discussion
Jeff:
Bro would need to be installed on every node and monitor everything
that may be OK only where the batch infrastructure is owned by the WLCG site
at many sites WLCG jobs will run side to side with other jobs
many sites would anyway feel uncomfortable sending such data elsewhere
Ian C:
the idea is that each site should have its own SOC
at the moment such a facility is hard to deploy
therefore an appliance is being looked into
compare with perfSONAR
data would only be shared within a trust community
some information will then be forwarded to a central instance
Sven: did the IDS at CERN actually help detect incidents?
Liviu: so far we have only been flooded with false positives
Sven: it will not be easy to find a tool that avoids them for us
David C:
small sites need help in these matters
we will test possible solutions with artificial data
we then can identify which information can be shared
Vincent: what can we do with the central MISP data?
David C: we need to be cautious with the volume of that data
Vincent: the SOC must also be able to reprocess the past when more data arrives
Wrap up & next steps (Ian Collier)
Ian C:
a first WG needs to further explore SOC solutions
the ingestion of VO workflow data also needs to be looked into
a second WG needs to further explore traceability tools for jobs
containers, cgroups etc.
the WLCG Risk Assessment should be updated later this year
Jeff:
the SOC at CERN has an impressive infrastructure, yet it detected zero incidents
we should watch out for a hasty deployment before we are sure it will actually work
Vincent: Bro was already useful for detection of compromised systems
Ian B: as CERN has an open infrastructure, we need to understand the network traffic
Ian C: we need to use big data tools to analyze network flows
Sven: as CERN attracts more attackers, its solution may be overkill for small sites
Ian C:
GridPP is well placed to tune solutions for small sites
when jobs run in opaque VMs, we need to analyze the network flows
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r3
<
r2
<
r1
|
B
acklinks
|
R
aw View
|
WYSIWYG
|
M
ore topic actions
Topic revision: r3 - 2016-04-15
-
IanCollier
Log In
LCG
LCG Wiki Home
LCG Web Home
Changes
Index
Search
LCG Wikis
LCG Service
Coordination
LCG Grid
Deployment
LCG
Apps Area
Public webs
Public webs
ABATBEA
ACPP
ADCgroup
AEGIS
AfricaMap
AgileInfrastructure
ALICE
AliceEbyE
AliceSPD
AliceSSD
AliceTOF
AliFemto
ALPHA
Altair
ArdaGrid
ASACUSA
AthenaFCalTBAna
Atlas
AtlasLBNL
AXIALPET
CAE
CALICE
CDS
CENF
CERNSearch
CLIC
Cloud
CloudServices
CMS
Controls
CTA
CvmFS
DB
DefaultWeb
DESgroup
DPHEP
DM-LHC
DSSGroup
EGEE
EgeePtf
ELFms
EMI
ETICS
FIOgroup
FlukaTeam
Frontier
Gaudi
GeneratorServices
GuidesInfo
HardwareLabs
HCC
HEPIX
ILCBDSColl
ILCTPC
IMWG
Inspire
IPv6
IT
ItCommTeam
ITCoord
ITdeptTechForum
ITDRP
ITGT
ITSDC
LAr
LCG
LCGAAWorkbook
Leade
LHCAccess
LHCAtHome
LHCb
LHCgas
LHCONE
LHCOPN
LinuxSupport
Main
Medipix
Messaging
MPGD
NA49
NA61
NA62
NTOF
Openlab
PDBService
Persistency
PESgroup
Plugins
PSAccess
PSBUpgrade
R2Eproject
RCTF
RD42
RFCond12
RFLowLevel
ROXIE
Sandbox
SocialActivities
SPI
SRMDev
SSM
Student
SuperComputing
Support
SwfCatalogue
TMVA
TOTEM
TWiki
UNOSAT
Virtualization
VOBox
WITCH
XTCA
Welcome Guest
Login
or
Register
Cern Search
TWiki Search
Google Search
LCG
All webs
Copyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use
Discourse
or
Send feedback