TWiki
>
LCG Web
>
WLCGGDBDocs
>
GDBMeetingNotes20130508
(2013-05-27,
MichelJouvin
)
(raw view)
E
dit
A
ttach
P
DF
---+!! Summary of GDB meeting, May 8, 2013 (CERN) %TOC% ---+ Agenda https://indico.cern.ch/conferenceDisplay.py?confId=197803 ---+ Welcome - M. Jouvin Upcoming meetings: in June we need to decide if there is to be another outside CERN GDB. * Michel finds people in favour but the meetings have low attendance and especially less people from CERN. News on actions: * Message that new APEL publishers released in EMI-3 will be available in EMI-2. * WN in CVMFS. Have any experiments used it? Do we close the action? * Deployment of xrootd and WebDav access at all ATLAS DPMs. Any update? (WebDav is to allow access for the new naming convention. This is not federated WebDav) * Simone: not yet started, an action for the coming months, no follow-up needed by GDB or Ops Coord at this point * Are jobs with high memory requirements still an issue? HEPiX news: a puppet Working Group had been started led by Ben Jones (CERN) and Yves Kempf (DESY). * More detailed report at next GDB Ian Bird: after the end of EMI and EGI-Inspire SA2, CERN IT/GD and IT/ES groups have merged under leadership of Markus * Jamie Shiers left the service coordination to lead the data preservation effort * Maria Girone becomes the new service coordinator * *Jamie deserves a great deal of thanks for the immense work he did in building the WLCG service* * *Others agreed that Jamies contribution and focus on such things as service challenges had a huge impact and the collaboration is very grateful.* ---+ LHCONE/OPN Update and Plans - E. Martelli LHCONE: private routed network, currently based on L3VPN * 44 sites, 17 countries, 15 NRENs * Main operators: GEANT, CERN, Internet2, ESnet and CANARIE LHCONE R&D activity: P2P service * On-demand P2P link estabished when 2 sites want to exchange data * Not needed today but may be an attractive technology when/if network becomes a bottleneck * Potential benefit: guaranteed bandwith, deterministic * But a lot of challenges including: * Need for a API used by SW to provision the circuits * Challenge of multi-domain provisioning * Not covering the last mile that can be also an issue and degrade the guaranteed bandwidth * Routing over these circuits * Billing: are users ready to pay for this service? How to do the billing? * Workshop last week decided to create a testbed based on static P2P links between volunteer sites (T2s) to compare performance with the currently existing infrastructure * Static link used to avoid change in SW LHCONE also exploring SDN * First use case: WAN load balancing over the 6 transatlantic links using OpenFlow * Goal: optimize the use of these expensive resources * Main challenge is coordination between multiple domains * Is the SDN coordinated with the Network Working Group? * EM: Not at the moment as the activity is recent. Simone: can P2P service help solve the poor connection of a few sites? * Edoardo: not really... first must join LHCONE! Which is open to everybody... * It's an expensive service: it is for sites with a good connectivity already that want to have a better one. * Ian: you need to convince people to improve their local/national connectivity. We have been successful in a few cases. IPv6: CERN expects IPv4 shortage to happen in 2014 * WLCG community urged to consider using IPv6 and help with the testing * Testbed activities driven by HEPiX WG ---+ GridKA Migration to Grid Engine - M. Alef GridKA was using PBSpro: many stability problems, problematic fairshares Evaluated several alternatives, in particular Torque/MAUI and Grid Engine (GE) * Torque/MAUI proved to be less stable that PBSpro * Several testing stages since 2010, migration undertaken in 2012 to Univa GE Current configuration * Single server, no failover * Flat files, no DB * Certificate Security Protocol enabled * Fairshare based on wall-clock time * historical data can be mixed with other parameters to decide the actual job priority Lesson learnt * Steep learning curve at the beginning * Documentation not always very clear * Very good support from Univa (based in Europe) Luca: any figure about the scalability limit? * Manfred: successful tests done with 100K slots and 1M of jobs ---+ MW Development and Provisioning after EMI ---++ MEDIA Initiative - A. De Meglio EMI legacy: EMI-3, released in March 2013 * 61 products, including new ones: STS, EMIR, Hydra * "Supported until April 2014", "bug fixes until April 2015" * EMI-2 status: "bug fixes until April 2014" Current situation: Product Teams (PTs) are now fully responsible for their SW build process * EMI-2 built using standard tools (mock, pbuilder) but ETICS still needed for some parts * EMI-3 relies only on standard tools PTs now fully responsible for SW releases and announcements * EMI EMT officially stopped but mailing list will continue as an announcement list for PTs * Monthly summary of the announcements will be done but not technical coordination * EMI EMT responsibility tansfered to EGI URT for the product teams who agree to join UMD * Cristina will co-chair EGI URT during a transition period PTs now fully responsible for their product support * GGUS still used as user support tracking system * No monitoring of any SLA compliance * Each product has a lifetime defined at http://www.eu-emi.eu/retirement-calendar * Some products with unclear support * WN, UI and YAIM-core: discussion ongoing with INFN were responsible for them * Hydra, STS: a commercial company may handle the support MEDIA: is a lightweight collaboration that has been launched * Collaboration between interested PTs * All PTs from EMI confirmed their participation * Not restricted to former EMI PTs * 3 main objectives: forum for technical discussions and follow-up of actions decided, coordination for future funding proposals * First co-chairs: Balazs Konya (LU) and P. Fuhrmann * Idea validated with all main EMI stakeholders: agreement as long as this is lightweight and voluntary ---++ EGI after EMI - P. Solagna During EMI and IGE, no direct (formal) contact between EGI and PTs: EGI TCB in contact with EMI and IGE With the end of the projects, EGI needs to deal with a larger number of technology providers (PTs) * New board created: URT (UMD Release Team) * URT will take over part of the release coordination work previously done by EMI * Membership: PTs (or group of PT), UMD provisionning team * Lightweight coordination: mainly wiki pages plus bi-weekly meetings * Will track only works from PTs relevant to UMD * PTs are invited to subscribe URT mandate * Discuss the UMD roadmap and updates schedule * Report the status of products to the UMD provisioning pipeline * Agree on Quality Criteria * Gather development plans from PTs and anticipate changes that potentially affect other PTs * Track the status of critical bugs/requirements * Eg. SHA-2 support * Evolve UMD repository structure TCB evolution * New representatives from main MW providers * MEDIA representatives * Mandate: EGI technical roadmap/strategy, SLAs, high-level cross-product requirements 3d level support in GGUS * Support units being reorganized, with new contacts, to match PT structure after EMI * New process for handling unresponsive tickets * Minimum level of commitment requested for each PT, even if best effort ---++ Discussion Ian: WLCG would like to deal directly with PTs, not with an integrated distribution. Not sure how EGI vision matches the WLCG needs (described in a document discussed over the last 6 months). Would like to see products in EPEL and MEDIA as the coordination. * Peter: not incompatible with EGI vision/structure. PTs will remain independent but EGI will try to take care of dependencies between products. They will release in EPEL and UMD will pull from it. MEDIA is more about technical evolution than release process. * Maarten: the baseline from UMD can be helpful (safe basesline). EGI have been taking responsibility for the non-US infrastructure. There are many things that we would have to do for ourselves. Would not help sites to have conflicting messages one from WLCG and another from EGI. Need to collaborate where we can. Do not want to return to big-bang releases but we need to be careful of disruption caused by conflict between packages or timelines. Intense discussion on EGI proposal vs. WLCG needed flexibility as far as SW provisioning is concerned * WLCG insists that it may need to deal directly with PTs and pick-up version not yet released in UMD * EGI agrees this is not a problem as long as the version goes through a minimum certification * Always the case with WLCG * Worked well in the past, should work even better with the lighter coordination proposed Review where we are in a few months... ---+ Actions in Progress ---++ Ops Coordination Report - M. Girone CVMFS: target date (April 13) met * ALICE beginning evaluation * Very good site participation in evaluating new versions SHA-2 good progress: ready to be tested, hope to be able to have full support in production during autumn glexec: now integrated into DIRAC, sorting out a few new problems at sites * No scalability problem seen so far * *Deployment is ready for pushing. TF recommends to MB to aim for 90% adoption by end of 2013.* perfSONAR: v3.3 RC3 being tested and a few issues found * Sites should wait before installing it * New modular dashboard: an alpha release available, very promising FTS3: successful tests; still waiting for a deployment roadmap http proxy discovery: new TF started, led by D. Dykstra MW * All sites still running EMI-1 service must upgrade them ASAP * EMI-2 VOBOX now available * New WLCG repository created for things that cannot be hosted somewhere else Conclusion * Good progress reported by several TFs, eg. CVMFS deployment done in 6 months * Good collaboration with OSG and EGI ---++ SL Migration Plans - A. Forti TF membership: sites, experiments, EGI and IT/ES * Good representation * A mailing list + a Twiki page (LCG/SL6Migration) Timeline proposed * Before June 1st, sites supporting several VOs including ATLAS encouraged to test without upgrading * Before end of October: upgrade bulk of WLCG resources to SL6 (5 months) * After: discuss move to SL6/EMI-3 Recommendations * Gradual migration rather than big-bang * Do not mix SL5 and SL6 resources in the same queues * LHCb wants SL6 queues to be published into BDII HEP_OSlibs for SL6 available and documented * SL5 version still available but unmaintained * Sites encouraged to install it on WNs New WLCG repository created, in particular for HEP_OSlibs * http://linuxsoft.cern.ch/wlcg * Sites encouraged to enable it * Xrootd plugins will be soon added Migration status * LXPLUS alias migrated to SL6 last Monday * VMs * lxbatch currently has 10% of resources migrated to SL6: migration will be done inline with lxplus usage growth * Details of CERN plans can be found at http://itssb.web.cern.ch/service-change/lxplus-alias-migration-slc6/06-05-2013. * T1s: most have already complete testing or are doing it, several already planned migration * T2s: 81 over 131 forwarded their plans, many awaiting the green light from experiments (particularly ATLAS) *Need more country T2 representatives in the TF* I. Fisk: CMS encourage sites to migrate their WNs to SL6 asap. * No problem to run SL5 binaries S. Campana: would prefer big-bang migration at each site (all site resources migrated at once)! Progressive migration requires creation of additional queues, a time-consuming operation for just a migration. * ATLAS would prefer the risk of temporary problems or downtimes * Avoid migrating all sites at the same time... * MJ: for small sites it is okay to go ahead right now, for larger sites it is best to coordinate with central experiment teams to decide the most appropriate timing and approach. ---++ Xrootd Monitoring Infrastructure - D. Giordano 2 monitoring streams from Xrootd (using UDP) * Summaries: aggregated by MonALISA * Detailed (and f-stream): requires the use of GLED collector for aggregation * Collector is publishing into ActiveMQ: information processed by dashboard, popularity calculation, WLCG transfer dashboard... * Both for EOS and federation activities * Require the use of new Xrootd plugins that will be distributed in the new WLCG repository: dCache-Xrootd monitoring, VOMS-Xrootd Open issues * Concern about user privacy if user information is stored in the monitoring DB (and only there)? * Detailed user-specification published for all users irrevelant of the VO to which they belong * Is it enough to filter at the collector level? * Is it necessary to do it at the site-level (more work needed)? Xrootd services must be published into GOCDB * Sites must declare downtimes when relevant Xrootd federation services are being instrumented as all WLCG services. ---++ DPM Community Status - Oliver Keeble Presentation postponed as we were running late: an update talk will be given at the June GDB. The main points: * The collaboration started on 2nd May. * Includes contributions from CERN, Czech Rep., France, Italy, Japan, Taiwan and UK. * A collaboration agreement has been agreed and the allocation of tasks (development, testing, support ) started. * Support for non-HEP communities will be best effort. ---+ Computing Model Updates ---++ Introduction - I. Bird Request from LHCC in December after the apparent significant increase of resources requests for Run 2: * Describe changes since original computing TDR (2005) * Ephasize what is being done to adapt to new technologies * Produce 1 document rather than 5 (TDRs): common format, tables... * Updated computing models should cover LS1 to LS2 Timescales * To prepare for 2015, need a good draft to be discussed at the autumn RRB (October) * Require a draft to be available at the end of the summer to be first discussed at the LHCC meeting in September. This document will be an opportunity to: * Describe significant changes and improvments aleady made * Insist on commonalities between experiments in recent evolutions and plans * Describe WLCG needs for the next 5 years * Review our organisation * Raise issues/concerns like staffing issues and missing skills A draft TOC is available: see slides Challenges * Need to make best use of resources * Major investment needed in SW but missing skills, tools... Resource Needs and Evolution * Basic asumptions on running conditions, desired physics and impact on resources * Take flat funding as the most optimistic funding hypothesis: much less that the 20%/year growth for CPU (15% for disk) that we were used too until now (until 2015). Collaboration Organisation and Management: should we need to be more inclusive and open to other HEP experiments? * Also need to be collaborative with other communities (funding and efficiency) ---++ Technology trends - B. Panzer See slides. Tape: increase by steps every 2-3 year * Currently: 2.5-5 TB/tape * Next cycle: 6-8 TB/tape HDD: declining market * notebook, tablets, smartphones using flash memory * Cloud storage leading to consolidation (less disks used compared to individual disks) HDD current technology reaching its limit: new technologies are just emerging and are very expensive * Price: x2 between commodity and enterprise markets * Expect a slowdown in the space/price ratio * New technologies will mean more TB per spindle without much improvement in sequential/random access: impact on global performance? Network: HEP IP is 1/1000 of global IP traffic, so negligible Processors/servers: tables and smartphones are the drivers * 2013: 1st year number of tablets is exceeding number of laptops sold, smartphones even more and larger than standard phones * HEP buying mainly from the server market: dominated by a limited number of very big consumers, dominated by INTEL as a provider (96% of the market, almost no competition). * Emergence of the Microserver market, based on ARM and ATOM: competition between them is hard, no clear winner, picture changing frequently * Moore's law still applies for the main cores * Specific (co-)processors are something different With flat budgets and increasing computing requests, the only solution is to increase efficiency * Combination of new HW combinations and SW improvements * HW only will not provide the 10x improvement we are looking for, coding efficiency has a major role to play * Must take into account the side effect on global efficiency of higher processor usage factor: need to adopt a holistic view instead of concentrating on just one element * Impact on I/O and networks * High end GPUs are a niche market: desktop market decrease impacts negatively the availability and price of discrete graphic cards * Look at market sites programming etc. as a system Lesson to be learnt from previous attempt to understand impact of technology evolution on our computing: look at new technology, evaluate them but avoid to concentrate on only one thing. * I. Bird: one the lesson learnt is that we should have invested much more in SW that we did. *We have to do it now.* ---++ Experiment Computing Models - M. Cattaneo Important to emphasize what has been done since initial TDR (2005) to improve efficiency of LHC experiment computing * Sometimes a factor of 10 has been achieved for some applications (eg. CMS reconstrution) * Computing has not been a limiting factor during Run 1 Document overview for Computing model chapters * Will cover data flows, event rates, non-event data (e.g. DBs), software evolution, data preservation . * Need to document that many optimisations have been done. Ian: already very substantial input from 3 of the experiments for this chapter. ---++ Application SW - F. Carminati Massive parralelism is here: we got away many times but now we probably can't anymore Different dimensions to performance * Multi-socket/core improves memory footprint but not throughtput * Only the micro-parralelism (instruction level //, instruction pipeline, vectors) can allow to gain a significant performance factor * According to OpenLab survey of industry best-pactices, a factor of 10 could be gained... For post LS1, must work on scenario like pileup=140, bunch spacing=25ns Work to be done is very significant but a few first successful experiences already exist * GEANT4 MT: providing a significant improvement in performance with a low penalty in a 1 thread context, first example of moving a large code base * Several interesting results with GPU usage: NA62 L0 trigger, CMS tracking * GaudiHive very promising ROOT6 will be a major step forward in this direction. Need to evolve the current Concurrency Forum to a HEP SW Collaboration * Attract new skills/expertise * Attract new people from the community and acknowledge them for their contribution * Explore possibility of Horizon 2020 project on these challenges TechPark idea: build upon the OpenLab success to provide HW ad SW resources with the right connection to company engineers * A resource for the SW collaboration driven by technology rather than demand * Exact relationship with OpenLab still being discussed: for a company to join OpenLab it is quite a heavy process, and may be too heavy to start hence new approach being considered. ---++ Computing Services - I. Bird Contents will come mostly from the work done by TEGs and related work since the Amsterdam meeting in 2010 * Describe commonalities * Describe and justify differencies between experiments Review Tier definitions based on required functions by each experiments * Be more flexible in the mapping of functions to sites: give "credit" for providing various services * How to make better use of opportunistic resources * Define the different level of services expected: not the same for all functions and for all the sites Workflow management * Use of pilots, needs for interfaces at sites * Strategy for future: CE vs. cloud, glexec problem Data management * Clarify strategy for tape, disks, federations, role of data popularity * Access control and security must be clarified * Storage solutions/interfaces: role of HEP specific solutions vs. standard solutons * Storage abstraction (caching) for opportunistic resource use Also database needs, distributed services... Operations and infrastructure services * Describe coordination, tools and monitoring Security development, in particular federated identities: are we advanced enough to mention it? This chapter should focus on the strategy statement rather than detailed technical discussions. Frederico: Don't forget data preservation! * Ian: will be added, based on DPHEP proposal led by Jamie Discussion about Tiers: * MD: Is there work on-going to develop the definitions different from the hierarchy of Tiers? * IB: Need to be pragmatic but also consider implications of changes for example prestige attached to historical contributions. Newer Tier-1s may contribute a subset of the functions. * Roger Jones: I would be wary of this because this would lead to a renegotiation of the MoU. * IB/General: We have sites coming scalability and reliability may be an assessment criteria. Lack of resources is an issue and some may be excluded by a naming convention if we do not address this problem. Notion of credit associated with services may be developed. If it means you have to completely redevelop your computing model then that may be an answer guiding what we can do in this area. This may include opportunistic use of other large resource offerings (via cloud or whatever mechanism) where certain other functions are excluded such as large connected storage. Beyond 2015 we may wish to revisit some of these wider questions and this would not be material for the present document, but it is a discussion worth having. Well come up with a further draft of this document through various computing groups in the next month(s). A solid draft of the whole document must be ready for the end of August. * IB will write the summaries on all aspects of this chapter, based on input received from TEG, WGs and others.
E
dit
|
A
ttach
|
Watch
|
P
rint version
|
H
istory
: r4
<
r3
<
r2
<
r1
|
B
acklinks
|
V
iew topic
|
WYSIWYG
|
M
ore topic actions
Topic revision: r4 - 2013-05-27
-
MichelJouvin
Log In
LCG
LCG Wiki Home
LCG Web Home
Changes
Index
Search
LCG Wikis
LCG Service
Coordination
LCG Grid
Deployment
LCG
Apps Area
Public webs
Public webs
ABATBEA
ACPP
ADCgroup
AEGIS
AfricaMap
AgileInfrastructure
ALICE
AliceEbyE
AliceSPD
AliceSSD
AliceTOF
AliFemto
ALPHA
Altair
ArdaGrid
ASACUSA
AthenaFCalTBAna
Atlas
AtlasLBNL
AXIALPET
CAE
CALICE
CDS
CENF
CERNSearch
CLIC
Cloud
CloudServices
CMS
Controls
CTA
CvmFS
DB
DefaultWeb
DESgroup
DPHEP
DM-LHC
DSSGroup
EGEE
EgeePtf
ELFms
EMI
ETICS
FIOgroup
FlukaTeam
Frontier
Gaudi
GeneratorServices
GuidesInfo
HardwareLabs
HCC
HEPIX
ILCBDSColl
ILCTPC
IMWG
Inspire
IPv6
IT
ItCommTeam
ITCoord
ITdeptTechForum
ITDRP
ITGT
ITSDC
LAr
LCG
LCGAAWorkbook
Leade
LHCAccess
LHCAtHome
LHCb
LHCgas
LHCONE
LHCOPN
LinuxSupport
Main
Medipix
Messaging
MPGD
NA49
NA61
NA62
NTOF
Openlab
PDBService
Persistency
PESgroup
Plugins
PSAccess
PSBUpgrade
R2Eproject
RCTF
RD42
RFCond12
RFLowLevel
ROXIE
Sandbox
SocialActivities
SPI
SRMDev
SSM
Student
SuperComputing
Support
SwfCatalogue
TMVA
TOTEM
TWiki
UNOSAT
Virtualization
VOBox
WITCH
XTCA
Welcome Guest
Login
or
Register
Cern Search
TWiki Search
Google Search
LCG
All webs
Copyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use
Discourse
or
Send feedback