TWiki> LCG Web>WLCGGDBDocs>GDBMeetingNotes20161214 (revision 1)EditAttachPDF

December 2016 GDB notes

DRAFT


Agenda

http://indico.cern.ch/event/394789/

Introduction (Ian Collier)

https://indico.cern.ch/event/394789/contributions/2392195/attachments/1388433/2114503/GDB-Introduction-20161214.pdf

Note: Feb Pre-GDB on benchmarking

Nordic countries go on vacation 23rd June: Discussion of WLCG Workshop location between Manchester and Naples. Manchester looking better for accessibility and timing? Hold Naples in reserve as first choice for the next non-CHEP meeting. Ratify at MB next week.

Downtime policies: A proposal (Maria Alandes Pradillo)

https://indico.cern.ch/event/394789/contributions/2392230/attachments/1388375/2113909/LongShutdowns.pdf

Q: Data migration, experiments decide on their own?

A: I guess yes, would like to have the time to do it

Ian Bird: take more than a month

Mattias: Small T2 could migrate in a couple of weeks

Maria: We can discuss this, CMS proposal, maybe data to be used in the next month?

See backup slides for experiment input

Q: Has anyone looked into the past years to see how many downtimes have been declared more than a month? I get the impression this happens rarely.

A: A few long downtimes, at least one was not announced in advance. Operational problems for some of the experiments. Could we do better, is there a policy? None except for A/R targets, sort the issue once and for all, grew into a bigger thing. Indeed, nominally on the same page, sites try to do a honest job. Even that downtime was external reasons, Downtime had no choice. To make this into a concrete policy.

Ian Collier: Just to be clear, examples that prompted this were not scheduled downtimes, no policy will change that.

Q: Cases of data migrations? Did it happen before, what happens when the site comes back?

Maarten: Migration of SARA, announced half year in advance, pointed out that experiments should effect data should be on disk by time of downtime because it was on tape. Done very nicely. Not out of site, but experiments had to take nontrivial action. T1 can't be vacated just like that. Rare occurance. T2, migration statement we should say that this is primarily for T2s, they have a chance to be vacated if necessary.

Ian Collier: T1 moving into new machine rooms isn't so unusual but is usually planned and wouldn't involved DTs of more than a month.

Q: Suprised that 1 day warning of DTs is OK. Remember from KIT had 2 day downtime at the same time as another T1, major complaint.

A: Avoiding T1s going into Downtime at the same time. True that we can extent policy to conver this. LHCb insisted to please try to avoid clashes.

Q: Prompted be VO to notify in advance, looking to history big sites are announcing in advance.

Maarten: Complex matter, not an easy guilty party, far from it. Any site can say won't let other site dictate when I can do downtime. Site has the last word. LHCb always affected the most, most reliant on T1s, others can tolerate this more easily. Can't say 1st to book DT get it, external pressures. Try to avoid this when we can, then sites shouldn't be punished

Q: May be good to state in the policy - we try to coord with expts, have some flexibility. Did ask experiments, ask experiments in advance. In policy, put that it is welcome for sites coordinate with experiments?

Dave Kelsey: Along that line, doesn't say just data migrated but agreement with experiment is what to do, may be case by case basis. Interesting to EGI as well. Would be really nice to have WLCG/EGI policies agree. VO specific calculation is getting more complex.

Ian Collier: Spoken to GocDB devs, could in principle have more complex policy engine per VO with different constraints per VO.

Maria: Talk to EGI but look at WLCG first. Would be good to have feedback and talk more with CMS.

Ian Collier: Whether this liaising should be part of the policy, may be too difficult to capture in a dependable way. It's the practice for T1s that we try to coordinate. Capturing that formally, this might be the way to do this.

Dave Kelsey: Short term might be security, want that done quickly

Ian Collier: Not entirely scheduled, that's effectively incident response.

Maria: Monday meetings is the place to talk about this kind of thing.

Ian Collier: As mentioned, a bit of analysis in the last couple of years might be useful. Don't want to put too much work into policy that only deals with one edge case.

Maria: Experiments are asking for this. Agreement that generally sites are doing very well.

Ian Collier: Comments by 5th January - people should be thinking about this before they go away.

HEPiX Report (Helge Meinhard)

https://indico.cern.ch/event/394789/contributions/2392202/attachments/1388333/2113814/go
https://indico.cern.ch/event/394789/contributions/2392202/attachments/1388334/2113824/2016-12-14-GDB-HEPiXReport.pdf

No questions

Alice use of HPC Facilities (Pavlo Svirin)

https://indico.cern.ch/event/394789/contributions/2392205/attachments/1388279/2113711/GDB_meeting_14.12.2016.pdf

Q: What is the advantage that PanDA is bringing to you? Don't have to implement a lot of stuff/reinvent the wheel?

A: Yeah. Different approach, in next versions, yes.

A: Also consideration that ATLAS has strong presence working with Titan, want to leverage that, they have a team we have one part time person from CERN. Very good response from ATLAS team. Long list of things that need to be adopted. Again reinventing a lot of things needed to run on supercomputer, coming not only from few corners but people trying to find resources elsewhere. Everyone is doing this, probably not most economical way to have everyone do that. Especially with this level of application code, submission system, all HPC are different, unique requirements. Perhaps have to make cohesive effort for all experiments together, not have each experiment discover on their own. Tremendous amount of work. if we are going to start using these resources seriously, should have central effort.

AARC Report (Hannah Short)

SOC Working Group Report (David Crooks)

Security Policy Update (Dave Kelsey)

Community White Paper (Peter Elmer)

Wrap up (Ian Collier)

Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions...
Topic revision: r1 - 2016-12-19 - DavidCrooks
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    LCG All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2020 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback