Planning for Ganga is 2010

Indico agenda: http://indico.cern.ch/conferenceTimeTable.py?confId=94195#20100927

Organisation (Monday)

Outstanding items from the Oslo meeting (Kuba)

Documentation (Kuba)

    • consolidation of the wikis required
      • review needed
        • delete old stuff
        • refactor good stuff
        • integrate wiki and main web (editing main web is hard / AFS text files)
    • update of tutorial and dev survival guide -> how to keep them updated? part of the release procedure?
    • links to (most-up-to-date-version-of) experiment tutorials from Ganga webpage?

Testing

    • system testing: testing framework
      • works but hard to maintain and extend, in longer term requires a reimplementation (Kuba)
      • should exploit parallelism better (still takes too long)
    • Core: 269/6, Atlas: 11/6, LHCb: 110/4, NG: 1/4, Panda: 4/1, Robot: 29/5
  • shall we go for functional testing (as opposed to unit testing)?

Functional testing? usage of hammer cloud for "continous" testing of main Atlas and LHCb workflows (Dan)

    • how to handle Grid proxies in automatic way? myproxy?

Error reporting tool (short demo by Ivan)

Extended spyware (Kuba)

Generic spyware to monitor all jobs, all backends, all applications. Extension of current spyware.

In ATLAS they are also interested in wallclock time used by the jobs...

Release manager schedule (Ulrik)

Top 5-10 savannah items

    • 173 open ones
      • Atlas: 81
      • Core: 68
      • LHCb: 11
    • shall we say that within a month/end of year all these lists will be reviewed?
    • shall we put a regular schedule to review/close at least 10 items a week?
    • some outstanding examples:
      • Atlas: bug #72056: Failed to load jobs from repository in 5.5.13
        • GangaJEM crashing repository: how to avoid that monitoring plugins (inactive) crash the repository?
      • LHCb: bug #71590: Function to remove a file from inputdata of job
        • general discussion on resubmit(): modify input data before resubmit()
      • better support for local backends (Gaudi on Remote, Atlas on SGE)
      • bug #60752: j.copy() does not work for latest version of GangaAtlas jobs
        • handling of filenames (appended trailing slash to the dataset name)
        • dev doc? test?
      • bug #49394: startup hangs if ganga.web.cern.ch is unreachable
        • dependencies on external services at startup
        • another: MSG spyware
        • dev doc? test?
    • bug #70438: Keyboard interrupt raised by user killed glite-command
    • bug #69653: Annotating all Objects in Ganga
      • requires discussion about R/W locking with Johannes Eb.
    • bug #53403: change defaults from EDG to GLITE
      • change of general documentation needed
    • bug #50607: ArgSplitter doesn't copy input sandbox info
      • is that a problem for other splitters too?

Overview of general functionalities (Kuba)

* GangaSAGA * GangaKISTI * GangaPlotter

Migration of stuff from LHCb/ATLAS plugins to Core

    • Tasks package (by Johannes)
    • Atlas features (by Mark)
    • LHCb features (by Mike)

New developments (Tuesday)

WebGUI (demo by Ivan)

Best coding practices:

    • general points:
      • we should consider update/creation of documentation: FAQ, cookbook or developer survival guide)
      • we should go through the plugin code and clean it up (remove bad practices)
    • specific issues:
      • don't use object._name in if statements -> use isInstance() instead

Optimization of job submission time:

    • parallel submission of job slices? this could be implemented without touching IBackend, condition: master_submit() and submit() must be thread-safe!
    • parallel submission of subjobs?
    • profile FileWorkspace() creation time (and review if always needed, c.f. Panda backend)
    • Atlas-specific: replace DQ2 lock by DQ client instance in every thread?

CONFIGURED STATE

This feature will be probably most useful for LHCb. In ATLAS however, they have a huge tarfile in /tmp and thus the job workspace is not self contained (which sort of break the concept of "configured" state).

We are just discussing this proposal. It looks reasonable, however it may be not completely trivial to implement.

Our first impressions are:

  1. if an existing user does not use this feature it is transparent to him (so a new job may be directly submitted as before)
  2. j.configure() would put the job in the "configured" state
  3. "configured" = promise to run the same code, including the input sandbox files, as defined at the time of configure()
  4. j.copy() would always give a copy in a "new" state
  5. j.copy_configured() would give a copy in a "configured" state for all statuses of j except "new"
  6. "confgured" job is frozen, mostly read-only, except a few attributes specified by a (new) schema property "reconfigurable"
  7. "reconfigurable" attributes include: inputdata, outputdata, backend, splitter, merger
  8. at the level of input file workspace, we would probably need to distinguish between files which are generated by configure and submit
  9. resubmit() could eventually be implemented taking into account "reconfigurable" attributes (via an internal transition to "configured")

Metadata for Ganga objects

Ganga Tasks in core or as a separate runtime package (GangaTasks)

An idea: pulling out a "automatic resubmit" functionality to be available for simple split jobs (no tasks) with smart strategies on if/when to resubmit.

Data management

Outreach:

    • EGI UF in April 2011: Mini Dev Days or User Days?
    • A Ganga Blog? For attracting new users, and educating ganga users and giving them new ideas how to use ganga, we could ask active developers (and power users) to contribute to a Ganga blog which gives neat examples or announces new or little known features.

Packaging: UBUNTU, DEBIAN, REDHAT?

Cleanup: Which modules are obsolete and can be removed from the release? also external pkgs?

ATLAS-specific

  • General strategy discussion on the future of Ganga in Atlas
  • Writing more and robust test case for all important workflows in GangaPanda and GangaAtlas
  • GangaPanda: review different workflows in Athena and Executable application
  • Better general TRF support either through AthenaMC or Athena.type=TRF on Panda and LCG
  • Review monitoring plug-ins in ATLAS applications
  • script/athena: add support for all Athena, DQ2 and backend options
  • Ganga for ATLAS T3s: US sites with Condor plugins and xrootd-splitter, integration with dashboard monitoring (Kuba)
  • DQ2JobSplitter is getting very complicated and difficult to validate. Should we factorize it into SplitByFileSize, SplitByNumFiles, SplitByNumEvents, etc...
    • Wild idea... What about multi-step/multi-level/recursive splitting? This is a general ganga question. Would work like this. j.splitter = [SplitByNumEvents(),SplitByFileSize]. For this, SplitByNumEvents would act on the master job, but SplitByFileSize would act on each subjob, further splitting if that constraint isn't met.

Defined actions

The actions of the meeting were defined in a set of short term Savannah items as listed below and a set of longer terms wishes where the development effort available to them was not currently visible.

Short term

Longer term

  • Rewrite testing framework
  • Make input and output more flexible
  • Add Ganga into standard Linux distributions such as Ubuntu and Fedora
  • Enhance the GangaTutorial package to become a better starting point for new developers.

Ideas discussed but not formalised

  • Ability to create command line options "ganga --backend=Batch ..." in an automatic way, maybe driven by schema
  • Generalise the client server model in the DiracServer to be a general tool
  • Let the monitoring spyware report running time of jobs

-- UlrikEgede - 25-Aug-2010

Edit | Attach | Watch | Print version | History: r19 < r18 < r17 < r16 < r15 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r19 - 2011-12-01 - MichaelJohnKenyonExCern
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback