Quarterly plan for August-October 2011

Job monitoring area

Historical view

ATLAS

  • Add new sorting attribute , namely new definition of the generic activity (end of August) Done
  • Add # of users metric (end of August) Needs further modifications - ongoing
  • Changes on the UI following the suggestions of Torre , (end of August) ongoing
  • Add new sorting attribute , namely new definition of the generic activity (end of August) Done
  • Redefined Success/Failures page not to include cancelled jobs on the average efficiency and also to include cancelled jobs as a seperate category on the success/failures plots Done
  • Added three new plots on the Success/Failures page: Panda Failures Categories, Panda Failure ExitCodes, Transformation ExitCodes. These plots (request by A. Forti) can be grouped by any of the available grouping by options, i.e. ADC Activities, Sites, Groups, .. Done
  • Added CPU and WC HEPSPEC06 hours usage over the pledges on the Resource Utilisation page. Data retrieved from: http://gstat-wlcg.cern.ch/apps/topology/ Done
  • Normalisation of No. Events Processed in Million Events and No. of Bytes in GBs. Done
  • Various minor bug fixes Done

New version of the application was presented in the Monitoring section of the ATLAS Distributed Computing Software & Computing Workshop and it is currently under validation by the ATLAS community.

CMS

  • Adapt new version of the historical view redesigned for ATLAS to CMS (end of September) This include 1). changing smry tables adding new sorting attributes (data type, CMSSW version), 2).add # of users metric, 3).add site/application failure on the interactive-like plot. 4).Redoing UI

THis task was pospone to the very end of 2011. Prototype of the new version should be ready by mid of February 2012

  • Added monitoring on the performance of the query execution time of the CMS Job Summary application. Done
  • Enabled Google Analytics for the CMS Task Monitoring and Job Summary applicatinos. Done

Data collectors

CMS

  • New version of the CMS job monitoring collectors. First prototype (mid of September, production end of October)

Implementation of a prototype that works at the job granularity level, followed by scale tests on the integration database to discover issues and bottlenecks. Iterating between development and tests till the expected scale of the production activities is reached. Then moving from prototype to production system and re-iterating with tests. Expecting also validation tests. When the system will be ready it will be moved in production. A period of overlap with the current production collectors is foreseen.

Due to the changes in the responsabilities of the group memebers (decreasing manpower in the dashboard team), this task is posponed to the next year.

ATLAS

Two new information collectors:

  • Information Collector that resolves ATLAS Sitenames with ATLAS Panda Queues and Computing Elements. Done.
  • CPU & WallClock HEPSPEC06 and ATLAS Federation Pledges collector. Done

Task monitoring

ATLAS

  • Introduce task concept and properly implement it in the the ATLAS analysis task monitoring (end of August) This includes 2).changes on the DAO and UI levels

  • Start work on the functionality which would allow to resubmit and kill jobs from the task monitoring UI. First step authentication and authorisation (end of September).

Necessary changes in the DAO and collector were implemented and the APIs for retrieving task level information was implemented. The UI implementation is posponed waiting for requirements and suggestions from the ATLAS community.

hBrowse framework

  • Start to develop hBrowse framework v2, planned changes:
    • Improved UI ( End of August) (Completed, Testing)
      • more screen space for data
      • improved appearance
    • Filters reconfiguration, remove "time range" predefined filters ( End of August) (Completed, Testing)
    • Possibility to handle hierarchical structure greater then 2 level deep ( End of September) (Completed, Testing)
    • General re-factoring ( End of September) (Completed, Testing)

Transfer monitoring area

ATLAS DDM Dashboard

  • Periodic cleaner of event tables (mid-September) IN TESTING
  • 2.0 M2.2 release DEPLOYED 06-Oct-2011
    • Additional plots (pie charts, accumulative plots).
    • Additional filtering (exclusion, self reference).
  • 2.0 M2.3 release DEPLOYED 27-Oct-2011
    • UI bug fixes.
    • Maximum data series in plots.
  • 2.0 M3 FINAL release DEPLOYED 09-Nov-2011
    • Drill-down to event details.
    • Pin/unpin date interval.
  • 2.1 release POSTPONED (to be completed before ATLAS SW&C Week in March 2012)
    • Registration statistics.

Global Transfer Monitoring System

  • Development of the data transfer consumer (end of August) DONE
  • Development of the schema for transfer monitoring repository (end of September) DONE
  • Stress tests DONE

Google Earth

Try kinect sensor and software for GoogleEarth (end of September) Initial testing performed. Ongoing

Handling of the Dashboard cluster

  • Deprecate all the SLC4 machines (currently, only 3 left) (end of August)
  • Include code checkers in the svn, rejecting wrong code (at least for python and xml) (mid of August)
  • Configure awstats for all the mahines in the cluster (end of September) It was decided to use GoogleAnalytics instead of awstats.

SSB

  • Including monitoring information of the collectors (end of August)
  • Improve the Plots of the historical values (end of August)
  • Investigate NoSQL alternatives for the historical data (end of October)
  • Investigate Aggregation of old data (end of October)

Tasks performed but not foreseen in the quarterly plan

ATLAS DDM Dashboard

  • Major bug fix: Error statistics and details counts do not always match (#88495).

-- JuliaAndreeva - 26-Jul-2011

Edit | Attach | Watch | Print version | History: r18 < r17 < r16 < r15 < r14 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r18 - 2012-01-12 - DavidTuckett
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    ArdaGrid All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2022 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback