Quarterly plan for November 2011-January 2012
Job monitoring area
Historical view
ATLAS
Multiple bug fixes and feature requests implemented. See report from Eddie for details.
CMS
- Adapt new version of the historical view redesigned for ATLAS to CMS
This include 1). changing smry tables adding new sorting attributes (data type, CMSSW version), 2).add # of users metric, 3).add site/application failure on the interactive-like plot. 4).Redoing UI (Deploy prototype for validation - middle of February)
Interactive interface
CMS
- Develop new version of the interactive interface in the hBrowser framework (end of January) (done mid December, currently under validation)
- Evaluate interactive UI performance quering denormalized DB object (table or materialized view) (mid of January) (first checks done, next steps poponed waiting for migration to 11g and new hardware in the end of January)
- In case of positive results of evaluation (point 2) rewrire interactive UI DAO part using denormalized DB object (end of January)
Task monitoring
ATLAS
- Develop stored procedures for aggergating data of the analysis task for increasing of the UI performance (?)
- Prototype first version of the alanysis task monitoring (end of February?)
Transfer monitoring area
ATLAS DDM Dashboard
- 2.0 M3.1 release (DELIVERED 15-Dec-2011)
- Features:
- Details links are now all real (non-JavaScript) and can be opened in a new tab or window.
- Dates are locked by default when selecting details and user is alerted.
- Details queries tuned to achieve acceptable response times for full details time range (3 months).
- Bug fixes:
- Special characters in SURLs from the details page prevent effective copying and pasting.
- Tabs fail to refresh correctly when resizing browser window.
- Other:
- Google Analytics set up.
- Data model refactored to simplify development.
- 2.0 M3.2 release (POSTPONED awaiting Modularization of UI code see below)
- Links to monitoring data via Web API in JSON/XML formats.
- Help page for Web API.
- Live preview of endpoint selection.
- Modularization of UI code (ONGOING expected Feb-2012)
- Allow parallel development of UI for DDM Dashboard and Global Transfer Monitoring.
Global Transfer Monitoring System
- Deploy first prototype covering the complete data flow form FTS data publisher to the UI. (DELIVERED 16-Nov-2011)
- Perform consistency checks between new WLCG Transfer Dashboard and PhEDEx and DDM Dashboard systems (DELIVERED 16-Dec-2011 and ongoing)
- Following deployment of the FTS 2.2.8 to all T1 sites make sure that information from all FTS instances is collected in the WLCG Transfer Dashboard (POSTPONED awaiting FTS 2.2.8 deployment)
- Enable alarms in case information of any FTS instance is missing or delayed (DELIVERED 23-Jan-2012)
- Development of the new features in the UI following the feedback of the LHC experiments
- Add filtering / grouping by country (DELIVERED 27-Jan-2012)
Google Earth
Try kinect sensor and software for
GoogleEarth (end of September) *Initial testing performed. Ongoing
Handling of the Dashboard cluster
Tasks performed but not foreseen in the quarterly plan
Dashboard
ATLAS DDM Dashboard
- Testing for 11g database upgrade.
Global Transfer Monitoring System
- Switch to production MSG brokers with virtual queues. DELIVERED 16-Dec-2011
- Set up dashboard46 as showcase integration service with best-effort support. DELIVERED 27-Jan-2012
- A lot of improvement and bug fixes done in the SiteView collectors and monitoring display , see details in Eddi's input. (January)
- SiteView is upgraded to the latest SSB version which would allow to enable another display with historical distribution (SSB - like) as requested by operations TEG working group (January)
Detailed input from team members
Eddie
Google Earth
Fixed a problem with Austrian grid sites not appearing in Google Earth GGUS #76061
Job monitoring area
CMS
Started the cleaning of the old CMS Dashboard job monitoring data.
Initial cleaning done, ongoing
- Job Summary / Interactive Vew: Created a denormalised table with one week's data for testing/benchmarking purposes. Created db bitmap indices and indices on that table and changed the Database Access Object of the application to query on that table.
ATLAS
- Fixed problems with ATLAS analysis build jobs: ( #88851 & #88887).
- Fixed bug #89936: Jobs with unresolved site
- Regular meetings with the DBAs about performance problems
- Optimised a procedure which calculates the hourly summaries. Created a procedure that adds daily partitions on the JOB table for up to a month.
Historical view
ATLAS
New version ready for testing: many bug fixes and new features:
#124503 thumbnail of successes/failures for gangarobot
#88668 ATLAS Hist.Views Prototype: Normalisation of running/pending jobs with monthly granularity is wrong.
#88696 ATLAS Hist.Views Prototype: Exception on the Resource Utilisation page
sr#124658 success/all efficiency > 1
sr#124739 legends too small with all sites selected
sr #124784: average efficiency > 1.
#89282: legends too large/difficult to read.
#89234: HEPSPEC Average Coefficient is supposed to fluctuate over time.
#89276: Increase the thickness of the 'pledges' line.
#89290: Wrong individual CPU consumption plot when grouping by ADC Activity.
#89296: Minor problem with the pledges summary table.
#89487: A user should be able to adjust the total number of legends in the plots.
#89814 - plot doesn't take into account the selected granularity on the resource utilisation page, running jobs plot.
#90204: Historical Views: Wrong sites<->patttern association for Romania federation
#90244: Add 3 missing gangarobot activities on the ATLAS Job Monitoring
#90394: ATLAS Historical Views: Generate a url hash for every user selection
- Fixed the HEPSPEC06 values of the IL-HEP Tier-2 Federation sites: GGUS 77079.
- Tweaked and improved the ATLAS HEPSPECs and Pledges collector.
Also added the possibility to select All T2s+T1s and All T2s+T1s+T0 from the selection menu.
CMS
Fixed bug #89516: Plots of the CPU Consumption should follow the selected granularity and present the results in 'hours/hour', 'days/day','weeks/week' and 'months/month'.
Task monitoring
New minor release for CMS targeting bug #89568: issue with non-UTC time.
Fixed broken ATLAS and CMS collectors for the
GridMap SiteView application (#88584 & #88633).
Fixed #89802 LHCb Site Status collector does not work
Fixed #89889:
- LHCb Collector was using an outdated topology
- ALICE Collector for the job processing was wrong: collecting wrong number from
MonALISA.
- ATLAS Collector for the job processing was wrong: not taking into account terminated jobs but the submitted ones.
- CMS Collector for the job processing was wrong: not taking into account terminated jobs but the submitted ones.
Fixed #90348: Direct links to ATLAS DDM 2 Dashboard.
Others
- EGI CF2011 dashboard abstract.