Testing and Metrics Guidelines

Internal Metrics Report

Metrics-Report-v0.20.docx: Includes language specific analyzers 7th September 2010
Metrics-Report-v0.20.pdf: PDF version of docx version 0.20 7th September 2010
Metrics-Report-v0.19.docx: includes cyclomatic complex and static code metrics 7th September 2010
Metrics-Report-v0.19.pdf: PDF version, Note OpenOffice cannot handle .docx version 7th September 2010

Meetings to discuss outstanding issues

  • Meeting on August 12th at 10:30am CEST*

Main action points

Bug-tracking metrics

  • UNICORE/ARC/dCache granularity issues: Can you please advice as to whether it is viable to change your bug tracking system relating to the following issues?
    • Granularity of the bug tracking systems must be sufficient to obtain all bug-tracking related metrics.
    • In particular, detection area, bug state and bug severity level must be defined as described in Table 3, page 9 of: https://twiki.cern.ch/twiki/pub/EMI/TSA23/Metrics-Report-v0.14.docx, if we want to produce all bug-tracking related statistics. Otherwise, we need to discuss alternatives that are agreeable across all middlewares.
  • SA2.3 issue: TotalBugDistribution still needs to be defined adequately. It may be possible to integrate into ETICS.

Build-system metrics

  • General Comment: The integration of the middlewares into one build system will make it possible to provide all build related metrics from one central location. This is an enormous plus for metric provision in terms of manpower overheads.
  • Developer related issue (No action yet): FindBugs can be made available for Java as a metric, but we may want to define other metrics such as and produce a new metric for each programming language. These metrics do not have to be produced in an QA report, but may be very useful for developers. The TESTCOVERAGE metric is currently more meaningful for the overall project.
  • SA2.2 related issue: Can you please express your preference for the 4 options below and explain the reason for your opinion? Supported platforms metric, per PEB endorsed platform, per product team can be obtained either:
    1. manually by stating which services/clients are available based on the information in the release pages.
    2. automatically by parsing the stated services/clients per platform in the release pages.
    3. automatically by ETICS, seeing how many tarballs exist on the release date, in the EMI repository, for a component/meta-package on each supported platform.
    4. automatically by ETICS, seeing what tarball and package management package are produced by a certified build (current in gLite this is obtained from the glite_3_2_cert branch).

GGUS user related metrics

  • Defined to be beyond the remit of EMI.

Deferred/Removed metrics

  • SA2.3 deferred metrics: Reusability and interoperation metrics can not be defined presently until a core service and interoperation technical plan is in progress. It would be good to see the evolution of this work in EMI-0, EMI-1,... going forward so that the results can be presented clearly in reviews, and to allow the QA group to monitor its progress over time.
  • MEMORYLEAK metric removed: Valgrind is not viable to implement. We need to document why this is the case, so that this explanation is available to this project and later projects.
  • Open POSIX test suite removed: test suite is not useful to assess the POSIX conformance of component in each product team.

What Metrics?

Equation conventions for metrics table
Value Meaning
Time a in Days
The set of all bugs in the bug tracker
A restricted to the condition B
The cardinality of the set A (the total items in the set)

The following table is an attempt to clarify the outstanding items that need clarification. In particular, the goal of the table is to provide a description of what is to be measured, how it is defined as a metric, the threshold to set on the metric and how it is obtained from within each middleware. Whether metrics are automated/manually collected should fall out, as the table is populated.

Grouping Summary Metric ID Calculation of measurement Metric calculation (and units) Metric Thresholds Middleware Availability
ARC dCache gLite UNICORE
SA2.2 QA Process Delay on the release schedule DELAYONTHERELEASE Histogram and single value, pre release (in days) PEB defined threshold, working days might be useful starting point. All middlewares: should attempt to produce the relevant repositories and documentation in time, this item depends on Metrics SUPPORTEDPLATFORMS and UPDATEDOC
Number of bugs closed relative to number of bugs opened, between previous release and current release BACKLOG Single real valued unit, for the whole project and per product team Backlog is acceptable, (per Product team) needs investigation/explanation ? ? Needs work: will require some modification of the the savannah statistics, Name:bugStateOpenClosedPage.php code example ?
Metric shows whether up-to-date document is present with a release UPDATEDOC Single rational (P/Q) value All middlewares: Value should be to avoid delayed releases
Bug Tracking bugs found during integration over found in production, per release (May be better to do away with percentage??) CERTIFICATIONTESTSEFFECTIVENESS Single rational value (P/Q) per release, per component/metapackage ? ? Needs work: will require some modification of the the savannah statistics, Name:bugHistoryPerDetectionArea.php code example ?
Average time to handle an immediate priority bug PRIORITYBUG where is the set of all bugs in the bug tracker Single value (in hours, per component, per release) <= a PEB recommended value (Caveat: zero if N=0) ? ? Needs work: will use Savannah bug tracking statistics. Name:bugCategory.php code example ?
Distribution of bugs per severity levels (see table below) BUGSEVERITYDISTRIBUTION Set of bugs 2-d Histogram of severity level versus the number of bugs with that severity level. <= a PEB recommended value, or we could use the mean for past two years from EGEE-III as a starting point ? ? Needs work: will use Savannah bug tracking statistics. Name:bugServerityStatistics.php code example ?
Total Bug Density TOTALBUGDENSITY Amount of bugs per KLOC could be very difficult to obtain?
Bug density per release BUGDENSITYPERRELEASE Amount of bugs per component per release could be achieved using patching info?
Build System Number of supported platforms SUPPORTEDPLATFORMS PT produces meta-packages/component Single integer value (a unit) per component or meta-package, per product team Number of recommended PEB platforms Packages from all 4 middlewares must be in the EMI repository in time
Test Coverage Metric TESTCOVERAGE (???) Should this include FindBugs, etc Single real value (a unit), normalised Some industry standards specify values between and with as critical. PEB could suggest up to 80% by end of project. Need an ETICS tool to provide this information
Valgrind memory leakage metric MEMORYLEAK On hold?
Percentage of failing components per platform per release of a product team (PT) SUCCESSFULBUILDS Single valued percentage per PEB defined platform: Strictly, in certification branch, if in development branch, this needs ongoing monitoring, per release ? ? Needs snapshot at time of release, needs information on number of components per PT ?
Code comments Metric CODECOMMENTS Single real valued per language, per component, per product team Starting threshold could be set at per component SA2.5 and SA1 must monitor these values for components with badly lacking documentation
Core middleware metric Not currently defined
Overlapping middleware metric Not currently defined
User incidents (GGUS) Total user incidents per user month TOTALUSERINCIDENTS Outside of scope of EMI N/A N/A N/A N/A N/A N/A
Training and support incident per user month TRAININGSUPPORTINCIDENTS Outside of scope of EMI N/A N/A N/A N/A N/A N/A
Average time to deal with an incident at the 3rd level of user support AVERAGETIMEFORUSERINCIDENTS Outside of scope of EMI N/A N/A N/A N/A N/A N/A

Severity levels "0" => "New feature request", "1" => "Cosmetic", "2" => "Minor", "3" => "Normal", "4" => "Major", "5" => "Critical"

  • The current list is not complete as specified in the DSA2.1 document.
  • It would be good to have combined middleware and core service metrics so that we can produce interoperability and reusability quality factors.
  • This needs future thought and discussion in a meeting.

Software, process, language specific?

  • Is the test coverage and unit tests specific to each language enough?
  • Or do we need specific tools per language to gather the information?
  • Answer: See developer metrics issues at top of page.

How to define thresholds?

  • Must be very conservative initially in my opinion, based on some industry standard where appropriate, but needs PEB reassessment over time with feedback from SA2.5. As we've said, we can unit test 90% of a component and miss out on the important software, or we can produce unit tests that do essentially nothing. So the tests have to be spot checked.
  • Answer: from the table above it seems that SA2.2 and the PEB can fully stipulate any values, with SA2.3 providing suggested starting values.

Is an integrated testing environment required? How?

  • In other words, does ETICS, Hudson, Maven and Mock allow us to track the results of unit testing, coverage testing, single machine certification testing, multi-machine product team and certification testing? And, can each of these types of testing environments be integrated into something like ETICS?
  • Answer: it appears from the integration task-force proposal that there will be a centralised build in ETICS, so this could pave the way for an integrated testing environment.

How to generate metrics? Automatically or Manually?

  • This has to be a mixture of both. But we need to be clear on what has to be done manually so that we have individual parties to produce the metrics in time for each SA2 quality assurance report.
This will need to be decided per individual metric.
  • Answer: It would be best if we can produce all the metric automatically. In the case of the build-system, ETICS should be able to produce everything automatically. The case of the bug-tracker is still pending the feedback from UNICORE/ARC and dCache. Savannah statistics should provide everything automatically for gLite.

Who generates metrics?

  • If automatically, SA2.3 could assemble the information over time. If manually, we need an expert per middleware, or even better we need an expert to dedicate time to SA2.3 and provide the information internally.
Do people agree on this?
  • Answer: Work required to change the granularity of the bug-tracking systems to provide a uniform detection area, severity levels and bug states may then allow SA2.3 to generate all the metrics automatically. SA2.2 will be able to provide a number of statistics based on their release documentation and delays in releases.

How to uniformly extract metrics from Bug Trackers?

  • Answer: we don't know yet for sure. A query to every bug-tracker to produce single values would be useful. In the case of 2-d histograms a sequence of queries could be defined. It may be best to produce the results automatically per bug-tracker using some form of scripting language.

What format to use to communicate, store and display metrics?

  • Answer: based on the units of measurements in the table above: a single value for non-time varying metrics or a bar-chart/histogram for time-varying distributions.

What level of continuous integration? Are Nightly builds/tests OK? Test on commit?

  • Answer: Based on the table above, its shaping up for metrics at the time of each release, with some metrics being performed nightly to glean extra information.

What tools?

  • Answer: Tools inside ETICS for the build-system, php for bug-tracking. Extra metric tools for Valgrind and Open POSIX test suite have been dropped due to their ineffectiveness/unusability.

-- EamonnKenny - 12-Aug-2010

Topic attachments
I Attachment History Action Size Date Who Comment
Unknown file formatdocx Metrics-Report-v0.19.docx r2 r1 manage 330.8 K 2010-09-07 - 17:06 UnknownUser  
Edit | Attach | Watch | Print version | History: r13 < r12 < r11 < r10 < r9 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r13 - 2010-09-07 - unknown
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    EMI All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright & 2008-2021 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback