Using Valgrind to check the Persistency software

Valgrind emulates a CPU and thus can trace every command, checking for different kinds of errors. Several modules exists for Valgrind which check different types errors in a program. The most prominent is "Memcheck" which detects error in the memory management, but there are also modules for measuring and improving the performance of a program.

To run a program with valgrind and the "Memcheck" tool enabled, I suggest the following command line:

  valgrind -v --leak-check=full --show-reachable=yes --error-limit=no --log-file=cooltest.log \
    --num-callers=50 --suppressions=contrib/Valgrind/cool.supp test_RelationalCool_RalDatabase
This command line increases the verbosity, performs a full leak check with no error limit, and redirects the valgrind output into the file "cooltest.log". Some errors in libraries are suppressed by using the suppression file "cool.supp". It also increases the number of methods displayed in the stack traces. You may also add --track-origins=yes to track the origin of uninitialised values.

The memcheck tool from valgrind reports several types of errors:

  • use of uninitialised values (ValueN), or uninitialized CPU condition flags (Cond)
  • invalid addresses during memory access (AddrN) or jumps to unaddressable locations (Jump)
  • invalid system call parameters (Param)
  • invalid or mismatching calls to free (Free)
  • overlapping memory regions in memcpy calls or similar (Overlap)
  • memory leaks (Leak)

Valgrind finds errors in all parts of a program, including libraries and of course some are false positives. Because of the false positives, and the fact that we can't fix bugs in all the libraries I created a suppression file which suppresses the output of certain errors (the full file can be found in the COOL CVS contrib/Valgrind/cool.supp):

# Oracle errors
{
    <general_oracle_Addr8_suppression>
   Memcheck:Addr8
   ...
   obj:*libclntsh.so.11.1
}
{
   <oracle_slts_tls_getaddr>
   Memcheck:Addr4
   fun:slts_tls_getaddr
   fun:sltsqKeyAdd
   fun:sltskys
   obj:*libclntsh.so.11.1
   fun:kpeDbgProcessInit
   fun:kpummpin
   fun:kpuenvcr
}
[...]

Each suppression block in enclosed in '{' '}', the first line is the name of the suppresion, followed by a line with the tool name and the type of the suppression. Then follow lines describing the call stack to of the place the error occurs. Comments are marked with an '#'.

When you add the parameter '--gen-suppressions=all', Valgrind will generate a suppression block for each error. Usually it makes sense to generalize these blocks to make them apply to whole classes of errors. You can use asterisks '*' and question marks '?' as wildcards in function and object file (library) names, and '...' as wildcard for functions, or object files in the call stack. You should also replace the full path to the libraries by an asterisk.

A good example for a catch all invalid address access of size 8 in the Oracle libraries is:

# Oracle errors
{
    <general_oracle_Addr8_suppression>
   Memcheck:Addr8
   ...
   obj:*libclntsh.so.11.1
}
It will suppress all invalid accesses of size 8 in Oracles libclntsh.so library. Valgrind reports how many times a suppression block was used during each run.

For details on suppression file creation, see http://valgrind.org/docs/manual/manual-core.html#manual-core.suppress and http://valgrind.org/docs/manual/mc-manual.html#mc-manual.suppfiles.

Examples where Valgrind was successfully used

Valgrind was successfully used to analyze and fix crashes in OracleAccess due to the use of already deleted OCI handles (e.g. bug #94385). A suppression file was generated as suggested by Martin, using the '--gen-suppressions=yes' option and manually validating the suppression blocks so that a run with no crash would produce no valgrind error (th eresulting suppression file is in the CORAL CVS Tests/cmt/valgrind_libclntsh.supp). The cash was intermittent, so the test had to be rerun several times. When a crash appeared, valgrind first reported that some memory used by an OCI call (e.g. OCIStmtExecute) had already been deleted by a previous OCI call (OCIHandleFree), making it possible to understand which specific handle (OCISvcCtx or OCISession) was responsible for that particular crash.

Valgrind false positives

As discussed in bug #49692, the reports from Valgrind that mention "Invalid read of size N. Address 0x... is M bytes inside a block of size O alloc'd" may be false positives due to optimized reads inside externally built libraries. Several such reports have been seen associated to the Oracle client libraries, and should probably be suppressed.

-- MartinWache and AndreaValassi

Child topics (in alphabetical order):

Edit | Attach | Watch | Print version | History: r7 < r6 < r5 < r4 < r3 | Backlinks | Raw View | WYSIWYG | More topic actions
Topic revision: r7 - 2012-11-11 - AndreaValassi
 
    • Cern Search Icon Cern Search
    • TWiki Search Icon TWiki Search
    • Google Search Icon Google Search

    Persistency All webs login

This site is powered by the TWiki collaboration platform Powered by PerlCopyright &© 2008-2023 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
or Ideas, requests, problems regarding TWiki? use Discourse or Send feedback