skip to navigation
skip to content

msparser 1.1

Valgrind massif.out parser

Downloads ↓

A parser for Valgrind massif.out files.

Massif Parser

Author:Mathieu Turcotte

The msparser module offers a simple interface to parse the Valgrind massif.out file format, i.e. data files produced the valgrind heap profiler.

How do I use it?

Import the module

As usual, import the module:

>>> import msparser

Parse a massif.out file

To extract the data from a massif.out file, you simply have to give its path to the parse_file function:

>>> data = msparser.parse_file('massif.out')

You could also use the msparser.parse function directly with a file descriptor.

Understand the data

The parsed data is returned as a dictionary which follow closely the massif.out format. It looks like this:

>>> from pprint import pprint
>>> pprint(data, depth=1)
{'cmd': './a.out',
 'desc': '--time-unit=ms',
 'detailed_snapshots_index': [...],
 'peak_snapshot_index': 16,
 'snapshots': [...],
 'time_unit': 'ms'}

The detailed_snapshots_index and peak_snapshot_index fields allow efficient localisation of the detailled and peak snapshots in the snapshots list. For example, to retrieve the peak snapshot from the snapshots list, we could do:

>>> peak_index = data['peak_snapshot_index']
>>> peak_snapshot = data['snapshots'][peak_index]

The snapshots list stores dictionaries representing each snapshot data:

>>> second_snapshot = data['snapshots'][1]
>>> pprint(second_snapshot)
{'heap_tree': None,
 'id': 1,
 'mem_heap': 1000,
 'mem_heap_extra': 8,
 'mem_stack': 0,
 'time': 183}

If the snapshot is detailled, the heap_tree field, instead of being None, will store a heap tree:

>>> peak_heap_tree = peak_snapshot['heap_tree']
>>> pprint(peak_heap_tree, depth=3)
{'children': [{'children': [...], 'details': {...}, 'nbytes': 12000},
              {'children': [], 'details': {...}, 'nbytes': 10000},
              {'children': [...], 'details': {...}, 'nbytes': 8000},
              {'children': [...], 'details': {...}, 'nbytes': 2000}],
 'details': None,
 'nbytes': 32000}

On the root node, the details field is always None, but on the children nodes it's a dictionary which looks like this:

>>> first_child = peak_snapshot['heap_tree']['children'][0]
>>> pprint(first_child['details'], width=1)
{'address': '0x8048404',
 'file': 'prog.c',
 'function': 'h',
 'line': 4}

Obviously, if the node is below the massif threshold, the details field will be None.

Putting It All Together

From this data structure, it's very easy to write a procedure that produce a data table ready for Gnuplot consumption:

print("# valgrind --tool=massif", data['desc'], data['cmd'])
print("# id", "time", "heap", "extra", "total", "stack", sep='\t')
for snapshot in data['snapshots']:
    id = snapshot['id']
    time = snapshot['time']
    heap = snapshot['mem_heap']
    extra = snapshot['mem_heap_extra']
    total = heap + extra
    stack = snapshot['mem_stack']
    print('  '+str(id), time, heap, extra, total, stack, sep='\t')

The output should looks like this:

# valgrind --tool=massif --time-unit=ms ./a.out
# id    time    heap    extra   total   stack
  0     0       0       0       0       0
  1     183     1000    8       1008    0
  2     184     2000    16      2016    0
  3     184     3000    24      3024    0
  4     184     4000    32      4032    0
  5     184     5000    40      5040    0
  6     184     6000    48      6048    0
  7     184     7000    56      7056    0
  8     184     8000    64      8064    0
  9     184     9000    72      9072    0

Changelog

  • 1.1 [2011-01-13]
    • cleaned up exception throwing code
    • fixed and refactored some regular expressions
  • 1.0 [2011-01-11]
    • initial release
 
File Type Py Version Uploaded on Size # downloads
msparser-1.1.tar.gz (md5) Source 2011-01-13 5KB 377