ListComparator

Compares ordered lists, xml and csv application

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Detailed Documentation

XML and CSV comparisons

Two scripts are provided xml_cmp and csv_cmp They both compares 2 files and outputs delta as file_suppr, file_addon and file_changes

the extension is forced to xml or csv respectively

List comparison

listcomparator provides a Comparator object that allows to find the differences between two lists provided the elements of the lists appear in the same order

>>> old = [1, 2, 3, 4, 5, 6]
>>> new = [1, 3, 4, 7, 6]

>>> from listcomparator.comparator import Comparator

Let’s create a Comparator object

>>> comp = Comparator(old,new)

The check method gives values to additions and deletions attributes

>>> comp.check()
>>> comp.additions
[7]
>>> comp.deletions
[2, 5]

We can also use lists of lists

>>> old_list = [['62145', 'azerty'], ['1234', 'qwerty'], ['9876', 'ipsum']]
>>> new_list = [['62145', 'azerty'], ['1234', 'qwertw'], ['4865', 'lorem']]
>>> comp = Comparator(old_list, new_list)
>>> comp.check()
>>> comp.additions
[['1234', 'qwertw'], ['4865', 'lorem']]
>>> comp.deletions
[['1234', 'qwerty'], ['9876', 'ipsum']]

We can have an issue when a modification, in our case “qwerty” became “qwertz”, appears in both outputs, comp.additions and comp.deletions. You might want to consider this a change. Comparator can handle this and filter out such cases if you provide a function that tells Comparator how to recognize such cases In our example, we consider 2 elements to be the same if the first element of the list is the same, a kind of id.

>>> def my_key(x):
...     return x[0]
...

The getChanges methods then provides a new attribute : changes

>>> comp.getChanges(my_key)
>>> comp.changes
[['1234', 'qwertw']]

of course, additions and deletions stay unchanged

>>> comp.additions
[['1234', 'qwertw'], ['4865', 'lorem']]
>>> comp.deletions
[['1234', 'qwerty'], ['9876', 'ipsum']]

You might want to consider only ‘pure’ additions and deletions getChanges allows for a keyword argument ‘purge’ that does just that

>>> comp.getChanges(my_key, purge=True)
>>> comp.changes
[['1234', 'qwertw']]
>>> comp.additions
[['4865', 'lorem']]
>>> comp.deletions
[['9876', 'ipsum']]

The old and new attributes store the lists to be compared you might want to reset those, Comparator provides a purgeOldNew method to clear up memory

>>> comp.old
[['62145', 'azerty'], ['1234', 'qwerty'], ['9876', 'ipsum']]
>>> comp.new
[['62145', 'azerty'], ['1234', 'qwertw'], ['4865', 'lorem']]
>>> comp.purgeOldNew()
>>> comp.old
>>> comp.new

compare XML files

Comparator can be used to compare xml files let’s make two xml files describing books

>>> old='''<?xml version="1.0" ?>
... <infos>
... <book><title>White pages 1995</title>
... <author>
... <surname>La Poste</surname>
... </author>
... <chapter><title>Paris</title>
... <para>ABEL Antoine 82 23 44 12</para>
... <para>ABEL Pierre 82 67 23 12</para>
... </chapter>
... </book>
... <book><title>Yellow pages 2007</title>
... <author>
... <surname>La Poste</surname>
... </author>
... <chapter><title>Bretagne</title>
... <para>Zindep 82 23 44 12</para>
... <para>ZYM 82 67 23 12</para>
... </chapter>
... </book>
... <book><title>Dark pages 2007</title>
... <author>
... <surname>La Poste</surname>
... </author>
... <chapter><title>Greves</title>
... <para>SNCF 82 23 44 12</para>
... </chapter>
... </book>
... </infos>
... '''

>>> new='''<?xml version="1.0"?>
... <infos>
... <book><title>White pages 1995</title>
... <author>
... <surname>La Poste</surname>
... </author>
... <chapter><title>Paris</title>
... <para>ABIL Antoine 82 23 44 12</para>
... <para>ABEL Pierre 82 67 23 12</para>
... </chapter>
... </book>
... <book><title>Yellow pages 2007</title>
... <author>
... <surname>La Poste</surname>
... </author>
... <chapter><title>Bretagne</title>
... <para>Zindep 82 23 44 12</para>
... <para>ZYM 82 67 23 12</para>
... </chapter>
... </book>
... <book><title>Blue pages 2007</title>
... <author>
... <surname>La Poste</surname>
... </author>
... <chapter><title>Bretagne</title>
... <para>Mer 82 23 44 12</para>
... <para>Ciel 82 67 23 12</para>
... </chapter>
... </book>
... </infos>
... '''

elementtree is required to parse xml

>>> from elementtree import ElementTree as ET

for this test we’ll use cStringIO rather than a file

>>> import cStringIO
>>> ex_old = cStringIO.StringIO(old)
>>> ex_new = cStringIO.StringIO(new)

we parse contents

>>> root_old = ET.parse(ex_old).getroot()
>>> root_new = ET.parse(ex_new).getroot()

the “book” tag identifies objects we want >>> objects_old = root_old.findall(‘book’) >>> objects_new = root_new.findall(‘book’)

as we can’t compare 2 objects, we stringify them

>>> objects_old = [ET.tostring(o) for o in objects_old]
>>> objects_new = [ET.tostring(o) for o in objects_new]

from there, Comparator is usefull

>>> my_comp = Comparator(objects_old, objects_new)
>>> my_comp.check()

>>> for e in my_comp.additions:
...     print e
...
<book><title>White pages 1995</title>
<author>
<surname>La Poste</surname>
</author>
<chapter><title>Paris</title>
<para>ABIL Antoine 82 23 44 12</para>
<para>ABEL Pierre 82 67 23 12</para>
</chapter>
</book>
<BLANKLINE>
<book><title>Blue pages 2007</title>
<author>
<surname>La Poste</surname>
</author>
<chapter><title>Bretagne</title>
<para>Mer 82 23 44 12</para>
<para>Ciel 82 67 23 12</para>
</chapter>
</book>
<BLANKLINE>

>>> for e in my_comp.deletions:
...     print e
...
<book><title>White pages 1995</title>
<author>
<surname>La Poste</surname>
</author>
<chapter><title>Paris</title>
<para>ABEL Antoine 82 23 44 12</para>
<para>ABEL Pierre 82 67 23 12</para>
</chapter>
</book>
<BLANKLINE>
<book><title>Dark pages 2007</title>
<author>
<surname>La Poste</surname>
</author>
<chapter><title>Greves</title>
<para>SNCF 82 23 44 12</para>
</chapter>
</book>
<BLANKLINE>

we need to know wich tag is used to uniquely define an object here we choose to use the “title” tag

>>> def item_signature(xml_element):
...     title = xml_element.find('title')
...     return title.text
...

we build our custom function for use by the Comparator

>>> def my_key(str):
...     file_like = cStringIO.StringIO(str)
...     root = ET.parse(file_like)
...     return item_signature(root)
...

then the getChanges method of the Comparator becomes available

>>> my_comp.getChanges(my_key, purge=True)

What books have been exclusively added ?

>>> for e in my_comp.additions:
...     print e
...
<book><title>Blue pages 2007</title>
<author>
<surname>La Poste</surname>
</author>
<chapter><title>Bretagne</title>
<para>Mer 82 23 44 12</para>
<para>Ciel 82 67 23 12</para>
</chapter>
</book>
<BLANKLINE>

what books have been exclusively removed ?

>>> for e in my_comp.deletions:
...     print e
...
<book><title>Dark pages 2007</title>
<author>
<surname>La Poste</surname>
</author>
<chapter><title>Greves</title>
<para>SNCF 82 23 44 12</para>
</chapter>
</book>
<BLANKLINE>

what books have changed ? that is have same title, but different other values

>>> for e in my_comp.changes:
...     print e
...
<book><title>White pages 1995</title>
<author>
<surname>La Poste</surname>
</author>
<chapter><title>Paris</title>
<para>ABIL Antoine 82 23 44 12</para>
<para>ABEL Pierre 82 67 23 12</para>
</chapter>
</book>
<BLANKLINE>

then we can put those results back in xml file

This code conforms to PEP8
It is fully tested, 100% coverage
A Buildbot runs tests at each commit

Contributors

Main developpers

Nicolas Laurance <nlaurance at zindep dot com>

with contributions of

Yves Mahe <ymahe at zindep dot com>

Change history

New in 0.1

First Release

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.1

Dec 13, 2009

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ListComparator-0.1.tar.gz (10.9 kB view hashes)

Uploaded Dec 13, 2009 Source

Hashes for ListComparator-0.1.tar.gz

Hashes for ListComparator-0.1.tar.gz
Algorithm	Hash digest
SHA256	`1b17dad959d0963261a8e8e08c06018329d413461d0f0facd77fb206b66074fb`
MD5	`e8b3b57781101ab6eeeda7e9fb807e27`
BLAKE2b-256	`6ec629c3bbc181c6b24dcfb8b46a76d66bf2c83f3802f727de727bbb41841688`