Skip to main content

Diff and patch tables

Project description

[![Build Status](https://travis-ci.org/paulfitz/daff.svg?branch=master)](https://travis-ci.org/paulfitz/daff)
[![NPM version](https://badge.fury.io/js/daff.svg)](http://badge.fury.io/js/daff)
[![Gem Version](https://badge.fury.io/rb/daff.svg)](http://badge.fury.io/rb/daff)
[![PyPI version](https://badge.fury.io/py/daff.svg)](http://badge.fury.io/py/daff)

daff: data diff
===============

This is a library for comparing tables, producing a summary of their
differences, and using such a summary as a patch file. It is
optimized for comparing tables that share a common origin, in other
words multiple versions of the "same" table.

For a live demo, see:
> http://paulfitz.github.com/daff/

Download the code for your preferred language here:
> https://github.com/paulfitz/daff/releases

For certain languages you can use the command-line:
````sh
npm install daff -g # node/javascript
pip3 install daff # python3
gem install daff # ruby
````

Or use the library to view csv diffs on github via a chrome extension:
> https://github.com/theodi/csvhub

The diff format used by `daff` is specified here:
> http://dataprotocols.org/tabular-diff-format/

This library is a stripped down version of the coopy toolbox (see
http://share.find.coop). To compare tables from different origins,
or with automatically generated IDs, or other complications, check out
the coopy toolbox.

The program
-----------

You can run `daff`/`daff.py`/`daff.rb` as a utility program:
````
$ daff
daff can produce and apply tabular diffs.
Call as:
daff [--output OUTPUT.csv] a.csv b.csv
daff [--output OUTPUT.csv] parent.csv a.csv b.csv
daff [--output OUTPUT.jsonbook] a.jsonbook b.jsonbook
daff patch [--output OUTPUT.csv] source.csv patch.csv
daff trim [--output OUTPUT.csv] source.csv
daff render [--output OUTPUT.html] diff.csv

If you need more control, here is the full list of flags:
daff diff [--output OUTPUT.csv] [--context NUM] [--all] [--act ACT] a.csv b.csv
--context NUM: show NUM rows of context
--all: do not prune unchanged rows
--act ACT: show only a certain kind of change (update, insert, delete)

daff render [--output OUTPUT.html] [--css CSS.css] [--fragment] [--plain] diff.csv
--css CSS.css: generate a suitable css file to go with the html
--fragment: generate just a html fragment rather than a page
--plain: do not use fancy utf8 characters to make arrows prettier
````

Using with git
--------------

Run `daff git csv` to see how to use daff to improve `git`'s handling
of csv files.

````
$ daff git csv
You can use daff to improve git's handling of csv files, by using it as a
diff driver (for showing what has changed) and as a merge driver (for merging
changes between multiple versions). Here is how.

Create and add a file called .gitattributes in the root directory of your
repository, containing:

*.csv diff=daff-diff
*.csv merge=daff-merge

Create a file called .gitconfig in your home directory (or alternatively
open .git/config for a particular repository) and add:

[merge "daff-merge"]
name = daff tabular merge
driver = daff merge --output %A %O %A %B

[diff "daff-diff"]
command = daff diff --git

Make sure you can run daff from the command-line as just "daff" - if not,
replace "daff" in the driver and command lines above with the correct way
to call it.
````

The library
-----------

You can use `daff` as a library from any supported language. We take
here the example of Javascript. To use `daff` on a webpage,
first include `daff.js`:
```html
<script src="daff.js"></script>
```
Or if using node outside the browser:
```js
var daff = require('daff');
```

For concreteness, assume we have two versions of a table,
`data1` and `data2`:
```js
var data1 = [
['Country','Capital'],
['Ireland','Dublin'],
['France','Paris'],
['Spain','Barcelona']
];
var data2 = [
['Country','Code','Capital'],
['Ireland','ie','Dublin'],
['France','fr','Paris'],
['Spain','es','Madrid'],
['Germany','de','Berlin']
];
```

To make those tables accessible to the library, we wrap them
in `daff.TableView`:
```js
var table1 = new daff.TableView(data1);
var table2 = new daff.TableView(data2);
```

We can now compute the alignment between the rows and columns
in the two tables:
```js
var alignment = daff.compareTables(table1,table2).align();
```

To produce a diff from the alignment, we first need a table
for the output:
```js
var data_diff = [];
var table_diff = new daff.TableView(data_diff);
```

Using default options for the diff:
```js
var flags = new daff.CompareFlags();
var highlighter = new daff.TableDiff(alignment,flags);
highlighter.hilite(table_diff);
```

The diff is now in `data_diff` in highlighter format, see
specification here:
> http://share.find.coop/doc/spec_hilite.html

```js
[ [ '!', '', '+++', '' ],
[ '@@', 'Country', 'Code', 'Capital' ],
[ '+', 'Ireland', 'ie', 'Dublin' ],
[ '+', 'France', 'fr', 'Paris' ],
[ '->', 'Spain', 'es', 'Barcelona->Madrid' ],
[ '+++', 'Germany', 'de', 'Berlin' ] ]
```

For visualization, you may want to convert this to a HTML table
with appropriate classes on cells so you can color-code inserts,
deletes, updates, etc. You can do this with:
```js
var diff2html = new daff.DiffRender();
diff2html.render(table_diff);
var table_diff_html = diff2html.html();
```

For 3-way differences (that is, comparing two tables given knowledge
of a common ancestor) use `daff.compareTables3` (give ancestor
table as the first argument).

Here is how to apply that difference as a patch:
```js
var patcher = new daff.HighlightPatch(table1,table_diff);
patcher.apply();
// table1 should now equal table2
```

For other languages, you should find sample code in
the packages on the [Releases](https://github.com/paulfitz/daff/releases) page.


Supported languages
-------------------

The `daff` library is written in [Haxe](http://haxe.org/), which
can be translated reasonably well into at least the following languages:

* Javascript
* PHP
* Python
* Java
* C#
* C++
* (via a hack, just for `daff`) Ruby

Some translations are done for you on the
[Releases](https://github.com/paulfitz/daff/releases) page.
To make another translation,
follow the
[Haxe getting started tutorial](http://haxe.org/doc/start) for the
language you care about, then do one of:

```
make js
make php
make py
make java
make cs
make cpp
```

[@Floppy](https://github.com/Floppy) has made a lovingly-hand-written [native Ruby port](https://github.com/theodi/coopy-ruby) that covers core functionality. I've made a brutally-machine-converted port that is a full translation but less idiomatic.

For each language, the `daff` library expects to be handed an interface to tables you create, rather than creating them
itself. This is to avoid inefficient copies from one format to another. You'll find a `SimpleTable` class you can use if
you find this awkward.

Reading material
----------------

* http://dataprotocols.org/tabular-diff-format/ : a specification of the diff format we use.
* http://theodi.org/blog/csvhub-github-diffs-for-csv-files : using this library with github.
* http://theodi.org/blog/adapting-git-simple-data : using this library with gitlab.
* http://okfnlabs.org/blog/2013/08/08/diffing-and-patching-data.html : a summary of where the library came from.
* http://blog.okfn.org/2013/07/02/git-and-github-for-data/ : a post about storing small data in git/github.
* http://blog.ouseful.info/2013/08/27/diff-or-chop-github-csv-data-files-and-openrefine/ : counterpoint - a post discussing tracked-changes rather than diffs.

## License

daff is distributed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

daff-1.1.6.tar.gz (88.9 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page