fingerprint 0.1.0
Document fingerprint generator
What is it?
"fingerprint" module generates fingerprints of a document.
Fingerprint!!!! :(, What is that?
Fingerprint is like the signature of the document. More specifically, in our context, it is the subset of hash values calculated from the document.
Okay i now know about it a bit, tell me how it calculates fingerprints?
Generation of fingerprints of a document is a three stage process;
- (1st phase) generates the k-grams from the standard string
- (2nd phase) generates the hash values for each k-gram using rolling hash function
- (3rd phase) generates the fingerprints from the hash values using winnowing
How can i install it on my machine?
You can install it in basically two ways;
- using source
- git clone git@github.com:kailashbuki/fingerprint.git
- cd fingerprint
- sudo python setup.py install
- using pip
- sudo pip install fingerprint
Hmm! ... How can i use it?
It's plain simple. Here's an example for you;
from fingerprint.fingerprintgenerator import file_content_refiner, FingerprintGenerator
# You could get the standard string from a document as;
s = file_content_refiner("path/to/file")
# OR you could directly pass the standard string if you have
s = "some sample string"
fpg = FingerprintGenerator(input_string=s)
fpg.generate_fingerprints()
print fpg.fingerprints
>>Feel free to contact at kailash<DOT>buki<AT>gmail<DOT>com (kailash.buki@gmail.com)<<
| File | Type | Py Version | Uploaded on | Size | # downloads |
|---|---|---|---|---|---|
| fingerprint-0.1.0.tar.gz (md5) | Source | 2011-07-20 | 4KB | 255 | |
- Author: Kailash Budhathoki
- Home Page: http://github.com/kailashbuki/fingerprint
- License: LICENSE.txt
- Platform: any
- Categories
- Package Index Owner: kailashbuki
- DOAP record: fingerprint-0.1.0.xml
