No project description provided
Project description
Finding Duplicate Images
Finds equal or similar images in a directory containing (many) image files.
Needs Python3 and Pillow imaging library to run, additionally Wand for the test suite.
Uses Poetry for dependency management.
Usage
$ pip install duplicate_images
$ find-dups -h
<OR JUST>
$ find-dups $IMAGE_ROOT
Image comparison algorithms
Use the --algorithm
option to select how equal images are found.
exact
: marks only binary exactly equal files as equal. This is by far the fasted, but most restricted algorithm.histogram
: checks the images' color histograms for equality. Faster than the image hashing algorithms, but tends to give a lot of false positives for images that are similar, but not equal. Use the--fuzziness
and--aspect-fuzziness
options to fine-tune its behavior.ahash
,colorhash
,dhash
andphash
: four different image hashing algorithms. See https://pypi.org/project/ImageHash for an introduction on image hashing and https://tech.okcupid.com/evaluating-perceptual-image-hashes-okcupid for some gory details which image hashing algorithm performs best in which situation. For a start I recommendahash
.
Development
Installation
From source:
$ git clone https://gitlab.com/lilacashes/DuplicateImages.git
$ cd DuplicateImages
$ pip3 install poetry
$ poetry install
Running
$ poetry run find-dups $PICTURE_DIR
or
$ poetry run find-dups -h
for a list of all possible options.
Testing
Running:
$ poetry run mypy duplicate_images tests
$ poetry run flake8
$ poetry run pytest
Publishing
$ poetry build
$ poetry publish --username $PYPI_USER --password $PYPI_PASSWORD --repository testpypi
$ poetry publish --username $PYPI_USER --password $PYPI_PASSWORD
Profiling
CPU time
To show the top functions by time spent, including called functions:
$ poetry run python -m cProfile -s tottime ./duplicate_images/duplicate.py \
--algorithm $ALGORITHM --action-equal none $IMAGE_DIR 2>&1 | head -n 15
or, to show the top functions by time spent in the function alone:
$ poetry run python -m cProfile -s cumtime ./duplicate_images/duplicate.py \
--algorithm $ALGORITHM --action-equal none $IMAGE_DIR 2>&1 | head -n 15
Memory usage
$ poetry run fil-profile run ./duplicate_images/duplicate.py \
--algorithm $ALGORITHM --action-equal none $IMAGE_DIR 2>&1
This will open a browser window showing the functions using the most memory (see https://pypi.org/project/filprofiler for more details).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
duplicate_images-0.2.0.tar.gz
(7.3 kB
view hashes)
Built Distribution
Close
Hashes for duplicate_images-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 907aab6ec2ae59bb5b6f0e58f5eb83f7540f715dbfc60b39466ec5f02da294c1 |
|
MD5 | 1da658abd64360e4c64684f2126a184c |
|
BLAKE2b-256 | cec6ecd3f0d36d273e7f3532ce2d807254ebda26f5bd19b583eb6363cfec69c9 |