Easily computing clip embeddings and building a clip retrieval system with them

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.6
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

clip-retrieval

Easily computing clip embeddings and building a clip retrieval system with them.

clip batch allows you to quickly (1500 sample/s on a 3080) compute image and text embeddings and indices
clip filter allows you to filter out the data using the clip embeddings
clip back hosts the indices with a simple flask service
clip service is a simple ui querying the back

End to end this make it possible to build a simple semantic search system. Interested to learn about semantic search in general ? You can read by medium post on the topic.

Install

pip install clip-retrieval

clip batch

Get some images in an example_folder, for example by doing:

pip install img2dataset
echo 'https://placekitten.com/200/305' >> myimglist.txt
echo 'https://placekitten.com/200/304' >> myimglist.txt
echo 'https://placekitten.com/200/303' >> myimglist.txt
img2dataset --url_list=myimglist.txt --output_folder=image_folder --thread_count=64 --image_size=256

You can also put text files with the same names as the images in that folder, to get the text embeddings.

Then run clip-retrieval batch --dataset_path image_folder --output_folder indice_folder

Output folder will contain:

description_list containing the list of caption line by line
image_list containing the file path of images line by line
img_emb.npy containing the image embeddings as numpy
text_emb.npy containing the text embeddings as numpy
image.index containing a brute force faiss index for images
text.index containing a brute force faiss index for texts

Clip filter

Once the embeddings are computed, you may want to filter out the data by a specific query. For that you can run clip-retrieval filter --query "cat" --output_folder "cat/" --indice_folder "indice_folder" It will copy the 100 best images for this query in the output folder. Using the --num_results or --threshold may be helpful to refine the filter

Clip back

Then run (output_folder is the output of clip batch)

echo '{"example_index": "output_folder"}' > indices_paths.json
clip-retrieval back --port 1234 --indices-paths indices_paths.json

At this point you have a simple flask server running on port 1234 and that can answer these queries:

/indices-list -> return a list of indices
/knn-service that takes as input:

{
    "text": "a text query",
    "image": "a base64 image",
    "modality": "image", // image or text index to use
    "num_images": 4, // number of output images
    "indice_name": "example_index"
}

and returns:

[
    {
        "image": "base 64 of an image",
        "text": "some result text"
    },
    {
        "image": "base 64 of an image",
        "text": "some result text"
    }
]

For development

Either locally, or in gitpod (do export PIP_USER=false there)

Setup a virtualenv:

python3 -m venv .env
source .env/bin/activate
pip install -U pip
pip install -e .

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3.6
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

2.44.0

Jan 13, 2024

2.43.0

Jan 13, 2024

2.42.0

Jan 12, 2024

2.41.0

Jan 11, 2024

2.40.0

Jan 6, 2024

2.39.0

Jan 6, 2024

2.38.0

Jan 6, 2024

2.37.0

May 29, 2023

2.36.1

Jan 27, 2023

2.36.0

Jan 27, 2023

2.35.1

Nov 4, 2022

2.35.0

Oct 25, 2022

2.34.2

Jul 20, 2022

2.34.1

Jul 14, 2022

2.34.0

Jun 20, 2022

2.33.0

Jun 15, 2022

2.32.0

Jun 2, 2022

2.31.1

May 21, 2022

2.31.0

May 21, 2022

2.30.0

Apr 27, 2022

2.29.1

Apr 10, 2022

2.29.0

Apr 10, 2022

2.28.0

Mar 26, 2022

2.27.0

Mar 26, 2022

2.26.0

Mar 16, 2022

2.25.4

Mar 13, 2022

2.25.3

Mar 13, 2022

2.25.2

Mar 13, 2022

2.25.1

Mar 13, 2022

2.25.0

Mar 12, 2022

2.24.10

Mar 4, 2022

2.24.9

Mar 4, 2022

2.24.8

Mar 4, 2022

2.24.7

Feb 26, 2022

2.24.6

Feb 26, 2022

2.24.5

Feb 22, 2022

2.24.4

Feb 21, 2022

2.24.2

Feb 21, 2022

2.24.1

Feb 20, 2022

2.24.0

Feb 20, 2022

2.23.3

Feb 19, 2022

2.23.2

Feb 19, 2022

2.23.1

Feb 18, 2022

2.23.0

Feb 18, 2022

2.22.0

Feb 6, 2022

2.21.0

Dec 25, 2021

2.20.0

Dec 15, 2021

2.19.1

Nov 30, 2021

2.19.0

Nov 30, 2021

2.18.0

Nov 27, 2021

2.17.0

Nov 26, 2021

2.16.2

Nov 25, 2021

2.16.1

Nov 22, 2021

2.16.0

Nov 22, 2021

2.15.1

Nov 19, 2021

2.15.0

Nov 19, 2021

2.14.3

Nov 4, 2021

2.14.2

Nov 4, 2021

2.14.1

Nov 3, 2021

2.14.0

Nov 2, 2021

2.13.1

Oct 12, 2021

2.13.0

Oct 10, 2021

2.12.0

Sep 26, 2021

2.11.2

Sep 21, 2021

2.11.1

Sep 21, 2021

2.11.0

Sep 15, 2021

2.10.0

Sep 15, 2021

2.9.2

Sep 11, 2021

2.9.1

Sep 11, 2021

2.9.0

Sep 11, 2021

2.8.1

Sep 11, 2021

2.8.0

Sep 11, 2021

2.7.1

Sep 10, 2021

2.7.0

Sep 10, 2021

2.6.0

Sep 10, 2021

2.5.0

Sep 9, 2021

2.4.0

Sep 9, 2021

2.3.0

Sep 8, 2021

2.2.0

Sep 7, 2021

2.1.0

Sep 7, 2021

2.0.4

Sep 5, 2021

2.0.3

Sep 3, 2021

2.0.2

Sep 3, 2021

2.0.1

Sep 3, 2021

2.0.0

Sep 3, 2021

This version

1.0.1

Aug 11, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clip_retrieval-1.0.1.tar.gz (7.8 kB view hashes)

Uploaded Aug 11, 2021 Source

Built Distribution

clip_retrieval-1.0.1-py3-none-any.whl (10.2 kB view hashes)

Uploaded Aug 11, 2021 Python 3

Hashes for clip_retrieval-1.0.1.tar.gz

Hashes for clip_retrieval-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`bc59afb51f67311a42ba5190f92a7ca1177d7fe901d0107c86b24c214046fbc6`
MD5	`5e639c59c2705a81deeec4b95bf3f366`
BLAKE2b-256	`04fe7ba7d62167e0665bff20203000ffb9f2290d922bde86b23db617bbb70935`

Hashes for clip_retrieval-1.0.1-py3-none-any.whl

Hashes for clip_retrieval-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4deb15f2db13ecc9442be65ed5b0d4c5af098b3069efa10f264f77f692517e3a`
MD5	`166f952d8420576c30fcc78b7f2721b7`
BLAKE2b-256	`7e2a3e4fff288520197d6c8af2d04c9b308daaff8b775e4591aa61aa1136ecb0`