Metric Learning for Humans

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 5 - Production/Stable
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: Apache Software License
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

TensorFlow Similarity: Metric Learning for Humans

TensorFlow Similarity is a TensorFlow library for similarity learning also known as metric learning and contrastive learning.

TensorFlow Similarity is still in beta.

Introduction

Tensorflow Similarity offers state-of-the-art algorithms for metric learning and all the necessary components to research, train, evaluate, and serve similarity-based models.

Example of nearest neighbors search performed on the embedding generated by a similarity model trained on the Oxford IIIT Pet Dataset.

With TensorFlow Similarity you can train and serve models that find similar items (such as images) in a large corpus of examples. For example, as visible above, you can train a similarity model to find and cluster similar looking images of cats and dogs from the Oxford IIIT Pet Dataset by only training on a few classes. To train your own similarity model see this notebook.

Metric learning is different from traditional classification as it's objective is different. The model learns to minimize the distance between similar examples and maximize the distance between dissimilar examples, in a supervised or self-supervised fashion. Either way, TensorFlow Similarity provides the necessary losses, metrics, samplers, visualizers, and indexing sub-system to make this quick and easy.

Currently, TensorFlow Similarity supports supervised training. In future releases, it will support semi-supervised and self-supervised training.

To learn more about the benefits of using similarity training, you can check out the blog post.

What's new

[Oct 8]: Added Samplers.* IO notebook detailing how to efficently sample your data for succesful training.
[Oct 8]: 0.14 is out which includes various speed improvements and post initial release bug fixes.

For previous changes - see the release changelog

Getting Started

Installation

Use pip to install the library.

NOTE: The Tensorflow extra_require key can be omitted if you already have tensorflow>=2.4 installed.

pip install --upgrade-strategy=only-if-needed tensorflow_similarity[tensorflow]

Documentation

The detailed and narrated notebooks are a good way to get started with TensorFlow Similarity. There is likely to be one that is similar to your data or your problem (if not, let us know). You can start working with the examples immediately in Google Colab by clicking the Google Colab icon.

For more information about specific functions, you can check the API documentation

For contributing to the project please check out the contribution guidelines

Minimal Example: MNIST similarity

Here is a bare bones example demonstrating how to train a TensorFlow Similarity model on the MNIST data. This example illustrates some of the main components provided by TensorFlow Similarity and how they fit together. Please refer to the hello_world notebook for a more detailed introduction.

Preparing data

TensorFlow Similarity provides data samplers, for various dataset types, that balance the batches to ensure smoother training. In this example, we are using the multi-shot sampler that integrate directly from the TensorFlow dataset catalog.

from tensorflow_similarity.samplers import TFDatasetMultiShotMemorySampler

# Data sampler that generates balanced batches from MNIST dataset
sampler = TFDatasetMultiShotMemorySampler(dataset_name='mnist', classes_per_batch=10)

Building a Similarity model

Building a TensorFlow Similarity model is similar to building a standard Keras model, except the output layer is usually a MetricEmbedding() layer that enforces L2 normalization and the model is instantiated as a specialized subclass SimilarityModel() that supports additional functionality.

from tensorflow.keras import layers
from tensorflow_similarity.layers import MetricEmbedding
from tensorflow_similarity.models import SimilarityModel

# Build a Similarity model using standard Keras layers
inputs = layers.Input(shape=(28, 28, 1))
x = layers.experimental.preprocessing.Rescaling(1/255)(inputs)
x = layers.Conv2D(64, 3, activation='relu')(x)
x = layers.Flatten()(x)
x = layers.Dense(64, activation='relu')(x)
outputs = MetricEmbedding(64)(x)

# Build a specialized Similarity model
model = SimilarityModel(inputs, outputs)

Training model via contrastive learning

To output a metric embedding, that are searchable via approximate nearest neighbor search, the model needs to be trained using a similarity loss. Here we are using the MultiSimilarityLoss(), which is one of the most efficient loss functions.

from tensorflow_similarity.losses import MultiSimilarityLoss

# Train Similarity model using contrastive loss
model.compile('adam', loss=MultiSimilarityLoss())
model.fit(sampler, epochs=5)

Building images index and querying it

Once the model is trained, reference examples must indexed via the model index API to be searchable. After indexing, you can use the model lookup API to search the index for the K most similar items.

from tensorflow_similarity.visualization import viz_neigbors_imgs

# Index 100 embedded MNIST examples to make them searchable
sx, sy = sampler.get_slice(0,100)
model.index(x=sx, y=sy, data=sx)

# Find the top 5 most similar indexed MNIST examples for a given example
qx, qy = sampler.get_slice(3713, 1)
nns = model.single_lookup(qx[0])

# Visualize the query example and its top 5 neighbors
viz_neigbors_imgs(qx[0], qy[0], nns)

Supported Algorithms

Supervised Losses

Triplet Loss
PN Loss
Multi Sim Loss
Circle Loss

Metrics

Tensorflow Similarity offers many of the most common metrics used for classification and retrieval evaluation. Including:

Name	Type	Description
Precision	Classification
Recall	Classification
F1 Score	Classification
Recall@K	Retrieval
Binary NDCG	Retrieval

Citing

Please cite this reference if you use any part of TensorFlow similarity in your research:

@article{EBSIM21,
  title={TensorFlow Similarity: A Usuable, High-Performance Metric Learning Library},
  author={Elie Bursztein, James Long, Shun Lin, Owen Vallis, Francois Chollet},
  journal={Fixme},
  year={2021}
}

Disclaimer

This is not an official Google product.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 5 - Production/Stable
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: Apache Software License
Programming Language
- Python :: 3
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

0.17.1

May 31, 2023

0.16.10

Jan 16, 2023

0.16.9

Dec 2, 2022

0.16.8

Sep 23, 2022

0.16.7

Jul 21, 2022

0.16.6

Jun 28, 2022

0.16.5

Jun 16, 2022

0.16.4

Jun 13, 2022

0.16.3

Jun 9, 2022

0.16.2

Jun 8, 2022

0.16.0

May 28, 2022

0.15.8

Apr 2, 2022

0.15.7

Mar 20, 2022

0.15.6

Mar 15, 2022

0.15.5

Feb 28, 2022

0.15.4

Feb 23, 2022

0.15.3

Feb 23, 2022

0.15.2

Jan 21, 2022

0.15.1

Jan 11, 2022

0.15.0

Jan 11, 2022

0.14.11

Jan 3, 2022

0.14.10

Dec 30, 2021

This version

0.14.9

Dec 22, 2021

0.14.8

Oct 20, 2021

0.14.7

Oct 19, 2021

0.14.6

Oct 19, 2021

0.14.5

Oct 19, 2021

0.14.4

Oct 19, 2021

0.14.3

Oct 19, 2021

0.14.2

Oct 10, 2021

0.14.1

Oct 10, 2021

0.14

Oct 9, 2021

0.13.45

Sep 29, 2021

0.13.44

Sep 27, 2021

0.13.43

Sep 25, 2021

0.13.42

Sep 24, 2021

0.13.41

Sep 24, 2021

0.13.40

Sep 23, 2021

0.13.39

Sep 23, 2021

0.13.38

Sep 23, 2021

0.13.37

Sep 23, 2021

0.13.36

Sep 21, 2021

0.13.35

Sep 21, 2021

0.13.34

Sep 21, 2021

0.13.33

Sep 21, 2021

0.13.32

Sep 20, 2021

0.13.31

Sep 20, 2021

0.13.30

Sep 20, 2021

0.13.29

Sep 16, 2021

0.13.28

Sep 16, 2021

0.13.27

Sep 15, 2021

0.13.26

Sep 13, 2021

0.13.25

Sep 13, 2021

0.13.24

Sep 13, 2021

0.13.23

Sep 13, 2021

0.13.22

Sep 13, 2021

0.13.21

Sep 12, 2021

0.13.20

Sep 2, 2021

0.13.18

Sep 2, 2021

0.13.17

Sep 2, 2021

0.13.15

Sep 2, 2021

0.13.14

Sep 1, 2021

0.13.13

Sep 1, 2021

0.13.12

Sep 1, 2021

0.13.11

Sep 1, 2021

0.13.10

Aug 31, 2021

0.13.9

Aug 19, 2021

0.13.8

Aug 19, 2021

0.13.7

Aug 17, 2021

0.13.6

Aug 13, 2021

0.13.5

Aug 13, 2021

0.13.4.1

Aug 6, 2021

0.13.4

Aug 3, 2021

0.13.3

Aug 3, 2021

0.13.2

Aug 2, 2021

0.13.1

Jul 30, 2021

0.13.0

Jul 29, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tensorflow_similarity-0.14.9.tar.gz (102.8 kB view hashes)

Uploaded Dec 22, 2021 Source

Built Distribution

tensorflow_similarity-0.14.9-py3-none-any.whl (163.7 kB view hashes)

Uploaded Dec 22, 2021 Python 3

Hashes for tensorflow_similarity-0.14.9.tar.gz

Hashes for tensorflow_similarity-0.14.9.tar.gz
Algorithm	Hash digest
SHA256	`1af2e30e1280ab04a6cccc54d693a1b9cd114ef767398c6ed8a336146a17b7ce`
MD5	`3ebe12ae85c8ec7c899f45edcb028cf6`
BLAKE2b-256	`784609eebe66884f612ca46760caa08947c748c016c83e1f5fc32ce8266c6348`

Hashes for tensorflow_similarity-0.14.9-py3-none-any.whl

Hashes for tensorflow_similarity-0.14.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2f18c097f261dd5fb91a94deab95e66395735b5aed8de5538883c26385cf91b7`
MD5	`79cb0621586394cd5cccc0b00cfcefe1`
BLAKE2b-256	`fcde62356046cfcd93a8a1fe48be4142caeee3043a18d23e821a3ac80e511453`