word-embeddings-benchmarks

An Python Library for training and evaluating on Incremental Word Embedding.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Word Embeddings Benchmarks

Updated WEB version. Original repository: https://github.com/kudkudak/word-embeddings-benchmarks

Word Embedding Benchmark (web) package is focused on providing methods for easy evaluating and reporting

results on common benchmarks (analogy, similarity and categorization).

Research goal of the package is to help drive research in word embeddings by easily accessible reproducible

results (as there is a lot of contradictory results in the literature right now).

This should also help to answer question if we should devise new methods for evaluating word embeddings.

To evaluate your embedding (converted to word2vec or python dict pickle)

on all fast-running benchmarks execute ./scripts/eval_on_all.py <path-to-file>.

See here results for embeddings available in the package.

Warnings and Disclaimers:

Analogy test does not normalize internally word embeddings.
Package is currently under development, and we expect within next few months an official release. The main issue that might hit you at the moment is rather long embeddings loading times (especially if you use fetchers).

Please also refer to our recent publication on evaluation methods https://arxiv.org/abs/1702.02170.

Features:

scikit-learn API and conventions
18 popular datasets
11 word embeddings (word2vec, HPCA, morphoRNNLM, GloVe, LexVec, ConceptNet, HDC/PDC and others)
methods to solve analogy, similarity and categorization tasks

Included datasets:

TR9856
WordRep
Google Analogy
MSR Analogy
SemEval2012
AP
BLESS
Battig
ESSLI (2b, 2a, 1c)
WS353
MTurk
RG65
RW
SimLex999
MEN

Note: embeddings are not hosted currently on a proper server, if the download is too slow consider downloading embeddings manually from original sources referred in docstrings.

Dependencies

Please see requirements.txt.

Install

This package uses setuptools. You can install it running:

python setup.py install

If you have problems during this installation. First you may need to install the dependencies:

pip install -r requirements.txt

If you already have the dependencies listed in requirements.txt installed,

to install in your home directory, use:

python setup.py install --user

To install for all users on Unix/Linux:

python setup.py build

sudo python setup.py install

You can also install it in development mode with:

python setup.py develop

Examples

See examples folder.

License

Code is licensed under MIT, however available embeddings distributed within package might be under different license. If you are unsure please reach to authors (references are included in docstrings)

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.0.1

Feb 18, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

word-embeddings-benchmarks-0.0.1.tar.gz (41.4 kB view hashes)

Uploaded Feb 18, 2023 Source

Built Distribution

word_embeddings_benchmarks-0.0.1-py3-none-any.whl (42.5 kB view hashes)

Uploaded Feb 18, 2023 Python 3

Hashes for word-embeddings-benchmarks-0.0.1.tar.gz

Hashes for word-embeddings-benchmarks-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`085c9e803ca6921202361541a351fb890861137461bd39cc0ca2f2e0b2f87cb9`
MD5	`8a8192df0d44c7e27c48d6b1ca4a8feb`
BLAKE2b-256	`53f1585d92f2a8276dc9a6fd4daba86ee4755f2d2aef073a3bf8f1ea56c27d50`

Hashes for word_embeddings_benchmarks-0.0.1-py3-none-any.whl

Hashes for word_embeddings_benchmarks-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4813edf2ac47aa535fbf204320014b6a7ec3c02aa54765ae62edc3ba41662a8f`
MD5	`d563354205a275dde730d5b1630216a2`
BLAKE2b-256	`d331947a46db86268f57d7772c640d470f3de5c276c3833b6fc6082750a008a4`