Skip to main content

Word2Vec implentation with Tensorflow Estimators and Datasets

Project description

Word2Vec

GitHub release PyPI release Build MIT License

This is a re-implementation of Word2Vec relying on Tensorflow Estimators and Datasets.

Works with python >= 3.6 and Tensorflow v2.0.

Install

via pip:

pip3 install tf-word2vec

or, after a git clone:

python3 setup.py install

Get data

You can download a sample of the English Wikipedia here:

wget http://129.194.21.122/~kabbach/enwiki.20190120.sample10.0.balanced.txt.7z

Train Word2Vec

w2v train \
  --data /absolute/path/to/enwiki.20190120.sample10.0.balanced.txt \
  --outputdir /absolute/path/to/word2vec/models \
  --alpha 0.025 \
  --neg 5 \
  --window 2 \
  --epochs 5 \
  --size 300 \
  --min-count 50 \
  --sample 1e-5 \
  --train-mode skipgram \
  --t-num-threads 20 \
  --p-num-threads 25 \
  --keep-checkpoint-max 3 \
  --batch 1 \
  --shuffling-buffer-size 10000 \
  --save-summary-steps 10000 \
  --save-checkpoints-steps 100000 \
  --log-step-count-steps 10000

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tf-word2vec-1.0.7.tar.gz (31.6 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page