Coreference resolution with e2e for Dutch

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Environment
- Console
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Python package

e2e-Dutch

Code for e2e coref model in Dutch. The code is based on the original e2e model for English, and modified to work for Dutch. If you make use of this code, please cite it and also cite the original e2e paper.

Installation

Requirements:

Python 3.6 or 3.7
pip

In this repository, run:

pip install -r requirements.txt
./scripts/setup_all.sh
pip install .

The setup_all script downloads the word vector files to the data directories. It also builds the application-specific tensorflow kernels.

Quick start

A pretrained model is available to download:

python -m e2edutch.download

This downloads the model files, the default location is the data directory inside the python package location. It can also be set manually by specifying the enviornment vairable E2E_HOME or through the config file (see below).

The pretrained model can be used to predict coreferences on a conll 2012 files, jsonlines files, NAF files or plain text files (in the latter case, the nltk package will be used for tokenization).

python -m e2edutch.predict [-h] [-o OUTPUT_FILE] [-f {conll,jsonlines,naf}]
                  [-c WORD_COL] [--cfg_file CFG_FILE] [-v]
                  config input_filename

positional arguments:
  config: name of the model to use for prediction ('final' for the pretrained)
  input_filename

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUT_FILE, --output_file OUTPUT_FILE
  -f {conll,jsonlines,naf}, --format_out {conll,jsonlines,naf}
  -c WORD_COL, --word_col WORD_COL
  --cfg_file CFG_FILE   config file
  -v, --verbose

The user-specific configurations (such as data directory, data files, etc) can be provided in a separate config file, the defaults are specified in cfg/defaults.conf.

Train your own model

To train a new model:

Make sure the model config file (default: e2edutch/cfg/models.conf) describes the model you wish to train
Make sure your config file (default: e2edutch/cfg/defaults.conf) includes the data files you want to use for training
Run scripts/setup_train.sh e2edutch/cfg/defaults.conf. This script converts the conll2012 data to jsonlines files, and caches the word and contextualized embeddings.
If you want to enable the use of a GPU, set the environment variable:

export GPU=0

Run the training script:

python -m e2edutch.train <model-name>

Citing this code

If you use this code in your research, please cite it as follows:

@misc{YourReferenceHere,
author = {
            Dafne van Kuppevelt and
            Jisk Attema
         },
title  = {e2e-Dutch},
doi    = {10.5281/zenodo.4146960},
url    = {https://github.com/Filter-Bubble/e2e-Dutch}
}

As the code is largely based on original e2e model for English, please make sure to also cite the original e2e paper.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Environment
- Console
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.4.1

Sep 27, 2021

0.4.0

Jan 28, 2021

This version

0.3.1

Jan 18, 2021

0.3.0

Jan 18, 2021

0.2.0

Jan 18, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

e2e-Dutch-0.3.1.tar.gz (28.0 kB view hashes)

Uploaded Jan 18, 2021 Source

Built Distributions

e2e_Dutch-0.3.1-py3.8.egg (116.6 kB view hashes)

Uploaded Jan 28, 2021 Source

e2e_Dutch-0.3.1-py3-none-any.whl (72.2 kB view hashes)

Uploaded Jan 18, 2021 Python 3

Hashes for e2e-Dutch-0.3.1.tar.gz

Hashes for e2e-Dutch-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`2a28edc8c2a3488f2eeba610925c9a6369eb57d58899856094b0d87079fbbabb`
MD5	`e9a8a6e46a3c67ffc3d2dc84d438eefb`
BLAKE2b-256	`19a31601861b57b4b63ba249842544aab523edbdd393daf5af281523d6ec1914`

Hashes for e2e_Dutch-0.3.1-py3.8.egg

Hashes for e2e_Dutch-0.3.1-py3.8.egg
Algorithm	Hash digest
SHA256	`66e4128c1761799b6d54273996374bd9cde73282c1210914efeac7efb3826a8d`
MD5	`94d6f899c64381b8f294a3318bce7187`
BLAKE2b-256	`c96e8e8c2a0a12d6fd64ca4b916d3346626f3ca33b3eb5ac334fe741925cda35`

Hashes for e2e_Dutch-0.3.1-py3-none-any.whl

Hashes for e2e_Dutch-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`281ab7a48fe9e46833d9662b18435e152a10f2b518dfed69369e3708ad697e71`
MD5	`cd31b53fdc182a5dcbf16cd4cd06fb7f`
BLAKE2b-256	`be7a60dc5589cf0582b3c60f153211737dcfc183c95c378ac7f4fd25045f064a`