Skip to main content

A library for parsing multinational street addresses using deep learning.

Project description

License: LGPL v3 Continuous Integration

Here is deepparse.

Deepparse is a state-of-the-art library for parsing multinational street addresses using deep learning.

Use deepparse to:

  • Use the pre-trained models to parse multinational addresses.
  • Retrain our pre-trained models on new data to parse multinational addresses.

Read the documentation at deepparse.org.

Deepparse is compatible with the latest version of PyTorch and Python >= 3.6.

Countries and Results

The following table presents the accuracy on the 20 countries we used during training for both our models.

Country Fasttext (%) BPEmb (%) Country Fasttext (%) BPEmb (%)
Italy 99.66 99.73 United States 99.56 99.53
Germany 99.72 99.84 Austria 99.19 99.03
South Korea 99.96 100.00 Canada 99.76 99.80
Mexico 99.54 99.60 Australia 99.62 99.74
Finland 99.75 99.87 Netherlands 99.50 99.84
France 99.54 99.50 United Kingdom 99.54 99.62
Russia 98.71 99.49 Norway 99.40 98.71
Switzerland 99.48 99.61 Poland 99.64 99.83
Brazil 99.33 99.24 Denmark 99.65 99.84
Spain 99.70 99.79 Czechia 99.46 99.83

We have also made a zero-shot evaluation of our models using data from 41 other countries; the results are shown in the next table.

Country Fasttext (%) BPEmb (%) Country Fasttext (%) BPEmb (%)
Philippines 81.56 83.73 South Africa 92.69 95.03
Colombia 85.92 87.50 Venezuela 95.36 89.67
Bermuda 91.30 93.66 Lithuania 89.21 76.60
Moldova 88.51 89.13 India 66.91 77.26
Malaysia 81.31 92.78 Bosnia 88.91 84.33
Belgium 89.57 86.41 Ukraine 91.80 92.73
Greece 83.42 39.82 Algeria 86.93 80.62
Slovakia 81.00 91.28 Bangladesh 74.49 79.29
Latvia 93.80 80.18 Reunion 96.48 93.40
Romania 93.23 91.83 Singapore 84.55 81.68
Indonesia 63.15 67.97 Cyprus 97.69 98.30
Portugal 93.39 93.20 Serbia 95.62 94.69
Croatia 96.63 86.24 Japan 44.33 35.77
New Caledonia 99.42 99.01 New Zealand 97.04 98.86
Uzbekistan 87.63 71.93 Faroe Islands 71.73 85.46
Hungary 47.00 24.05 Slovenia 96.27 97.28
Paraguay 97.00 97.15 Iceland 95.76 98.01
Estonia 90.61 76.45 Argentina 89.47 88.55
Bulgaria 92.70 95.87 Sweden 77.29 87.77
Belarus 88.77 93.00 Kazakhstan 87.24 91.23
Ireland 86.35 87.49

Getting started:

from deepparse.parser import AddressParser

address_parser = AddressParser(model_type="bpemb", device=0)

# you can parse one address
parsed_address = address_parser("350 rue des Lilas Ouest Québec Québec G1L 1B6")

# or multiple addresses
parsed_address = address_parser(
    ["350 rue des Lilas Ouest Québec Québec G1L 1B6", "350 rue des Lilas Ouest Québec Québec G1L 1B6"])

# you can also get the probability of the predicted tags
parsed_address = address_parser("350 rue des Lilas Ouest Québec Québec G1L 1B6", with_prob=True)

Retrain a model

see here for a complete example.

# We will retrain the fasttext version of our pretrained model.
address_parser = AddressParser(model_type="fasttext", device=0)

address_parser.retrain(training_container, 0.8, epochs=5, batch_size=8)

Installation

Before installing deepparse, you must have the latest version of PyTorch in your environment.

  • Install the stable version of deepparse:
pip install deepparse
  • Install the latest development version of deepparse:
pip install -U git+https://github.com/GRAAL-Research/deepparse.git@dev

Cite

Use the following for the article;

@misc{yassine2020leveraging,
    title={{Leveraging Subword Embeddings for Multinational Address Parsing}},
    author={Marouane Yassine and David Beauchemin and François Laviolette and Luc Lamontagne},
    year={2020},
    eprint={2006.16152},
    archivePrefix={arXiv}
}

and this one for the package;

@misc{deepparse,
    author = {Marouane Yassine and David Beauchemin},
    title  = {{Deepparse: A state-of-the-art deep learning multinational addresses parser}},
    year   = {2020},
    note   = {\url{https://deepparse.org}}
}

Contributing to Deepparse

We welcome user input, whether it is regarding bugs found in the library or feature propositions ! Make sure to have a look at our contributing guidelines for more details on this matter.

License

Deepparse is LGPLv3 licensed, as found in the LICENSE file.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deepparse-0.2.3.tar.gz (52.5 kB view hashes)

Uploaded Source

Built Distribution

deepparse-0.2.3-py3-none-any.whl (82.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page