Project description

dependency-paraphraser

A sentence paraphraser based on dependency parsing and word embedding similarity.

How the paraphraser works:

Create a random projection of the dependency tree
Replace several words with similar ones

The basic usage (for Russian language) is based on Natasha library:

pip install dependency-paraphraser natasha

import dependency_paraphraser.natasha
import random
random.seed(42)
text = 'каждый охотник желает знать где сидит фазан'
for i in range(3):
    print(dependency_paraphraser.natasha.paraphrase(text, tree_temperature=2))
# желает знать сидит фазан где каждый охотник
# каждый охотник желает знать где фазан сидит
# знать где фазан сидит каждый охотник желает

You can provide your own w2v model to replace words with similar ones:

import compress_fasttext
small_model = compress_fasttext.models.CompressedFastTextKeyedVectors.load(
    'https://github.com/avidale/compress-fasttext/releases/download/v0.0.1/ft_freqprune_100K_20K_pq_100.bin'
)
random.seed(42)
for i in range(3):
    print(dependency_paraphraser.natasha.paraphrase(text, w2v=small_model, p_rep=0.8, min_sim=0.55))
# стремится каждый охотник знать рябчик где усаживается
# каждый охотник хочет узнать фазан где просиживает
# каждый охотник хочет узнать фазан где восседает

Alternatively, you can expand and use the w2v model from Natasha (aka navec):

navec_model = dependency_paraphraser.natasha.emb.as_gensim
random.seed(42)
for i in range(3):
    print(dependency_paraphraser.natasha.paraphrase(text, w2v=navec_model, p_rep=0.5, min_sim=0.55))
# желает каждый охотник помнить фазан где лежит
# каждый охотник желает знать фазан где сидит
# каждый охотник оставляет понять где фазан лежит

For other languages, one way to use this paraphraser is with the UDPipe library

pip install dependency-paraphraser ufal.udpipe pyconll

import dependency_paraphraser.udpipe
path = 'english-ewt-ud-2.5-191206.udpipe'
pipe = dependency_paraphraser.udpipe.Model(path)
projector = dependency_paraphraser.udpipe.en_udpipe_projector

text = 'in April 2012 they released the videoclip for a new single entitled Giorgio Mastrota'
for i in range(3):
    print(dependency_paraphraser.udpipe.paraphrase(text, pipe, projector=projector, tree_temperature=1))
# they released the videoclip in April 2012 for a new entitled Mastrota single Giorgio
# they released in April 2012 the videoclip for a entitled single new Giorgio Mastrota
# they released the videoclip in April 2012 for a new single Giorgio Mastrota entitled

Projectors (models for projecting dependency trees into a flat sentence) can be trained for any language, if you have a corpus of unlabeled sentences and a syntax parser to label them:

import dependency_paraphraser.udpipe
import dependency_paraphraser.train_projector
parser = dependency_paraphraser.udpipe.Model(path_to_your_model)

sents = dependency_paraphraser.train_projector.label_udpipe_sentences(
    texts=your_corpus,
    model=parser,
)
projector = dependency_paraphraser.train_projector.train_projector(sents)

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.0.5

Sep 21, 2021

0.0.4

Jun 23, 2021

0.0.3

May 16, 2020

0.0.2

May 16, 2020

0.0.1

May 16, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dependency-paraphraser-0.0.5.tar.gz (58.8 kB view hashes)

Uploaded Sep 21, 2021 Source

Hashes for dependency-paraphraser-0.0.5.tar.gz

Hashes for dependency-paraphraser-0.0.5.tar.gz
Algorithm	Hash digest
SHA256	`3697cdbc600ca5f77edbe5b5698ea1748bb01d9463d890d9041083868f9d253a`
MD5	`f6e6e01f8ac1d4526c4034bdaab4609c`
BLAKE2b-256	`fec6bc1b4e7faf1196dfe278c3bd778df7991ee8222230de803b65e29af7f8cb`