Keyword extraction with spaCy

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python
- Python :: 3.7
Topic
- Scientific/Engineering

Project description

spacy_ke: Keyword Extraction with spaCy.

⏳ Installation

pip install spacy_ke

🚀 Quickstart

Usage as a spaCy pipeline component (spaCy v2.x.x)

import spacy

from spacy_ke import Yake

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe(Yake(nlp))

doc = nlp(
    "Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence "
    "concerned with the interactions between computers and human language, in particular how to program computers "
    "to process and analyze large amounts of natural language data. "
)

for keyword, score in doc._.extract_keywords(n=3):
    print(keyword, "-", score)

# computer science - 0.020279855002262884
# NLP - 0.035016746977200745
# Natural language processing - 0.04407186487965091

Usage as a spaCy pipeline component (spaCy v3.x.x)

import spacy

from spacy.language import Language
from spacy_ke import Yake


@Language.factory(
    "yake", default_config={"window": 2, "lemmatize": False, "candidate_selection": "ngram"}
)
def yake(nlp, name, window: int, lemmatize: bool, candidate_selection: str):
    return Yake(
        nlp, window=window, lemmatize=lemmatize, candidate_selection=candidate_selection
    )

nlp = spacy.load("en_core_web_sm")
nlp.add_pipe("yake")

Configure the pipeline component

Normally you'd want to configure the keyword extraction pipeline according to its implementation.

window: int = 2 # default
lemmatize: bool = False  # default
candidate_selection: str = "ngram" # default, use "chunk" for noun phrase selection.

nlp.add_pipe(
    Yake(
        nlp, 
        window=window, # default
        lemmatize=lemmatize, # default
        candidate_selection="ngram" # default, use "chunk" for noun phrase selection
    )
)

And if you want to define a custom candidate selection use the example below.

from spacy_ke.util import registry, Candidate

@registry.candidate_selection.register("custom")
def custom_selection(doc: Doc, n=3) -> Iterable[Candidate]:
  ...

nlp.add_pipe(
    Yake(
        nlp, 
        candidate_selection="custom"
    )
)

Development

Set up pip & virtualenv

$ pipenv sync -d

Run unit test

$ pipenv run pytest

Run black (code formatter)

$ pipenv run black spacy_ke/ --config=pyproject.toml

Release package (via twine)

$ python setup.py upload

References

[1] A Review of Keyphrase Extraction

@article{DBLP:journals/corr/abs-1905-05044,
  author    = {Eirini Papagiannopoulou and
               Grigorios Tsoumakas},
  title     = {A Review of Keyphrase Extraction},
  journal   = {CoRR},
  volume    = {abs/1905.05044},
  year      = {2019},
  url       = {http://arxiv.org/abs/1905.05044},
  archivePrefix = {arXiv},
  eprint    = {1905.05044},
  timestamp = {Tue, 28 May 2019 12:48:08 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-1905-05044.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

[2] pke: an open source python-based keyphrase extraction toolkit.

@InProceedings{boudin:2016:COLINGDEMO,
  author    = {Boudin, Florian},
  title     = {pke: an open source python-based keyphrase extraction toolkit},
  booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations},
  month     = {December},
  year      = {2016},
  address   = {Osaka, Japan},
  pages     = {69--73},
  url       = {http://aclweb.org/anthology/C16-2015}
}

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python
- Python :: 3.7
Topic
- Scientific/Engineering

Release history Release notifications | RSS feed

0.1.4

Mar 28, 2021

0.1.3

Mar 26, 2021

This version

0.1.2

Mar 18, 2021

0.1.1

Mar 17, 2021

0.1.0

Mar 14, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacy_ke-0.1.2.tar.gz (13.0 kB view hashes)

Uploaded Mar 18, 2021 Source

Built Distribution

spacy_ke-0.1.2-py3-none-any.whl (27.2 kB view hashes)

Uploaded Mar 18, 2021 Python 3

Hashes for spacy_ke-0.1.2.tar.gz

Hashes for spacy_ke-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`5b8a0ccb43bbead788ae9dec098d62859317600dea37adf9acea7cf6197b8a65`
MD5	`226c7b57819ccfebb326dbfb34d3d9d0`
BLAKE2b-256	`3de82888c00d299a509be0f8054bfd9c4a36f8402312a4ef811525c790e9ced6`

Hashes for spacy_ke-0.1.2-py3-none-any.whl

Hashes for spacy_ke-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7a4acd69719d135060065c8bde430c3086287dc31784fd34bbc2b88c863e29bf`
MD5	`1359f7c39dbaf03fdb53f41c551abd55`
BLAKE2b-256	`df40fb18f1732866b81c6dc7020de602159b2fe09466c070c815dcd31d0c20be`