Wrappers for including pre-trained transformers in spaCy pipelines
Project description
spaCy-wrap: For Wrapping fine-tuned transformers in spaCy pipelines
spaCy-wrap is a minimal library intended for wrapping fine-tuned transformers from the Huggingface model hub in your spaCy pipeline allowing the inclusion of existing models within SpaCy workflows.
As for as possible it follows a similar API as spacy-transformers.
Installation
Installing spacy-wrap is simple using pip:
pip install spacy_wrap
Examples
The following shows a simple example of how you can quickly add a fine-tuned transformer model from the Huggingface model hub for either text classification, named entity or token classification.
Sequence Classification
In this example, we will use a model fine-tuned for sentiment classification on SST2. This model classifies whether a text is positive or negative. We will add this model to a blank English pipeline:
import spacy
import spacy_wrap
nlp = spacy.blank("en")
config = {
"doc_extension_trf_data": "clf_trf_data", # document extention for the forward pass
"doc_extension_prediction": "sentiment", # document extention for the prediction
"model": {
# the model name or path of huggingface model
"name": "distilbert-base-uncased-finetuned-sst-2-english",
},
}
transformer = nlp.add_pipe("sequence_classification_transformer", config=config)
doc = nlp("spaCy is a wonderful tool")
print(doc.cats)
# {'NEGATIVE': 0.001, 'POSITIVE': 0.999}
print(doc._.sentiment)
# 'POSITIVE'
print(doc._.clf_trf_data)
# TransformerData(wordpieces=...
These pipelines can also easily be applied to multiple documents using the nlp.pipe
as one would expect from a spaCy component:
docs = nlp.pipe(
[
"I hate wrapping my own models",
"Isn't there a tool for this?!",
"spacy-wrap is great for wrapping models",
]
)
for doc in docs:
print(doc._.sentiment)
# 'NEGATIVE'
# 'NEGATIVE'
# 'POSITIVE'
More Examples
It is always nice to have more than one example. Here is another one where we add the Hate speech model for Danish to a blank Danish pipeline:
import spacy
import spacy_wrap
nlp = spacy.blank("da")
config = {
"doc_extension_trf_data": "clf_trf_data", # document extention for the forward pass
"doc_extension_prediction": "hate_speech", # document extention for the prediction
# choose custom labels
"labels": ["Not hate Speech", "Hate speech"],
"model": {
"name": "DaNLP/da-bert-hatespeech-detection", # the model name or path of huggingface model
},
}
transformer = nlp.add_pipe("classification_transformer", config=config)
doc = nlp("Senile gamle idiot") # old senile idiot
doc._.clf_trf_data
# TransformerData(wordpieces=...
doc._.hate_speech
# "Hate speech"
doc._.hate_speech_prob
# {'prob': array([0.013, 0.987], dtype=float32), 'labels': ['Not hate Speech', 'Hate speech']}
Token Classification
We can also use the model for token classification:
import spacy
import spacy_wrap
nlp = spacy.blank("en")
config = {"model": {"name": "vblagoje/bert-english-uncased-finetuned-pos"}}
nlp.add_pipe("token_classification_transformer", config=config)
text = "My name is Wolfgang and I live in Berlin"
doc = nlp(text)
doc._.tok_clf_predictions
# ['O', 'O', 'O', 'B-PER', 'O', 'O', 'O', 'O', 'B-LOC', 'O']
By default, spacy-wrap will automatically detect it the labels follow the universal POS tags as well. If so it will also assign it to the token.pos
, similar regular spacy pipelines:
doc[0].pos_
# 'PRON'
Named Entity Recognition
In this example, we use a model fine-tuned for named entity recognition. spacy-wrap will in this case infer from the IOB tags that the model is intended for named entity recognition and assign it to doc.ents
.
import spacy
import spacy_wrap
nlp = spacy.blank("en")
# specify model from the hub
config = {"model": {"name": "dslim/bert-base-NER"}}
# add it to the pipe
nlp.add_pipe("token_classification_transformer", config=config)
doc = nlp("My name is Wolfgang and I live in Berlin.")
print(doc.ents)
# (Wolfgang, Berlin)
📖 Documentation
Documentation | |
---|---|
🔧 Installation | Installation instructions for spacy-wrap. |
📰 News and changelog | New additions, changes and version history. |
🎛 Documentation | The reference for spacy-wrap's API. |
💬 Where to ask questions
Type | |
---|---|
🚨 FAQ | FAQ |
🚨 Bug Reports | GitHub Issue Tracker |
🎁 Feature Requests & Ideas | GitHub Issue Tracker |
👩💻 Usage Questions | GitHub Discussions |
🗯 General Discussion | GitHub Discussions |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for spacy_wrap-1.2.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d8c7380772e559867c1d1b0664be4717cb18cd460b22338528986ee6d447575d |
|
MD5 | 956bbc1f3667f1627a053cda8299887b |
|
BLAKE2b-256 | 622f35d93a42f0e2cf9db1639a8939ec49d92ecd4ec65ba2687d3171f54d194b |