Happy Transformer is an API built on top of Hugging Face's Transformer library that makes it easy to utilize state-of-the-art NLP models.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Happy Transformer

Happy Transformer is an package built on top of Hugging Face's transformer library that makes it easy to utilize state-of-the-art NLP models.

News
Features
Installation
Word Prediction
Text Classification
Question Answering
Question Answering Training
Next Sentence Prediction
Token Classification
Tech
Call For Contributors
Maintainers

News:

March 1st, 2021

Introducing Version 2.1.0! You can now use any model type available on Hugging Face's model distribution network for the implemented features. This includes BERT, ROBERTA, ALBERT XLNET and more.

January 12, 2021

Introducing Version 2.0.0!

We fully redesigned Happy Transformer from the ground up.

New Features:

Question answering training
Multi label text classification training
Single predictions for text classification

Deprecated Features:

Masked word prediction training
Masked word prediction with multiple masks

Breaking changes:

Everything

Happy Transformer have been redesigned to promote scalability. Now it's easier than ever to add new models and features, and we encourage you to create PRs to contribute to the project.

Awards

Best Presentation at C-Search, Queen's University Student Research Conference. Best Paper at the Canadian Undergraduate Conference for AI. The paper can be found on page 67.

Features

Public Methods	Basic Usage	Training
Word Prediction	✔
Text Classification	✔	✔
Question Answering	✔	✔
Next Sentence Prediction	✔
Token Classification	✔

Installation

pip install happytransformer

Word Prediction

Initialization

See Medium article for a more in-depth explanation

Initialize a HappyWordPrediction object to perform word prediction.

Initialization Arguments:

model_type (string): Specify the model name in all caps, such as "ROBERTA" or "ALBERT"
model_name(string): below is a URL that contains potential models: MODELS

Note: For all Transformers, the masked token is "[MASK]"

We recommend using "HappyWordPrediction("ALBERT", "albert-xxlarge-v2")" for the best performance

Example 1.0:

    from happytransformer import HappyWordPrediction
    # --------------------------------------#
    happy_wp_distilbert = HappyWordPrediction("DISTILBERT", "distilbert-base-uncased")  # default
    happy_wp_albert = HappyWordPrediction("ALBERT", "albert-base-v2")
    happy_wp_bert = HappyWordPrediction("BERT", "bert-base-uncased")
    happy_wp_roberta = HappyWordPrediction("ROBERTA", "roberta-base")

predict_mask()

The method predict_masks() contains 3 arguments:

text (string): a body of text that contains a single masked token
targets (list of strings): a list of potential answers. All other answers will be ignored
top_k (int): the number of results that will be returned

Returns: A list of objects with fields "token" and "score"

Note: if targets are provided, then top_k will be ignored and a score for each target will be returned.

Example 1.1:

from happytransformer import HappyWordPrediction
#--------------------------------------#
    happy_wp = HappyWordPrediction()  # default uses distilbert-base-uncased
    result = happy_wp.predict_mask("I think therefore I [MASK]")
    print(type(result))  # <class 'list'>
    print(result)  # [WordPredictionResult(token='am', score=0.10172799974679947)]
    print(type(result[0]))  # <class 'happytransformer.happy_word_prediction.WordPredictionResult'>
    print(result[0])  # [WordPredictionResult(token='am', score=0.10172799974679947)]
    print(result[0].token)  # am
    print(result[0].score)  # 0.10172799974679947

Example 1.2:

from happytransformer import HappyWordPrediction
#--------------------------------------#
happy_wp = HappyWordPrediction("ALBERT", "albert-xxlarge-v2")
result = happy_wp.predict_mask("To better the world I would invest in [MASK] and education.", top_k=2)
print(result)  # [WordPredictionResult(token='infrastructure', score=0.09270179271697998), WordPredictionResult(token='healthcare', score=0.07219093292951584)]
print(result[1]) # WordPredictionResult(token='healthcare', score=0.07219093292951584)
print(result[1].token) # healthcare

Example 1.3:

from happytransformer import HappyWordPrediction
#--------------------------------------#
happy_wp = HappyWordPrediction("ALBERT", "albert-xxlarge-v2")
targets = ["technology", "healthcare"]
result = happy_wp.predict_mask("To better the world I would invest in [MASK] and education.", targets=targets)
print(result)  # [WordPredictionResult(token='healthcare', score=0.07219093292951584), WordPredictionResult(token='technology', score=0.032044216990470886)]
print(result[1])  # WordPredictionResult(token='technology', score=0.032044216990470886)
print(result[1].token)  # technology

Text Classification

Initialization

Initialize a HappyTextClassification object to perform text classification.

This model assigns a label to a given text string. For example, you can train a model to detect if an email is spam based on its text.

Initialization Arguments:

model_type (string): specify the model name in all caps, such as "ROBERTA" or "ALBERT"
model_name(string): below is a URL that contains potential models. The default is "distilbert-base-uncased" MODELS
num_labels(int): The number of text categories. The default is 2

WARNING: If you try to load a pretrained model that has a different number of categories than num_labels, then you will get an error

NOTE: "albert-base-v2", "bert-base-uncased" and "distilbert-base-uncased" do not have a predefined number of labels, so if you use these models you can set num_labels freely

Example 2.0:

    from happytransformer import HappyTextClassification
    # --------------------------------------#
    happy_tc_distilbert = HappyTextClassification("DISTILBERT", "distilbert-base-uncased", num_labels=2)  # default 
    happy_tc_albert = HappyTextClassification(model_type="ALBERT", model_name="albert-base-v2")
    happy_tc_bert = HappyTextClassification("BERT", "bert-base-uncased")
    happy_tc_roberta = HappyTextClassification("ROBERTA", "roberta-base")

classify_text()

Input:

text (string): Text that will be classified

Returns: An object with fields "label" and "score"

Example 2.1:

    from happytransformer import HappyTextClassification
    # --------------------------------------#
    happy_tc = HappyTextClassification(model_type="DISTILBERT",  model_name="distilbert-base-uncased-finetuned-sst-2-english")
    result = happy_tc.classify_text("Great movie! 5/5")
    print(type(result))  # <class 'happytransformer.happy_text_classification.TextClassificationResult'>
    print(result)  # TextClassificationResult(label='LABEL_1', score=0.9998761415481567)
    print(result.label)  # LABEL_1

Text Classification Training

HappyTextClassification contains three methods for training

train(): fine-tune the model to become better at a certain task
eval(): determine how well the model performs on a labeled dataset
test(): run the model on an unlabeled dataset to produce predictions

train()

inputs:

input_filepath (string): a path file to a csv file as described in table 2.1
args (dictionary): a dictionary with the same keys and value types as shown below. The dictionary below shows the default values.

Information about what the keys mean can be accessed here

ARGS_QA_TRAIN= {
    'learning_rate': 5e-5,
    'weight_decay': 0,
    'adam_beta1': 0.9,
    'adam_beta2': 0.999,
    'adam_epsilon': 1e-8,
    'max_grad_norm':  1.0,
    'num_train_epochs': 3.0,

}

Output: None

Table 2.1

text (string): text to be classified
label (int): the corresponding label

Text	label
Wow what a great place to eat	1
Horrible food	0
Terrible service	0
I'm coming here again	1

Example 2.3:

    from happytransformer import HappyTextClassification
    # --------------------------------------#
     happy_tc = HappyTextClassification(model_type="DISTILBERT",
                                       model_name="distilbert-base-uncased-finetuned-sst-2-english",
                                       num_labels=2)  # Don't forget to set num_labels! 
    happy_tc.train("../../data/tc/train-eval.csv")

eval()

Input:

input_filepath (string): a path file to a csv file as described in table 2.1

output:

An object with the field "loss"

Example 2.3:

    from happytransformer import HappyTextClassification
    # --------------------------------------#
    happy_tc = HappyTextClassification(model_type="DISTILBERT",
                                       model_name="distilbert-base-uncased-finetuned-sst-2-english",
                                       num_labels=2)  # Don't forget to set num_labels!
    result = happy_tc.eval("../../data/tc/train-eval.csv")
    print(type(result))  # <class 'happytransformer.happy_trainer.EvalResult'>
    print(result)  # EvalResult(eval_loss=0.007262040860950947)
    print(result.loss)  # 0.007262040860950947

test()

Input:

input_filepath (string): a path file to a csv file as described in table 2.2

Output: A list of named tuples with keys: "label" and "score"

The list is in order by ascending csv index.

Table 2.2

text (string): text that will be classified

Text
Wow what a great place to eat
Horrible food
Terrible service
I'm coming here again

Example 2.4:

    from happytransformer import HappyTextClassification
    # --------------------------------------#
    happy_tc = HappyTextClassification(model_type="DISTILBERT",
                                       model_name="distilbert-base-uncased-finetuned-sst-2-english",
                                       num_labels=2)  # Don't forget to set num_labels!
    result = happy_tc.test("../../data/tc/test.csv")
    print(type(result))  # <class 'list'>
    print(result)  # [TextClassificationResult(label='LABEL_1', score=0.9998401999473572), TextClassificationResult(label='LABEL_0', score=0.9772131443023682)...
    print(type(result[0]))  # <class 'happytransformer.happy_text_classification.TextClassificationResult'>
    print(result[0])  # TextClassificationResult(label='LABEL_1', score=0.9998401999473572)
    print(result[0].label)  # LABEL_1

Example 2.5:

    from happytransformer import HappyTextClassification
    # --------------------------------------#
    happy_tc = HappyTextClassification(model_type="DISTILBERT",
                                       model_name="distilbert-base-uncased-finetuned-sst-2-english",
                                       num_labels=2)  # Don't forget to set num_labels!
    before_loss = happy_tc.eval("../../data/tc/train-eval.csv").loss
    happy_tc.train("../../data/tc/train-eval.csv")
    after_loss = happy_tc.eval("../../data/tc/train-eval.csv").loss
    print("Before loss: ", before_loss)  # 0.007262040860950947
    print("After loss: ", after_loss)  # 0.000162081079906784
    # Since after_loss < before_loss, the model learned!
    # Note: typically you evaluate with a separate dataset
    # but for simplicity we used the same one

Question Answering

Initialization

Initialize a HappyQuestionAnswering object to perform question answering.

This model answers a question given a body of that's text relevant to the questions.

The outputted answer is always a text-span with the provided information.

Initialization Arguments:

model_type (string): specify the model name in all caps, such as "ROBERTA" or "ALBERT"
model_name(string): below is a URL that contains potential models. MODELS

We recommend using "HappyQuestionAnswering("ALBERT", "mfeb/albert-xxlarge-v2-squad2")" for the best performance

Example 3.0:

    from happytransformer import HappyQuestionAnswering
    # --------------------------------------#
    happy_qa_distilbert = HappyQuestionAnswering("DISTILBERT", "distilbert-base-cased-distilled-squad")  # default
    happy_qa_albert = HappyQuestionAnswering("ALBERT", "mfeb/albert-xxlarge-v2-squad2")
    # good model when using with limited hardware 
    happy_qa_bert = HappyQuestionAnswering("BERT", "mrm8488/bert-tiny-5-finetuned-squadv2")
    happy_qa_roberta = HappyQuestionAnswering("ROBERTA", "deepset/roberta-base-squad2")

answer_question()

Inputs:

context (string): background information, which contains a text-span that is the answer
question (string): the question that will be asked
top_k (int): the number of results that will be returned (default=1)

Returns: A list of a objects with fields: "answer", "score", "start" and "end." The list is in descending order by score

Example 3.1:

    from happytransformer import HappyQuestionAnswering
    # --------------------------------------#
    happy_qa = HappyQuestionAnswering()
    result = happy_qa.answer_question("Today's date is January 10th, 2021", "What is the date?")
    print(type(result))  # <class 'list'>
    print(result)  # [QuestionAnsweringResult(answer='January 10th, 2021', score=0.9711642265319824, start=16, end=34)]
    print(type(result[0]))  # <class 'happytransformer.happy_question_answering.QuestionAnsweringResult'>
    print(result[0])  # QuestionAnsweringResult(answer='January 10th, 2021', score=0.9711642265319824, start=16, end=34)
    print(result[0].answer)  # January 10th, 2021

Example 3.2:

    from happytransformer import HappyQuestionAnswering
    # --------------------------------------#
    happy_qa = HappyQuestionAnswering()
    result = happy_qa.answer_question("Today's date is January 10th, 2021", "What is the date?", top_k=2)
    print(type(result))  # <class 'list'>
    print(result)  # [QuestionAnsweringResult(answer='January 10th, 2021', score=0.9711642265319824, start=16, end=34), QuestionAnsweringResult(answer='January 10th', score=0.017306014895439148, start=16, end=28)]
    print(result[1].answer)  # January 10th

Question Answering Training

HappyQuestionAnswering contains three methods for training

train(): fine-tune a question answering model to become better at a certain task
eval(): determine how well the model performs on a labeled dataset
test(): run the model on an unlabeled dataset to produce predictions

train()

inputs:

input_filepath (string): a path file to a csv file as described in table 3.1
args (dictionary): a dictionary with the same keys and value types as shown below. The dictionary below shows the default values.

Information about what the keys mean can be accessed here

ARGS_QA_TRAIN= {
    'learning_rate': 5e-5,
    'weight_decay': 0,
    'adam_beta1': 0.9,
    'adam_beta2': 0.999,
    'adam_epsilon': 1e-8,
    'max_grad_norm':  1.0,
    'num_train_epochs': 3.0,

}

Output: None

Table 3.1

context (string): background information for answer the question
question (string): the question that will be asked
answer_text(string): the answer in string format
answer_start(int): the char index of the start of the answer

context	question	answer_text	answer_start
October 31st is the date	what is the date?	October 31st	0
The date is November 23rd	what is the date?	November 23rd	12

Example 3.3:

    from happytransformer import HappyQuestionAnswering
    # --------------------------------------#
    happy_qa = HappyQuestionAnswering()
    happy_qa.train("../../data/qa/train-eval.csv")

eval()

Input:

input_filepath (string): a path file to a csv file as described in table 3.1

output:

A dataclass with the variable "loss"

Example 3.4:

    from happytransformer import HappyQuestionAnswering
    # --------------------------------------#
    happy_qa = HappyQuestionAnswering()
    result = happy_qa.eval("../../data/qa/train-eval.csv")
    print(type(result))  # <class 'happytransformer.happy_trainer.EvalResult'>
    print(result)  # EvalResult(eval_loss=0.11738169193267822)
    print(result.loss)  # 0.1173816919326782

test()

Input:

input_filepath (string): a path file to a csv file as described in table 3.2

Output: A list of named tuples with keys: "answer", "score", "start" and "end"

The list is in order by ascending csv index.

Table 3.2

context (string): background information for answer the question
question (string): the question that will be asked

context	question
October 31st is the date	what is the date?
The date is November 23rd	what is the date?

Example 3.5:

    from happytransformer import HappyQuestionAnswering
    # --------------------------------------#
    happy_qa = HappyQuestionAnswering()
    result = happy_qa.test("../../data/qa/test.csv")
    print(type(result))
    print(result)  # [QuestionAnsweringResult(answer='October 31st', score=0.9939756989479065, start=0, end=12), QuestionAnsweringResult(answer='November 23rd', score=0.967872679233551, start=12, end=25)]
    print(result[0])  # QuestionAnsweringResult(answer='October 31st', score=0.9939756989479065, start=0, end=12)
    print(result[0].answer)  # October 31st

Example 3.6:

    from happytransformer import HappyQuestionAnswering
    # --------------------------------------#
    happy_qa = HappyQuestionAnswering()
    before_loss = happy_qa.eval("../../data/qa/train-eval.csv").loss
    happy_qa.train("../../data/qa/train-eval.csv")
    after_loss = happy_qa.eval("../../data/qa/train-eval.csv").loss
    print("Before loss: ", before_loss)  # 0.11738169193267822
    print("After loss: ", after_loss)  # 0.00037909045931883156
    # Since after_loss < before_loss, the model learned!
    # Note: typically you evaluate with a separate dataset
    # but for simplicity we used the same one

Next Sentence Prediction

Initialization

Initialize a HappyNextSentence object to next sentence prediction

Initialization Arguments:

model_type (string): The default is "BERT", which is currently the only available model
model_name(string): We recommend none-finetuned BERT models like "bert-base-uncased" and "bert-large-uncased"

Example 4.0:

    from happytransformer import HappyNextSentence
    # --------------------------------------#
    happy_ns = HappyNextSentence("BERT", "bert-base-uncased")  # default 
    happy_ns_large = HappyNextSentence("BERT", "bert-large-uncased")

predict_next_sentence()

Inputs: We recommend keeping sentence_a and sentence_b to a single sentence. But longer inputs still work.

sentence_a (string): A sentence
sentence_b (string): A sentence that may or may not follow sentence_a

Returns: A float between 0 and 1 that represents how likely sentence_a follows sentence_b.

Example 4.1:

    from happytransformer import HappyNextSentence
    # --------------------------------------#
    happy_ns = HappyNextSentence()
    result = happy_ns.predict_next_sentence(
        "How old are you?",
        "I am 21 years old."
    )
    print(type(result))  # <class 'float'>
    print(result)  # 0.9999918937683105

Example 4.2:

    from happytransformer import HappyNextSentence
    # --------------------------------------#
    happy_ns = HappyNextSentence()
    result = happy_ns.predict_next_sentence(
        "How old are you?",
        "Queen's University is in Kingston Ontario Canada"
    )
    print(type(result))  # <class 'float'>
    print(result)  # 0.00018497584096621722

Token Classification

Initialization

Initialize a HappyNextSentence object to next sentence prediction

Initialization Arguments:

model_type (string): specify the model name in all caps, such as "ROBERTA" or "ALBERT"
model_name(string): potential models can be found here

Example 5.0:

    from happytransformer import HappyNextSentence
    # --------------------------------------#
    happy_toc = HappyTokenClassification("BERT", "dslim/bert-base-NER")  # default 
    happy_toc_large = HappyNextSentence("XLM-ROBERTA", "xlm-roberta-large-finetuned-conll03-english")

classify_token()

Inputs:

sentence_a (string): Text you wish to classify. Be sure to provide full sentences rather than individual words so that the model has more context.

Returns: A list of objects with the following fields: word: The classified word score: the probability of the entity entity: the predicted entity. Each model has it's own unique set of entities. index: The index of the token within the tokenized text start: The index of the string where the first letter of the predicted word occurs end: The index of the string where the last letter of the predicted word occurs

Example 5.1:

    from happytransformer import HappyTokenClassification
    # --------------------------------------#
    happy_toc = HappyTokenClassification(model_type="BERT", model_name="dslim/bert-base-NER")
    result = happy_toc.classify_token("My name is Geoffrey and I live in Toronto")
    print(type(result))  # <class 'list'>
    print(result[0].word)  # Geoffrey
    print(result[0].entity)  # B-PER
    print(result[0].score)  # 0.9988969564437866
    print(result[0].index)  # 4
    print(result[0].start) # 11
    print(result[0].end)  # 19

    print(result[1].word)  # Toronto
    print(result[1].entity)  # B-LOC

Tech

Happy Transformer uses a number of open source projects:

transformers - State-of-the-art Natural Language Processing for TensorFlow 2.0 and PyTorch!
pytorch - Tensors and Dynamic neural networks in Python
tqdm - A Fast, Extensible Progress Bar for Python and CLI

HappyTransformer is also an open source project with this public repository on GitHub.

Call for contributors

Happy Transformer is a growing API. We're seeking more contributors to help accomplish our mission of making state-of-the-art AI easier to use.

Maintainers

Eric Fillion Lead Maintainer
Ted Brownlow Maintainer

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

3.0.0

Aug 5, 2023

2.4.1

Feb 6, 2022

2.4.0

Nov 19, 2021

2.3.3

Oct 30, 2021

2.3.2

Oct 29, 2021

2.3.1

Sep 11, 2021

2.3.0

Aug 15, 2021

2.2.5

Jul 21, 2021

2.2.4

Jun 17, 2021

2.2.3

Jun 12, 2021

2.2.2

May 8, 2021

2.2.1

May 4, 2021

2.2.0

May 4, 2021

This version

2.1.0

Mar 3, 2021

2.0.0

Jan 13, 2021

1.1.3

Dec 13, 2020

1.1.2

Apr 28, 2020

1.1.0

Feb 17, 2020

1.0.4

Jan 27, 2020

1.0.3

Jan 25, 2020

1.0.2

Jan 24, 2020

1.0.1

Jan 11, 2020

1.0.0

Jan 11, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

happytransformer-2.1.0.tar.gz (25.8 kB view hashes)

Uploaded Mar 3, 2021 Source

Built Distribution

happytransformer-2.1.0-py3-none-any.whl (31.7 kB view hashes)

Uploaded Mar 3, 2021 Python 3

Hashes for happytransformer-2.1.0.tar.gz

Hashes for happytransformer-2.1.0.tar.gz
Algorithm	Hash digest
SHA256	`8a3551db095ac4036ba7376f633da8c7451f1a1602295403437e775ca406fb07`
MD5	`9e3fa3d38d537c265a5e8d341e2fe164`
BLAKE2b-256	`55f55b4bce726ef408c6c0b2e803b47e54cc1cc049ecef1044b7b77ae6c07221`

Hashes for happytransformer-2.1.0-py3-none-any.whl

Hashes for happytransformer-2.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`10c7c710804e556683d54139e37907332fdde5eebfb82ebfba8e9384c3df60c8`
MD5	`322fa86401c2b3d135157a156943d7e2`
BLAKE2b-256	`3890b0f8cf996bb39ee11a446d8e7528dabd6dca83e1c71ffd7e655848a7f738`

happytransformer 2.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Happy Transformer

Table of Contents

News:

March 1st, 2021

January 12, 2021

Awards

Features

Installation

Word Prediction

Initialization

Example 1.0:

predict_mask()

Example 1.1:

Example 1.2:

Example 1.3:

Text Classification

Initialization

Example 2.0:

classify_text()

Example 2.1:

Text Classification Training

train()

Table 2.1

Example 2.3:

eval()

Example 2.3:

test()

Table 2.2

Example 2.4:

Example 2.5:

Question Answering

Initialization

Example 3.0:

answer_question()

Example 3.1:

Example 3.2:

Question Answering Training

train()

Table 3.1

Example 3.3:

eval()

Example 3.4:

test()

Table 3.2

Example 3.5:

Example 3.6:

Next Sentence Prediction

Initialization

Example 4.0:

predict_next_sentence()

Example 4.1:

Example 4.2:

Token Classification

Initialization

Example 5.0:

classify_token()

Example 5.1:

Tech

Call for contributors

Maintainers

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files