Few-Shot Named Entity Recognition using Span Markers

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

SpanMarker for Named Entity Recognition

🤗 Models | 🛠️ Getting Started In Google Colab | 📄 Documentation

SpanMarker is a framework for training powerful Named Entity Recognition models using familiar encoders such as BERT, RoBERTa and DeBERTa. Tightly implemented on top of the 🤗 Transformers library, SpanMarker can take advantage of its valuable functionality.

Based on the PL-Marker paper, SpanMarker breaks the mold through its accessibility and ease of use. Crucially, SpanMarker works out of the box with many common encoders such as bert-base-cased and roberta-large, and automatically works with datasets using the IOB, IOB2, BIOES, BILOU or no label annotation scheme.

Documentation

Feel free to have a look at the documentation.

Installation

You may install the span_marker Python module via pip like so:

pip install span_marker

Quick Start

Please have a look at our Getting Started notebook for details on how SpanMarker is commonly used. It explains the following snippet in more detail.

Colab	Kaggle	Gradient	Studio Lab

from datasets import load_dataset
from span_marker import SpanMarkerModel, Trainer
from transformers import TrainingArguments

def main():
    dataset = load_dataset("DFKI-SLT/few-nerd", "supervised")
    labels = dataset["train"].features["ner_tags"].feature.names

    model_name = "bert-base-cased"
    model = SpanMarkerModel.from_pretrained(model_name, labels=labels)

    args = TrainingArguments(
        output_dir="my_span_marker_model",
        learning_rate=5e-5,
        gradient_accumulation_steps=2,
        per_device_train_batch_size=4,
        per_device_eval_batch_size=4,
        num_train_epochs=1,
        save_strategy="steps",
        eval_steps=200,
        logging_steps=50,
        fp16=True,
        warmup_ratio=0.1,
        dataloader_num_workers=2,
    )

    trainer = Trainer(
        model=model,
        args=args,
        train_dataset=dataset["train"].select(range(8000)),
        eval_dataset=dataset["validation"].select(range(2000)),
    )

    trainer.train()
    trainer.save_model("my_span_marker_model/checkpoint-final")

    metrics = trainer.evaluate()
    print(metrics)

if __name__ == "__main__":
    main()

Pretrained Models

tomaarsen/span-marker-bert-base-fewnerd-fine-super is a model that I have trained in 2 hours on the finegrained, supervised Few-NERD dataset. It reached a 0.7053 Test F1, competitive in the all-time Few-NERD leaderboard using bert-base. My training script resembles the one that you can see above.
- Try the model out online using this 🤗 Space.
tomaarsen/span-marker-roberta-large-fewnerd-fine-super was trained in 6 hours on the finegrained, supervised Few-NERD dataset using roberta-large. It reached a 0.7103 Test F1, very competitive in the all-time Few-NERD leaderboard.

Changelog

See CHANGELOG.md for news on all SpanMarker versions.

License

See LICENSE for the current license.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.5.0

Oct 31, 2023

1.4.0

Sep 29, 2023

1.3.0

Aug 24, 2023

1.2.5

Aug 24, 2023

1.2.4

Jul 18, 2023

1.2.3

Jun 20, 2023

1.2.2

Jun 20, 2023

1.2.1

Jun 19, 2023

1.2.0

Jun 15, 2023

1.1.1

Jun 13, 2023

1.1.0

Jun 10, 2023

This version

1.0.1

May 1, 2023

1.0.0

May 1, 2023

0.2.2

Apr 13, 2023

0.2.1

Apr 7, 2023

0.2.0

Apr 6, 2023

0.1.1

Mar 31, 2023

0.1.0

Mar 30, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

span_marker-1.0.1.tar.gz (33.8 kB view hashes)

Uploaded May 1, 2023 Source

Built Distribution

span_marker-1.0.1-py3-none-any.whl (32.3 kB view hashes)

Uploaded May 1, 2023 Python 3

Hashes for span_marker-1.0.1.tar.gz

Hashes for span_marker-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`914f7d3f2200b015a21653ce39ea661420a42cc53b08c30b85436284677a8b57`
MD5	`fc2c18065d0162c6408adee2d2169420`
BLAKE2b-256	`c769342562fd7f92ffb301ce973b8c9d130920971593390e183d67e924ac3e92`

Hashes for span_marker-1.0.1-py3-none-any.whl

Hashes for span_marker-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`96eb230ae8787df5986d5f3ccec4f10d595bf4caf64bfd71fd314c49375f9af1`
MD5	`ecee1b00e82408d2d8ecc84db3fc7d54`
BLAKE2b-256	`a505e61938db70f86c4a5a5537aec1faec006c914d889a1a84b537a77fa03614`