Language Model for Question Generation.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
Topic
- Scientific/Engineering

Project description

Generative Language Models for Paragraph-Level Question Generation

This is the official repository of the paper "Generative Language Models for Paragraph-Level Question Generation: A Unified Benchmark and Evaluation, EMNLP 2022 main conference". This repository includes following contents:

QG-Bench, the first ever multilingual/multidomain QG benchmark.
Multilingual/multidomain QG models fine-tuned on QG-Bench.
A python library lmqg developed to fine-tune/evaluate QG model.
AutoQG, a web application hosting QG models where user can test the model output interactively.

QG-Bench: multilingual & multidomain QG datasets (+ fine-tuned models)
LMQG: python library to fine-tune/evaluate QG model
AutoQG: web application hosting multilingual QG models
RestAPI: run model prediction via restAPI
Reproduce Analysis of the Paper

Please cite following paper if you use any resource:

@inproceedings{ushio-etal-2022-generative,
    title = "{G}enerative {L}anguage {M}odels for {P}aragraph-{L}evel {Q}uestion {G}eneration: {A} {U}nified {B}enchmark and {E}valuation",
    author = "Ushio, Asahi  and
        Alva-Manchego, Fernando  and
        Camacho-Collados, Jose",
    booktitle = "Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing",
    month = dec,
    year = "2022",
    address = "Abu Dhabi, U.A.E.",
    publisher = "Association for Computational Linguistics",
}

LMQG: Language Model for Question Generation 🚀

The lmqg is a python library to fine-tune seq2seq language models (T5, BART) on the question generation task and provide an API to host the model prediction via huggingface. Let's install lmqg via pip first.

pip install lmqg

Model Evaluation

The evaluation tool reports BLEU4, ROUGE-L, METEOR, BERTScore, and MoverScore following QG-Bench. From command line, run following command

lmqg-eval -m "lmqg/t5-large-squad" -e "./eval_metrics" -d "lmqg/qg_squad" -l "en"

where -m is a model alias on huggingface or path to local checkpoint, -e is the directly to export the metric file, -d is the dataset to evaluate, and -l is the language of the test set. Instead of running model prediction, you can provide a prediction file instead to avoid computing it each time.

lmqg-eval --hyp-test '{your prediction file}' -e "./eval_metrics" -d "lmqg/qg_squad" -l "en"

The prediction file should be a text file of model generation in each line in the order of test split in the target dataset (sample). Check lmqg-eval -h to display all the options.

Model Training

To fine-tune QG model, we employ a two-stage hyper-parameter optimization, described as above diagram. Following command is to run the fine-tuning with parameter optimization.

lmqg-train-search -c "tmp_ckpt" -d "lmqg/qg_squad" -m "t5-small" -b 64 --epoch-partial 5 -e 15 --language "en" --n-max-config 1 \
  -g 2 4 \
  --lr 1e-04 5e-04 1e-03 \
  --label-smoothing 0 0.15

Check lmqg-train-search -h to display all the options.

Fine-tuning models in python follows below.

from lmqg import GridSearcher
trainer = GridSearcher(
    checkpoint_dir='tmp_ckpt', dataset_path='lmqg/qg_squad', model='t5-small', epoch=15, epoch_partial=5, batch=64, n_max_config=5,
    gradient_accumulation_steps=[2, 4], lr=[1e-04, 5e-04, 1e-03], label_smoothing=[0, 0.15])
trainer.run()

AutoQG

AutoQG (https://autoqg.net) is a free web application hosting our QG models. The QG models are listed at the QG-Bench page.

Rest API with huggingface inference API

We provide a rest API which hosts the model inference through huggingface inference API. You need huggingface API token to run your own API and install dependencies as below.

pip install lmqg[api]

Swagger UI is available at http://127.0.0.1:8080/docs, when you run the app locally (replace the address by your server address).

Build/Run Local (command line):

export API_TOKEN={Your Huggingface API Token}
uvicorn app:app --reload --port 8088
uvicorn app:app --host 0.0.0.0 --port 8088

Build/Run Local (docker):

docker build -t lmqg/app:latest . --build-arg api_token={Your Huggingface API Token}
docker run -p 8080:8080 lmqg/app:latest

Reproduce Analysis

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Developers
- Science/Research
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
Topic
- Scientific/Engineering

Release history Release notifications | RSS feed

0.1.1

Jun 18, 2023

0.1.0

May 12, 2023

0.0.8

Feb 10, 2023

0.0.7

Feb 9, 2023

0.0.6

Feb 7, 2023

0.0.5

Oct 7, 2022

This version

0.0.4

Oct 7, 2022

0.0.3

Oct 6, 2022

0.0.2

Oct 1, 2022

0.0.1

Oct 1, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lmqg-0.0.4.tar.gz (65.9 kB view hashes)

Uploaded Oct 7, 2022 Source

Hashes for lmqg-0.0.4.tar.gz

Hashes for lmqg-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`3ade44301515b4805cb04f90f652a89616d88e23601462a377fd06667ce35f66`
MD5	`e0caaa939c80afb66ea8d5f7d3cb6491`
BLAKE2b-256	`95793a90226b7e3c415159f7a28d94a9b47e75569ee28efec9c912d3ea9fc42c`