TextRL - use reinforcement learning to adjust text generation results.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: Apache Software License
Programming Language
- Python :: 3.6
Topic
- Scientific/Engineering :: Artificial Intelligence

Project description

TextRL

Text generation with reinforcement learning using huggingface's transformer.

Introduction

This project is trying to use reinforcement learning to adjust text generation results. It is based on any text-generation model on huggingaface's transformer with PFRL and OpenAI GYM.

Installation

pip install

pip install textrl

Build from source

git clone and cd into this project.

pip install -e .

Usage

init agent and environment

from textrl import TextRLEnv,TextRLActor

from transformers import AutoTokenizer, AutoModelWithLMHead  
tokenizer = AutoTokenizer.from_pretrained("any models")  
model = AutoModelWithLMHead.from_pretrained("any models")
model.eval()

setup reward function for environment

predicted(list[str]): will be the list of predicted token
finish(bool): it met the end of sentence or not

class MyRLEnv(TextRLEnv):
    def get_reward(self, input_text, predicted_list, finish): # predicted will be the list of predicted token
        if "[UNK]" in predicted_list:
            reward = -1
        else:
            reward = 1
        return reward

prepare for training

observation_input should be a list of all possible input string for model training

env = MyRLEnv(model, tokenizer, observation_input=observaton_list)
actor = TextRLActor(env,model,tokenizer)
agent = actor.agent_ppo(update_interval=10, minibatch_size=2000, epochs=20)

Train

n_episodes = 1000
max_episode_len = 200 # max sentence length

for i in range(1, n_episodes + 1):
    obs = env.reset()
    R = 0 
    t = 0 
    while True:
        action = agent.act(obs)
        obs, reward, done, pred = env.step(action)
        R += reward
        t += 1
        reset = t == max_episode_len
        agent.observe(obs, reward, done, reset)
        if done or reset:
            break
    if i % 10 == 0:
        print('episode:', i, 'R:', R)
    if i % 50 == 0:
        print('statistics:', agent.get_statistics())
print('Finished.')

another way to train

import logging
import sys
logging.basicConfig(level=logging.INFO, stream=sys.stdout, format='')

pfrl.experiments.train_agent_with_evaluation(
    agent,
    env,
    steps=1000,
    eval_n_steps=None,
    eval_n_episodes=1500,       
    train_max_episode_len=50,  
    eval_interval=10000,
    outdir='somewhere', 
)

prediction

actor.predict("input text")

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Intended Audience
- Science/Research
License
- OSI Approved :: Apache Software License
Programming Language
- Python :: 3.6
Topic
- Scientific/Engineering :: Artificial Intelligence

Release history Release notifications | RSS feed

0.2.22

Aug 6, 2023

0.2.21

Aug 3, 2023

0.2.20

Jul 25, 2023

0.2.19

Jul 25, 2023

0.2.18

Apr 26, 2023

0.2.17

Apr 26, 2023

0.2.16

Apr 13, 2023

0.2.15

Mar 27, 2023

0.2.13

Mar 1, 2023

0.2.12

Mar 1, 2023

0.2.11

Feb 13, 2023

0.2.10

Feb 13, 2023

0.2.1

Feb 9, 2023

0.2.0

Feb 6, 2023

0.1.10

Feb 2, 2023

0.1.9

Jan 11, 2023

0.1.8

Jan 10, 2023

0.1.7

Jan 9, 2023

0.1.6

Dec 9, 2022

0.1.5

Dec 9, 2022

0.1.4

Dec 5, 2022

0.1.2

Dec 5, 2022

0.1.1

Dec 6, 2021

0.1.0

Jul 28, 2021

0.0.9

Jul 1, 2021

0.0.8

Jun 17, 2021

0.0.7

Jun 17, 2021

0.0.6

Jun 17, 2021

0.0.5

Jun 13, 2021

0.0.4

Jun 13, 2021

This version

0.0.3

Jun 13, 2021

0.0.2

Jun 9, 2021

0.0.1

Apr 22, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textrl-0.0.3.tar.gz (6.2 kB view hashes)

Uploaded Jun 13, 2021 Source

Built Distributions

textrl-0.0.3-py3.7.egg (7.9 kB view hashes)

Uploaded Jun 13, 2021 Source

textrl-0.0.3-py3-none-any.whl (4.5 kB view hashes)

Uploaded Jun 13, 2021 Python 3

Hashes for textrl-0.0.3.tar.gz

Hashes for textrl-0.0.3.tar.gz
Algorithm	Hash digest
SHA256	`5853d1604c499e4068a6b4d9dcaf6a54332589921d615cf340a14980eeb3dd4d`
MD5	`22271bce77c7040937fa3dd6b38f7c13`
BLAKE2b-256	`fae29a66a3120d8888091055cc65484ad1a1bda98180c054ada6516c7e9652c9`

Hashes for textrl-0.0.3-py3.7.egg

Hashes for textrl-0.0.3-py3.7.egg
Algorithm	Hash digest
SHA256	`34a550d235e7ed31226fde0efcacf2778aff85b93082d35d47105cf53bcccaab`
MD5	`b73d504aab6380b5c06f58ccec2a5c84`
BLAKE2b-256	`556d1b0669684ddc13669aa398573169220b4f508404132154edbe028f95d2a1`

Hashes for textrl-0.0.3-py3-none-any.whl

Hashes for textrl-0.0.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4a09645a784905efea9d46aaa8238aaf1c28bcf713c516ab9a094ae361cee2cb`
MD5	`6c810cb87ad1378554eec53269dc5218`
BLAKE2b-256	`9fb2deec0dd0cd994acbc14afc29eb0104d07dcb3090e1496dc845c4ee2603a1`