Easy-to-use text representations extraction library based on the Transformers library.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Simple Representations

This library is based on the Transformers library by HuggingFace. Using this library, you can quickly extract text representations from Transformer models. Only two lines of code are needed to initialize the required model and extract the text representations from it.

Installation
- With pip
- From source
Usage
Acknowledgements

Installation

This repository is tested on Python 3.6.8 and PyTorch 1.2.0

With `pip`

First you need to install PyTorch. Please refer to PyTorch installation page regarding the specific install command for your platform.

When PyTorch has been installed, Simple Representation can be installed using pip as follows:

pip install simplerepresentation

From source

Here also, you first need to install PyTorch. Please refer to PyTorch installation page regarding the specific install command for your platform.

When PyTorch has been installed, you can install from source by cloning the repository and running:

pip install .

Usage

Minimal Start

The following example extracts the text representations from BERT Base Uncased model for the sentences Hello Transformers! and It's very simple..

from simplerepresentations import RepresentationModel


def load_data():
	return ['Hello Transformers!', 'It\'s very simple.']


if __name__ == '__main__':
	model_type = 'bert'
	model_name = 'bert-base-uncased'

	representation_model = RepresentationModel(
		model_type=model_type,
		model_name=model_name,
		batch_size=32,
		max_seq_length=10, # truncate sentences to be less than or equal to 10 tokens
		combination_method='cat', # concatenate the last `last_hidden_to_use` hidden states
		last_hidden_to_use=4 # use the last 4 hidden states to build tokens representations
	)

	text_a = load_data()

	all_sentences_representations, all_tokens_representations = representation_model(text_a=text_a)

	print(all_sentences_representations.shape) # (2, 768) => (number of sentences, hidden size)
	print(all_tokens_representations.shape) # (2, 10, 3072) => (number of sentences, number of tokens, hidden size)

You can change the code in load_data function to load your own data from any source you want (e.g. a CSV file).

Default Settings

The default settings for RepresentationModel class are given below:

batch_size (32): integer

The batch size will be used while extracting representations.

max_seq_length (128): integer

Maximum sequence length the model will support.

last_hidden_to_use (1): integer

The number of the last hidden states that will be used to build the representations.

combination_method ('sum'): string ('sum', 'cat')

The method that will be used to combine the last_hidden_to_use.

use_cuda (True): boolean

Whether to use CUDA or not.

process_count (cpu_count() - 2 if cpu_count() > 2 else 1): integer

Number of CPU cores (processes) to use when converting examples to features. Default is (number of cores - 2) or 1 if (number of cores <= 2).

chunksize (500): integer

The number of chunks that the examples will be divided to when converting them to features.

Current Pretrained Models

You can find the complete list of the current pretrained models from Transformers library documentation.

Acknowledgements

None of this would have been possible without the hard work by the HuggingFace team in developing the Transformers library.

Also, a lot of ideas used in this repository inspired from the Simple Transformers library.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.0.4

Jan 3, 2020

0.0.3

Jan 3, 2020

0.0.2

Dec 14, 2019

0.0.1

Oct 25, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simplerepresentations-0.0.4.tar.gz (7.3 kB view hashes)

Uploaded Jan 3, 2020 Source

Hashes for simplerepresentations-0.0.4.tar.gz

Hashes for simplerepresentations-0.0.4.tar.gz
Algorithm	Hash digest
SHA256	`bdc3b6a08cabb4f6966dd723e5e8dcd56336a5b049d53c44ad99b17b32e12092`
MD5	`026ab4ada0f239fdfe5035b252056d79`
BLAKE2b-256	`b6c76b3dbc9b94612307a1f360640da888d5599783b601b0944422e1ba560b58`

simplerepresentations 0.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Simple Representations

Table of contents

Installation

With `pip`

From source

Usage

Minimal Start

Default Settings

batch_size (32): integer

max_seq_length (128): integer

last_hidden_to_use (1): integer

combination_method ('sum'): string ('sum', 'cat')

use_cuda (True): boolean

process_count (cpu_count() - 2 if cpu_count() > 2 else 1): integer

chunksize (500): integer

Current Pretrained Models

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

simplerepresentations 0.0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Simple Representations

Table of contents

Installation

With pip

From source

Usage

Minimal Start

Default Settings

batch_size (32): integer

max_seq_length (128): integer

last_hidden_to_use (1): integer

combination_method ('sum'): string ('sum', 'cat')

use_cuda (True): boolean

process_count (cpu_count() - 2 if cpu_count() > 2 else 1): integer

chunksize (500): integer

Current Pretrained Models

Acknowledgements

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

With `pip`