skip to navigation
skip to content

sentencepiece 0.0.2

SentencePiece python wrapper

# SentencePiece Python Wrapper

Python wrapper for SentencePiece with SWIG. This module wraps sentencepiece::SentencePieceProcessor class with the following modifications: * Encode and Decode methods are re-defined as EncodeAsIds, EncodeAsPieces, DecodeIds and DecodePieces respectevely. * SentencePieceText proto is not supported. * Added __len__ and __getitem__ methods. len(obj) and obj[key] returns vocab size and vocab id respectively.

## Build and Install SentencePiece You need to install SentencePiece before installing this python wrapper.

You can simply use pip comand to install SentencePiece python module.

` % pip install sentencepiece `

To install the wrapper manually, try the following commands: ` % python setup.py build % sudo python setup.py install `

If you don’t have write permission to the global site-packages directory or don’t want to install into it, please try: ` % python setup.py install --user `

## Usage

` % python >>> import sentencepiece as spm >>> sp = spm.SentencePieceProcessor() >>> sp.Load("test/test_model.model") True >>> sp.EncodeAsPieces("This is a test") ['\xe2\x96\x81This', '\xe2\x96\x81is', '\xe2\x96\x81a', '\xe2\x96\x81', 't', 'est'] >>> sp.EncodeAsIds("This is a test") [284, 47, 11, 4, 15, 400] >>> sp.DecodePieces(['\xe2\x96\x81This', '\xe2\x96\x81is', '\xe2\x96\x81a', '\xe2\x96\x81', 't', 'est']) 'This is a test' >>> sp.DecodeIds([284, 47, 11, 4, 15, 400]) 'This is a test' >>> sp.GetPieceSize() 1000 >>> sp.IdToPiece(2) '</s>' >>> sp.PieceToId('</s>') 2 >>> len(sp) 1000 >>> sp['</s>'] 2 `

 
File Type Py Version Uploaded on Size
sentencepiece-0.0.2-cp27-cp27m-manylinux1_x86_64.whl (md5) Python Wheel cp27 2017-11-08 1MB
sentencepiece-0.0.2-cp27-cp27mu-manylinux1_x86_64.whl (md5) Python Wheel cp27 2017-11-08 1MB
sentencepiece-0.0.2-cp33-cp33m-manylinux1_x86_64.whl (md5) Python Wheel cp33 2017-11-08 1MB
sentencepiece-0.0.2-cp34-cp34m-manylinux1_x86_64.whl (md5) Python Wheel cp34 2017-11-08 1MB
sentencepiece-0.0.2-cp35-cp35m-manylinux1_x86_64.whl (md5) Python Wheel cp35 2017-11-08 1MB
sentencepiece-0.0.2-cp36-cp36m-manylinux1_x86_64.whl (md5) Python Wheel cp36 2017-11-08 1MB
sentencepiece-0.0.2.tar.gz (md5) Source 2017-11-08 369KB