NLP Pipelines for Tagalog
Project description
calamanCy: NLP pipelines for Tagalog
calamanCy is a Tagalog natural language preprocessing framework made with spaCy. Its goal is to provide pipelines and datasets for downstream NLP tasks. This repository contains material for using calamanCy, reproduction of results, and guides on usage.
calamanCy takes inspiration from other language-specific spaCy Universe frameworks such as DaCy, huSpaCy, and graCy. The name is based from calamansi, a citrus fruit native to the Philippines and used in traditional Filipino cuisine.
🔧 Installation
To get started with calamanCy, simply install it using pip
by running the
following line in your terminal:
pip install calamanCy
Development
If you are developing calamanCy, first clone the repository:
git clone git@github.com:ljvmiranda921/calamanCy.git
Then, create a virtual environment and install the dependencies:
python -m venv venv
venv/bin/pip install -e . # requires pip>=23.0
venv/bin/pip install .[dev]
# Activate the virtual environment
source venv/bin/activate
or alternatively, use make dev
.
👩💻 Usage
To use calamanCy you first have to download either the medium, large, or transformer model. To see a list of all available models, run:
import calamancy
from model in calamancy.models():
print(model)
# ..
# tl_calamancy_md-0.1.0
# tl_calamancy_lg-0.1.0
# tl_calamancy_trf-0.1.0
To download and load a model, run:
nlp = calamancy.load("tl_calamancy_md-0.1.0")
doc = nlp("Ako si Juan de la Cruz")
The nlp
object is an instance of spaCy's Language
class and you can use it as any other spaCy
pipeline.
📦 Models and Datasets
calamanCy provides Tagalog models and datasets that you can use in your spaCy
pipelines. You can download them directly or use the calamancy
Python library
to access them. The training procedure for each pipeline can be found in the
models/
directory. They are further subdivided into versions. Each folder is
an instance of a spaCy project.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for calamanCy-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ca9e4499c5c6188ae9c22be25735610544ff6304d30286337397cfda89101e25 |
|
MD5 | 7de05e20f40dc12bea02b9c2fc1f9a6b |
|
BLAKE2b-256 | 1c01d3d0bc595c2541ede7bd3acba64f07a57ffe30da92a660071b0e87682b54 |