nagisa

Japanese word segmentation/POS tagging tool

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

![Alt text](/nagisa/data/nagisa_image.jpg 'An image of title')

# nagisa

Nagisa is a python module for Japanese word segmentation/POS-tagging.
It is designed to be a simple and easy-to-use tool.

This tool has the following features.
- Based on recurrent neural networks.
- The word segmentation model uses character- and word-level features [[池田+]](http://www.anlp.jp/proceedings/annual_meeting/2017/pdf_dir/B6-2.pdf).
- The POS-tagging model uses tag dictionary information [[Inoue+]](http://www.aclweb.org/anthology/K17-1042).

Requirements
========
[DyNet](https://github.com/clab/dynet) (Neural Network Toolkit) is required.
Nagisa is compatible with: Python 2.7-3.6.

Installation
========

```bash
# From github
git clone https://github.com/taishi-i/nagisa
cd nagisa
# If you got a permission denied error,
# please run the following line.
# sudo python setup.py install
python setup.py install
```

Usage
====

```python
import nagisa
tagger = nagisa.Tagger()

# Sample of word segmentation and POS-tagging for Japanese
text = 'Pythonで簡単に使えるツールです'
words = tagger.tagging(text)
print(words) # Python/名詞で/助詞簡単/形状詞に/助動詞使える/動詞ツール/名詞です/助動詞

# Get a list of words
print(words.words) # ['Python', 'で', '簡単', 'に', '使える', 'ツール', 'です']
# Get a list of POS-tags
print(words.postags) # ['名詞', '助詞', '形状詞', '助動詞', '動詞', '名詞', '助動詞']

# A list of available POS-tags
print(tagger.postags) # ['補助記号', '名詞', ... , 'URL']

# Extarcting all nouns from a text
words = tagger.extract(text, ['名詞'])
print(words) # Python/名詞ツール/名詞

# Filtering specific POS-tags from a text
words = tagger.filter(text, ['助詞', '助動詞'])
print(words) # Python/名詞簡単/形状詞使える/動詞ツール/名詞
```

Feature
====

```python
# Nagisa is good at capturing URLs and emoticons from a text.
text = '(人•ᴗ•♡)こんばんは♪'
words = tagger.tagging(text)
print(words) # (人•ᴗ•♡)/補助記号こんばんは/感動詞 ♪/補助記号

url = 'https://github.com/taishi-i/nagisaでコードを公開中(๑¯ω¯๑)'
words = tagger.tagging(url)
print(words) # https://github.com/taishi-i/nagisa/URL で/助詞コード/名詞を/助詞公開/名詞中/接尾辞 (๑　̄ω　̄๑)/補助記号

words = tagger.filter(url, ['URL', '補助記号', '助詞'])
print(words) # コード/名詞公開/名詞中/接尾辞
```

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.2.11

Jan 28, 2024

0.2.11rc1 pre-release

Jan 28, 2024

0.2.10

Jan 27, 2024

0.2.9

Jul 30, 2023

0.2.8

Sep 9, 2022

0.2.7

Jul 6, 2020

0.2.6

Jun 11, 2020

0.2.5

Dec 31, 2019

0.2.4

Aug 5, 2019

0.2.3

May 19, 2019

0.2.2

May 3, 2019

0.2.1

Mar 3, 2019

0.2.0

Jan 9, 2019

0.1.2

Dec 25, 2018

0.1.1

Sep 21, 2018

0.1.0

Sep 2, 2018

0.0.9

Jun 27, 2018

0.0.8

May 22, 2018

0.0.7

May 17, 2018

0.0.6

Mar 19, 2018

0.0.5

Feb 25, 2018

0.0.4

Feb 25, 2018

0.0.3

Feb 25, 2018

0.0.2

Feb 22, 2018

This version

0.0.1

Feb 15, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nagisa-0.0.1.tar.gz (20.6 MB view hashes)

Uploaded Feb 15, 2018 Source

Hashes for nagisa-0.0.1.tar.gz

Hashes for nagisa-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`5e93424f3ab95cebdf61f5ebfcd54319a3948abd12b66c11aad813df575dbcc3`
MD5	`06ab1caf6fa4b547c475010e5484d4f7`
BLAKE2b-256	`b3f3ac074f8db6e0c3da01ab2d7dfa4ab1328fc9930966d4566131a60718bbb6`