A collection of Korean Text Datasets ready to use using Tensorflow-Datasets.
Project description
tfds-korean
A collection of Korean Text Datasets ready to use using Tensorflow-Datasets.
TensorFlow-Datasets를 이용한 한국어/한글 데이터셋 모음입니다.
Usage
Installation
pip install tfds-korean
Loading dataset
import tensorflow_datasets as tfds
import tfds_korean.nsmc # register nsmc dataset
ds = tfds.load('nsmc')
train_ds = ds['train'].batch(32)
test_ds = ds['test'].batch(128)
# define model
# ....
# ....
model.fit(train_ds)
model.evaluate(test_ds)
Examples
Licenses
The license for this repository and licenses for datasets are applied separately. It is recommended to use each dataset after checking the dataset's website.
본 레포지토리의 라이선스와 데이터셋의 라이선스는 별도로 적용됩니다. 데이터셋을 사용하기 전 각 데이터셋의 라이선스와 웹 사이트를 확인 후 사용하시길 권해드리며, 본 라이브러리는 데이터셋을 호스팅하거나 배포하지 않는 점을 참고부탁드립니다.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tfds-korean-0.1.0.tar.gz
(100.8 kB
view hashes)
Built Distribution
tfds_korean-0.1.0-py3-none-any.whl
(119.0 kB
view hashes)
Close
Hashes for tfds_korean-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c31a3bced2ac3e46eff8a8a1273298851ecd371f2fe60e5d1d62bfffd2828e2b |
|
MD5 | f66a4a5ada39cf0516629ee12c118b82 |
|
BLAKE2b-256 | bfe8f8b7d7c72fbf72a7d5b4635f060df7b9bc71a02f16485932314a4ca76549 |