simplechinese

Chinese text processing, representation, and visualization.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Project description

SimpleChinese

Chinese text processing, representation, and visualization.

Free software: MIT license
Documentation: https://simplechinese.readthedocs.io.

Features

Read the data from a csv file.

df = pd.read_csv("test.csv")

https://github.com/chenmingxiang110/SimpleChinese/raw/master/pics/raw.png

Clean the data.

sc.clean(df)

https://github.com/chenmingxiang110/SimpleChinese/raw/master/pics/clean.png

The clean function does the following:

fillna(): Fill the N/As in a pandas.DataFrame with an empty string.

toLower(): Transform alphabets to their lowercases.

remove_punctuations(): Remove all the punctuations in a string or a pandas.DataFrame.

remove_space(): Remove all the spaces in a string or a pandas.DataFrame.

Extract words from the data

sc.extract_words(sc.clean(df))

https://github.com/chenmingxiang110/SimpleChinese/raw/master/pics/extract_words.png

Vectorization

sc.pca(sc.tfidf(sc.clean(df).iloc[:,0]))

https://github.com/chenmingxiang110/SimpleChinese/raw/master/pics/vectorization.png

Word cloud

sc.wordcloud(sc.clean(df).iloc[:,0], font_path="yahei.ttc")

https://github.com/chenmingxiang110/SimpleChinese/raw/master/pics/wordcloud.png

Credits

This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

0.1.0 (2020-07-10)

First release on PyPI.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

0.2.15

Jul 12, 2022

0.2.14

Nov 9, 2021

0.2.12

Nov 9, 2021

0.2.11

Aug 22, 2021

0.2.10

Jul 1, 2021

0.2.9

Jul 1, 2021

0.2.8

Jun 23, 2021

0.2.7

Jun 22, 2021

0.2.6

Jun 22, 2021

0.2.1

Jun 21, 2021

0.2.0

Jun 21, 2021

This version

0.1.0

Jul 10, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simplechinese-0.1.0.tar.gz (13.6 kB view hashes)

Uploaded Jul 10, 2020 Source

Hashes for simplechinese-0.1.0.tar.gz

Hashes for simplechinese-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6ed60d5cdc66e8d151167a13fd0e7aace19edebc48d7183bfddfb548b1cc3aca`
MD5	`7bd57294213a726234447896878238c2`
BLAKE2b-256	`f0ef7b0580d485a556a5c6549e22abfa5f053305dcc137d613b326fdfa13e41a`

simplechinese 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

SimpleChinese

Features

Credits

History

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution