Skip to main content

A simple Python (2 or 3) script to generate a PNG word-cloud image from a bunch of text files. Based on word_cloud.

Project description

A simple Python script to generate a square wordcloud from one (or more) text file(s). Supporting both Python 2 and 3 (2.7+ and 3.4+). generatewordcloud in pypi

generate-word-cloud example meta

Based on the great word_cloud module by @amueller.

PyPI version PyPI license PyPI format PyPI pyversions PyPI implementation PyPI status


How to use it?

1. Requirements

The usual module matplotlib is needed for the plotting, docopt is needed for the command line interface, and word_cloud is needed for the actual work (generating the cloud of words after reading the files).

The required Python (2 or 3) modules can be installed with pip, either directly:

# Directly:
sudo pip install matplotlib docopt word_cloud

Or with the requirements.txt file:

sudo pip install -r requirements.txt

Note: if ansicolortags is available, it will be used to print nice colors in the help and during the generation of word clouds.

2. Installation

Clone the repository, copy the script (generate-word-cloud.py) somewhere in your PATH (e.g., ~/.local/bin/).

You can also just download the script itself:

$ wget https://raw.githubusercontent.com/Naereen/generate-word-cloud.py/master/generate-word-cloud.py
$ cp generate-word-cloud.py /path/to/a/directory/in/your/PATH/

Note: The script is also available from PyPI : pypi.python.org/pypi/generatewordcloud. You can install it using pip.

$ sudo pip install generatewordcloud

PyPI version PyPI license PyPI format PyPI pyversions PyPI implementation PyPI status

3. Usage

Help:

$ generate-word-cloud.py --help

From one or two files

Generate a wordcloud from two txt files in the current directory, save it to wordcloud_txt.png.

$ generate-word-cloud.py -o ./wordcloud_txt.png ./file1.txt ./file2.txt

Generate a wordcloud from the textfile hamlet.txt (~ 8000 lines), saving to hamlet.png:

$ generate-word-cloud.py -o ./hamlet.png ./hamlet.txt

generate-word-cloud example hamlet

(It should work on pretty big text files without any issue.)


Other examples

From a lot of Python scripts (~ 200)

generate-word-cloud example python

From a lot of Bash scripts (~ 150)

generate-word-cloud example bash

From a lot of LaTeX files (~ 180)

generate-word-cloud example LaTeX

Meta example

Generate a wordcloud from the README.md and generate-word-cloud.py files of this very project, save it to wordcloud_meta.png!

$ generate-word-cloud.py -o ./wordcloud_meta.png ./*.md ./*.py

generate-word-cloud example meta


Features

  • [x] Support one or more input file(s), will cleanly skip any file it fails to find or fails to read,

  • [x] Custom output file, won’t be overwritten (except with -f flag),

  • [x] Nice command line interface (argparse powered). I switched to docopt after realizing how awesome it is!

  • [x] Has a command line option for every important parameter (max nb of words, width, height etc).

  • [x] Input filenames with spaces in their name were seen as several files (e.g. this file.txt), FIXED with the switch to docopt.


Complete documentation (--help)

$ generate-word-cloud.py -h | --help
Usage:
  generate-word-cloud.py [-s | --show] [-f | --force] [-o OUTFILE | --outfile=OUTFILE]
                         [-t TITLE | --title=TITLE] [-m MAX | --max=MAX]
                         [-w WIDTH | --width=WIDTH] [-H HEIGHT | --height=HEIGHT]
                         INFILE...
  generate-word-cloud.py (-h | --help)
  generate-word-cloud.py (-v | --version)

Options:
  -h --help            Show this help message and exit.
  -v --version         Show program's version number and exit.
  -s --show            Show the image but do not save it [default False].
  -f --force           Force to write the image, even if present (default is to ask before overwriting an existing file) [default False].
  -o OUTFILE --outfile=OUTFILE
                       Filename for the generated image [default 'wordcloud.png'].
  -t TITLE --title=TITLE
                       Title for the image [default None].
  -m MAX --max MAX
                       Max number of words to display on the cloud word [default 150].
  -w WIDTH --width WIDTH
                       Width of the generate image [default 400].
  -H HEIGHT --height HEIGHT
                       Height of the generate image [default 300].
  INFILE               A text file to read.

TODO

  • [x] Start it, from this example,

  • [x] Run it on some interesting examples, embed them here (as images),

  • [X] Check on weird encodings? (i.e., not UTF-8). It works fine!

  • [X] Test it against :closed_book: VERY large files (million of line) ? It works fine, slowly but fine.

  • [X] Test it against LOTS of files (several thousands) ? It works fine, slowly but fine.

  • [X] Publish it on PyPI: it is available at pypi.python.org/pypi/generatewordcloud/.

  • [ ] Write a small article about it for my blog.

Knows issues

  • [ ] Only tested on (X)Ubuntu (15.10), but it should work on other GNU/Linux distribution and Mac OS X (and probably Windows), if they support docopt and has both docopt and word_cloud installed.

Unknown issues?

Use the issue tracker to notify me of a bug!


About

Why write this script?

There already is a lot of good cloud word generator online, e.g. wordle.net.

  1. I wanted a way to visualize the major keywords of Bash and Python (my two favorite programming languages) and of Markdown/Strapdown, reStructuredText and LaTeX (my favorite typeset documents system),

  2. The original project word_cloud seemed cool. And it is. Great job @amueller !

  3. Clouds of words are interesting! And Python is awesome!

Author

Lilian Besson (Naereen).

License ? GitHub license

This plug-in is published under the terms of the GPLv3 License (file LICENSE.txt), © Lilian Besson, 2016.

Maintenance Ask Me Anything ! Analytics made-with-python

ForTheBadge uses-badges ForTheBadge uses-git

ForTheBadge built-with-love

Project details


Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page