textdata

Get clean line or text data from multi-line strings

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

| |travisci| |version| |downloads| |supported-versions| |supported-implementations|

.. |travisci| image:: https://travis-ci.org/jonathaneunice/textdata.png?branch=master
:alt: Travis CI build status
:target: https://travis-ci.org/jonathaneunice/textdata

.. |version| image:: http://img.shields.io/pypi/v/textdata.png?style=flat
:alt: PyPI Package latest release
:target: https://pypi.python.org/pypi/textdata

.. |downloads| image:: http://img.shields.io/pypi/dm/textdata.png?style=flat
:alt: PyPI Package monthly downloads
:target: https://pypi.python.org/pypi/textdata

.. |supported-versions| image:: https://img.shields.io/pypi/pyversions/textdata.svg
:alt: Supported versions
:target: https://pypi.python.org/pypi/textdata

.. |supported-implementations| image:: https://img.shields.io/pypi/implementation/textdata.svg
:alt: Supported implementations
:target: https://pypi.python.org/pypi/textdata

It's very common to need to extract text or text lines from within
program source. The way Python likes to have its text indented,
however, means that there will often be extra spaces appended to
the beginning of each line, as well as possibly extra lines at the
start and end of the text. They're there to make things look and work
right in the program
source, but they're not useful in the resulting data.

Python string methods give easy ways to clean this text up, but
it's no joy reinventing that particular text-cleanup wheel every
time you need it--especially since many of the details are nitsy,
dropping the code down into low-level constructs rather than
just "give me the text!"

This module helps clean up included text (or text lines) in a simple,
reusable way that won't muck up your programs with extra code, and won't
require constant wheel-reinvention.

Usage
=====

::

data = lines("""
There was an old woman who lived in a shoe.
She had so many children, she didn't know what to do;
She gave them some broth without any bread;
Then whipped them all soundly and put them to bed.
""")

will result in::

['There was an old woman who lived in a shoe.',
"She had so many children, she didn't know what to do;",
'She gave them some broth without any bread;',
'Then whipped them all soundly and put them to bed.']

If instead you used ``textlines()``, the result is the same, but
joined by newlines into into a single string::

"There was an old woman who lived in a shoe.\nShe ... to bed."
# where the ... abbreviates exactly the characters you'd expect

``textlines`` is an optional entry point, as ``lines`` has a ``join``
kwarg that, if set, joins the lines with that string.

Both routines provide typically-desired cleanups:

* remove blank lines (default), but at least first and last blanks
(which usually appear due to Python formatting)
* remove common line prefix (default)
* strip leading/trailing spaces other than the common prefix
(leading by request, trailing by default)
* (optionally) join the lines together with your choice of separator string

The API
=======

``lines(text, noblanks=True, dedent=True, lstrip=False, rstrip=True, join=False)``

Returns text as a series of cleaned-up lines.

* ``text`` is the text to be processed.
* ``noblanks`` => all blank lines are eliminated, not just starting and ending ones. (default ``True``).
* ``dedent`` => strip a common prefix (usually whitespace) from each line (default ``True``).
* ``lstrip`` => strip all left (leading) space from each line (default ``False``).
Note that ``lstrip`` and ``dedent`` are mutually exclusive ways of handling leading space.
* ``rstrip`` => strip all right (trailing) space from each line (default ``True``)
* ``join`` => either ``False`` (do nothing), ``True`` (concatenate lines), or a string that will be used to join the resulting lines (default ``False``)

``textlines(text, noblanks=True, dedent=True, lstrip=False, rstrip=True, join=False)``

Does the same helpful cleanups as ``lines()``, but returns
result as a single string, with lines separated by newlines (by
default) and without a trailing newline.

Unicode and Encodings
=====================

.. |star| unicode:: 0x2605 .. star
:star:

``textdata`` doesn't have any unique friction with Unicode
characters and encodings, but any time you use Unicode characters
in Python source files--especially in Python 2--care is warranted.

If your text includes Unicode characters, in Python 2 make sure to
mark the string with a "u" prefix: ``u""" |star| """``. You can
also do this in Python 3.3 and following. Sadly, there was a dropout
of compatibility in early Python 3 builds, making it much harder to
maintain a unified source base with them in the mix. (A
compatibility function such as `six.u`` from
`six <http://pypi.python.org/pypi/six>`_;
can help alleviate much--though certainly not all--of the pain.)

It can also be helpful to declare your source encoding: put
a specially-formatted comment as the first or second line of the source code:

# -*- coding: <encoding name> -*-

This will usually be ``# -*- coding: utf-8 -*-``, but other encodings are
possible. Python 3 defaults to a UTF-8 encoding, but Python 2 assumes
ASCII.

Notes
=====

* Automated multi-version testing managed with the wonderful
`pytest <http://pypi.python.org/pypi/pytest>`_,
`pytest-cov <http://pypi.python.org/pypi/pytest>`_,
and `tox <http://pypi.python.org/pypi/tox>`_.
Successfully packaged for, and tested against, all late-model versions of
Python: 2.6, 2.7, 3.3, 3.4, as well as PyPy 2.5.1 (based on 2.7.9)
and PyPy3 2.4.0 (based on 3.2.5). Module should work on Python 3.2, but
dropped from testing matrix due to its age and lack of a Unicode literal
making test specification much more difficult.)

* Common line prefix is now computed without considering blank
lines, so blank lines need not have any indentation on them
just to "make things work."

* The tricky case where all lines have a common prefix, but it's
not entirely composed of whitespace, now properly handled.
This is useful for lines that are already "quoted" such as
with leading `"|"` or `">"` symbols (common in Markdown
and old-school email usage styles)/

* ``textlines()`` is now somewhat superfluous, now that ``lines()``
has a ``join`` kwarg. But you may prefer it for the implicit
indication that it's turning lines into text.

* It's tempting to define a constant such as ``Dedent`` that might
be the default for the ``lstrip`` parameter, instead of having
separate ``dedent`` and ``lstrip`` Booleans. The more I use
singleton classes in Python as designated special values, the
more useful they seem.

* The author, `Jonathan Eunice <mailto:jonathan.eunice@gmail.com>`_
or `@jeunice on Twitter <http://twitter.com/jeunice>`_ welcomes
your comments and suggestions.

Installation
============

::

pip install -U textdata

To ``easy_install`` under a specific Python version (3.3 in this example)::

python3.3 -m easy_install --upgrade textdata

(You may need to prefix these with "sudo " to authorize installation.)

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

2.4.1

Jan 23, 2019

2.4.0

Dec 21, 2018

2.3.3

Sep 20, 2018

2.3.1

Sep 15, 2018

2.3.0

Sep 15, 2018

2.2.0

Jul 7, 2018

2.1.0

Jul 4, 2018

2.0.1

Jun 4, 2018

1.7.3

Oct 13, 2017

1.7.2

May 30, 2017

1.7.1

Jan 31, 2017

1.7.0

Jan 31, 2017

1.6.2

Jan 23, 2017

1.6.1

Sep 15, 2015

1.6.0

Sep 2, 2015

1.5.1

Sep 2, 2015

1.5.0

Sep 2, 2015

1.4.5

Aug 26, 2015

1.4.4

Aug 26, 2015

1.4.3

Aug 17, 2015

1.4.2

Aug 17, 2015

1.4.1

Aug 16, 2015

1.4.0

Aug 16, 2015

1.3.0

Aug 15, 2015

1.2.3

Aug 6, 2015

1.2.2

Aug 5, 2015

1.2.1

Aug 5, 2015

1.2.0

Aug 5, 2015

1.1.5

Aug 4, 2015

1.1.3

Jul 30, 2015

1.1.2

Jul 28, 2015

1.1.1

Jul 28, 2015

1.1.0

Jul 28, 2015

1.0.8

Jul 23, 2015

1.0.7

Jul 21, 2015

1.0.6

Jul 21, 2015

This version

1.0.5

Jul 21, 2015

1.0.4

Jul 21, 2015

1.0.3

Nov 28, 2014

1.0.2

Aug 16, 2014

1.0.1

Feb 26, 2014

1.0

Feb 26, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

textdata-1.0.5.zip (14.1 kB view hashes)

Uploaded Jul 21, 2015 Source

textdata-1.0.5.tar.gz (6.6 kB view hashes)

Uploaded Jul 21, 2015 Source

Hashes for textdata-1.0.5.zip

Hashes for textdata-1.0.5.zip
Algorithm	Hash digest
SHA256	`ca56723847c920d13bd7320571e8d02548fdb4278c96fd05d11361cfcf9ab381`
MD5	`d6cdc8ff22b30b3ad01d230147c65de0`
BLAKE2b-256	`2ffdf76dcb7ad14809d705cfc329e37a8ef5903ad5eeb889d83921d27af5b131`

Hashes for textdata-1.0.5.tar.gz

Hashes for textdata-1.0.5.tar.gz
Algorithm	Hash digest
SHA256	`9c7a33ee617cb65a1a8a793d7be6a9c28bb65390c680f6cf8be793234393b155`
MD5	`e5938481980081376fa08a068babec17`
BLAKE2b-256	`92cc8a1b40b03ee21015656bf4957b1db1d9c3f350cf89698813c92ac59c2bb8`