Skip to main content

Sets of integers like 1,3-7,33

Project description

PyPI Package latest release PyPI Package monthly downloads Supported versions Supported implementations

A set subclass that conveniently stores sets of integers. Sets can be created from and displayed as integer spans such as 1-3,14,29,92-97 rather than exhaustive member listings.

When iterating, pop()-ing an item, or converting to a list, intspan behaves as if it were an ordered–in fact, sorted–collection.

The main draw is having a convenient way to specify (possibly discontinuous) ranges–for example, rows to process in a spreadsheet. It can also help you quickly identify or report which items were not successfully processed in a large dataset.

Usage

from intspan import intspan

s = intspan('1-3,14,29,92-97')
s.discard('2,13,92')
print s
print repr(s)
print list(s)

yields:

1,3,14,29,93-97
intspan('1,3,14,29,93-97')
[1, 3, 14, 29, 93, 94, 95, 96, 97]

While:

for n in intspan('1-3,5'):
    print n                 # Python 2

yields:

1
2
3
5

Most set operations such as intersection, union, and so on are available just as they are in Python’s set. In addition, if you wish to extract the contiguous ranges:

for r in intspan('1-3,5,7-9,10,21-22,23,24').ranges():
    print r                 # Python 2

yields:

(1, 3)
(5, 5)
(7, 10)
(21, 24)

Note that these endpoints represent closed intervals, rather than the half-open intervals commonly used wiht Python’s range(). If you combine intspan ranges with Python generators, you’ll have to increment the stop value by one yourself to create the suitable “half-open interval.”

There is a corresponding range-oriented constructor:

>>> intspan.from_ranges([ (4,6), (10,12) ])
intspan('4-6,10-12')

A convenience from_range method creates a contiguous intspan from a given low to a high value.:

>>> intspan.from_range(8, 12)
intspan('8-12')

To find the elements not included, you can use the complement method:

>>> items = intspan('1-3,5,7-9,10,21-24')
>>> items.complement()
intspan('4,6,11-20')

The “missing” elements are computed as any integers between the intspan’s minimum and maximum values that aren’t included. If you’d like to customize the intended low and high bounds, you can give those explicitly.:

>>> items.complement(high=30)
intspan('4,6,11-20,25-30')

You can use the difference method or - operator to find the complement with respect to an arbitrary set, rather than just an expected contiguous range.

Experimental

As of version 1.2, an experimental function spanlist is provided. It returns a list from the same kind of specification string intspan does, but ordered as given rather than fully sorted. A corresponding intspanlist subclasses list in the same way that intspan subclasses set.:

>>> intspanlist('4,1-5,5')  # note order preserved
intspanlist('4,1-3,5')

>>> list(intspanlist('4,1-5,5'))
[4, 1, 2, 3, 5]

>>> spanlist('4,1-5,5')
[4, 1, 2, 3, 5]

So spanlist the function creates an array, whereas intspanlist creates a similar object–but one that has a more sophisticated representation and more specific update methods.

The intended use for this strictly-ordered version of intspan is to help users and developers specify an ordering of elements. For example, a program might have 20 items, 1-20. If you wanted to process item 7, then item 3, then “all the rest,” intspanlist('7,3,1-20') would be a convenient way to specify this. You could loop over that object in the desired order.

Note that intspanlist objects do not necessarily display as they are entered:

>>> intspanlist('7,3,1-20')
intspanlist('7,3,1-2,4-6,8-20')

This is an equivalent, though lower-level and more verbose, representation that more explicitly maps to the gaps in their ranges.

Note Whereas intspan attempts to faithfully implement the attributes and all the methods of a Python set , intspanlist is a thin shim over list. It works fine as an immutable type, but modifications are more problematic. append and extend operations work to maintain a “set-ish,” no-repeats nature (by discarding any additions that are already in the container). insert and other list update methods, however, provide no such promises. Indeed, it’s not entirely clear what update behavior should be, given the use case. If a duplicate is appended or inserted somewhere, should an exception be raised? Silently refuse to add items already seen (the current default)? Or something else? You read the part about “experimental,” right?

Also important: Unlike this module’s intspan core, intspanlist tests are not yet complete. Swim at your own risk.

Performance and Alternatives

intspan piggybacks Python’s set and list types. So it stores every integer individually. Unlike Perl’s Set::IntSpan it is not optimized for long contiguous runs. For sets of several hundred or even many thousands of members, you will probably never notice the difference.

But if you’re doing extensive processing of large sets (e.g. with 100K, 1M, or more elements), or doing lots of set operations on them (e.g. union, intersection), a data structure based on lists of ranges, run length encoding, or Judy arrays might perform and scale better. Horses for courses.

There are several modules you might want to consider as alternatives or supplements. AFAIK, none of them provide the convenient integer span specification that intspan does, but they have other virtues:

  • cowboy provides generalized ranges and multi-ranges. Bonus points for the package tagline: “It works on ranges.”

  • ranger is a generalized range and range set module. It supports open and closed ranges, and includes mapping objects that attach one or more objects to range sets.

  • rangeset is a generalized range set module. It also supports infinite ranges.

  • judy a Python wrapper around Judy arrays that are implemented in C. No docs or tests to speak of.

Notes

  • Version 1.2 adds an experimental spanlist constructor and intspanlist type.

  • Version 1.1.0 adds from_range and complement methods; improves error handling of pop on an empty set), and tweaks testing.

  • Patch versions through 1.0.3 are minor bumps, with small testing and documentation improvements.

  • Version 1.0 immediately follows 0.73. Bumped to institute a cleaner “semantic versioning” scheme. Upgraded from “beta” to “production” status.

  • Version 0.73 updates testing to include the latest Python 3.4

  • Version 0.7 fixed parsing of spans including negative numbers, and added the ranges() method. As of 0.71, the from_ranges() constructor appeared.

  • Though inspired by Perl’s Set::IntSpan, that’s where the similarity stops. intspan supports only finite sets, and it follows the methods and conventions of Python’s set.

  • intspan methods and operations such as add() discard(), and >= take integer span strings, lists, and sets as arguments, changing facilities that used to take only one item into ones that take multiples, including arguments that are technically string specifications rather than proper intspan objects.

  • String representation and ranges() method based on Jeff Mercado’s concise answer to this StackOverflow question. Thank you, Jeff!

  • Automated multi-version testing managed with the wonderful pytest, pytest-cov, and tox. Successfully packaged for, and tested against, all late-model versions of Python: 2.6, 2.7, 3.2, 3.3, and 3.4, as well as PyPy 2.6.0 (based on 2.7.9) and PyPy3 2.4.0 (based on 3.2.5). Should run fine on Python 3.5, though py.test broken on its pre-release iterations. Test line coverage ~100% (for intspan objects, not experimental intspanlist features).

  • The author, Jonathan Eunice or @jeunice on Twitter welcomes your comments and suggestions.

Installation

To install the latest version:

pip install -U intspan

To easy_install under a specific Python version (3.3 in this example):

python3.3 -m easy_install --upgrade intspan

(You may need to prefix these with sudo comamnd to authorize installation. In environments without super-user privileges, you may want to use pip’s --user option, to install only for a single user, rather than system-wide.)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

intspan-1.2.4.zip (22.1 kB view hashes)

Uploaded Source

intspan-1.2.4.tar.gz (11.3 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page