Bottleneck

Fast, NumPy array functions written in Cython

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Cython
- Python
Topic
- Scientific/Engineering

Project description

Bottleneck is a collection of fast, NumPy array functions written in Cython.

The three categories of Bottleneck functions:

Faster replacement for NumPy and SciPy functions
Moving window functions
Group functions that bin calculations by like-labeled elements

Function signatures (using nanmean as an example):

Functions	nanmean(arr, axis=None)
Moving window	move_mean(arr, window, axis=0)
Group by	group_nanmean(arr, label, order=None, axis=0)

Let’s give it a try. Create a NumPy array:

>>> import numpy as np
>>> arr = np.array([1, 2, np.nan, 4, 5])

Find the nanmean:

>>> import bottleneck as bn
>>> bn.nanmean(arr)
3.0

Moving window nanmean:

>>> bn.move_nanmean(arr, window=2)
array([ nan,  1.5,  2. ,  4. ,  4.5])

Group nanmean:

>>> label = ['a', 'a', 'b', 'b', 'a']
>>> bn.group_nanmean(arr, label)
(array([ 2.66666667,  4.        ]), ['a', 'b'])

Fast

Bottleneck is fast:

>>> arr = np.random.rand(100, 100)
>>> timeit np.nanmax(arr)
10000 loops, best of 3: 99.6 us per loop
>>> timeit bn.nanmax(arr)
100000 loops, best of 3: 15.3 us per loop

Let’s not forget to add some NaNs:

>>> arr[arr > 0.5] = np.nan
>>> timeit np.nanmax(arr)
10000 loops, best of 3: 146 us per loop
>>> timeit bn.nanmax(arr)
100000 loops, best of 3: 15.2 us per loop

Bottleneck comes with a benchmark suite that compares the performance of the bottleneck functions that have a NumPy/SciPy equivalent. To run the benchmark:

>>> bn.benchit(verbose=False)
Bottleneck performance benchmark
    Bottleneck  0.1.0dev
    Numpy       1.5.1
    Scipy       0.8.0
    Speed is numpy (or scipy) time divided by Bottleneck time
    NaN means all NaNs
   Speed   Test                  Shape        dtype    NaN?
   2.4019  median(a, axis=-1)    (500,500)    float64
   2.2668  median(a, axis=-1)    (500,500)    float64  NaN
   4.1235  median(a, axis=-1)    (10000,)     float64
   4.3498  median(a, axis=-1)    (10000,)     float64  NaN
   9.8184  nanmax(a, axis=-1)    (500,500)    float64
   7.9157  nanmax(a, axis=-1)    (500,500)    float64  NaN
   9.2306  nanmax(a, axis=-1)    (10000,)     float64
   8.1635  nanmax(a, axis=-1)    (10000,)     float64  NaN
   6.7218  nanmin(a, axis=-1)    (500,500)    float64
   7.9112  nanmin(a, axis=-1)    (500,500)    float64  NaN
   6.4950  nanmin(a, axis=-1)    (10000,)     float64
   8.0791  nanmin(a, axis=-1)    (10000,)     float64  NaN
  12.3650  nanmean(a, axis=-1)   (500,500)    float64
  42.0738  nanmean(a, axis=-1)   (500,500)    float64  NaN
  12.2769  nanmean(a, axis=-1)   (10000,)     float64
  22.1285  nanmean(a, axis=-1)   (10000,)     float64  NaN
   9.5515  nanstd(a, axis=-1)    (500,500)    float64
  68.9192  nanstd(a, axis=-1)    (500,500)    float64  NaN
   9.2174  nanstd(a, axis=-1)    (10000,)     float64
  26.1753  nanstd(a, axis=-1)    (10000,)     float64  NaN

Faster

Under the hood Bottleneck uses a separate Cython function for each combination of ndim, dtype, and axis. A lot of the overhead in bn.nanmax(), for example, is in checking that the axis is within range, converting non-array data to an array, and selecting the function to use to calculate the maximum.

You can get rid of the overhead by doing all this before you, say, enter an inner loop:

>>> arr = np.random.rand(10,10)
>>> func, a = bn.func.nanmax_selector(arr, axis=0)
>>> func
<built-in function nanmax_2d_float64_axis0>

Let’s see how much faster than runs:

>> timeit np.nanmax(arr, axis=0)
10000 loops, best of 3: 25.7 us per loop
>> timeit bn.nanmax(arr, axis=0)
100000 loops, best of 3: 5.25 us per loop
>> timeit func(a)
100000 loops, best of 3: 2.5 us per loop

Note that func is faster than Numpy’s non-NaN version of max:

>> timeit arr.max(axis=0)
100000 loops, best of 3: 3.28 us per loop

So adding NaN protection to your inner loops comes at a negative cost!

Functions

Bottleneck is in the prototype stage.

Bottleneck contains the following functions:

median
nanmean	move_nanmean	group_nanmean
nanvar
nanstd
nanmin
nanmax

Currently only 1d, 2d, and 3d NumPy arrays with dtype int32, int64, and float64 are supported.

License

Bottleneck is distributed under a Simplified BSD license. Parts of NumPy, Scipy and numpydoc, all of which have BSD licenses, are included in Bottleneck. See the LICENSE file, which is distributed with Bottleneck, for details.

URLs

download	http://pypi.python.org/pypi/Bottleneck
docs	http://berkeleyanalytics.com/bottleneck
code	http://github.com/kwgoodman/bottleneck
mailing list	http://groups.google.com/group/bottle-neck

Install

Requirements:

Bottleneck	Python, NumPy 1.5.1+, SciPy 0.8.0+
Unit tests	nose
Compile	gcc or MinGW

GNU/Linux, Mac OS X, et al.

To install Bottleneck:

$ python setup.py build
$ sudo python setup.py install

Or, if you wish to specify where Bottleneck is installed, for example inside /usr/local:

$ python setup.py build
$ sudo python setup.py install --prefix=/usr/local

Windows

In order to compile the C code in dsna you need a Windows version of the gcc compiler. MinGW (Minimalist GNU for Windows) contains gcc and has been used to successfully compile dsna on Windows.

Install MinGW and add it to your system path. Then install dsna with the commands:

python setup.py build --compiler=mingw32
python setup.py install

Post install

After you have installed Bottleneck, run the suite of unit tests:

>>> import bottleneck as bn
>>> bn.test()
<snip>
Ran 10 tests in 13.756s
OK
<nose.result.TextTestResult run=10 errors=0 failures=0>

This is an old version. Click here for latest version

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 2 - Pre-Alpha
Environment
- Console
Intended Audience
- Science/Research
License
- OSI Approved :: BSD License
Operating System
- OS Independent
Programming Language
- Cython
- Python
Topic
- Scientific/Engineering

Release history Release notifications | RSS feed

1.3.8

Feb 25, 2024

1.3.8rc5 pre-release

Feb 24, 2024

1.3.7

Mar 8, 2023

1.3.7rc1 pre-release

Jan 20, 2023

1.3.6

Jan 20, 2023

1.3.6rc1 pre-release

Jan 19, 2023

1.3.5

Jul 2, 2022

1.3.5rc2 pre-release

Jul 2, 2022

1.3.5rc1 pre-release

Jul 2, 2022

1.3.4

Feb 22, 2022

1.3.3

Feb 22, 2022

1.3.3rc14 pre-release

Feb 22, 2022

1.3.3rc13 pre-release

Feb 21, 2022

1.3.3rc12.post0.dev6 pre-release

Feb 21, 2022

1.3.3rc12.post0.dev5 pre-release

Feb 21, 2022

1.3.3rc12.post0.dev4 pre-release

Feb 21, 2022

1.3.3rc12.post0.dev1 pre-release

Feb 20, 2022

1.3.3rc2 pre-release

Feb 20, 2022

1.3.2

Feb 21, 2020

1.3.1

Nov 19, 2019

1.3.0

Nov 13, 2019

1.3.0rc2 pre-release

Nov 9, 2019

1.3.0rc1 pre-release

Nov 3, 2019

1.2.1

May 15, 2017

1.2.0

Oct 20, 2016

1.1.0

Jun 22, 2016

1.0.0

Feb 6, 2015

0.8.0

Jan 21, 2014

0.7.0

Sep 10, 2013

0.6.0

Jun 4, 2012

0.5.0

Jun 13, 2011

0.4.3

Mar 17, 2011

0.4.2

Mar 8, 2011

0.4.1

Mar 8, 2011

0.4.0

Mar 8, 2011

0.3.0

Jan 19, 2011

0.2.0

Dec 27, 2010

This version

0.1.0

Dec 1, 2010

0.1.0dev pre-release

Nov 28, 2010

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bottleneck-0.1.0.tar.gz (324.5 kB view hashes)

Uploaded Dec 1, 2010 Source

Hashes for bottleneck-0.1.0.tar.gz

Hashes for bottleneck-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`cf508f0a61daa2111c1eb5f22cd6760b71922c2c027172ccae53784547e5151a`
MD5	`b33647bcfdc4d39740923adf617ca9c0`
BLAKE2b-256	`384682f29b4db836f1571952feb14e1e78a33d2a400a71a3e0b79b7221ed944c`