pebaystats

descriptive statistics using Pebay results

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Information about repository and package maintenance actions can be found on the Wiki.

Install the package from PyPI using pip:

bash> pip install pebaystats

pebaystats

Provides a single pass generation of statistical moments. This package is based on the formulas described in the document Formulas for Robust, One-Pass Parallel Computation of Covariances and Arbitrary-Order Statistical Moments, Phillipe Pébay, Sandia National Laboratories

Read “The Full Manual” for a more detailed description of this package.

The current implementation of this package allows computation of statistical moments for more than one data set (column) at a time. Currently only the first four moments are computed and the general purpose algorithm from the source paper is not yet implemented.

This Python implementation evolved from my C++ code which includes the ability to remove/disaggregate data from the accumulators as well. That feature will eventually be migrated here.

Quick Start

from __future__ import print_function

Import the aggregation object from the module.

from pebaystats import dstats

Create a few objects with various depths (number of moments) and widths (number of columns to compute statistics for). Here the stats1 and stats3 objects each accumulate two moments for a single column of data, and the stats2 object collects 4 statistical moments for 4 columns of data.

stats1 = dstats(2,1)
stats2 = dstats(4,4)
stats3 = dstats(2,1)

Add individual data values to the single column accumulation of the stats1 object. Print the object to view its state, which includes the moment values so far accumulated. Also, print the list of lists returned from the statistics() method call. Here you can see that the mean is 2.0 and the variance is 0.0.

stats1.add(2)
stats1.add(2)
stats1.add(2)
print('stats1: %s' % stats1)
print('statistics: %s' % stats1.statistics())

stats1: dstats: 2 moments, 1 columns, 3 rows
[[ 2.]
 [ 0.]]
statistics: [[ 2.]
 [ 0.]]

Add entire rows (multiple columns) of values to the stats2 object. View the accumulated results. Note that when the second moment (n * Var) is 0, equivalent to a deviation of 0, the higher moments are left in there initial 0 state. The higher statistics are set to a NaN value in this case.

stats2.add([1.2,2,3,9])
stats2.add([4.5,6,7,9])
stats2.add([8.9,0,1,9])
stats2.add([2.3,4,5,9])
print('stats2: %s' % stats2)
print('statistics: %s' % stats2.statistics(True))

stats2: dstats: 4 moments, 4 columns, 4 rows
[[  4.22500000e+00   3.00000000e+00   4.00000000e+00   9.00000000e+00]
 [  3.47875000e+01   2.00000000e+01   2.00000000e+01   0.00000000e+00]
 [  6.73818750e+01   7.10542736e-15   7.10542736e-15   0.00000000e+00]
 [  5.75139658e+02   1.64000000e+02   1.64000000e+02   0.00000000e+00]]
statistics: [[  4.22500000e+00   3.00000000e+00   4.00000000e+00   9.00000000e+00]
 [  2.94904646e+00   2.23606798e+00   2.23606798e+00   0.00000000e+00]
 [  6.56807734e-01   1.58882186e-16   1.58882186e-16              nan]
 [ -1.09897921e+00  -1.36000000e+00  -1.36000000e+00              nan]]

Remove data (UNIMPLEMENTED) from the stats2 object.

# stats2.remove(1.2,2,3,9)

Load the stats3 object with with data and view the results.

stats3.add(4)
stats3.add(4)
stats3.add(4)
print('stats3: %s' % stats3)
print('statistics: %s' % stats3.statistics())

stats3: dstats: 2 moments, 1 columns, 3 rows
[[ 4.]
 [ 0.]]
statistics: [[ 4.]
 [ 0.]]

Now aggregate that object onto the first. This only works when the shapes are the same.

stats1.aggregate(stats3)
print('stast1: %s' % stats1)
print('statistics: %s' % stats1.statistics(True))

stast1: dstats: 2 moments, 1 columns, 6 rows
[[ 3.]
 [ 6.]]
statistics: [[ 3.]
 [ 1.]]

History

0.1 (2016-11-13)

First release on PyPI

0.2 (2016-11-13)

Corrected some setup configuration issues

0.3 (2016-11-14)

Added support and tests for serialization

0.4 (2017-1-4)

Added repl() and str() support
Added exceptions for unsupported methods and unsupported moments
Handle divide by zero on a per column basis
Improved setup processing
Extended testing
- started migrating to factored test dependencies
- test columns with 0 variance
- added SciPy for evaluating expected skew and kurtosis values
- raise exceptions for unsupported moments
Extensive documentation updates
- added Makefile to generate documentation and create README
- removed optional files
- changed to classic theme
- extended content and examples

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.4

Jan 5, 2017

0.3

Nov 15, 2016

0.2

Nov 14, 2016

0.1

Nov 14, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pebaystats-0.4.tar.gz (18.8 kB view hashes)

Uploaded Jan 5, 2017 Source

Hashes for pebaystats-0.4.tar.gz

Hashes for pebaystats-0.4.tar.gz
Algorithm	Hash digest
SHA256	`965da3c7514f396d64a970499a1305b7380f60f0a0c78a7ca35e3b833ebd921b`
MD5	`42f8014ef7dc7733886f809c88cb74d4`
BLAKE2b-256	`908d7ce656032ebba382028333e1b3b1a8fb2ab7be452fde348c5a425209e271`

pebaystats 0.4

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

pebaystats

Quick Start

History

0.1 (2016-11-13)

0.2 (2016-11-13)

0.3 (2016-11-14)

0.4 (2017-1-4)

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution