Skip to main content

Lossy compression with controlled error tolerance for smooth data series

Project description

Overview

PyPI Package latest release PyPI Wheel Supported versions Supported implementations Commits since latest release

Lossy compression with controlled error tolerance for smooth data series

Table of Contents

Quick Start Guide

limesqueezer uses numpy ndarrays types for input and output. Package import name is limesqueezer. Author recommends abbreviation ls Rest of documentation uses this abbreviation.

    import numpy as np
    import limesqueezer as  ls

For example let's make mock data. An array of values (xdata) and an array of dependent values (ydata). You can imagine that these have come e.g. from some simulation or measurements.

x_data = np.linspace(0, 1, int(1e4))
y_data = np.sin(24 * x_data ** 2)

These can be simply compressed with absolute tolerance of e.g. 0.05

tolerance = 0.05
x_compressed, y_compressed = ls.compress(x_data, y_data, tolerances = tolerance)

Or maybe you have some generator-like thing that gives out numbers, e.g. some simulation step. For sake of example there the mock generator

x0, y0 = x_data[0], y_data[0]
generator = zip(x_data[1:], y_data[1:])

Here you use the context manager Stream. It needs to be initialise with first values of the series, here I am just going to use the first from the mock x and y data.

Compressed values can then be accessed from the record

with ls.Stream(x0, y0, tolerances = tolerance) as record:
    for x_value, y_value in generator:
        record(x_value, y_value)

x_compressed, y_compressed = record.x, record.y

These can then be decompressed into an spline interpolation function.

function = ls.decompress(x_compressed, y_compressed)
y_decompressed = function(y_data)

With this function e.g. resuduals can be computed and checked against the tolerance

residuals = y_decompressed - y_data
maximum_error = np.amax(np.abs(residuals))
print(f'Maximum error should be ~= {tolerance}: {maximum_error:.5f}')

and when plotting the output with matplotlib you should see it

from matplotlib import pyplot as plt

fig, axs = plt.subplots(2,1, sharex=True)
# Data and compressed
axs[0].plot(x_data, y_data, label='Original')
axs[0].plot(x_compressed, y_compressed, '-o', label ='Compressed')
axs[0].legend()

# Residuals to tolerance
residuals = y_decompressed - y_data
axs[1].plot(x_data, y_decompressed - y_data, label = 'Residuals')
axs[1].axhline(tolerance, label = 'Total tolerance', color = 'red')
axs[1].axhline(-tolerance, color = 'red')
axs[1].legend()

fig.tight_layout()
plt.show()

Quick start output

User Guide

VERY MUCH WORK IN PROGRESS

Compression

Parameters

Tolerances

Keyword tolerances

Tolerances Absolute Tolerance, Relative Tolerance and Falloff to smooth between them.

Absolute tolerance -only is the default

Absolute tolerance only

tolerances, Falloff determines how much the absolute error is reduced as y value grows. If 3 values: (relative, absolute, falloff) If 1 values: (relative, absolute, 0) If 1 value: (0, absolute, 0)

Allowed deviation is calculated with following function

$$ deviation = Relative \cdot |Y_{data}| + \frac{Absolute}{Falloff \cdot |Y_{data}| + 1} $$

$$ D_{|Y|}^1 deviation = Relative - \frac{Absolute \cdot Falloff}{(Falloff \cdot |Y_{data}| + 1)^2} $$

To have constrain that

$$ D_{|Y|}^1 deviation(Y = 0) \geq 0 $$ Means $$ Relative \geq Absolute \cdot Falloff $$

Relative tolerance only

Relative and absolute tolerance without falloff

Smooth falloff is at $$ Falloff = \frac{Relative}{Absolute} $$

Relative andabsolute tolerance with smooth falloff

If you go over the smooth falloff limit, you make tolerance fucntion non-monotonic, so it first decreases as the absolute y value increases and then starts to increase.

Relative tolerance with too much falloff

Recommended

errorfunction

You have some data from system of equations For this example, let's make 100 000 datapoints along some function

    input_x = np.linspace(0, 1, int(1e4))
    input_y = np.sin(24 * input_x ** 2)

Example of the data, compression output, and residuals Example of the data, compression output, and residuals

Or maybe you have some generator-like thing that gives out numbers. E.g. some simulation step Here you use the context manager "Stream" Initialise with first values, here I am just going to use the first

Errorfunctions

The default, MaxAbs. Maximum of the absolute residual

$$ \max(|residual| - tolerance) $$

MaxAbs

$$ \max(mean(|residuals|)) $$

MaxMAbs

$$ \max(\max(mean(|residuals| - tolerance)), (|residuals| - tolerance)[-1]) $$

MaxMAbs_AbsEnd

Here the Residuals is actually $residuals^2$

$$ \max(mean(residuals^2 - tolerance)) $$

MaxMS

$$ \max(max(mean(residuals^2 - tolerance)), (residuals^2 - tolerance)[-1]) $$

MaxMS_SEnd

Stream

Context manager and a class.

  • Data is fed one point at the time.
  • Context manager is used to ensure proper finishing of the compression process.
    with ls.Stream(example_x0, example_y0, tol = 1e-3) as record:
        for example_x_value, example_y_value in generator:
            record(example_x_value, example_y_value)

Using record.x or record.y in the with statement block results in attribute error, as those attributes are generated only when the record is closed.

If you want to access the data fed to the record, you can use

    x_compressed, y_compressed = record.xc, record.yc

to access the already compressed data and

    x_buffered, y_buffered = record.xb, record.yb

to access the buffered data waiting more values or closing of the record to be compressed.

    output_x, output_y = record.x, record.y
    print(record.state)
    print(record)

A side mote: In English language the word 'record' can be either verb or noun and since it performs this double role of both taking in data and being storage of the data, it is a fitting name for the object

Decompression

Decompression is done in two main steps with interpolation. First an interpolation function is created Then that is called.

This two-step approach allows more flexible use of the data.


Combining compression methods

This compression method can be combined with lossless compressiom to achieve even higher compression ratios. The lossless compression should be done only after the lossy compression this package provides.

Meta

Version Numbering

Version code is composed of three numbers: Major, Minor, Micro

Experimental, alpha or beta versions are indicated by a 0 as one of those three.

First public release starts with Major Version. Incrementation of Major Version indicates backwards compatibility breaking change in API or fuctionality.

Minor Version indicates design

While the Minor Version is 0, the package is in alpha stage. That means features and API

Later incrementation of the Minor Version signifies upgrades to the features and interfaces. In general changes here mean changes in the design and specification, but not such that it breaks backwards compatibility I.e. code that works with documented features of X.n.x will work with X.n+1.y

Micro Version indicate implementation. These are bugfixes, typo corrections, documentation clarifications. In Micro Version incementation the intention, i.e. intended documented specification of the Minor Version is not changed, only the implementation.

Changelog

1.0.12 2022-07-16

  • Changed README to Markdown-only
  • Updated documentation
  • Some refactoring
  • Fixed type hints to pass MyPy type checking

1.0.11 2022-07-16

  • Debug plotting improvements
  • Added undocumented API for other fitting functions
  • More tests
  • Profiling and benchmarking from tests

1.0.10 2022-05-08

  • Cleaned documentation

1.0.9 2022-04-03

  • Block and stream compression are much more uniform
  • Restructuring
  • Tests
  • Profiling

1.0.8 2022-03-20

  • Step-by-step style ploting of the compression.

1.0.7 2021-12-07

1.0.6 2021-12-02

1.0.5 2021-12-02

1.0.4 2021-12-01

1.0.3 2021-11-30

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

limesqueezer-1.0.12.tar.gz (99.1 kB view hashes)

Uploaded Source

Built Distribution

limesqueezer-1.0.12-py2.py3-none-any.whl (30.5 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page