skip to navigation
skip to content

Not Logged In

madoka 0.4

Memory-efficient CountMin Sketch key-value structure (based on Madoka C++ library)

Latest Version: 0.6

madoka

Madoka is an implementation of a Count-Min sketch data structure for summarizing data streams.

String-int pairs in a Madoka-Sketch may take less memory than in a standard Python dict.

Based on madoka C++ library.

NOTE: Madoka-Sketch does not have index of keys. so Madoka-Sketch can not dump all keys such as Python dict’s dict.keys().

Installation

$ pip install madoka

Usage

Create a new sketch

>>> import madoka
>>> sketch = madoka.Sketch()
  • Sketch madoka.Sketch([width=0, max_value=0, path=”, flags=0, seed=0])
    • madoka.Sketch() calls madoka.Sketch.create(), so you don’t have to explicitly call create()

Increment a key value

>>> sketch['mami'] += 1

or

>>> sketch.inc('mami')
  • int inc(key[, key_length=0])
    • Note that key_length is automatically determined when not giving key_length. Thus, the order of parameters differs from original madoka C++ library.

Add a value to the current key value

>>> sketch['mami'] += 6

or

>>> sketch.add('mami', 6)
  • int add(key, value[, key_length=0])
    • Note that key_length is automatically determined when not giving key_length. Thus, the order of parameters differs from original madoka C++ library.

Update a key value

>>> sketch['mami'] = 6

or

>>> sketch.set('mami', 6)
  • void set(key, value[, key_length=0])
    • Note that set() does nothing when the given value is not greater than the current key value.
    • Also note that the new value is saturated when the given value is greater than the upper limit.
    • Additionally note that key_length is automatically determined when not giving key_length. Thus, the order of parameters differs from original madoka C++ library.

Get a key value

>>> sketch['mami']

or

>>> sketch.get('mami')
  • int get(key[, key_length=0])
    • Note that key_length is automatically determined when not giving key_length. Thus, the order of parameters differs from original madoka C++ library.

Get all values

>>> sketch.values()
  • generator<int> values()
    • Note that processing time increases according to sketch’s width. If you feel slow, I recommend setting width to less than 1000000 when creating sketch.

Save a sketch to a file

>>> sketch.save('example.madoka')
  • void save(filename)

Load a sketch from a file

>>> sketch.load('example.madoka')
  • void load(filename)

Clear a sketch

>>> sketch.clear()
  • void clear()
    • Delete all key-value pairs. It differs from create() in maintaining current settings.

Initialize a sketch with settings change

>>> sketch.create()
  • void create([width=0, max_value=0, path=NULL, flags=0, seed=0])

Copy a sketch

>>> sketch.copy(othersketch)
  • void copy(Sketch)

Merge two sketches

>>> sketch += other_sketch

or

>>> sketch.merge(othersketch)
  • void merge(Sketch[, lhs_filter=None, rhs_filter=None])
    • lhs_filter is applied for self.sketch, rhs_filter is applied for given sketch

Shrink a sketch

>>> sketch.shrink(sketch, width=1000)
  • void shrink(Sketch[, width=0, max_value=0, filter=None, path=None, flags=0])
    • When width > 0, width must be less than source sketch

Get summed sketch

>>> summed_sketch = sketch + other_sketch
  • It does not break original sketches

Check whether sketch contains key value

>>> 'mami' in sketch

Get inner product of two sketches

>>> sketch.inner_product(other_sketch)
  • float inner_product(Sketch)

Apply filter into all values

>>> sketch.filter(lambda x: x + 1)
  • void filter(Callable[, only_nonzero=False])
    • Note that processing time increases according to sketch’s width. If you feel this method is slow, I recommend setting width to less than 1000000 when creating sketch

Contributions are welcome!

License

  • Wrapper code is licensed under New BSD License.
  • Bundled madoka C++ library is licensed under the Simplified BSD License.

CHANGES

0.4 (2014-03-30)

  • Implement dict-like interface (e.g., sketch[‘key’])
  • Add filter() method
  • Add values() method for dumping all values

0.3 (2014-03-14)

  • Key length is automatically determined when it is not given
  • Remove filter function
  • Slightly decreasing amount of memory usage

0.2 (2013-10-12)

Simplify the step of creating new sketch.

0.1 (2013-10-11)

Initial release.

 
File Type Py Version Uploaded on Size
madoka-0.4.tar.gz (md5) Source 2014-03-30 59KB
  • Downloads (All Versions):
  • 0 downloads in the last day
  • 77 downloads in the last week
  • 379 downloads in the last month