Skip to main content

Streamline your Kafka data processing, this tool aims to standardize streaming data from multiple Kafka clusters. With a pub-sub approach, multiple functions can easily subscribe to incoming messages, serialization can be specified per topic, and data is automatically processed by data sink functions.

Project description

Snapstream

A tiny data-flow model with a user-friendly interface that provides sensible defaults for Kafka integration, message serialization/deserialization, and data caching.

In response to a challenge of performing a "merge-as-of", "nearby join", or "merge by key distance" operation on multiple kafka streams while reading topics from separate kafka clusters, this package was born.

No actual stream or external database is required, cached data is persisted to disk using rocksdb, applications using snapstream are more inclined to be; self-contained, easy to extend, less complex, easy to test using regular iterables:

Installation

pip install snapstream

Usage

In the example below, snap decorates the handle function, binding the iterable range(5) to it:

from snapstream import snap, stream

r = range(5)

@snap(r, sink=[print])
    def handler(msg):
        return f'Hello {msg}'

stream()
Hello 0
Hello 1
Hello 2
Hello 3
Hello 4

Features

  • snapstream.Topic: an iterable/callable to consume and produce (default: confluent-kafka)
  • snapstream.Cache: a callable/dict to persist data (default: rocksdict)
  • snapstream.Conf: a singleton object, can be used to store common kafka configurations
  • snapstream.snap: a function to bind streams (iterables) and sinks (callables) to user defined handler functions
  • snapstream.stream: a function to start the streams

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snapstream-0.0.0.dev3.tar.gz (6.9 kB view hashes)

Uploaded Source

Built Distribution

snapstream-0.0.0.dev3-py3-none-any.whl (8.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page