Skip to main content

Kafka implementation for Open edX event bus.

Project description

edX Event Bus - Kafka

Kafka implementation for Open edX event bus.

PyPI CI Codecov Documentation Supported Python versions License

Overview

This package implements an event bus for Open EdX using Kafka.

The event bus acts as a broker between services publishing events and other services that consume these events. Implementing the event bus will allow for asynchronous messaging across services which greatly improves efficiency as we don’t have to wait for scheduled batch synchronizations. Additionally, since the services all speak to the event bus, they are independent of one another and can still function if one service crashes.

This package contains both the publishing code, which processes events into messages to send to the broker, and the consumer code, which polls the broker using a while True loop in order to turn messages back into event data to be emitted. The actual Kafka host will be configurable.

The goal for event-bus-kafka is to eventually have a flexible event bus that can be easily brought into other apps and repositories to produce and consume arbitrary topics. Ideally, the event bus itself will also be an abstraction behind which platform maintainers can use non-Kafka implementations (Redis, Pulsar, etc.). The documentation/ADRs may also be moved to more appropriate places as the process matures.

The repository works together with the openedx/openedx-events repository to make the fully functional event bus.

For manual testing, see docs/how_tos/manual_testing.rst.

Documentation

  • Main API: edx_event_bus_kafka exposes get_producer and a Producer API class. See https://github.com/openedx/openedx-events/issues/87 for how these will be documented and used in the future.

  • Django management commands: edx_event_bus_kafka.management.commands.* expose Command classes

OEP-52 documentation: https://open-edx-proposals.readthedocs.io/en/latest/architectural-decisions/oep-0052-arch-event-bus-architecture.html (TODO: Set up documentation)

Development Workflow

One Time Setup

# Clone the repository
git clone git@github.com:openedx/event-bus-kafka.git
cd event-bus-kafka

# Set up a virtualenv using virtualenvwrapper with the same name as the repo and activate it
mkvirtualenv -p python3.8 event-bus-kafka

Every time you develop something in this repo

# Activate the virtualenv
workon event-bus-kafka

# Grab the latest code
git checkout main
git pull

# Install/update the dev requirements
make requirements

# Run the tests and quality checks (to verify the status before you make any changes)
make validate

# Make a new branch for your changes
git checkout -b <your_github_username>/<short_description>

# Using your favorite editor, edit the code to make your change.
vim …

# Run your new tests
pytest ./path/to/new/tests

# Run all the tests and quality checks
make validate

# Commit all your changes
git commit …
git push

# Open a PR and ask for review.

Installation

This library is not intended to be a direct dependency of any service. Instead, it will likely be installed by some private mechanism. Unless the platform gains new processes for installing private dependencies, upgrades will need to be manually managed via a pin.

License

The code in this repository is licensed under the AGPL 3.0 unless otherwise noted.

Please see LICENSE.txt for details.

How To Contribute

Contributions are very welcome. Please read How To Contribute for details. Even though they were written with edx-platform in mind, the guidelines should be followed for all Open edX projects.

The pull request description template should be automatically applied if you are creating a pull request from GitHub. Otherwise you can find it at PULL_REQUEST_TEMPLATE.md.

The issue report template should be automatically applied if you are creating an issue on GitHub as well. Otherwise you can find it at ISSUE_TEMPLATE.md.

Reporting Security Issues

Please do not report security issues in public. Please email security@edx.org.

Getting Help

If you’re having trouble, we have discussion forums at https://discuss.openedx.org where you can connect with others in the community.

Our real-time conversations are on Slack. You can request a Slack invitation, then join our community Slack workspace.

For more information about these options, see the Getting Help page.

Change Log

Unreleased

[1.8.1] - 2022-11-10

Changed

  • Commit consumer offset asynchronously

[1.8.0] - 2022-11-09

Added

  • Consumer logs a warning for receivers that fail with an exception

[1.7.0] - 2022-11-04

Changed

  • Manually manage commits instead of using auto-commit on the consumer

  • Catch Exception instead of BaseException on both producer and consumer

[1.6.0] - 2022-11-04

Changed

  • Enhanced error logging in consumer, including telemetry for exceptions

  • Consumer loop will no longer exit when an error is encountered

[1.5.0] - 2022-11-01

Changed

  • Log full event data on all producer errors

[1.4.3] - 2022-10-31

Fixed

  • Upgrade openedx-events and fastavro to bring in a fix for schema creation

[1.4.2] - 2022-10-31

Fixed

  • Removed proof-of-concept code that logged user-login events

[1.4.1] - 2022-10-28

Fixed

  • Correct and clarify management command help strings (some copy-paste errors)

  • Update TODO comments

[1.4.0] - 2022-10-21

Changed

  • Remove override of auto.offset.reset on consumer (which will default to “latest”). New consumer groups will consume only messages that are sent after the group was initialized.

  • Remove redundant lookup of signal in consumer loop (should not have any effect)

  • Explicitly encode message header values as UTF-8 (no change in behavior)

[1.3.0] - 2022-10-20

Changed

  • Upgrade openedx-events. When AvroSignalSerializer gets event schemas, it will get whatever is currently defined in openedx-events, so this will update the COURSE_CATALOG_EVENT_CHANGED schema (dropping effort field)

[1.2.0] - 2022-10-13

Changed

  • EVENT_BUS_KAFKA_CONSUMERS_ENABLED now defaults to True instead of False

  • Removed manual monitoring since New Relic tracks these now.

[1.1.0] - 2022-10-06

Changed

  • Added monitoring for consumption tasks.

[1.0.0] - 2022-10-03

Changed

  • Fixed bug in schema registry that was sending schemas to the wrong topic

  • Bump version to 1.x to acknowledge that this is in use in production

[0.7.0] - 2022-09-08

Changed

  • Breaking changes EventProducerKafka is now KafkaEventProducer

  • KafkaEventConsumer is now part of the public API

[0.6.2] - 2022-09-08

Added

  • Topic names can be autoprefixed by setting EVENT_BUS_TOPIC_PREFIX

[0.6.1] - 2022-09-06

Added

  • Producer now polls on an interval, improving callback reliability. Configurable with EVENT_BUS_KAFKA_POLL_INTERVAL_SEC.

[0.6.0] - 2022-09-01

Changed

  • Breaking change: Public API is now defined in edx_event_bus_kafka package and edx_event_bus_kafka.management.commands package; all other modules should be considered unstable and not for external use.

[0.5.1] - 2022-08-31

Fixed

  • Various lint issues (and missing __init__.py files.)

[0.5.0] - 2022-08-31

Changed

  • Breaking changes in the producer module, refactored to expose a better API:

    • Rather than send_to_event_bus(...), relying code should now call get_producer().send(...).

    • The sync kwarg is gone; to flush and sync messages before shutdown, call get_producer().prepare_for_shutdown() instead.

  • Clarify that config module is for internal use only.

  • Implementation changes: Only a single Producer is created, and is used for all signals.

[0.4.4] - 2022-08-26

Fixed

  • Fixed bug in test module for when confluent-kafka isn’t present

[0.4.3] - 2022-08-24

Fixed

[0.4.2] - 2022-08-24

Fixed

  • Properly load auth settings for producer/consumer. (Auth settings were ignored since 0.3.1.)

[0.4.1] - 2022-08-18

Changed

  • Remove confluent-kafka as a formal dependency of the repository.

    • Note: This library will not work without confluent-kafka.

  • Add an ADR to explain why this work was done.

[0.4.0] - 2022-08-15

Changed

  • Rename settings to have consistent prefix.

    • KAFKA_CONSUMERS_ENABLED becomes EVENT_BUS_KAFKA_CONSUMERS_ENABLED

    • CONSUMER_POLL_TIMEOUT becomes EVENT_BUS_KAFKA_CONSUMER_POLL_TIMEOUT

    • Updates to documentation and tests for various settings previously renamed

[0.3.1] - 2022-08-11

Changed

  • Refactored consumer to use common configuration.

[0.3.0] - 2022-08-10

Changed

  • Moved configuration onto separate file.

  • Updated configuration settings to have EVENT_BUS_KAFKA prefix.

[0.2.0] - 2022-08-09

Fixed

  • Cache producers so that they don’t lose data.

[0.1.0] - 2022-06-16

Added

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edx_event_bus_kafka-1.8.1.tar.gz (44.0 kB view hashes)

Uploaded Source

Built Distribution

edx_event_bus_kafka-1.8.1-py2.py3-none-any.whl (42.4 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page