Skip to main content

python daemon that munches on logs and sends their contents to logstash

Project description

python daemon that munches on logs and sends their contents to logstash

Requirements

  • Python 2.7 (untested on other versions)

  • Optional zeromq support: install libzmq (brew install zmq or apt-get install libzmq-dev) and pyzmq (pip install pyzmq==2.1.11)

Installation

Using PIP:

From Github:

pip install git+git://github.com/josegonzalez/beaver.git#egg=beaver

From PyPI:

pip install beaver==11

Usage

usage:

beaver [-h] [-m {bind,connect}] [-p PATH] [-f FILES [FILES ...]]
          [-t {rabbitmq,redis,stdout,zmq,udp}] [-c CONFIG] [-d DEBUG] [--fqdn]

optional arguments:

-h, --help            show this help message and exit
-m {bind,connect}, --mode {bind,connect}
                    bind or connect mode
-p PATH, --path PATH  path to log files
-f FILES [FILES ...], --files FILES [FILES ...]
                    space-separated filelist to watch, can include globs
                    (*.log). Overrides --path argument
-t {rabbitmq,redis,stdout,zmq}, --transport {rabbitmq,redis,stdout,zmq}
                    log transport method
-c CONFIG, --configfile CONFIG
                    ini config file path
-d DEBUG, --debug DEBUG
                    enable debug mode
--fqdn
                    use the machine's FQDN for source_host

Background

Beaver provides an lightweight method for shipping local log files to Logstash. It does this using either redis, stdin, zeromq as the transport. This means you’ll need a redis, stdin, zeromq input somewhere down the road to get the events.

Events are sent in logstash’s json_event format. Options can also be set as environment variables.

NOTE: the redis transport uses a namespace of logstash:beaver by default. You will need to update your logstash indexer to match this.

Examples

Example 1: Listen to all files in the default path of /var/log on standard out as json:

beaver

Example 2: Listen to all files in the default path of /var/log on standard out with msgpack:

BEAVER_FORMAT='msgpack' beaver

Example 3: Listen to all files in the default path of /var/log on standard out as a string:

BEAVER_FORMAT='string' beaver

Example 4: Sending logs from /var/log files to a redis list:

REDIS_URL='redis://localhost:6379/0' beaver -t redis

Example 5: Use environment variables to send logs from /var/log files to a redis list:

REDIS_URL='redis://localhost:6379/0' BEAVER_PATH='/var/log' BEAVER_TRANSPORT=redis beaver

Example 6: Zeromq listening on port 5556 (all interfaces):

ZEROMQ_ADDRESS='tcp://*:5556' beaver -m bind -t zmq

# logstash config:
input {
  zeromq {
    type => 'shipper-input'
    mode => 'client'
    topology => 'pushpull'
    address => 'tcp://shipperhost:5556'
  }
}
output { stdout { debug => true } }

Example 7: Zeromq connecting to remote port 5556 on indexer:

ZEROMQ_ADDRESS='tcp://indexer:5556' beaver -m connect -t zmq

# logstash config:
input {
  zeromq {
    type => 'shipper-input'
    mode => 'server'
    topology => 'pushpull'
    address => 'tcp://*:5556'
  }
}
output { stdout { debug => true } }

Example 8: Real-world usage of Redis as a transport:

# in /etc/hosts
192.168.0.10 redis-internal

# From the commandline
REDIS_NAMESPACE='app:unmappable' REDIS_URL='redis://redis-internal:6379/0' beaver -f /var/log/unmappable.log -t redis

# logstash indexer config:
input {
  redis {
    host => 'redis-internal'
    data_type => 'list'
    key => 'app:unmappable'
    type => 'app:unmappable'
  }
}
output { stdout { debug => true } }

As you can see, beaver is pretty flexible as to how you can use/abuse it in production.

Example 9: RabbitMQ connecting to defaults on remote broker:

# From the commandline
RABBITMQ_HOST='10.0.0.1' beaver -t rabbitmq

# logstash config:
input { amqp {
    name => 'logstash-queue'
    type => 'direct'
    host => '10.0.0.1'
    exchange => 'logstash-exchange'
    key => 'logstash-key'
    exclusive => false
    durable => false
    auto_delete => false
  }
}
output { stdout { debug => true } }

Example 10: Read config from config.ini and put to stdout:

# From the commandline
beaver -c config.ini -t stdout

# config.ini content:
[/tmp/somefile]
type: mytype
tags: tag1,tag2
add_field: fieldname1,fieldvalue1[,fieldname2,fieldvalue2, ...]

[/var/log/*log]
type: syslog
tags: sys

[/var/log/{secure,messages}.log]
type: syslog
tags: sys

Example 11: UDP transport:

# From the commandline
UDP_HOST='127.0.0.1' UDP_PORT='9999' beaver -t udp

# logstash config:
input {
  udp {
    type => 'shipper-input'
    host => '127.0.0.1'
    port => '9999'
  }
}
output { stdout { debug => true } }

Todo

  • Use python threading + subprocess in order to support usage of yield across all operating systems

  • Fix usage on non-linux platforms - file.readline() does not work as expected on OS X. See above for potential solution

  • More transports

  • ~Ability to specify files, tags, and other metadata within a configuration file~

Caveats

When using copytruncate style log rotation, two race conditions can occur:

  1. Any log data written prior to truncation which beaver has not yet read and processed is lost. Nothing we can do about that.

  2. Should the file be truncated, rewritten, and end up being larger than the original file during the sleep interval, beaver won’t detect this. After some experimentation, this behavior also exists in GNU tail, so I’m going to call this a “don’t do that then” bug :)

    Additionally, the files beaver will most likely be called upon to watch which may be truncated are generally going to be large enough and slow-filling enough that this won’t crop up in the wild.

Credits

Based on work from Giampaolo and Lusis:

Real time log files watcher supporting log rotation.

Original Author: Giampaolo Rodola' <g.rodola [AT] gmail [DOT] com>
http://code.activestate.com/recipes/577968-log-watcher-tail-f-log/

License: MIT

Other hacks (ZMQ, JSON, optparse, ...): lusis

Changelog

11 (2012-12-16)

  • Add optional support for socket.getfqdn. [Jeremy Kitchen]

    For my setup I need to have the fqdn used at all times since my hostnames are the same but the environment (among other things) is found in the rest of the FQDN.

    Since just changing socket.gethostname to socket.getfqdn has lots of potential for breakage, and socket.gethostname doesn’t always return an FQDN, it’s now an option to explicitly always use the fqdn.

    Fixes #68

  • Check for log file truncation fixes #55. [Jeremy Kitchen]

    This adds a simple check for log file truncation and resets the watch when detected.

    There do exist 2 race conditions here: 1. Any log data written prior to truncation which beaver has not yet read and processed is lost. Nothing we can do about that. 2. Should the file be truncated, rewritten, and end up being larger than the original file during the sleep interval, beaver won’t detect this. After some experimentation, this behavior also exists in GNU tail, so I’m going to call this a “don’t do that then” bug :)

    Additionally, the files beaver will most likely be called upon to watch which may be truncated are generally going to be large enough and slow filling enough that this won’t crop up in the wild.

  • Add a version number to beaver. [Jose Diaz-Gonzalez]

10 (2012-12-15)

  • Fixed package name. [Jose Diaz-Gonzalez]

  • Regenerate CHANGES.rst on release. [Jose Diaz-Gonzalez]

  • Adding support for /path/{foo,bar}.log. [Josh Braegger]

  • Ignore file errors in unwatch method – the file might not exists. [Josh Braegger]

  • Unwatch file when encountering a stale NFS handle. When an NFS file handle becomes stale (ie, file was removed), it was crashing beaver. Need to just unwatch file. [Josh Braegger]

  • Consistency. [Chris Faulkner]

  • Pull install requirements from requirements/base.txt so they don’t get out of sync. [Chris Faulkner]

  • Include changelog in setup. [Chris Faulkner]

  • Convert changelog to RST. [Chris Faulkner]

  • Actually show the license. [Chris Faulkner]

  • Consistent casing. [Chris Faulkner]

  • Consistency. [Chris Faulkner]

  • Stating the obvious. [Chris Faulkner]

  • Grist for the mill. [Chris Faulkner]

  • Drop redundant README.txt. [Chris Faulkner]

  • Don’t use empty string for tag when no tags configured in config file. [Stylianos Modes]

  • Making ‘mode’ option work for zmqtransport. Adding setuptools and tests (use ./setup.py nosetests). Adding .gitignore. [Josh Braegger]

9 (2012-11-28)

  • More release changes. [Jose Diaz-Gonzalez]

  • Fixed deprecated warning when declaring exchange type. [Rafael Fonseca]

7 (2012-11-28)

  • Added a helper script for creating releases. [Jose Diaz-Gonzalez]

  • Partial fix for crashes caused by globbed files. [Jose Diaz-Gonzalez]

  • Removed deprecated usage of e.message. [Rafael Fonseca]

  • Fixed exception trapping code. [Rafael Fonseca]

  • Added some resiliency code to rabbitmq transport. [Rafael Fonseca]

6 (2012-11-26)

  • Fix issue where polling for files was done incorrectly. [Jose Diaz- Gonzalez]

  • Added ubuntu init.d example config. [Jose Diaz-Gonzalez]

5 (2012-11-26)

  • Try to poll for files on startup instead of throwing exceptions. Closes #45. [Jose Diaz-Gonzalez]

  • Added python 2.6 to classifiers. [Jose Diaz-Gonzalez]

4 (2012-11-26)

  • Remove unused local vars. [Jose Diaz-Gonzalez]

  • Allow rabbitmq exchange type and durability to be configured. [Jose Diaz-Gonzalez]

  • Remove unused import. [Jose Diaz-Gonzalez]

  • Formatted code to fix PEP8 violations. [Jose Diaz-Gonzalez]

  • Use alternate dict syntax for Python 2.6 support. Closes #43. [Jose Diaz-Gonzalez]

  • Fixed release date for version 3. [Jose Diaz-Gonzalez]

3 (2012-11-25)

  • Added requirements files to manifest. [Jose Diaz-Gonzalez]

  • Include all contrib files in release. [Jose Diaz-Gonzalez]

  • Revert “removed redundant README.txt” to follow pypi standards. [Jose Diaz-Gonzalez]

    This reverts commit e667f63706e0af8bc82c0eac6eac43318144e107.

  • Added bash startup script. Closes #35. [Jose Diaz-Gonzalez]

  • Added an example supervisor config for redis. closes #34. [Jose Diaz- Gonzalez]

  • Removed redundant README.txt. [Jose Diaz-Gonzalez]

  • Added classifiers to package. [Jose Diaz-Gonzalez]

  • Re-order workers. [Jose Diaz-Gonzalez]

  • Re-require pika. [Jose Diaz-Gonzalez]

  • Make zeromq installation optional. [Morgan Delagrange]

  • Formatting. [Jose Diaz-Gonzalez]

  • Added changes to changelog for version 3. [Jose Diaz-Gonzalez]

  • Timestamp in ISO 8601 format with the “Z” sufix to express UTC. [Xabier de Zuazo]

  • Adding udp support. [Morgan Delagrange]

  • Lpush changed to rpush on redis transport. This is required to always read the events in the correct order on the logstash side. See: https: //github.com/logstash/logstash/blob/6f745110671b5d9d66bf082fbfed99d145 af4620/lib/logstash/outputs/redis.rb#L4. [Xabier de Zuazo]

2 (2012-10-25)

  • Example upstart script. [Michael D’Auria]

  • Fixed a few more import statements. [Jose Diaz-Gonzalez]

  • Fixed binary call. [Jose Diaz-Gonzalez]

  • Refactored logging. [Jose Diaz-Gonzalez]

  • Improve logging. [Michael D’Auria]

  • Removed unnecessary print statements. [Jose Diaz-Gonzalez]

  • Add default stream handler when transport is stdout. Closes #26. [bear (Mike Taylor)]

  • Handle the case where the config file is not present. [Michael D’Auria]

  • Better exception handling for unhandled exceptions. [Michael D’Auria]

  • Fix wrong addfield values. [Alexander Fortin]

  • Add add_field to config example. [Alexander Fortin]

  • Add support for add_field into config file. [Alexander Fortin]

  • Minor readme updates. [Jose Diaz-Gonzalez]

  • Add support for type reading from INI config file. [Alexander Fortin]

    Add support for symlinks in config file

    Add support for file globbing in config file

    Add support for tags

    a little bit of refactoring, move type and tags check down into transport class

    create config object (reading /dev/null) even if no config file has been given via cli

    Add documentation for INI file to readme

    Remove unused json library

    Conflicts: README.rst

  • When sending data over the wire, use UTC timestamps. [Darren Worrall]

  • Support globs in file paths. [Darren Worrall]

  • Added msgpack support. [Jose Diaz-Gonzalez]

  • Use the python logging framework. [Jose Diaz-Gonzalez]

  • Fixed Transport.format() method. [Jose Diaz-Gonzalez]

  • Properly parse BEAVER_FILES env var. [Jose Diaz-Gonzalez]

  • Refactor transports. [Jose Diaz-Gonzalez]

    Fix the json import to use the fastest json module available

    Move formatting into Transport class

  • Attempt to fix defaults from env variables. [Jose Diaz-Gonzalez]

  • Fix README and beaver CLI help to reference correct RABBITMQ_HOST environment variable. [jdutton]

  • Add RabbitMQ support. [Alexander Fortin]

  • Added real-world example of beaver usage for tailing a file. [Jose Diaz-Gonzalez]

  • Removed unused argument. [Jose Diaz-Gonzalez]

  • Ensure that python-compatible readme is included in package. [Jose Diaz-Gonzalez]

  • Fix variable naming and timeout for redis transport. [Jose Diaz- Gonzalez]

  • Installation instructions. [Jose Diaz-Gonzalez]

  • Use restructured text for readme instead of markdown. [Jose Diaz- Gonzalez]

  • Removed unnecessary .gitignore. [Jose Diaz-Gonzalez]

1 (2012-08-06)

  • Moved app into python package format. [Jose Diaz-Gonzalez]

  • Moved binary beaver.py to bin/beaver, as per python packaging. [Jose Diaz-Gonzalez]

  • Moved around transports to be independent of each other. [Jose Diaz- Gonzalez]

  • Reorder transports. [Jose Diaz-Gonzalez]

  • Rewrote run_worker to throw exception if all transport options have been exhausted. [Jose Diaz-Gonzalez]

  • Rename Amqp -> Zmq to avoid confusion with RabbitMQ. [Alexander Fortin]

  • Added choices to the –transport argument. [Jose Diaz-Gonzalez]

  • Fixed derpy formatting. [Jose Diaz-Gonzalez]

  • Added usage to the readme. [Jose Diaz-Gonzalez]

  • Support usage of environment variables instead of arguments. [Jose Diaz-Gonzalez]

  • Fixed files argument parsing. [Jose Diaz-Gonzalez]

  • One does not simply license all the things. [Jose Diaz-Gonzalez]

  • Add todo to readme. [Jose Diaz-Gonzalez]

  • Added version to pyzmq. [Jose Diaz-Gonzalez]

  • Added license. [Jose Diaz-Gonzalez]

  • Reordered imports. [Jose Diaz-Gonzalez]

  • Moved all transports to beaver/transports.py. [Jose Diaz-Gonzalez]

  • Calculate current timestamp at most once per callback fired. [Jose Diaz-Gonzalez]

  • Modified transports to include proper information for ingestion in logstash. [Jose Diaz-Gonzalez]

  • Fixed package imports. [Jose Diaz-Gonzalez]

  • Removed another compiled python file. [Jose Diaz-Gonzalez]

  • Use ujson instead of simplejson. [Jose Diaz-Gonzalez]

  • Ignore compiled python files. [Jose Diaz-Gonzalez]

  • Fixed imports. [Jose Diaz-Gonzalez]

  • Fixed up readme instructions. [Jose Diaz-Gonzalez]

  • Refactor transports so that connections are no longer global. [Jose Diaz-Gonzalez]

  • Readme and License. [Jose Diaz-Gonzalez]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Beaver-11.tar.gz (22.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page