Bugzilla-ETL 0.3.13353

Mozilla Bugzilla Bug Version ETL


Python version of Metric's Bugzilla ETL

Motivation and Details


-  PyPy 2.1.0 using Python 2.7 (cPython is way too slow)
-  A MySQL/Maria database with Mozilla's Bugzilla schema (`old public
   version can be found
   here <>`__)
-  A timezone database
   (`instructions <./tests/resources/mySQL/>`__)
-  An ElasticSearch (v 0.20.5) cluster to hold the bug version documents


PyPy and SetupTools are required. If you are installing on Windows
please `follow instructions to get these
installed <>`__.
When done, installation is easy:


    pip install Bugzilla-ETL


You must prepare a ``settings.json`` file to reference the resources,
and it's filename must be provided as an argument in the command line.
Examples of settings files can be found in
`resources/settings <resources/settings>`__

Bugzilla-ETL keeps local run state in the form of two files:
``first_run_time`` and ``last_run_time``. These are both parameters in
the \`\`settings.json\`\`\` file.

-  ``first_run_time`` is written only if it does not exist, and triggers
   a full ETL refresh. Delete this file if you want to create a new ES
   index and start ETL from the beginning.
-  ``last_run_time`` is recorded whenever there has been a successful
   ETL. This file will not exist until the initial full ETL has
   completed successfully. Deleteing this file should have no net
   effect, other than making the program work harder then it should.

Running bz\

Asuming your ``settings.json`` file is in ``~/Bugzilla_ETL``:


    cd ~/Bugzilla_ETL
    bzetl --settings=settings.json

Use ``--help`` for more options, and see `example command line
script <resources/scripts/bz_etl.bat>`__

Got it working?

The initial ETL will take over two hours. If you want something quicker
to confirm your configuration is correct, use ``--reset --quick``
arguments on the command line. This will limit ETL to the first 1000,
and last 1000 bugs.


    bzetl --settings=settings.json --reset --quick

Developer Installation

If you plan to help improve this software, or if you enjoy working from
source, you can clone from Github:


    git clone

Install requirements:


    pip install -e

It is best you install on Linux, but if you do install on Windows you
can find further Windows-specific Python installation instructions at
one of my other projects:

Running Tests

The Git clone will include test code. You can run those tests, but you

-  Have MySQL installed (no Bugzilla schema required)
-  Have timezone database installed
   (`instructions <./tests/resources/mySQL/>`__)
-  A complete ``test_settings.json`` file to point to the resources
   (`example <./resources/settings/test_settings_example.json>`__)
-  Use pypy for 4x the speed:
   ``pypy .\tests\ --settings=test_settings.json``

More on ElasticSearch

If you are new to ElasticSearch, I recommend using `ElasticSearch
Head <>`__ for getting cluster
status, current schema definitions, viewing individual records, and
more. Clone it off of GitHub, and open the ``index.html`` file from in
your browser. Here are some alternate
`instructions <>`__.
