skip to navigation
skip to content

Not Logged In

hemlock 0.1.6

Hemlock is a way of providing a common data access layer.

Hemlock
=======
[![PyPI version](https://badge.fury.io/py/hemlock.png)](http://badge.fury.io/py/hemlock) [![Build Status](https://travis-ci.org/Lab41/Hemlock.png?branch=master)](https://travis-ci.org/Lab41/Hemlock) [![downloads](https://pypip.in/d/hemlock/badge.png)](http://crate.io/packages/hemlock/) [![Coverage Status](https://coveralls.io/repos/Lab41/Hemlock/badge.png?branch=master)](https://coveralls.io/r/Lab41/Hemlock?branch=master)

Hemlock is an open-source project exploring ways to create a common data access
layer that eliminates the need to understand underlying data topologies but
still preserving the requirements of each data source such as access control,
performance, and formats.

![Hemlock L](https://raw.github.com/Lab41/Hemlock/master/docs/images/overview_hemlock.png "Hemlock")

Install instructions
====================

Option A, install using pip:

```bash
sudo pip install hemlock
```

Option B, build from source:

```bash
git clone https://github.com/Lab41/Hemlock.git
cd Hemlock
sudo python setup.py install
```

Required Dependencies
---------------------

Python modules:
- [MySQLdb](http://mysql-python.sourceforge.net/MySQLdb.html)
- [texttable](https://pypi.python.org/pypi/texttable)
- [couchbase](http://www.couchbase.com/communities/python/getting-started) >= 1.0
- [APScheduler](https://pypi.python.org/pypi/APScheduler)

Build a server running [MySQL](http://www.mysql.com/) to store user accounts, tenants, and registered
systems.


Build a [Couchbase 2.0](http://www.couchbase.com/) cluster to store metadata and data of registered systems.

Build an [ElasticSearch 0.90.2](http://www.elasticsearch.org/) cluster to store the index of Couchbase.

Add XDCR one-way replication from Couchbase to ElasticSearch using this [plugin](https://github.com/couchbaselabs/elasticsearch-transport-couchbase) (Note, grab version 1.1.0).

Once the plugin is installed, be sure and update the couchbase_template.json under plugins/transport-couchbase/ to have the following:

```json
{
    "template" : "*",
    "order" : 10,
    "mappings" : {
        "couchbaseCheckpoint" : {
            "_source" : {
                "includes" : ["doc.*"]
            },
            "date_detection" : false,
            "dynamic_templates": [
                {
                    "store_no_index": {
                        "match": "*",
                        "mapping": {
                            "store" : "no",
                            "index" : "no",
                            "include_in_all" : false
                        }
                    }
                }
            ]
        },
        "_default_" : {
            "_source" : {
                "includes" : ["meta.*"]
            },
            "date_detection" : false,
            "properties" : {
                "meta" : {
                    "type" : "object",
                    "include_in_all" : false
                }
            }
        }
    }
}
```

Once that is added, start up ElasticSearch with ``bin/elasticsearch`` and then perform the following the first time:

```bash
curl -XPUT http://localhost:9200/_template/couchbase -d @plugins/transport-couchbase/couchbase_template.json
```

Installing required databases
-----------------------------

1. Create database ``hemlock`` in [MySQL](http://www.mysql.com/).
2. Create bucket ``hemlock`` in [Couchbase](http://www.couchbase.com/).
3. Create index ``hemlock`` in [ElasticSearch](http://www.elasticsearch.org/).


Getting started
----------------

1. Create Hemlock credentials (see 'Credential files')
   ```bash
   HEMLOCK_MYSQL_SERVER=192.168.1.10
   HEMLOCK_MYSQL_USERNAME=user
   HEMLOCK_MYSQL_DB=hemlock
   HEMLOCK_MYSQL_PW=pass
   HEMLOCK_COUCHBASE_SERVER=192.168.1.20
   HEMLOCK_COUCHBASE_BUCKET=hemlock
   HEMLOCK_COUCHBASE_USERNAME=hemlock
   HEMLOCK_COUCHBASE_PW=pass
   HEMLOCK_ELASTICSEARCH_ENDPOINT=192.168.1.30
   ```

   (if you'd like these to persist, consider adding export before each line and performing ``source`` on the file)
2. Create a tenant, role, user, and data source system
   ```bash
   hemlock tenant-create --name Project1

   hemlock tenant-list

   hemlock role-create --name User

   hemlock role-list

   hemlock user-create --name User1 \
                        --username Username1 \
                        --email user1@email.com \
                        --rold_id 42ba73f9-0ab6-4a50-908c-1585955754f4 \
                        --tenant_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6

   hemlock user-list

   hemlock register-local-system --name System1 \
                                  --data_type csv \
                                  --description "description" \
                                  --tenant_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6 \
                                  --hostname system1.fqdn \
                                  --endpoint http://hemlock.server/ \
                                  --poc_name user1 \
                                  --poc_email user1@email.com

   hemlock system-list
   ```
3. Add credentials for data source system, for example: mysql_creds
   ```bash
   MYSQL_SERVER=192.168.1.30
   MYSQL_DB=db1
   #MYSQL_TABLE=table1
   MYSQL_USERNAME=user
   MYSQL_PW=pass
   ```
4. Store a client
   ```bash
   hemlock client-store --name mysql_client_1 --type mysql --system_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6 --credential_file /path/to/mysql_creds

   hemlock client-list
   ```
5. Add credentials for hemlock
   ```bash
   hemlock hemlock-server-store --credential_file /path/to/hemlock_creds
   ```
6. Create a schedule server (optional)
   ```bash
   hemlock schedule-server-create --name schedule_server_1

   hemlock schedule-server-list
   ```
7. Add a schedule for the data source system to run (optional)
   ```bash
   hemlock client-schedule --name schedule1 \
                          --minute "54" \
                          --hour "12" \
                          --day_of_month "*" \
                          --month "*" \
                          --day_of_week "*" \
                          --client_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6
                          --schedule_server_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6

   hemlock schedule-list
   ```
8. Perform a test run for pulling data from the data source system
   ```bash
   hemlock client-run --uuid 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6
   ```
9. Search for data that has been loaded into Hemlock
   ```bash
   hemlock query-data --user 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6 --query foo
   ```
   or
   ```bash
   Direct with elasticsearch:

   http://elasticsearch.fqdn:9200/hemlock/_search?q=foo

   Which returns something the following:

   {
    "took": 14,
    "timed_out": false,
    "_shards": {
        "total": 20,
        "successful": 20,
        "failed": 0
    },
    "hits": {
        "total": 1,
        "max_score": 3.6582048,
        "hits": [
            {
                "_index": "hemlock",
                "_type": "couchbaseDocument",
                "_id": "865f458b4421ae5fd758e3c81aca9f8d8b4696b6",
                "_score": 3.6582048,
                "_source": {
                    "meta": {
                        "id": "865f458b4421ae5fd758e3c81aca9f8d8b4696b6",
                        "rev": "1-0010f1ac6045ccf40000000000000000",
                        "flags": 0,
                        "expiration": 0
                    }
                }
            }
        ]
    }
   }

   Now we can feed the 'id' into Couchbase to return the full document:

   http://couchbase.fqdn:8092/hemlock/865f458b4421ae5fd758e3c81aca9f8d8b4696b6

   Which returns something like the following:

   {
    "hemlock-system": "a50b86c2-59f7-42a3-aa67-3367579189fe",
    "hemlock-date": "2013-09-03 16:10:20",
    "stream": "DOYLIE"
   }
   ```

Credential files
----------------

1. Create a ``hemlock_creds`` file (see hemlock_creds_sample for an example):

   ```bash
   HEMLOCK_MYSQL_SERVER=192.168.1.10
   HEMLOCK_MYSQL_USERNAME=user
   HEMLOCK_MYSQL_DB=hemlock
   HEMLOCK_MYSQL_PW=pass
   HEMLOCK_COUCHBASE_SERVER=192.168.1.20
   HEMLOCK_COUCHBASE_BUCKET=hemlock
   HEMLOCK_COUCHBASE_USERNAME=hemlock
   HEMLOCK_COUCHBASE_PW=pass
   ```

2. Create credential files for each client you intend to use ([examples](https://github.com/Lab41/Hemlock/tree/master/hemlock/clients/)).


Currently supported data sources
================================

Technology | Parameter | Python Module Dependencies
---------- | --------- | ------------
MySQL      | mysql     | MySQLdb
MongoDB    | mongo     | pymongo
Redis      | redis     | redis
Local FileSystem | fs  | magic, pdfminer, xmltodict
RESTful API | rest     |
Streams    | stream_odd |


Adding a new data source type
-----------------------------

Create a new class under the clients folder for each new data source type.  Most
classes will need two methods defined: ``connect_client`` and ``get_data``.

The following is a template that can be used to work from:

```python
class HMyclient:
    def connect_client(self, client_dict):
        # return a handle that can be used to get data from the data source
        return c_server
    def get_data(self, client_dict, c_server, h_server, client_uuid):
        # data_list is an array of arrays to contain the data
        data_list = [[]]
        # desc_list is an array that contains the schema (if exists or known)
        desc_list = []
        return data_list, desc_list
```

Usage examples
==============

- Create a tenant

    ```bash
    hemlock tenant-create --name Project1
    ```
- Create a role

    ```bash
    hemlock role-create --name User
    ```
- Create a user

    ```bash
    hemlock user-create --name User1 \
                        --username Username1 \
                        --email user1@email.com \
                        --rold_id 42ba73f9-0ab6-4a50-908c-1585955754f4 \
                        --tenant_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6
    ```
- Register a local system

    ```bash
    hemlock register-local-system --name System1 \
                                  --data_type csv \
                                  --description "description" \
                                  --tenant_id 7d0f6b0d-334a-4d89-bd1a-70e8e1c04aa6 \
                                  --hostname system1.fqdn \
                                  --endpoint http://hemlock.server/ \
                                  --poc_name user1 \
                                  --poc_email user1@email.com
    ```
- List registered systems

    ```bash
    hemlock system-list
    ```
- List created users

    ```bash
    hemlock user-list
    ```
- Lists created tenants

    ```bash
    hemlock tenant-list
    ```
- [Connecting to a client](https://github.com/Lab41/Hemlock/tree/master/hemlock/clients/)
- [Full CLI API list](https://github.com/Lab41/Hemlock/blob/master/docs/CLI.md)


Related repositories
====================

 - [Hemlock-REST](http://lab41.github.io/Hemlock-REST/)
 - [Hemlock-Frontend](http://lab41.github.io/Hemlock-Frontend/)

Documentation
=============

 - [Docs](http://lab41.github.io/Hemlock/docs/_build/html/index.html)

Tests
=====

The tests for this project use [py.test](http://pytest.org/latest/)


Contributing to Hemlock
=======================

What to contribute?  Awesome!  Issue a pull request or see more details [here](https://github.com/Lab41/Hemlock/blob/master/CONTRIBUTING.md).
 
File Type Py Version Uploaded on Size
hemlock-0.1.6.tar.gz (md5) Source 2013-10-21 1MB
  • Downloads (All Versions):
  • 21 downloads in the last day
  • 83 downloads in the last week
  • 541 downloads in the last month