Open Source Data Lineage Tool for Redshift. Snowflake and many other databases
Project description
Tokern Lineage Engine
Tokern Lineage Engine is fast and easy to use application to collect, visualize and analyze column-level data lineage in databases, data warehouses and data lakes in AWS and GCP.
Tokern Lineage helps you browse column-level data lineage
- visually using kedro-viz
- analyze lineage graphs programmatically using the powerful networkx graph library
Resources
- Demo of Tokern Lineage App
-
Checkout an example data lineage notebook.
-
Check out the post on using data lineage for cost control for an example of how data lineage can be used in production.
Quick Start
Install a demo of using Docker and Docker Compose
Download the docker-compose file from Github repository.
# in a new directory run
wget https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/catalog-demo.yml
# or run
curl https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/tokern-lineage-engine.yml -o docker-compose.yml
Run docker-compose
docker-compose up -d
Check that the containers are running.
docker ps
CONTAINER ID IMAGE CREATED STATUS PORTS NAMES
3f4e77845b81 tokern/data-lineage-viz:latest ... 4 hours ago Up 4 hours 0.0.0.0:8000->80/tcp tokern-data-lineage-visualizer
1e1ce4efd792 tokern/data-lineage:latest ... 5 days ago Up 5 days tokern-data-lineage
38be15bedd39 tokern/demodb:latest ... 2 weeks ago Up 2 weeks tokern-demodb
Try out Tokern Lineage App
Head to http://localhost:8000/
to open the Tokern Lineage app
Install Tokern Lineage Engine
# in a new directory run
wget https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/tokern-lineage-engine.yml
# or run
curl https://raw.githubusercontent.com/tokern/data-lineage/master/install-manifests/docker-compose/catalog-demo.yml -o tokern-lineage-engine.yml
Run docker-compose
docker-compose up -d
If you want to use an external Postgres database, change the following parameters in tokern-lineage-engine.yml
:
- CATALOG_HOST
- CATALOG_USER
- CATALOG_PASSWORD
- CATALOG_DB
You can also override default values using environement variables.
CATALOG_HOST=... CATALOG_USER=... CATALOG_PASSWORD=... CATALOG_DB=... docker-compose -f ... up -d
For more advanced usage of environment variables with docker-compose, refer to docker-compose docs
Pro-tip
If you want to connect to a database in the host machine, set
CATALOG_HOST: host.docker.internal # For mac or windows
#OR
CATALOG_HOST: 172.17.0.1 # Linux
Supported Technologies
- Postgres
- AWS Redshift
- Snowflake
Coming Soon
- SparkSQL
- Presto
Documentation
For advanced usage, please refer to data-lineage documentation
Survey
Please take this survey if you are a user or considering using data-lineage. Responses will help us prioritize features better.