A column lineage tool
Project description
LineageX
A Column Level Lineage Graph for Postgres
Have you ever wondered what is the column level relationship among your SQL scripts and base tables? Don't worry, this tool is intended to help you by creating an interactive graph on a webpage to explore the column level lineage among them(Currently only supports Postgres, other connection types or dialects are under development).
What you need is one line of code:
from lineagex.lineagex import lineagex
lineagex("/path/to/SQL/" or [a_list_of_SQL_string])
That is the bare minimum input, the input can be a path to a SQL file, a path to the folder containing many SQL files or simply a list of SQL strings in Python.
Optionally, you can provide more information such as the schemas to the "search_path" in Postgres and the connection string to the database to achieve a better result.
from lineagex.lineagex import lineagex
lineagex("/path/to/SQL/", "search, path, schemas", "postgresql://username:password@server:port/database")
The output would be a output.json and a index.html file in the folder. Start a local http server and you would be able to see the interactive graph.
Installation
pip install lineagex
Parameter and output format
When there are dependencies between the SQL files, please have the first part of the "search_path" being the schema that the dependant table is created(default is "public"). Also, the name assumption of the table is either the file name if there is only 1 SQL in that file or the name extracted from "CREATE TABLE/VIEW".
Example:
table1.sql - SELECT column1, column2 FROM schema1.other_table WHERE column3 IS NOT NULL;
table2.sql - SELECT column1 AS new_column1, column2 AS new_column2 from schema1.table1;
In that example, the call should be like this, note that "schema1" is the first element in the "search path" parameter
lineagex("/path/to/SQL/", "schema1, public", "postgresql://username:password@server:port/database")
In the output.json file, it can be read by other programs and analyzed for other uses, the general format is as follows (using the example from above):
{
schema1.other_table: {
tables: [],
columns: {
column1: [], column2: [], column3: []
},
table_name: schema1.other_table
},
schema1.table1: {
tables: [schema1.other_table],
columns: {
column1: [schema1.other_table.columns1, schema1.other_table.columns3], column2: [schema1.other_table.columns2, schema1.other_table.columns3]
},
table_name: schema1.table1
},
schema1.table2: {
tables: [schema1.table1],
columns: {
new_column1: [schema1.table1.columns1], new_column2: [schema1.table1.column2]
},
table_name: schema1.table2
},
}
How to Navigate the Webpage
- Start by clicking the star on the right(search) and input a model name that you want to start with.
- It should show a table on the canvas with table names and its columns, by clicking the "explore" button on the top right, it will show all the downstream and upstream tables that are related to the columns.
- Hovering over a column will highlight its downstream and upstream columns as well.
- You can navigate through the canvas by clicking "explore" on other tables.
FAQ
"not init data"
in the webpage: Possibly due to the content of the JSON in the index.html, please check if it is in valid JSON format, and that all keys are in string format.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for lineagex-0.0.2a1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a6c73c4017c20c853e415585f1e3c3a09c06ded7bfdf6c0455a91f58f55d02e9 |
|
MD5 | f66ccaad345deb072ae9ca612818342b |
|
BLAKE2b-256 | 24750f3bf347fafb2a4b46c04f0051b498fea96241cef74d6a67bb9c4799cb5a |