A column lineage tool

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

LineageX

A Column Level Lineage Graph for Postgres

Have you ever wondered what is the column level relationship among your SQL scripts and base tables? Don't worry, this tool is intended to help you by creating an interactive graph on a webpage to explore the column level lineage among them(Currently only supports Postgres, other connection types or dialects are under development).

What you need is one line of code:

from lineagex.lineagex import lineagex

lineagex("/path/to/SQL/" or [a_list_of_SQL_string])

That is the bare minimum input, the input can be a path to a SQL file, a path to the folder containing many SQL files or simply a list of SQL strings in Python.

Optionally, you can provide more information such as the schemas to the "search_path" in Postgres and the connection string to the database to achieve a better result.

from lineagex.lineagex import lineagex

lineagex("/path/to/SQL/", "search, path, schemas", "postgresql://username:password@server:port/database")

The output would be a output.json and a index.html file in the folder. Start a local http server and you would be able to see the interactive graph.

Installation

pip install lineagex

Parameter and output format

When there are dependencies between the SQL files, please have the first part of the "search_path" being the schema that the dependant table is created(default is "public"). Also, the name assumption of the table is either the file name if there is only 1 SQL in that file or the name extracted from "CREATE TABLE/VIEW".

Example:

table1.sql - SELECT column1, column2 FROM schema1.other_table WHERE column3 IS NOT NULL;
table2.sql - SELECT column1 AS new_column1, column2 AS new_column2 from schema1.table1;

In that example, the call should be like this, note that "schema1" is the first element in the "search path" parameter

lineagex("/path/to/SQL/", "schema1, public", "postgresql://username:password@server:port/database")

In the output.json file, it can be read by other programs and analyzed for other uses, the general format is as follows (using the example from above):

{
  schema1.other_table: {
    tables: [], 
    columns: {
      column1: [], column2: [], column3: []
    }, 
    table_name: schema1.other_table
  }, 
  schema1.table1: {
    tables: [schema1.other_table], 
    columns: {
      column1: [schema1.other_table.columns1, schema1.other_table.columns3], column2: [schema1.other_table.columns2, schema1.other_table.columns3]
    }, 
    table_name: schema1.table1
  }, 
  schema1.table2: {
    tables: [schema1.table1], 
    columns: {
      new_column1: [schema1.table1.columns1], new_column2: [schema1.table1.column2]
    }, 
    table_name: schema1.table2
  }, 
}

How to Navigate the Webpage

Start by clicking the star on the right(search) and input a model name that you want to start with.
It should show a table on the canvas with table names and its columns, by clicking the "explore" button on the top right, it will show all the downstream and upstream tables that are related to the columns.
Hovering over a column will highlight its downstream and upstream columns as well.
You can navigate through the canvas by clicking "explore" on other tables.

FAQ

"not init data" in the webpage: Possibly due to the content of the JSON in the index.html, please check if it is in valid JSON format, and that all keys are in string format.

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.0.26

Apr 28, 2024

0.0.25

Apr 28, 2024

0.0.24

Apr 8, 2024

0.0.23

Apr 8, 2024

0.0.22

Apr 7, 2024

0.0.21

Apr 7, 2024

0.0.20

Apr 7, 2024

0.0.19

Feb 20, 2024

0.0.18

Feb 19, 2024

0.0.17

Feb 12, 2024

0.0.16

Jan 30, 2024

0.0.15

Jan 29, 2024

0.0.14

Jan 29, 2024

0.0.13

Jan 29, 2024

0.0.12

Jan 29, 2024

0.0.11

Jan 29, 2024

0.0.10

Jan 29, 2024

0.0.9

Jan 17, 2024

0.0.8

Jan 17, 2024

0.0.7

May 26, 2023

0.0.6

May 24, 2023

0.0.5

May 24, 2023

0.0.5a5 pre-release

May 24, 2023

0.0.5a4 pre-release

May 24, 2023

0.0.5a3 pre-release

May 24, 2023

0.0.5a2 pre-release

May 24, 2023

0.0.5a1 pre-release

May 24, 2023

0.0.4

May 20, 2023

0.0.3

May 20, 2023

0.0.3a2 pre-release

May 19, 2023

0.0.3a1 pre-release

May 19, 2023

0.0.2

May 17, 2023

0.0.2a2 pre-release

May 14, 2023

This version

0.0.2a1 pre-release

May 12, 2023

0.0.1

May 12, 2023

0.0.1a1 pre-release

May 12, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lineagex-0.0.2a1.tar.gz (909.4 kB view hashes)

Uploaded May 12, 2023 Source

Built Distribution

lineagex-0.0.2a1-py3-none-any.whl (917.1 kB view hashes)

Uploaded May 12, 2023 Python 3

Hashes for lineagex-0.0.2a1.tar.gz

Hashes for lineagex-0.0.2a1.tar.gz
Algorithm	Hash digest
SHA256	`bfd2eb288a73777120e60375d3ccbc4a51d7c0b243aa47b2af0b1f291a198006`
MD5	`db578e5ebea1df6df82fcef7bd3d480c`
BLAKE2b-256	`984940efe29a102d9ec8e7170257f8e9b733714177e06933492f12afbe26b6e2`

Hashes for lineagex-0.0.2a1-py3-none-any.whl

Hashes for lineagex-0.0.2a1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a6c73c4017c20c853e415585f1e3c3a09c06ded7bfdf6c0455a91f58f55d02e9`
MD5	`f66ccaad345deb072ae9ca612818342b`
BLAKE2b-256	`24750f3bf347fafb2a4b46c04f0051b498fea96241cef74d6a67bb9c4799cb5a`