Skip to main content

A column lineage tool

Project description

LineageX

A Column Level Lineage Graph for Postgres

Have you ever wondered what is the column level relationship among your SQL scripts and base tables? Don't worry, this tool is intended to help you by creating an interactive graph on a webpage to explore the column level lineage among them(Currently only supports Postgres, other connection types or dialects are under development).

What you need is one line of code:

from lineagex.lineagex import lineagex

lineagex("/path/to/SQL/" or [a_list_of_SQL_string])

That is the bare minimum input, the input can be a path to a SQL file, a path to the folder containing many SQL files or simply a list of SQL strings in Python.

Optionally, you can provide more information such as the schemas to the "search_path" in Postgres and the connection string to the database to achieve a better result.

from lineagex.lineagex import lineagex

lineagex("/path/to/SQL/", "search, path, schemas", "postgresql://username:password@server:port/database")

The output would be a output.json and a index.html file in the folder. Start a local http server and you would be able to see the interactive graph.

Installation

pip install lineagex

Parameter and output format

When there are dependencies between the SQL files, please have the first part of the "search_path" being the schema that the dependant table is created(default is "public"). Also, the name assumption of the table is either the file name if there is only 1 SQL in that file or the name extracted from "CREATE TABLE/VIEW".

Example:

table1.sql - SELECT column1, column2 FROM schema1.other_table WHERE column3 IS NOT NULL;
table2.sql - SELECT column1 AS new_column1, column2 AS new_column2 from schema1.table1;

In that example, the call should be like this, note that "schema1" is the first element in the "search path" parameter

lineagex("/path/to/SQL/", "schema1, public", "postgresql://username:password@server:port/database")

In the output.json file, it can be read by other programs and analyzed for other uses, the general format is as follows (using the example from above):

{
  schema1.other_table: {
    tables: [], 
    columns: {
      column1: [], column2: [], column3: []
    }, 
    table_name: schema1.other_table
  }, 
  schema1.table1: {
    tables: [schema1.other_table], 
    columns: {
      column1: [schema1.other_table.columns1, schema1.other_table.columns3], column2: [schema1.other_table.columns2, schema1.other_table.columns3]
    }, 
    table_name: schema1.table1
  }, 
  schema1.table2: {
    tables: [schema1.table1], 
    columns: {
      new_column1: [schema1.table1.columns1], new_column2: [schema1.table1.column2]
    }, 
    table_name: schema1.table2
  }, 
}

How to Navigate the Webpage

  • Start by clicking the star on the right(search) and input a model name that you want to start with.
  • It should show a table on the canvas with table names and its columns, by clicking the "explore" button on the top right, it will show all the downstream and upstream tables that are related to the columns.
  • Hovering over a column will highlight its downstream and upstream columns as well.
  • You can navigate through the canvas by clicking "explore" on other tables.

FAQ

  • "not init data" in the webpage: Possibly due to the content of the JSON in the index.html, please check if it is in valid JSON format, and that all keys are in string format.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lineagex-0.0.2a1.tar.gz (909.4 kB view hashes)

Uploaded Source

Built Distribution

lineagex-0.0.2a1-py3-none-any.whl (917.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page