Skip to main content

A Powerful suite of fast and intuitive Python ETL utilities

Project description


pyblux: A suite of fast, easy-to-use, and intuitive Python ETL utilities.

PyPI Latest Release Package Status

What is it?

pyblux is a Python package that provides a suite of ETL utilities built to make the interactions with databases in the cloud as well as on-premise fast, easy and intuitive.

Features

  • Support multiple databases, including Postgres, MySql, MS SQL, SQLIte, Teradata and Oracle.
  • The get_engine method makes it easy to connect to databases in a simple and intuitive manner.
  • Blux.sql method from the Blux class helps run fast queries. It Povides output results as namedtuple or dictionary and it supports parameterised queries and in-flight transformation of data.
  • Logger class helps setup logging via log file or console.
  • send_teams_notification method provides an easy way to send alerts to a MS Teams channel via an incoming webhook.
  • send_email method helps to send email with HTML content
  • Helpful error messages display the failed query SQL
  • DBAPI2 specification was used in order to simplify coding for queries on relational database systems using Python.

Where to get it

The source code is currently hosted on GitHub at: https://github.com/bertin.nono/pyblux

Binary installers for the latest released version are available at the Python Package Index (PyPI)

# or PyPI
pip install pyblux

pyblux provides support for the databases below:

Dependencies

Depending on the use case, the database package should be installed.

  • Teradata : Install the Teradata SQL Driver Dialect for SQLAlchemy
pip install teradatasqlalchemy
  • PostgreSQL: Install Psycopg which is the most popular PostgreSQL database adapter for the Python programming language.
pip install psycopg2-binary
pip install PyMySQL
  • [SQLite]: No install required
  • Oracle: Intall cx_Oracle which is a Python extension module that enables access to Oracle Database.
pip install cx-Oracle
  • Microsoft SQL Server: Install the pyodbc which is an open source Python module that makes accessing ODBC databases simple.
pip install pyodbc

Documentation

Classes

  • Blux:

Establishes a connection engine to a database system referenced by the dialect attribute run fast queries.

Note: The dialect is the system SQLAlchemy uses to communicate with various types of DBAPI implementations and databases.

class Blux:
    """
    This class connects to a local database session using the db `dialect` library.
    """

    def __init__(self, engine=None,dialect=None):
        """
        Args:
            engine (str): Database connection engine.
            dialect (str): database system name(postgres, oracle, teradata,...)
        Note: Database Connection package must be installed in order to use this backend.
        """
        self.engine = engine
        self._dialect = dialect

        self.__errlimit = 1
        self.__warnings = []
        self.__errors = []
        self.__logons = []

Example:

table_list = """SELECT table_schema, table_name FROM information_schema.tables"""
postgres_engine = get_engine(dialect='PG', host="localhost", port=5432, database="mydata", user="postgres_user", password="123")
blux= Blux(engine=postgres_engine, dialect='postgres')
dataframe = blux.sql(query=table_list)
table = 'test'
database = 'postgres'
#load dataframe to table
blux.sql(dataframe=dataframe,database=database,table=table, dialect='postgres')
  • Logger:

provides a custom logging handler called logger. Helps Debug SQL and monitor progress with logging.

class Logger:
    """
    This class connects to a local database session using the db `connnection` library.
    """

    def __init__(self, logname:str, filename:str, level=logging.INFO, console:bool=True):
        """
        Args:
            logname (str): Logger Name.
            filename (str): log file path 
            level (str): Logger Level (DEBUG, INFO, WARNING, ERROR)
            console (cool): print to console
        """
        self._logname = logname
        self._filename = filename
        self._level = level
        self._console = console

Example:

import logging
from pyblux import logger

pyblux_logger = Logger(logname=ETL.NAME, filename=log_file,level=logging.INFO, console=True)
logger=pyblux_logger.logger( verbose=True)

Output from a call for get_engine will look like:

2021-07-07 15:06:22,411 get_engine: 
2021-07-07 15:06:22,413 get_engine: 
2021-07-07 15:06:22,416 get_engine: 

Methods:

  • get_engine:

Creates a database connection engine.

get_engine(user:str,password:str,host:str,port:int,database:str,dialect:str,verbose:bool=False,parameter:str=None,raw_engine:bool=True,logger:Callable=print)

Database connection details are defined by get_engine objects (see below).

Example:

import pandas
from pyblux.utils import get_engine
from pyblux.blux import Blux

oracle_engine = get_engine(dialect='oracle', host="localhost", port=1521,database="mydata", user="oracle_user", password="123")

teradata_engine = get_engine(dialect='mssql', host="localhost", port=5432, database="mydata", user="postgres_user", password="123")

postgres_engine = get_engine(dialect='mysql', host="localhost", port=5432, database="mydata", user="postgres_user", password="123")

postgres_engine = get_engine(dialect='teradata', host="localhost", port=5432, database="mydata", user="postgres_user", password="123")

postgres_engine = get_engine(dialect='postgres', host="localhost", port=5432, database="mydata", user="postgres_user", password="123")

Passwords

It is best practice for Database passwords to be stored in environment variables. This can be done on the command line via:

  • export password=secret-password on Linux
  • set password=secret-password on Windows

Or in a Python terminal via:

import os
os.environ['password'] = 'secret-password'

No password is required for SQLite databases.

Example: Connection_String = postgresql+psycopg2://user:p@ssword%to%encode@hosturl/defaultdb The above password encoded using urllib.parse:

   pwd=urllib.parse.quote_plus("p@ssword%to%encode")
   print(pwd)
   Connection_String = postgresql+psycopg2://user:urllib.parse.quote_plus("p@ssword%to%encode")@hosturl/defaultdb
  • get_connection:

Gets a regular database connection.

get_connection(user:str,password:str,host:str,port:int,database:str,dialect:str,verbose:bool=False,parameter:str=None,logger:Callable=print):
    """
    Get a regular connection for Teradata , Oracle, Aurora/Postgres, Aurora/MySql/MariaDB, SQLite, and  Microsoft SQL Server
    Returns
    -------
    Connection Object
    """

Example:

from pyblux.utils import get_connection

oracle_conn = get_connection(dialect='oracle', host="localhost", port=1521,database="mydata", user="oracle_user", password="123")

teradata_conn = get_connection(dialect='mssql', host="localhost", port=5432, database="mydata", user="postgres_user", password="123")

postgres_conn = get_connection(dialect='mysql', host="localhost", port=5432, database="mydata", user="postgres_user", password="123")

postgres_conn = get_connection(dialect='teradata', host="localhost", port=5432, database="mydata", user="postgres_user", password="123")

postgres_conn = get_connection(dialect='postgres', host="localhost", port=5432, database="mydata", user="postgres_user", password="123")
  • create_table_from_dataframe:

Creates table from a dataframe attributes and fastload data into in it.

create_table_from_dataframe(dataframe:pd.DataFrame=None,table:str=None,Blux:Blux=None,verbose:bool=False,logger:Callable=print)
  • is_exist:

Checks is a table or view exist.

is_exist(table:str='', Blux:Blux=None,verbose:bool=False,logger:Callable=print)
  • drop_table:

Checks is a table or view exist and then drops it if it exists.

drop_table(table:str=None,Blux:Blux=None,verbose:bool=False,logger:Callable=print)
  • send_teams_notifications:

Send a Card to a MS Teams Channel

send_teams_notification ( hookurl: str, title: str='' , text: str='', message: str ='', status: str ='', error_message: str='', activitySubtitle: str='', activityText: str='')
  • send_email:

Send an HTML formated email that can include a dataframe

send_email(server:str, port:int,sender: str, receivers: list, subject: str, body_text: str, attachment: any = None,df: pd.DataFrame = None)

Maintainers:

  • Bertin Nono

Development status

Stable

Licence

MIT

References

Change Log

(09/12/2021)

  • Initial Release

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyblux-0.0.5.tar.gz (14.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page