Skip to main content

No project description provided

Project description

qabot

Query local or remote files or databases with natural language queries powered by langchain and openai.

Works on local CSV files:

remote CSV files:

$ qabot \
    -f https://www.stats.govt.nz/assets/Uploads/Environmental-economic-accounts/Environmental-economic-accounts-data-to-2020/renewable-energy-stock-account-2007-2020-csv.csv \
    -q "How many Gigawatt hours of generation was there for Solar resources in 2015 through to 2020?"

as well as on real databases:

Even on (public) data stored in S3:

Quickstart

You need to set the OPENAI_API_KEY environment variable to your OpenAI API key, which you can get from here.

Install the qabot command line tool using pip/poetry:

$ poetry install qabot

Then run the qabot command with either files or a database connection string.

Examples

Local CSV file/s

$ qabot -q "how many passengers survived by gender?" -f data/titanic.csv
🦆 Loading data from files...
Loading data/titanic.csv into table titanic...

Query: how many passengers survived by gender?
Result:
There were 233 female passengers and 109 male passengers who survived.


 🚀 any further questions? [y/n] (y): y

 🚀 Query: what was the largest family who did not survive? 
Query: what was the largest family who did not survive?
Result:
The largest family who did not survive was the Sage family, with 8 members.

 🚀 any further questions? [y/n] (y): n

Database

Install any required drivers for your database, e.g. pip install psycopg2-binary for postgres.

For example to connect and query directly from the trains database in the relational dataset repository:

$ pip install mysqlclient

$ qabot -d mysql+mysqldb://guest:relational@relational.fit.cvut.cz:3306/trains -q "what are the unique load shapes of cars, what are the maximum number of cars per train?" 
Query: what are the unique load shapes of cars, what are the maximum number of cars per train?
Result:
The unique load shapes of cars are circle, diamond, hexagon, rectangle, and triangle, and the maximum number of cars per train is 3.

Note you can also supply a database connection string via the environment variable QABOT_DATABASE_URI.

Limit the tables

You can limit the tables that are queried by passing the -t flag - this will save your tokens! For example, to only query the cars table:

$ export QABOT_DATABASE_URI=mysql+mysqldb://guest:relational@relational.fit.cvut.cz:3306/trains
$ qabot -q "what are the unique load shapes of cars?" -t cars

The unique load shapes of cars are circle, hexagon, triangle, rectangle, and diamond.

Features

Intermediate steps and database queries

Use the -v flag to see the intermediate steps and database queries:

$ qabot -d mysql+mysqldb://guest:relational@relational.fit.cvut.cz:3306/trains -q "what are the unique load shapes of cars, what are the maximum number of cars per train?" -v
Query: what are the unique load shapes of cars, what are the maximum number of cars per train?
Intermediate Steps: 
  Step 1

    list_tables_sql_db(
      
    )

    Output:
    trains, cars

  Step 2

    schema_sql_db(
      trains, cars
    )

    Output:
    CREATE TABLE trains (
        id INTEGER(11) NOT NULL, 
        direction VARCHAR(4), 
        PRIMARY KEY (id)
    )ENGINE=InnoDB DEFAULT CHARSET=latin1

    SELECT * FROM 'trains' LIMIT 3;
    id  direction
    1   east
    2   east
    3   east


    CREATE TABLE cars (
        id INTEGER(11) NOT NULL, 
        train_id INTEGER(11), 
        `position` INTEGER(11), 
        shape VARCHAR(255), 
        len VARCHAR(255), 
        sides VARCHAR(255), 
        roof VARCHAR(255), 
        wheels INTEGER(11), 
        load_shape VARCHAR(255), 
        load_num INTEGER(11), 
        PRIMARY KEY (id), 
        CONSTRAINT cars_ibfk_1 FOREIGN KEY(train_id) REFERENCES trains (id) ON DELETE CASCADE ON UPDATE CASCADE
    )ENGINE=InnoDB DEFAULT CHARSET=latin1

    SELECT * FROM 'cars' LIMIT 3;
    id  train_id        position        shape   len     sides   roof    wheels  load_shape      load_num
    1   1       1       rectangle       short   not_double      none    2       circle  1
    2   1       2       rectangle       long    not_double      none    3       hexagon 1
    3   1       3       rectangle       short   not_double      peaked  2       triangle        1

  Step 3

    query_sql_db(
      SELECT load_shape, MAX(load_num) FROM cars GROUP BY load_shape
    )

    Output:
    [('circle', 3), ('diamond', 1), ('hexagon', 1), ('rectangle', 3), ('triangle', 3)]


Result:
The unique load shapes of cars are circle, diamond, hexagon, rectangle, and triangle, and the maximum number of cars per train is 3.

Data accessed via http/s3

Use the -f <url> flag to load data from a url, e.g. a csv file on s3:

$ qabot -f s3://covid19-lake/enigma-jhu-timeseries/csv/jhu_csse_covid_19_timeseries_merged.csv -q "how many confirmed cases are there" -v
🦆 Loading data from files...
create table jhu_csse_covid_19_timeseries_merged as select * from 's3://covid19-lake/enigma-jhu-timeseries/csv/jhu_csse_covid_19_timeseries_merged.csv';

Result:
264308334 confirmed cases

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

qabot-0.1.4.tar.gz (11.0 kB view hashes)

Uploaded Source

Built Distribution

qabot-0.1.4-py3-none-any.whl (11.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page