A error notification system for remote Jupyter notebooks

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

remote-notebook-error-collection

This weekend project is an alpha release. It may be stay like the forever.

This aim to collect errors generated by other users using a notebook that was shared. Three classes presented here are successive steps in its construction.

In a Jupyter notebook, this is what is run:

!pip install notebook-error-reporter
from notebook_error_reporter import ErrorServer

es = ErrorServer(url='https://errors.matteoferla.com', notebook='test')
es.enable()

Then if an error is raised it gets logged (see lengthy privacy discussion below) and can be inspected:

es.retrieve_errors()

Install

pip install notebook-error-reporter

Examples in action

Aims

I have a few notebooks that I have shared on Twitter and I occasionally get an email telling if the repo they use is broken or there is a case that causes an error. Similar in concept to Sentry.io, I would like to know when error happen. Most users will not email about errors, so one sees the tip of an iceberg. This is because:

it is something silly they did
they worry it may be something silly they did
they deem the code crap

Point 1 implies there is a problem with user experience: it could have been clearer. The user is never wrong: they have simply been misled.

Point 2 and 3 is an error that needs fixing. Point 2 in particular means that better error handling is needed. Point 3 Okay, the user is never wrong. However, instead of obfuscating the crapiness, one can document the issue.

I do not want any private or confidential data from the user or user given fields —someone's target protein might be confidential. The code therefore should not contain error codes raise someone's password or credit card number or mutation.

I only want to receive

the error type
the error message
some traceback details (line number, function name and filename minus path)
the notebook name
the cell's first line

In a regular locally hosted notebook there is the issue that servers collect IP addresses, which point to a user's location. This is not quite GDPR data, but still. Not collecting IP addresses is a terrible idea as fail2ban etc. rely on IP addressed to block wannabe hackers.

In a colab notebook this is rather straightforward as the IP of the request is from the server running the kernel, not the browser (for that a javascript function is required to pass this info over).

Data not sent is:

inputted values
(majorly) content of a mounted Google Drive

Store

An alternative option is storing the error details error_details.

from notebook_error_reporter import ErrorStore
es = ErrorStore()
es.enable()
es.error_details

Slack

The easiest way is getting slacked on error to a channel. A Slack webhook is easy to set up (just remember the subdomain to do so is api not app).

import os
os.environ['SLACK_WEBHOOK'] = "https://hooks.slack.com/services/XXXXXXXX"

from notebook_error_reporter import ErrorSlack
es = ErrorSlack(os.environ['SLACK_WEBHOOK'])
es.enable()

A regular cell does nothing. But one that is not successful will send a Slack message.

{"error_name": "ValueError", 
 "error_message": "foo", 
 "traceback": [{"filename": "foo.py",
                "fun_name": "run_code", 
                "lineno": 666}, 
                ...
               ], 
 "first_line": "# cell that does foo",
 "execution_count": 111}

The 'filename' is stripped of the dist-packages path, because the dist-packages path in colab may have a username that could have personal identifiable data.

If a Slack webhook is shared on GitHub, there are users that search GitHub for exposed webhooks and spam with adverts for their cybersecurity courses. Also a single prankster user could make it really annoying. Therefore, a server needs to be set up ideally to collect this...

Server

For myself I have set-up https://errors.matteoferla.com This is an Intel NUC acting as my homeserver connected to my router. It could be even a Raspberry Pi. (This is a weekend project so it's outside of the University's network but privacy & confidentiality is as valued!) If you like this project and want to replicate it or use this server, just drop me an email.

A FastAPI app to get the errors is also present. This needs to be set up on a hosting server exposed to the internet.

This has the largest risk of vandalism.

So the server host would run run_app.py, which contains this code:

import uvicorn
from fastapi import FastAPI
from notebook_error_reporter.serverside import create_db, create_app

create_db()
app:FastAPI = create_app(debug=False, max_transparency=True, colab_only=False)

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=8000, log_level="info")

While a user activate logging on the notebook thusly:

from notebook_error_reporter import ErrorServer

es = ErrorServer(url='http://127.0.0.1:8000', notebook='mine')
es.enable()

On error a dictionary typehintinted as EventMessageType is sent:

from notebook_error_reporter import EventMessageType

EventMessageType.__annotations__

{'execution_count': int,
 'first_line': str,
 'error_name': str,
 'error_message': str,
 'traceback': typing.List[notebook_error_reporter.error_event._traceback.TracebackDetailsType]}

and TracebackDetailsType.__annotations__ is:

{'filename': str, 'fun_name': str, 'lineno': int}

The server does keep track of IP addresses to prevent vandalism, but it's the IP address of the colab notebook. No JavaScript call is present to get the browser IP. (Annoyingly I'd love to do some JS calls to get some useful data, but best not obfuscate!) Therefore the IP will be in the range: 142.250.0.0 - 142.251.255.255.

To see the errors sent:

es.retrieve_errors()

I am unsure if to allow everyone to see the sessions and errors, hence the max_transparency argument. For an internal server, this makes sense, but for a public one, revealing the session ids may result in vandals adding errors to sessions randomly.

Colab

Colab runs on an ancient version of IPython (5.5, cf. 8.2). As a result things are done a bit differently.

.enable calls either load_ipython_extension or monkeypatch_extension depending on the ipython version. The former adds an event callback function (shell.events.callbacks), which is all proper and good. The latter monkeypatches a decorating function around shell.showtraceback, which knows about the ErrorEvent/ErrorSlack/ErrorServer/ErrorStorage instance, because it was created in a factory method of the latter. As it does not have a result object, it does not know what is the excecution count nor the first line of the cell.

!pip install notebook-error-reporter
from notebook_error_reporter import ErrorServer

es = ErrorServer(url='https://errors.matteoferla.com', notebook='test')
es.enable()
# raise an error:
raise ValueError('Foo')

The latter error can be seen to have been sent successfully:

es.retrieve_errors()

However as I am not a Seattle/Arlington multinational hellbent on collecting data, I like to make it opt in:

#@markdown Send error messages to errors.matteoferla.com for logging?
#@markdown See [notebook-error-reporter repo for more](https://github.com/matteoferla/notebook-error-reporter)
report_errors = False #@param {type:"boolean"}
if report_errors:
    !pip install notebook-error-reporter
    from notebook_error_reporter import ErrorServer

    es = ErrorServer(url='https://errors.matteoferla.com', notebook='fragmenstein')
    es.enable()

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.1.1

Nov 30, 2022

0.1

Apr 13, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

notebook-error-reporter-0.1.1.tar.gz (16.9 kB view hashes)

Uploaded Nov 30, 2022 Source

Hashes for notebook-error-reporter-0.1.1.tar.gz

Hashes for notebook-error-reporter-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`85c3b5964c5a0dca39cd028439fbdd2db31db8a5ba18b72047210b3a0c82e7ee`
MD5	`aede457a5470d2e0bc0fdc841f7567a6`
BLAKE2b-256	`dc9b557ac629be623b5d8ef1b8238779844242062cc9f4d76e1a77e50104c77a`