No project description provided
Project description
tbzuploader - Generic HTTP Uploading
A lot of daily work is based on regular files.
tbzuploader is a command line tool which detects uploadable files and posts them via HTTP while conforming to the standardized HTTP Status Codes.
Upload Protocol
tbzuploader conforms to the generally accepted upload protocol.
201 Created
If the HTTP upload is successful, the server responds with 201 Created. The files will then be moved to a done directory.
400 Bad Request
If the HTTP upload is not successful and it is a client error (such as wrong files or corrupted files), the server responds with 400 Bad Request. The files will then be moved to a failed directory.
In case you want to inform an admin, specify an email address which gets notified in that case, because failed files won’t be retried.
500 Internal Server Error and others
If the HTTP upload was not successful (such as an login page, outage, programming error or overload), the server responds with other status codes (such as 500 Internal Server Error). tbzuploader will then retry to post the files next time.
Features
pairs of arbitrary size (tuples, triplets, etc.)
For example you have four files: a.pdf, a.xml, b.pdf, b.xml
The first upload should take a.pdf and a.xml, and the second upload b.pdf and b.xml.
See the docs for --patterns.
mail to admin if broken files are uploaded
Use Case
Imagine you provide a modern solution (ReST/HTTP/SaaS) with a nice API and many many manhours invested into it. Unfortunately, many of your customers don’t have any programming skills. The only thing they can do is providing files such as PDF documents, Excel workbooks, CSV tables, XML files etc. In the past, these files were imported using protocols like ftp, scp, windows shares (smb) and others.
The main problem with these dated protocols is the missing data validation on the receiving side (on your side!).
tbzuploader helps overcome this obstacle:
First, you write a simple HTTP service which validates the uploaded files. If the data is valid, return 201 Success.
Second, you tell the customer to use tbzuploader. It is a simple command line tool which works everywhere (on Linux, Windows, Mac, …)
If the data of the customer is valid, the data will be imported.
If the data of the customer is not valid, the issue will stay where it belongs: on the sending side (on the client’s side!).
Example
user@host> tbzuploader my-local-dir https://user:password@myhost/upload-files
This will upload files from directory my-local-dir to the specified URL.
If the upload was successful (server returned HTTP status 201 Created), then the local files in my-local-dir get moved to my-local-dir/done.
If the upload failed because the server rejects the files (400 Bad Request), then the local files in my-local-dir get moved to my-local-dir/failed.
If there was another error (network timeout, server overload, …), the files stay in the current location and the next call of the command line tool will try to upload the files again.
Usage
>>> bin/tbzuploader --help usage: tbzuploader [-h] [--patterns= LIST_OF_PATTERNS] [--min-age-seconds MIN_AGE_SECONDS] [--done-directory DONE_DIRECTORY] [--failed-directory FAILED_DIRECTORY] [--smtp-server SMTP_SERVER] [--mail-from MAIL_FROM] [--mail-to MAIL_TO] [--all-files-in-one-request] [--all-files-in-n-requests] [--no-ssl-cert-verification] [--ca-bundle CA_BUNDLE] [--dry-run] local_directory url positional arguments: local_directory url URL can contain http-basic-auth like this: https://apiuser:mypwd@example.com/input-process- output/ optional arguments: -h, --help show this help message and exit --patterns= LIST_OF_PATTERNS List of file endings which should get uploaded together. Example: --patterns="*.pdf *.xml" The pairs (a.pdf, a.xml) and (b.pdf, b.xml) get uploaded together --min-age-seconds MIN_AGE_SECONDS Skip files which are too young. Default: 60 --done-directory DONE_DIRECTORY files get moved to this directory after successful upload. Defaults to {local_directory}/done --failed-directory FAILED_DIRECTORY files get moved to this directory after failed upload due to broken files. Defaults to {local_directory}/failed --smtp-server SMTP_SERVER SMTP server which sends mails in case broken files were tried to be uploaded. --mail-from MAIL_FROM Sender of mails in case broken files were tried to be uploaded. --mail-to MAIL_TO Recipient of mails in case broken files were tried to be uploaded. --all-files-in-one-request Upload all files in one request (if you give not --pattern). Upload all matching files in one request (if you give --pattern) --all-files-in-n-requests Upload all files in N requests (if you give not --pattern). Upload all matching files in N requests (if you give --pattern) --no-ssl-cert-verification --ca-bundle CA_BUNDLE --dry-run Do not upload. Just print the pair of files which would get uploaded together
Install
Install for usage from pypi:
pip install tbzuploader
Development Install on Python2
Install tbzuploader for development on Python2:
virtualenv tbzuploader-env cd tbzuploader-env . ./bin/activate pip install -e git+https://github.com/guettli/tbzuploader.git#egg=tbzuploader
Development Install on Python3
Install tbzuploader for development on Python3:
python3 -m venv tbzuploader-py3env cd tbzuploader-py3env . ./bin/activate pip install --upgrade pip pip install -e git+https://github.com/guettli/tbzuploader.git#egg=tbzuploader
Development Testing
Testing:
pip install -r src/tbzuploader/requirements.txt cd src/tbzuploader pytest # all test ok? pyCharm src/tbzuploader/... pytest # all test still ok? .... I am waiting for your pull request :-)
Protocol for resumable uploads
Unfortunately, tbzuploader does not support resumable uploads up to now.
There is already a spec for it.
It would very cool if tbzuploader could support this spec: https://tus.io/
Pull requests are welcome.
Trivia: Why 201?
Why using 201 Created instead of 200 Success?
In the beginning, we used 200 Success for “successful upload”. A server misconfiguration caused a redirect to the login page, thus ignoring the uploaded files and returning a 200 Success. Since the upload was “successful”, the files were moved into done erroneously.
That’s why 201 Created gets used.