Amazon Textract Overlay tools
Project description
Textract-Overlayer
amazon-textract-overlayer provides functions to help overlay bounding boxes on documents.
Install
> python -m pip install amazon-textract-overlayer
Make sure your environment is setup with AWS credentials through configuration files or environment variables or an attached role. (https://docs.aws.amazon.com/cli/latest/userguide/cli-chap-configure.html)
Samples
Primary method provided is get_bounding_boxes which returns bounding boxes based on the Textract_Type passed in.
Mostly taken from the amazon-textract
command from the package amazon-textract-helper
.
This will return the bounding boxes for WORD and CELL data types.
from textractoverlayer.t_overlay import DocumentDimensions, get_bounding_boxes
from textractcaller.t_call import Textract_Features, Textract_Types, call_textract
doc = call_textract(input_document=input_document, features=features)
# image is a PIL.Image.Image in this case
document_dimension:DocumentDimensions = DocumentDimensions(doc_width=image.size[0], doc_height=image.size[1])
overlay=[Textract_Types.WORD, Textract_Types.CELL]
bounding_box_list = get_bounding_boxes(textract_json=doc, document_dimensions=document_dimension, overlay_features=overlay)
The actual overlay drawing of bounding boxes for images is in the amazon-textract
command from the package amazon-textract-helper
and looks like this:
from PIL import Image, ImageDraw
image = Image.open(input_document)
rgb_im = image.convert('RGB')
draw = ImageDraw.Draw(rgb_im)
# check the impl in amazon-textract-helper for ways to associate different colors to types
for bbox in bounding_box_list:
draw.rectangle(xy=[bbox.xmin, bbox.ymin, bbox.xmax, bbox.ymax], outline=(128, 128, 0), width=2)
rgb_im.show()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for amazon-textract-overlayer-0.0.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | c4e7cd3da0f04ee279af646e2d72f882bba5c88721269b7c4323c4ae03041d32 |
|
MD5 | 6a9795666cf5c32e33afa0e6ffb87996 |
|
BLAKE2b-256 | c03d0b8e8c970796db0ef269351e224b5dba1684ec98d6e26dba17f4b1937e7a |
Hashes for amazon_textract_overlayer-0.0.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 660d2e2ea8844a806a2b7f78d89495de54f4f83a3bdfd0628e7173722f3abed6 |
|
MD5 | aa1fa2e13bba9d28ebd1e9a83abc193f |
|
BLAKE2b-256 | 356ec24609c2b6329cd695a17810f38e52b491954006391ecef878ff7d7c33a1 |