snatch

Simple image scraping in Python

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Configurable, extensible image scraping for Python. Inspired by the design and internals of Kenneth Reitz’ Requests library.

>>> from snatch import snatch
>>> images = snatch('http://octodex.github.com/pythocat/')
>>> images.extensions
[u'png']
>>> images[1]
<Image ["pythocat.png"]>
>>> images[1].url
u'http://octodex.github.com/images/pythocat.png'

Easily usable, easily configurable:

>>> url = 'url/with/54/images'
>>> snatch(url)
<ImageList [54]>

# reduce your results by extension:
>>> _.with_extension('gif')
<ImageList [2]>

# or more explicitly limit your extension in the inital api call:
>>> snatch(url, with_extension=('gif',))
<ImageList [2]>

It’s also very easy to hook your own filters or operations into Snatch’s callbacks system. Let’s say you only wanted to capture images that were larger than 250 px wide:

import requests
import Image
from StringIO import StringIO
from snatch import snatch

def wider_than_250(images):
    def filter_fn(image):
        if image.width is None:
            res = requests.get(image.src)
            img = Image.open(StringIO(res.content))
            image.width = img.size[0]
        return image.width > 250
    return filter(filter_fn, images)

url = 'http://octodex.github.com/images/pythocat.png'
callbacks = {'complete': wider_than_250}
images = snatch(url, callbacks=callbacks)

And even simpler to download all images from a URL:

import os
import requests
from snatch import snatch

directory = 'snatched-images'

if not os.path.exists(directory):
    os.mkdir(directory)

for image in snatch('http://octodex.github.com/pythocat/'):
    contents = requests.get(image.url).content
    with open('%s/%s' % (directory, image.filename), 'w') as image_file:
        image_file.write(contents)

Release History

0.1.0 (2013-10-12)

Initial write/scaffold, lots to fix/improve upon

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.1.0

Dec 4, 2013

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snatch-0.1.0.tar.gz (6.7 kB view hashes)

Uploaded Dec 4, 2013 Source

Hashes for snatch-0.1.0.tar.gz

Hashes for snatch-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`32e7e86b14de2064ee9860c4a99caa99a2a471fd5dc8201de83717d630f94aeb`
MD5	`da988461a3cb4b5761b51bf9b0ce76d9`
BLAKE2b-256	`25109d44219c75316c268b334b3cf9becc673d811c61b04cca59d622e49684eb`