skip to navigation
skip to content

Not Logged In

snatch 0.1.0

Simple image scraping in Python

Configurable, extensible image scraping for Python. Inspired by the design and internals of Kenneth Reitz' Requests library.

>>> from snatch import snatch
>>> images = snatch('http://octodex.github.com/pythocat/')
>>> images.extensions
[u'png']
>>> images[1]
<Image ["pythocat.png"]>
>>> images[1].url
u'http://octodex.github.com/images/pythocat.png'

Easily usable, easily configurable:

>>> url = 'url/with/54/images'
>>> snatch(url)
<ImageList [54]>

# reduce your results by extension:
>>> _.with_extension('gif')
<ImageList [2]>

# or more explicitly limit your extension in the inital api call:
>>> snatch(url, with_extension=('gif',))
<ImageList [2]>

It's also very easy to hook your own filters or operations into Snatch's callbacks system. Let's say you only wanted to capture images that were larger than 250 px wide:

import requests
import Image
from StringIO import StringIO
from snatch import snatch

def wider_than_250(images):
    def filter_fn(image):
        if image.width is None:
            res = requests.get(image.src)
            img = Image.open(StringIO(res.content))
            image.width = img.size[0]
        return image.width > 250
    return filter(filter_fn, images)

url = 'http://octodex.github.com/images/pythocat.png'
callbacks = {'complete': wider_than_250}
images = snatch(url, callbacks=callbacks)

And even simpler to download all images from a URL:

import os
import requests
from snatch import snatch

directory = 'snatched-images'

if not os.path.exists(directory):
    os.mkdir(directory)

for image in snatch('http://octodex.github.com/pythocat/'):
    contents = requests.get(image.url).content
    with open('%s/%s' % (directory, image.filename), 'w') as image_file:
        image_file.write(contents)

Release History

0.1.0 (2013-10-12)

  • Initial write/scaffold, lots to fix/improve upon
 
File Type Py Version Uploaded on Size
snatch-0.1.0.tar.gz (md5) Source 2013-12-04 6KB
  • Downloads (All Versions):
  • 1 downloads in the last day
  • 17 downloads in the last week
  • 137 downloads in the last month
  • Author: maisano
  • Home Page: https://github.com/maisano/snatch
  • Keywords: image scraping
  • License:
    Copyright 2013 Richard Maisano
    
    Licensed under the Apache License, Version 2.0 (the "License");
    you may not use this file except in compliance with the License.
    You may obtain a copy of the License at
    
       http://www.apache.org/licenses/LICENSE-2.0
    
    Unless required by applicable law or agreed to in writing, software
    distributed under the License is distributed on an "AS IS" BASIS,
    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
    See the License for the specific language governing permissions and
    limitations under the License.
  • Categories
  • Package Index Owner: maisano
  • DOAP record: snatch-0.1.0.xml