gImageGrabber

Tools to download images from Google search

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

It provides tools to grab images from a google search by extracting the links of the images and downloading original images.

This module is written for windows 10 or Ubuntu 16.0 on 64-bit processor. It uses Selenium to open browser so as to scroll down to get more images than possible otherwise. Thus it needs a browser to work correctly. This is on default set to use chrome browser in case of inability to open it Firefox will be used. The package comes with chromedriver and geckodriver with it.

Installation

To install gImageGrabber do as follow:

$ pip install gImageGrabber

There are two python files imgScrape and imgTools.

imgScrape has all the utilities needed to run the script but if you want to have additional control over the functions you could explore imgTools.

Importing

To import this module to your script do this :

from gimagegrabber import imgScrape
from gimagegrabber import imgTools

Functions

Building URL

imgScrape.build_url(search)

This is to compose a google search URL for your search term. To specify your search term use search argument of the function to build the URL.

Usage :

from gimagegrabber import imgScrape

searchTerm = "kamikaze eminem"

url = imgScrape.build_url(searchTerm)
print(url) #FOR DEBUG PURPOSE

Getting Source Data

imgScrape.browser(url, test=False)

This to start a browser windows and scroll down the webpage to let more pictures load.It returns a raw source code data of the webpage encoded in utf-8 format. It takes 2 arguments url and test .

url is the url of the page it needs to open.
test is to make the browser scroll down less thus taking less time to return the source code. This is useful when you are writing or debugging something in your script.

It uses Chrome or Firefox to work so make sure you have Google chrome or Firefox installed at their default directory.

Sometimes you might need to click on show more images on webpage to load even more images

If you cant seem to open browser make sure you are on 64-bit OS and that you have chrome or Firefox installed.

If you are on 32-bit processor you need to use Firefox and you also have to download 32 bit driver from here and replace it with the already present geckodriver.exe saved in driver folder of the gImageGrabber Module folder.

Usage :

from gimagegrabber import imgScrape

searchTerm = "kamikaze eminem"

url = imgScrape.build_url(searchTerm)
raw_data = imgScrape.browser(url)
print(raw_data) #FOR DEBUG PURPOSE

Extracting Links

imgScrape.imageLink(html)

This extracts the original link of the images from the html`(source code) provided. :code:`html is the source code of the google image search page. It returns a dict with format [ link : file extension ] . If you want it in [file extension : link] you can use imgTools.invDict()` function from imgTools

Usage :

from gimagegrabber import imgScrape
from gimagegrabber import imgTools

searchTerm = "kamikaze eminem"
debug = False

url = imgScrape.build_url(searchTerm)
raw_data = imgScrape.browser(url,debug)
links = imgScrape.imageLink(raw_data)
print(links) #FOR DEBUG PURPOSE
print(imgTools,invDict(links)) #FOR DEBUG PURPOSE

Saving Images

imgScrape.saveImages(data, name, onlyType)`

This saves all the images given to it in a list of format [link: file extension].

It takes 3 arguments:

data This is to provide dictionary containing links to images in format [link: file extension].
name This is to provide the name for the folder under which images will be saved.
onlyType If you want only a particular file extension then use this mention that. If not, then pass it a empty string or just don’t use that argument.

The format in which it saves images is

Root folder
|-- Search Term
    |-- file extension(eg 'jpg')
        |-- 000001.jpg
        |-- 000002.jpg

Usage :

from gimagegrabber import imgScrape

searchTerm = "Kamikaze"
extension = '' #save all types of images

url = imgScrape.build_url(searchTerm)
raw_data = imgScrape.browser(url)
links = imgScrape.imageLink(raw_data)
imgScrape.saveImages(links,searchTerm,extension)

Example Code

This code is included in the package as simpleScript.py.

from imggrabber import imgScrape

# Search term
search = 'kamikaze eminem'
fType = ''  # if you want all the files them make it empty string
debug = False

html = imgScrape.browser(imgScrape.build_url(search), debug)
data = imgScrape.imageLink(html)
imgScrape.saveImages(data, search, fType)

Author

Saksham Sharma

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.1.16.3

Feb 1, 2020

0.1.16.2

Feb 1, 2020

0.1.16.1

Feb 1, 2020

0.1.16

Feb 1, 2020

0.1.15

Jun 21, 2019

0.1.14.1

Jun 20, 2019

0.1.14

Jun 20, 2019

0.1.13.1

Jun 20, 2019

0.1.13

Jun 20, 2019

0.1.12

Oct 23, 2018

0.1.11

Oct 22, 2018

0.1.10

Oct 22, 2018

0.1.10b0 pre-release

Oct 22, 2018

0.1.9

Oct 22, 2018

0.1.8

Oct 22, 2018

0.1.7

Oct 22, 2018

0.1.6

Oct 22, 2018

0.1.5

Oct 22, 2018

0.1.4

Oct 22, 2018

0.1.3

Oct 22, 2018

0.1.2

Oct 22, 2018

0.1.1

Oct 22, 2018

0.1.0

Oct 22, 2018

0.0.8

Oct 22, 2018

0.0.7

Oct 21, 2018

0.0.6

Oct 21, 2018

0.0.5

Oct 21, 2018

0.0.4

Oct 21, 2018

0.0.3

Oct 21, 2018

0.0.1

Oct 21, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gImageGrabber-0.1.16.3.tar.gz (8.8 MB view hashes)

Uploaded Feb 1, 2020 Source

Hashes for gImageGrabber-0.1.16.3.tar.gz

Hashes for gImageGrabber-0.1.16.3.tar.gz
Algorithm	Hash digest
SHA256	`38feeed573ee954ada76d9ddf56d086becc120d7b724ca8f17e0aeb97c0f16c5`
MD5	`9cfde31ba51b65514677acb933bb792d`
BLAKE2b-256	`34da007f0cfecf38ad8c7f910afb0ecf247cd345e6e431258029e6575a26c895`