Skip to main content

Pythonic HTTP Client and Middleware Library for SnapSearch

Project description

http://badge.fury.io/py/snapsearch-client-python.png Downloads License

SnapSearch Client Python is a Python based framework agnostic HTTP client library for SnapSearch (https://snapsearch.io/).

SnapSearch provides similar libraries in other languages: https://github.com/SnapSearch/SnapSearch-Clients

Installation

The Pythonic snapsearch-client requires a dependable HTTP library that can verify SSL certificates for HTTPS requests. Normally, this means you need to have either requests or PycURL.

To install with pip, simply:

$ pip install snapsearch-client-python

Or, if you prefer easy_install:

$ easy_install snapsearch-client-python

Usage

The Pythonic SnapSearch Client provides WSGI and CGI middlewares for integrating SnapSearch with respective Python Web applications. There are also framework agnostic core objects that can be used independently.

The following examples include step-by-step instructions on the context of using the Pythonic SnapSearch Client in your Python web applications.

For full documentation on the API and API request parameters see: https://snapsearch.io/documentation

Basic Usage

The below instructions is an abridged version of the Flask example. The following python script serves a simple "Hello World" page through any of the public IP address(es) of the runner machine.

from flask import Flask
app = Flask(__name__)

@app.route('/')
def hello_world():
    return "Hello World!\r\n"

if __name__ == '__main__':
    app.run(host="0.0.0.0", port=5000)

To start the application,

$ pip install Flask
$ pip install snapsearch-client-python
$ python main.py
 * Running on http://0.0.0.0:5000/

To enable SnapSearch-based interception for this application,

  1. initialize an Interceptor.

from SnapSearch import Client, Detector, Interceptor
interceptor = Interceptor(Client(api_email, api_key), Detector())
  1. deploy the Interceptor.

from SnapSearch.wsgi import InterceptorMiddleware
app.wsgi_app = InterceptorMiddleware(app.wsgi_app, interceptor)
  1. putting it all together.

from flask import Flask
app = Flask(__name__)

@app.route('/')
def hello_world():
    return "Hello World!\r\n"

if __name__ == '__main__':
    # API credentials
    api_email = "<email>"  # change this to the registered email
    api_key = "<key>"  # change this to the real api credential

    # initialize the interceptor
    from SnapSearch import Client, Detector, Interceptor
    interceptor = Interceptor(Client(api_email, api_key), Detector())

    # deploy the interceptor
    from SnapSearch.wsgi import InterceptorMiddleware
    app.wsgi_app = InterceptorMiddleware(app.wsgi_app, interceptor)

    # start servicing
    app.run(host="0.0.0.0", port=5000)

Advanced Topics

Customizing the Detector

The Detector class can take ignored_routes and matched_routes as optional arguments to its constructor and perform interception detection in a per-route basis. For example, the following detector will bypass interception for any access to http://<server_name>/ignored.*, and enforce interception for any access to http://<server_name>/matched.*.

from SnapSearch import Detector
detector = Detector(ignored_routes=["^\/ignored", ],
                    matched_routes=["^\/matched", ])

The Detector class can take external robots.json and extensions.json files as optional arguments to its constructor. Namely,

from SnapSearch import Detector
detector = Detector(robots_json="path/to/external/robots.json",
                    extensions_json="path/to/external/extensions.json")

You can also modify the lists of robots and extension through the robots and extensions properties of the detector object. For example, the following customization will bypass interception for Googlebot.

from SnapSearch import Detector
detector = Detector(robots_json="path/to/external/robots.json",
                    extensions_json="path/to/external/extensions.json")
detector.robots['ignore'].append("Googlebot")

Customizing the Client

The Client class can take an optional dict of request_parameters that contains additional parameters defined in https://snapsearch.io/documentation#parameters . Note that the url parameter is always overwritten by the Interceptor with the encoded URL from the associated Detector object. It can also take optional api_url and ca_path to communicate with an alternative backend service.

Customizing the Interceptor

The Interceptor class can take two optional callback functions, namely before_intercept() and after_intercept().

At the presence of before_intercept(), the Interceptor object will bypass any communication with the backend service of SnapSearch, and return the result of before_intercept() as if it were returned by the associated Client object.

def before_intercept(url):
    ...
    return result

As for after_intercept(), the Interceptor will provide the response from the Client object to after_intercept() which can perform, say, data extraction or logging as appropriate.

def after_intercept(url, response):
    ...
    return None

The return value of after_response() is ignored by the Interceptor and it does not affect the interception process.

Developers’ Resources

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

snapsearch-client-python-0.0.7.tar.gz (311.6 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page