skip to navigation
skip to content

gugle.bot 1.0dev-r629

Small and Dumb Spider

Downloads ↓

Introduction

gugle.bot is a highly experimental web spider. It collects everything, gugle.bot does not distinguish trash from goods.

The intended use of gugle.bot is just to make experiments requiring the collection of links in web pages.

Instalation

Just type:

easy_install gugle.bot

That would be enough. If you find a non-declared dependency (and because of that gugle.bot just does not work), report it to me.

Usage of this package

This package provides a console script guglebot. This is a handy shortcut to start the spider.

Get help by typing:

guglebot --help

What data is collected?

Currently, we keep a list of URLs and a list (referer, target) pairs, nothing else.

How to inspect the collected data

Although we provide several scripts, you may need to craft your own in order to get all the information you need.

The following is a very compact list of the current scripts:

gbdomains
Shows every domain collected
gbinspect
Simply prints a summary of all collected data
gblist
Prints every collected URL
gbgraph
Prints every pair of (referrer, target).

Changelog

1.0 - Unreleased

  • Initial release
 
File Type Py Version Uploaded on Size # downloads
gugle.bot-1.0dev-r629.tar.gz (md5, pgp) Source 2008-05-12 9KB 733