crawler 0.1.0
python crawler.
Latest Version: 0.1.2
python crawler. ===== ## Example ===== from crawler.crawler import Crawler mycrawler = Crawler() seeds = ['http://www.example.com/'] # list of url mycrawler.add_seeds(seeds) url_patterns = ['^(.+example\.com)(.+)$'] # list of regular expression for urls that crawler will work on. mycrawler.start(url_patterns) # start crawling ################# data files ################# three database (Berkeley DB) files will be generated. queue.db webpage.db duplcheck.db
| File | Type | Py Version | Uploaded on | Size | # downloads |
|---|---|---|---|---|---|
| crawler-0.1.0.tar.gz (md5) | Source | 2011-01-17 | 4KB | 420 | |
| crawler-0.1.0.win32.exe (md5) | MS Windows installer | 2.7 | 2011-01-17 | 65KB | 246 |
- Author: Yifei Jiang
- Home Page: http://crawler.yifeijiang.com
- Download URL: http://code.google.com/p/python-crawler/downloads/list
- Keywords: python crawler spider
- License: Apache License 2.0
- Requires lxml, bsddb3
-
Categories
- Development Status :: 4 - Beta
- Environment :: Other Environment
- Intended Audience :: Developers
- License :: OSI Approved :: Apache Software License
- Operating System :: OS Independent
- Programming Language :: Python
- Programming Language :: Python :: 2.6
- Topic :: Internet :: WWW/HTTP
- Topic :: Software Development :: Libraries :: Python Modules
- Package Index Owner: Yifei.Jiang
- DOAP record: crawler-0.1.0.xml
