webstemmer 0.7.1
Web crawler and HTML layout analyzer
Webstemmer is a web crawler and HTML layout analyzer that automatically extracts main text of a news site without having banners, ads and/or navigation links mixed up.
- Author: Yusuke Shinyama <yusuke at cs dot nyu dot edu>
- Home Page: http://www.unixuser.org/~euske/python/webstemmer/index.html
- Download URL: http://www.unixuser.org/~euske/python/webstemmer/webstemmer-dist-0.7.1.tar.gz
- License: MIT/X
-
Categories
- Development Status :: 4 - Beta
- Environment :: Console
- Intended Audience :: Developers
- Intended Audience :: Information Technology
- Intended Audience :: Science/Research
- License :: OSI Approved :: MIT License
- Topic :: Internet :: WWW/HTTP
- Topic :: Scientific/Engineering :: Information Analysis
- Topic :: Text Processing :: Markup :: HTML
- Package Index Owner: euske
- DOAP record: webstemmer-0.7.1.xml
Log in to rate this package.
