skip to navigation
skip to content

webstemmer 0.5.0

A web crawler and HTML layout analyzer

Latest Version: 0.7.1

Webstemmer is a web crawler and HTML layout analyzer. It extracts articles from news sites as plain text and removes banners, ads and/or navigation links automatically. You only need to give a URL of the top page of a site and it works in an almost fully automatic way with little human intervention.