Skip to main content

stop words lists in many languages

Project description

Simple Python package that provides a single function for loading sets of stop words for different languages.

Stop words in English, French, German, Finish, Hungarian, Turkish, Russian, Czech, Greek, Arabic, Chinese, Japanese, Korean, Catalan, Polish, Hebrew, Norwegian, Swedish, Italian, Portuguese and Spanish, were retrieved from the following sources:

The directory called orig contains the original files used to compile the stop word lists. The directory called not_used contains raw data for creating more stop words lists for languages that are not yet available in many_stop_words.available_languages

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page