Skip to main content

Classify contact form messages as spam or not.

Project description

Django Spam Classifier

Contact form spam getting you down? We know the feeling. It's demeaning, draining and relentless.

This a very basic Django app that uses dbacl Bayesian text classification tool to filter out contact form spam. It's not perfect, but it works very well on blocking the really offensive English text spam. The app was written to avoid depending on external services like reCAPTCHA or Akismet - these services work well enough, but introduce some privacy concerns.

Limitations

Currently doesn't work so well on non-English text, very short input, garbage input or HTML only with a single hyperlink. It's possible that dbacl may have options to deal more effectively with this.

Additionally, dbacl seems to be not so actively maintained, and is currently not available on Debian Bullseye. I may switch to bogofilter or other Bayesian filtering options in the future.

Getting started

  • Install django-spam-classifier

  • Install dbacl via your OS package manager

  • Add a BASE_DIR setting

  • Enable Django django.contrib.sites app and configure your site domain via Django Admin (used for training links in emails)

  • Add 'classifier' to your INSTALLED_APPS setting

  • Add path('', include('classifier.urls')), to your project's urls.py

  • Run python manage.py migrate

  • Create the classifier_data directory to hold the classifier database

  • In contact form call classifier.is_spam() on all text accepted by your form:

    spam, submission = is_spam('\n'.join(submission_fields))
    if spam:
        # Throw away the form submission and don't notify anyone.
    else:
        # Process the form submission as normal.
    

    Doing so will internally use dbacl to classify the submission as spam or not spam and generate a confidence of 0-100. Spam/not-spam with a high confidence is processed as you'd expect. If the confidence is below the RECORD_AND_DISCARD_CONFIDENCE, the submission is treated as not spam because confidence is too low to make a safe decision. The body is recorded in the Submissions model and can be manually classified via the Django Admin. If the confidence is above RECORD_AND_DISCARD_CONFIDENCE but below SILENTLY_DISCARD_CONFIDENCE, the submission is treated as confidently spam, but also recorded to the Submissions model for manual classification.

  • Add a training link to the footer of any notification email you send::

    email_body = email_body + spam_footer(submission, site)
    

    Which will output something like:

    --
    Spam score: spam (15% confidence)
    Train as spam: https://example.com/classifier/1704/spam/
    Train as not spam: https://example.com/classifier/1704/not-spam/
    
  • Ensure you have a logging configuration set up so you can see log messages

  • Add a cron job to regularly (eg. daily) update the training database with any new manual classifications you've made:

    python manage.py train
    
  • Visit the Django Admin and classify the low-confidence submissions you receive.

  • Tune the Django settings as desired (optional):

    CLASSIFIER = {
       'SILENTLY_DISCARD_CONFIDENCE': 90,  # Defaults to 80
      'RECORD_AND_DISCARD_CONFIDENCE': 75,  # Defaults to 60
    }
    

Development

Create a venv and install the development requirements:

python3 -m python3.8 -m venv --system-site-packages [VENV-PATH]
source [VENV_PATH]/bin/activate
python -m pip install Django pytz

TODO: There is undoubtedly a better way of installing dev-dependencies. Perhaps poetry or flit? Are they the only tools that handle this? What's generally accepted?

Run tests with tox or:

PYTHONPATH=src:.:$PYTHONPATH DJANGO_SETTINGS_MODULE=tests.test_settings pytest tests

Create migrations with:

DJANGO_SETTINGS_MODULE=tests.test_settings python -m django makemigrations

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django-spam-classifier-0.1.0.tar.gz (12.8 kB view hashes)

Uploaded Source

Built Distribution

django_spam_classifier-0.1.0-py3-none-any.whl (13.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page