Skip to main content

Interactive Classification System (ICS): a tool for machine learning-supported labeling of text

Project description

ICS - Interactive Classification System

The Interactive Classification System (ICS), is a web-based application that supports the activity of manual text classification, i.e., labeling documents according to their content.

The system is designed to give total freedom of action to its users: they can at any time modify any classification schema and any label assignment, possibly reusing any relevant information from previous activities.

The application uses machine learning to actively support its users with classification suggestions The machine learning component of the system is an unobtrusive observer of the users' activities, never interrupting them, constantly adapting and updating its models in response to their actions, and always available to perform automatic classifications.

Publication

ICS is described in the paper:

A. Esuli, "ICS: Total Freedom in Manual Text Classification Supported by Unobtrusive Machine Learning," in IEEE Access, vol. 10, pp. 64741-64760, 2022, doi: 10.1109/ACCESS.2022.3184009

Installation

You can have a working installation of ICS in many ways:

Single file executable

Executable files of ICS are downloadable from the releases page. Once downloaded it can be run and have a working instance of ICS, provided a database is configured.

ics-webapp

The executable are from source using pyinstaller:

pyinstaller -F ics\scripts\webapp.py --add-data="ics\apps\media;ics\apps\media" --collect-all sklearn --name ics-webapp

Docker

A quick way have a running instance of ICS is to use Docker.

docker run -p 8080:8080 ghcr.io/aesuli/ics

This command pulls the ICS image from Docker hub and runs it, publishing the application on port 8080 of the host machine, accessible from any interface. Once started ICS is accessible from the host machine using a browser at the address http://127.0.0.1:8080

To have ICS accessible only from the local host machine add local ip address:

docker run -p 127.0.0.1:8080:8080 ghcr.io/aesuli/ics

NOTE: by default the ICS image uses the SQLite database engine, which may result in reduced efficiency and functionalities. A configuration using PostgreSQL is strongly recommended. It can be easily set up using docker compose.

Data persistence

ICS image use volumes to keep information persistent:

  • ics-db stores the sqlite file, this is the only volume that should be saved to keep the state of the application.
  • ics-data stores the files that are uploaded or downloaded from the system. It is defined for inspection in case of failures, it is not necessary to save it.
  • ics-log stores the log files. It is defined for inspection in case of failures, it is not necessary to save it.

Docker compose

An instance of ICS using PostgreSQL can be obtained downloading the docker-compose.yml file to a local directory and running

docker compose up

from that directory.

Host and port

The environment variables ICS_HOST and ICS_PORT define the interface and port on which ICS is accessible on the host machine. Default is 127.0.0.1 and 8080.

Data persistence

The compose-based version of ICS use volumes to keep information persistent:

  • db-data stores the PostgreSQL, this is the only volume that should be saved to keep the state of the application.
  • ics-data stores the files that are uploaded or downloaded from the system. It is defined for inspection in case of failures, it is not necessary to save it.
  • ics-log stores the log files. It is defined for inspection in case of failures, it is not necessary to save it.

A volume can be linked to a path on the host machine by defining an environment variable (or by editing the docker-compose.yml file):

  • DB_DATA for the db-data volume (recommended)
  • ICS_DATA for the ics-data volume (not necessary)
  • ICS_LOG for the ics-log volume (not necessary)

For example, on Windows:

set DB_DATA=D:\ics_db_data
docker compose up

On Linux/Mac:

DB_DATA=/var/lib/ics/data docker compose up

Pip

The suggested way to quickly set up the python environment is to use the Anaconda/Miniconda distribution and the conda package manager to create the virtual enviroment.

conda create -n ics python
conda activate ics

ICS is published as a pip package.

pip install ics-pkg

The last required step is to configure a database.

From source

Download source code from GitHub repo. Create a virtual environment and install the required packages.

cd [directory with ICS code]
conda create -n ics python
conda activate ics
pip install -r requirements.txt

The last required step is to configure a database.

DB configuration

The Docker compose installation already includes the setup of the PostgreSQL database, so you can skip this section. Any another requires to have a database available to connect to. The use of PostgreSQL is strongly recommended.

PostgreSQL

To connect to PostgreSQL, a dedicated DB must be created. These are the SQL commands to create the required user and database on PostgreSQL.

CREATE USER ics WITH PASSWORD 'ics';
CREATE DATABASE ics;
GRANT ALL PRIVILEGES ON DATABASE ics to ics;

These commands can be issued using the psql SQL shell (or using pgAdmin, or similar db frontends). The tables required by ICS are created automatically at the first run.

Then ICS can be launched passing the DB connection string:

ics-webapp --db_connection_string postgresql://ics:ics@localhost:5432/ics

The above connection string is the correct one for a locally running database, change it according to your configuration.

SQLite

By default ICS uses SQLite as the DB, yet please note that the use of SQLite is intended only for a first exploration of ICS and that using PostgreSQL is strongly recommended. Using SQLite can result in reduced efficiency and some functionalities may be missing or not properly working.

To use SQLite use the following --db_connection_string argument to the launch script:

ics-webapp --db_connection_string sqlite:///ics.sqlite

This is the default connection string, it creates the DB file in the current working directory. Change it to point to the path where you want to store your file.

Again, PostgreSQL is the recommended database.

The main app

Running the docker image automatically starts the main application, which can be accessed with a browser at the ip and port defined with the docker launch command or docker compose file. Installations that do not use docker can run ics by using the ics-webapp script.

Activate the virtual environment:

conda activate ics

When installed using pip, the main application can be started with the command:

ics-webapp

When working on source code, it can be launched from the ics-webapp.py script:

Linux/Mac:

PYTHONPATH=. python ics/scripts/ics-webapp.py

Windows:

set PYTHONPATH=. 
python ics/scripts/ics-webapp.py

When launched, the app will print the URL at which it is accessible.

[30/Mar/2022:15:31:59] ENGINE Bus STARTING
[30/Mar/2022:15:31:59] ENGINE Started monitor thread 'Autoreloader'.
[30/Mar/2022:15:31:59] ENGINE Serving on http://127.0.0.1:8080
[30/Mar/2022:15:31:59] ENGINE Bus STARTED
[30/Mar/2022:15:31:59] ENGINE Started monitor thread 'Session cleanup'.

Login

After the installation, only the admin user is defined, with password adminadmin. Change the default password on the first run.

Configuration

A configuration for ics-webapp can be saved to a file using the -s argument with the filename to use. For example, this command creates a default.conf file that lists all the default values (if any other argument is used in the command, the value of the argument is saved in the configuration file).

ics-webapp -s default.conf

A configuration file can be used to set the launch arguments, using the -c argument:

ics-webapp -c myinstance.conf

Any additional argument passed on the command line overrides the one specified in the configuration file.

Additional apps

These apps are clients that connect to the ICS web applications.

If you run ICS from Docker you must install them in a local python environment (pip install ics-pkg, note that you don't need to set up the DB for them)

If ICS is not running on the local machine with default port, you must use the --host [ip address or name] and/or the --port [number] arguments.

Command line interface

When the ics-webapp is running, ICS can be also accessed from command line

> ics-cli
Welcome, type help to have a list of commands
> login admin
Password: 
'Ok'
>

Twitter stream collector

A command line app, based on TwiGet, automatically upload to ICS the tweets collected from filtered stream queries.

> ics-twitter-uploader
Logging into http://127.0.0.1:8080/service/userauth/
Username: admin
Password: 
TwiGet 0.1.5

Available commands (type help <command> for details):
create, delete, exit, help, list, refresh, start, stop

Reminder: add -is:retweet to a rule to exclude retweets from results, and to get only original content.
Registered queries:
        no registered queries

[not collecting (0 since last start)]>

Video tutorials

This YouTube playlist collects videos showing what you can do with ICS.

License

This software is licensed under the 3-Clause BSD license unless otherwise noted.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ics-pkg-0.2.0.tar.gz (91.0 kB view hashes)

Uploaded Source

Built Distribution

ics_pkg-0.2.0-py3-none-any.whl (117.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page