Skip to main content

Interactive Classification System (ICS): a tool for machine learning-supported labeling of text

Project description

ICS - Interactive Classification System

The Interactive Classification System (ICS), is a web-based application that supports the activity of manual text classification, i.e., labeling documents according to their content.

The system is designed to give total freedom of action to its users: they can at any time modify any classification schema and any label assignment, possibly reusing any relevant information from previous activities.

The application uses machine learning to actively support its users with classification suggestions The machine learning component of the system is an unobtrusive observer of the users' activities, never interrupting them, constantly adapting and updating its models in response to their actions, and always available to perform automatic classifications.

Publication

ICS is described in the paper:

A. Esuli, "ICS: Total Freedom in Manual Text Classification Supported by Unobtrusive Machine Learning," in IEEE Access, vol. 10, pp. 64741-64760, 2022, doi: 10.1109/ACCESS.2022.3184009

Installation

You can have a working installation of ICS in many ways:

Docker

A quick way have a running instance of ICS is to use Docker.

docker run -p 8080:8080 andreaesuli/ics

This command pulls the ICS image from Docker hub and runs it, publishing the application on port 8080 of the host machine, accessible from any interface. Once started ICS is accessible from the host machine using a browser at the address http://127.0.0.1:8080

To have ICS accessible only from the local host machine add local ip address:

docker run -p 127.0.0.1:8080:8080 andreaesuli/ics

NOTE: by default the ICS image uses the SQLite database engine, which can results in performance drops caused by the access to DB, specially when multiple users access the system. A configuration using PostgreSQL is recommended. It can be easily set up using docker compose.

Data persistence

ICS image use volumes to keep information persistent:

  • ics-db stores the sqlite file, this is the only volume that should be saved to keep the state of the application.
  • ics-data stores the files that are uploaded or downloaded from the system. It is defined for inspection in case of failures, it is not necessary to save it.
  • ics-log stores the log files. It is defined for inspection in case of failures, it is not necessary to save it.

Docker compose

An instance of ICS using PostgreSQL can be obtained downloading the docker-compose.yml file to a local directory and running

docker compose up

from that directory.

Host and port

The environment variables ICS_HOST and ICS_PORT define the interface and port on which ICS is accessible on the host machine. Default is 127.0.0.1 and 8080.

Data persistence

The compose-based version of ICS use volumes to keep information persistent:

  • db-data stores the PostgreSQL, this is the only volume that should be saved to keep the state of the application.
  • ics-data stores the files that are uploaded or downloaded from the system. It is defined for inspection in case of failures, it is not necessary to save it.
  • ics-log stores the log files. It is defined for inspection in case of failures, it is not necessary to save it.

A volume can be linked to a path on the host machine by defining an environment variable (or by editing the docker-compose.yml file):

  • DB_DATA for the db-data volume (recommended)
  • ICS_DATA for the ics-data volume (not necessary)
  • ICS_LOG for the ics-log volume (not necessary)

For example, on Windows:

set DB_DATA=D:\ics_db_data
docker compose up

On Linux/Mac:

DB_DATA=/var/lib/ics/data docker compose up

Pip

The suggested way to quickly set up the python enviroment is to use the Anaconda/Miniconda distribution and the conda package manager to create the virtual enviroment.

conda create -n ics python
conda activate ics

ICS is published as a pip package.

pip install ics-pkg

The last required step is to configure a database.

From source

Download source code from GitHub repo. Create a virtual environment and install the required packages.

cd [directory with ICS code]
conda create -n ics python
conda activate ics
pip install -r requirements.txt

The last required step is to configure a database.

DB configuration

Docker installation already includes the setup of the DB, so you can skip this section. If you installed ICS using pip or the source code you must set up a DB.

The use of PostgreSQL is recommended to avoid performance drops caused by the access to DB, specially when multiple users access the system. Howerever, ICS can also work using other DB engines, such as SQLite.

SQLite

Running ICS using SQLite as the DB only require to pass a --db_connection_string argument to the launch script:

ics-webapp --db_connection_string sqlite:///ics.sqlite

SQLite is

PostgreSQL

By default ICS assumes to connect to PostgreSQL, using a database named 'ics' and a user named 'ics' (with password 'ics').

These are the SQL commands to create the required user and database on PostgreSQL.

CREATE USER ics WITH PASSWORD 'ics';
CREATE DATABASE ics;
GRANT ALL PRIVILEGES ON DATABASE ics to ics;

These command can be issued using the psql SQL shell (or using pgAdmin, or similar db frontends).

The tables required by ICS are created automatically at the first run.

The main app

Running the docker image automatically starts the main application, which can be accessed with a browser at the ip and port defined with the docker launch command or docker compose file. Installations that do not use docker can run ics by using the ics-webapp script.

Activate the virtual environment:

conda activate ics

When installed using pip, the main application can be started with the command:

ics-webapp

When working on source code, it can be launched from the ics-webapp.py script:

Linux/Mac:

PYTHONPATH=. python ics/scripts/ics-webapp.py

Windows:

set PYTHONPATH=. 
python ics/scripts/ics-webapp.py

When launched, the app will print the URL at which it is accessible.

[30/Mar/2022:15:31:59] ENGINE Bus STARTING
[30/Mar/2022:15:31:59] ENGINE Started monitor thread 'Autoreloader'.
[30/Mar/2022:15:31:59] ENGINE Serving on http://127.0.0.1:8080
[30/Mar/2022:15:31:59] ENGINE Bus STARTED
[30/Mar/2022:15:31:59] ENGINE Started monitor thread 'Session cleanup'.

Login

After the installation, only the admin user is defined, with password adminadmin. Change the default password on the first run.

Configuration

A configuration for ics-webapp can be saved to a file using the -s argument with the filename to use. For example, this command creates a default.conf file that lists all the default values (if any other argument is used in the command, the value of the argument is saved in the configuration file).

ics-webapp -s default.conf

A configuration file can be used to set the launch arguments, using the -c argument:

ics-webapp -c myinstance.conf

Any additional argument passed on the command line overrides the one specified in the configuration file.

Additional apps

These apps are clients that connect to the ICS web applications.

If you run ICS from Docker you must install them in a local python environment (pip install ics-pkg, note that you don't need to set up the DB for them)

If ICS is not running on the local machine with default port, you must use the --host [ip address or name] and/or the --port [number] arguments.

Command line interface

When the ics-webapp is running, ICS can be also accessed from command line

> ics-cli
Welcome, type help to have a list of commands
> login admin
Password: 
'Ok'
>

Twitter stream collector

A command line app, based on TwiGet, automatically upload to ICS the tweets collected from filtered stream queries.

> ics-twitter-uploader
Logging into http://127.0.0.1:8080/service/userauth/
Username: admin
Password: 
TwiGet 0.1.5

Available commands (type help <command> for details):
create, delete, exit, help, list, refresh, start, stop

Reminder: add -is:retweet to a rule to exclude retweets from results, and to get only original content.
Registered queries:
        no registered queries

[not collecting (0 since last start)]>

Video tutorials

This YouTube playlist collects videos showing what you can do with ICS.

License

This software is licensed under the 3-Clause BSD license unless otherwise noted.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ics-pkg-0.1.9.tar.gz (89.5 kB view hashes)

Uploaded Source

Built Distribution

ics_pkg-0.1.9-py3-none-any.whl (116.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page