Interactive Classification System (ICS): a tool for machine learning-supported labeling of text
Project description
ICS - Interactive Classification System
The Interactive Classification System (ICS), is a web-based application that supports the activity of manual text classification, i.e., labeling documents according to their content.
The system is designed to give total freedom of action to its users: they can at any time modify any classification schema and any label assignment, possibly reusing any relevant information from previous activities.
The application uses machine learning to actively support its users with classification suggestions The machine learning component of the system is an unobtrusive observer of the users' activities, never interrupting them, constantly adapting and updating its models in response to their actions, and always available to perform automatic classifications.
Installation
Installation: using pip (recommended)
The suggested way to quickly setup the python enviroment is to use
the Anaconda/Miniconda distribution and the conda
package manager to
create the virtual enviroment.
ICS is published as a pip
package.
> conda create -n ics python
> conda activate ics
> pip install ics-pkg
Installation: from source
Download source code from github repo.
> cd [directory with ICS code]
> conda create -n ics --file requirements.txt
> conda activate ics
Note: twiget is not listed as a requirement, as it is needed only by the twitter uploader script (pip install twiget
).
DB configuration
ICS requires a database to store its data.
By default ICS assumes the use of a database named 'ics' by a user named 'ics' (with password 'ics').
ICS is tested to work with PostgreSQL. These are the SQL commands to create the required user and database on PostgreSQL.
CREATE USER ics WITH PASSWORD 'ics';
CREATE DATABASE ics;
GRANT ALL PRIVILEGES ON DATABASE ics to ics;
These command can be issued using the psql
SQL shell (or using pgAdmin, or similar db frontends).
The tables required by ICS are created automatically at the first run.
Starting the main app
Activate the virtual environment:
> conda activate ics
When installed using pip
, the main application can be started with the command:
> ics-webapp
When working on source code, it can be launched from the ics-webapp.py
script:
Linux/Mac:
>PYTHONPATH=. python ics/scripts/ics-webapp.py
Windows:
>set PYTHONPATH=.
>python ics/scripts/ics-webapp.py
When launched, the app will print the URL at which it is accessible.
[30/Mar/2022:15:31:59] ENGINE Bus STARTING
[30/Mar/2022:15:31:59] ENGINE Started monitor thread 'Autoreloader'.
[30/Mar/2022:15:31:59] ENGINE Serving on http://127.0.0.1:8080
[30/Mar/2022:15:31:59] ENGINE Bus STARTED
[30/Mar/2022:15:31:59] ENGINE Started monitor thread 'Session cleanup'.
Login
After the installation, only the admin
user is defined, with password adminadmin
.
Configuration
A configuration for ics-start
can be saved to a file using the -s
argument with the filename to use. For example,
this command creates a default.conf
file that lists all the default values (if any other argument is used in the
command, the value of the argument is saved in the configuration file).
> ics-start -s default.conf
A configuration file can be used to set the launch arguments, using the -c
argument:
> ics-start -c myinstance.conf
Any additional argument passed on the command line overrides the one specified in the configuration file.
Additional apps
Command line interface
When the ics-webapp is running, ICS can be also accessed from command line
> ics-cli
Welcome, type help to have a list of commands
> login admin
Password:
'Ok'
>
Twitter stream collector
A command line app, based on TwiGet, automatically upload to ICS the tweets collected from filtered stream queries.
> ics-twitter-uploader
Logging into http://127.0.0.1:8080/service/userauth/
Username: admin
Password:
TwiGet 0.1.5
Available commands (type help <command> for details):
create, delete, exit, help, list, refresh, start, stop
Reminder: add -is:retweet to a rule to exclude retweets from results, and to get only original content.
Registered queries:
no registered queries
[not collecting (0 since last start)]>
Video tutorials
This YouTube playlist collects videos showing what you can do with ICS.
License
This software is licensed under the 3-Clause BSD license unless otherwise noted.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.