Learn and Infer Non Compensatory Sortings
Project description
lincs is a collection of MCDA algorithms, usable as a C++ library, a Python package and a command-line utility.
lincs is licensed under the GNU Lesser General Public License v3.0 as indicated by the two files COPYING and COPYING.LESSER. It’s available on the Python package index. Its documentation and its source code are on GitHub.
@todo (When we have a paper to actually cite) Add a note asking academics to kindly cite our work.
Questions? Remarks? Bugs? Want to contribute? Open an issue or a discussion!
Contributors and previous work
lincs is developed by the MICS research team at CentraleSupélec.
Its main authors are (alphabetical order):
Laurent Cabaret (performance optimization)
Vincent Jacques (engineering)
Vincent Mousseau (domain expertise)
Wassila Ouerdane (domain expertise)
It’s based on work by:
Olivier Sobrie (The “weights, profiles, breed” learning strategy for MR-Sort models, and the profiles improvement heuristic, developed in his Ph.D thesis, and implemented in Python)
Emma Dixneuf, Thibault Monsel and Thomas Vindard (C++ implementation of Sobrie’s heuristic)
Project goals
Provide MCDA tools usable out of the box
You should be able to use lincs without being a specialist of MCDA and/or NCS models. Just follow the Get started section below.
Provide a base for developing new MCDA algorithms
lincs is designed to be easy to extend with new algorithms of even replace parts of existing algorithms. @todo Write doc about that use case.
linc also provides a benchmark framework to compare algorithms (@todo Write and document). This should make it easier to understand the relative strengths and weaknesses of each algorithm.
Get started
Install
First, you need to install a few dependencies (@todo build binary wheel distributions to make installation easier):
# System packages sudo apt-get install --yes g++ libboost-python-dev python3-dev libyaml-cpp-dev # CUDA sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/3bf863cc.pub sudo add-apt-repository 'deb https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ /' sudo apt-get update sudo apt-get install --yes cuda-cudart-dev-12-1 cuda-nvcc-12-1 # OR-tools wget https://github.com/google/or-tools/releases/download/v8.2/or-tools_ubuntu-20.04_v8.2.8710.tar.gz tar xf or-tools_ubuntu-20.04_v8.2.8710.tar.gz sudo cp -r or-tools_Ubuntu-20.04-64bit_v8.2.8710/include/* /usr/local/include sudo cp -r or-tools_Ubuntu-20.04-64bit_v8.2.8710/lib/*.so /usr/local/lib sudo ldconfig rm -r or-tools_Ubuntu-20.04-64bit_v8.2.8710 or-tools_ubuntu-20.04_v8.2.8710.tar.gz # Header-only libraries cd /usr/local/include sudo wget https://raw.githubusercontent.com/Neargye/magic_enum/v0.8.2/include/magic_enum.hpp sudo wget https://raw.githubusercontent.com/d99kris/rapidcsv/v8.75/src/rapidcsv.h sudo wget https://raw.githubusercontent.com/jacquev6/lov-e-cuda/13e45bc/lov-e.hpp sudo wget https://raw.githubusercontent.com/doctest/doctest/v2.4.11/doctest/doctest.h
Finally, lincs is available on the Python Package Index, so pip install lincs should finalize the install.
Concepts and files
lincs is based on the following concepts:
a “domain” describes the objects to be classified (a.k.a. the “alternatives”), the criteria used to classify them, and the existing categories they can belong to;
a “model” is used to actually assign a category to each alternative, based on the values of the criteria for that alternative;
a “classified alternative” is an alternative, with its category.
Start using lincs’ command-line interface
The command-line interface is the easiest way to get started with lincs, starting with lincs --help, which should output something like:
Usage: lincs [OPTIONS] COMMAND [ARGS]... lincs (Learn and Infer Non-Compensatory Sorting) is a set of tools for training and using MCDA models. Options: --help Show this message and exit. Commands: classification-accuracy Compute a classification accuracy. classify Classify alternatives. generate Generate synthetic data. learn Learn a model. visualize Make graphs from data.
It’s organized using sub-commands, the first one being generate, to generate synthetic pseudo-random data.
Generate a classification domain with 4 criteria and 3 categories (@todo Link to concepts and file formats):
lincs generate classification-domain 4 3 --output-domain domain.yml
The generated domain.yml should look like:
kind: classification-domain format_version: 1 criteria: - name: Criterion 1 value_type: real category_correlation: growing - name: Criterion 2 value_type: real category_correlation: growing - name: Criterion 3 value_type: real category_correlation: growing - name: Criterion 4 value_type: real category_correlation: growing categories: - name: Category 1 - name: Category 2 - name: Category 3
Then generate a classification model (@todo Link to concepts and file formats):
lincs generate classification-model domain.yml --output-model model.yml
It should look like:
kind: classification-model format_version: 1 boundaries: - profile: - 0.255905151 - 0.0551739037 - 0.162252158 - 0.0526000932 sufficient_coalitions: kind: weights criterion_weights: - 0.147771254 - 0.618687689 - 0.406786472 - 0.0960085914 - profile: - 0.676961303 - 0.324553937 - 0.673279881 - 0.598555863 sufficient_coalitions: kind: weights criterion_weights: - 0.147771254 - 0.618687689 - 0.406786472 - 0.0960085914
@todo Use YAML anchors and references to avoid repeating the same sufficient coalitions in all profiles
You can visualize it using:
lincs visualize classification-model domain.yml model.yml model.png
It should output something like:
And finally generate a set of classified alternatives (@todo Link to concepts and file formats):
lincs generate classified-alternatives domain.yml model.yml 1000 --output-classified-alternatives learning-set.csv
It should start with something like this, and contain 1000 alternatives:
name,"Criterion 1","Criterion 2","Criterion 3","Criterion 4",category "Alternative 1",0.37454012,0.796543002,0.95071429,0.183434784,"Category 3" "Alternative 2",0.731993914,0.779690981,0.598658502,0.596850157,"Category 2" "Alternative 3",0.156018645,0.445832759,0.15599452,0.0999749228,"Category 1" "Alternative 4",0.0580836125,0.4592489,0.866176128,0.333708614,"Category 3" "Alternative 5",0.601114988,0.14286682,0.708072603,0.650888503,"Category 2"
You can visualize its first five alternatives using:
lincs visualize classification-model domain.yml model.yml --alternatives learning-set.csv --alternatives-count 5 alternatives.png
It should output something like:
@todo Improve how this graph looks:
display categories as stacked solid colors
display alternatives in a color that matches their assigned category
remove the legend, place names (categories and alternatives) directly on the graph
You now have a (synthetic) learning set.
You can use it to train a new model:
# @todo Rename the command to `train`? lincs learn classification-model domain.yml learning-set.csv --output-model trained-model.yml
The trained model has the same structure as the original (synthetic) model because they are both MR-Sort models for the same domain, but the trained model is numerically different because information was lost in the process:
kind: classification-model format_version: 1 boundaries: - profile: - 0.00751833664 - 0.0549556538 - 0.162616938 - 0.193127945 sufficient_coalitions: kind: weights criterion_weights: - 0.499998987 - 0.5 - 0.5 - 0 - profile: - 0.0340298451 - 0.324480206 - 0.672487617 - 0.427051842 sufficient_coalitions: kind: weights criterion_weights: - 0.499998987 - 0.5 - 0.5 - 0
If the training is effective, the resulting trained model should behave closely to the original one. To see how close a trained model is to the original one, you can reclassify a testing set.
First, generate a testing set:
lincs generate classified-alternatives domain.yml model.yml 10000 --output-classified-alternatives testing-set.csv
And ask the trained model to classify it:
lincs classify domain.yml trained-model.yml testing-set.csv --output-classified-alternatives reclassified-testing-set.csv
There are a few differences between the original testing set and the reclassified one:
diff testing-set.csv reclassified-testing-set.csv
That command should show a few alternatives that are not classified the same way by the original and the trained model:
2595c2595 < "Alternative 2594",0.234433308,0.780464768,0.162389532,0.622178912,"Category 2" --- > "Alternative 2594",0.234433308,0.780464768,0.162389532,0.622178912,"Category 1" 5000c5000 < "Alternative 4999",0.074135974,0.496049821,0.672853291,0.782560945,"Category 2" --- > "Alternative 4999",0.074135974,0.496049821,0.672853291,0.782560945,"Category 3" 5346c5346 < "Alternative 5345",0.815349102,0.580399215,0.162403136,0.995580792,"Category 2" --- > "Alternative 5345",0.815349102,0.580399215,0.162403136,0.995580792,"Category 1" 9639c9639 < "Alternative 9638",0.939305425,0.0550933145,0.247014269,0.265170485,"Category 1" --- > "Alternative 9638",0.939305425,0.0550933145,0.247014269,0.265170485,"Category 2" 9689c9689 < "Alternative 9688",0.940304875,0.885046899,0.162586793,0.515185535,"Category 2" --- > "Alternative 9688",0.940304875,0.885046899,0.162586793,0.515185535,"Category 1" 9934c9934 < "Alternative 9933",0.705289483,0.11529737,0.162508503,0.0438248962,"Category 2" --- > "Alternative 9933",0.705289483,0.11529737,0.162508503,0.0438248962,"Category 1"
You can also measure the classification accuracy of the trained model on that testing set:
lincs classification-accuracy domain.yml trained-model.yml testing-set.csv
It should be close to 100%:
9994/10000
Once you’re comfortable with the tooling, you can use a learning set based on real-world data and train a model that you can use to classify new real-world alternatives.
User guide
@todo Write the user guide.
Reference
@todo Generate a reference documentation using Sphinx:
Python using autodoc
C++ using Doxygen+Breath
YAML file formats using JSON Schema and https://sphinx-jsonschema.readthedocs.io/en/latest/
Develop lincs itself
Run ./run-development-cycle.sh.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.