Skip to main content

scikit-learn classes for molecule transformation

Project description

scikit-mol

Fancy logo Fancy logo

Scikit-Learn classes for molecular vectorization using RDKit

The intended usage is to be able to add molecular vectorization directly into scikit-learn pipelines, so that the final model directly predict on RDKit molecules or SMILES strings

As example with the needed scikit-learn and -mol imports and RDKit mol objects in the mol_list_train and _test lists:

pipe = Pipeline([('mol_transformer', MorganFingerprintTransformer()), ('Regressor', Ridge())])
pipe.fit(mol_list_train, y_train)
pipe.score(mol_list_test, y_test)
pipe.predict([Chem.MolFromSmiles('c1ccccc1C(=O)C')])

>>> array([4.93858815])

The scikit-learn compatibility should also make it easier to include the fingerprinting step in hyperparameter tuning with scikit-learns utilities

The first draft for the project was created at the RDKIT UGM 2022 hackathon 2022-October-14

Implemented

  • Descriptors
    • MolecularDescriptorTransformer

  • Fingerprints
    • MorganFingerprintTransformer
    • MACCSKeysFingerprintTransformer
    • RDKitFingerprintTransformer
    • AtomPairFingerprintTransformer
    • TopologicalTorsionFingerprintTransformer
    • MHFingerprintTransformer
    • SECFingerprintTransformer
    • AvalonFingerprintTransformer

  • Conversions
    • SmilesToMol

  • Standardizer
    • Standardizer

  • Utilities
    • CheckSmilesSanitazion

Installation

Users can install latest tagged release from pip

pip install scikit-mol

Bleeding edge

pip install git+https://github.com:EBjerrum/scikit-mol.git

Documentation

There are a collection of notebooks in the notebooks directory which demonstrates some different aspects and use cases

Contributing

There are more information about how to contribute to the project in CONTRIBUTION.md

BUGS

Probably still, please check issues at GitHub and report there

Contributers:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scikit_mol-0.3.1.tar.gz (3.2 MB view hashes)

Uploaded Source

Built Distribution

scikit_mol-0.3.1-py3-none-any.whl (18.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page