scikit-learn classes for molecule transformation
Project description
scikit-mol
Scikit-Learn classes for molecular vectorization using RDKit
The intended usage is to be able to add molecular vectorization directly into scikit-learn pipelines, so that the final model directly predict on RDKit molecules or SMILES strings
As example with the needed scikit-learn and -mol imports and RDKit mol objects in the mol_list_train and _test lists:
pipe = Pipeline([('mol_transformer', MorganTransformer()), ('Regressor', Ridge())])
pipe.fit(mol_list_train, y_train)
pipe.score(mol_list_test, y_test)
pipe.predict([Chem.MolFromSmiles('c1ccccc1C(=O)C')])
>>> array([4.93858815])
The scikit-learn compatibility should also make it easier to include the fingerprinting step in hyperparameter tuning with scikit-learns utilities
The first draft for the project was created at the RDKIT UGM 2022 hackathon 2022-October-14
Implemented
- descriptors
- MolecularDescriptorTransformer
* fingerprints * MorganFingerprintTransformer * MACCSKeysFingerprintTransformer * RDKitFingerprintTransformer * AtomPairFingerprintTransformer * TopologicalTorsionFingerprintTransformer * MHFingerprintTransformer * SECFingerprintTransformer * AvalonFingerprintTransformer
* conversions * SmilesToMol
* standardizer * Standardizer
* utilities * CheckSmilesSanitazion
Installation
Users can install latest tagged release from pip
pip install scikit-mol
Bleeding edge
pip install git+https://github.com:EBjerrum/scikit-mol.git
Developers
git clone git@github.com:EBjerrum/scikit-mol.git
pip install -e .
Documentation
There are a collection of notebooks in the notebooks directory which demonstrates some different aspects and use cases
- Basic Usage and fingerprint transformers
- Descriptor transformer
- Pipelining with Scikit-Learn classes
- Molecular standardization
- Sanitizing SMILES input
- Integrated hyperparameter tuning of Scikit-Learn estimator and Scikit-Mol transformer
- Using parallel execution to speed up descriptor and fingerprint calculations
BUGS
Probably still, please check issues at GitHub and report there
Contributers:
- Esben Jannik Bjerrum @ebjerrum, esben@cheminformania.com
- Carmen Esposito @cespos
- Son Ha, sonha@uni-mainz.de
- Oh-hyeon Choung, ohhyeon.choung@gmail.com
- Andreas Poehlmann, @ap--
- Ya Chen, @anya-chen
- Rafał Bachorz @rafalbachorz
- Adrien Chaton @adrienchaton
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scikit_mol-0.2.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | de3c2b07815c7a03604178dcc9cb97568fd1a32a63f3efc3020dd602a0e50786 |
|
MD5 | 832c5a5ae907f147f63c091e4bd96139 |
|
BLAKE2b-256 | e1be10a8d29b33bb008cbd5ebdea990fe8bf8bc5d22251ec99946e5462184eb2 |