A python library to featurize molecules.
Project description
molfeat - all your molecular featurizers in one place .
Molfeat is a python library to simplify molecular featurization. It supports a wide variety of molecular featurizers out-of-the-box and can be easily extended to add your own.
- :snake: Simple pythonic API.
- :rocket: Fast and efficient featurization.
- :arrows_counterclockwise: Unifies pre-trained embeddings and hand-crafted featurizers in a single package.
- :heavy_plus_sign: Easily extend Molfeat with your own featurizers through plugins.
- :chart_with_upwards_trend: Benefit from increased performance through a trouble-free caching system.
Visit our website at https://molfeat.datamol.io.
Installation
Installing Molfeat
Use mamba:
mamba install -c conda-forge molfeat
Tips: You can replace mamba
by conda
.
Note: We highly recommend using a Conda Python distribution to install Molfeat. The package is also pip installable if you need it: pip install molfeat
.
Installing Plugins
The functionality of Molfeat can be extended through plugins. The usage of a plugin system ensures that the core package remains easy to install and as light as possible, while making it easy to extend its functionality with plug-and-play components. Additionally, it ensures that plugins can be developed independently from the core package, removing the bottleneck of a central party that reviews and approves new plugins. Consult the Molfeat documentation for more details on how to create your own plugins.
This, however, does imply that the installation of a plugin is plugin-dependent: Please consult its documentation to learn more.
Optional dependencies
Not all featurizers of the Molfeat core package are supported by default. Some featurizers require additional dependencies. If you try to use a featurizer that requires additional dependencies, Molfeat will raise an error and will tell you which dependencies are missing and how to install these.
API tour
import datamol as dm
from molfeat.calc import FPCalculator
from molfeat.trans import MoleculeTransformer
from molfeat.store.modelstore import ModelStore
# Load some dummy data
data = dm.data.freesolv().sample(500).smiles.values
# Featurize a single molecule
calc = FPCalculator("ecfp")
calc(data[0])
# Define a parallelized featurization pipeline
trans = MoleculeTransformer(calc, n_jobs=-1)
trans(data)
# Easily save and load featurizers
trans.to_state_yaml_file("state_dict.yml")
trans = MoleculeTransformer.from_state_yaml_file("state_dict.yml")
trans(data)
# List all availaible featurizers
store = ModelStore()
store.available_models
# Find a featurizer and learn how to use it
model_card = store.search(name="ChemBERTa-77M-MLM")[0]
model_card.usage()
# Load a featurizer through the store
trans, model_info = store.load(model_card)
How to cite
Please cite Molfeat if you use it in your research: .
Changelogs
See the latest changelogs at CHANGELOG.rst.
License
Under the Apache-2.0 license. See LICENSE.
Authors
See AUTHORS.rst.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.