Tools for creating machine-learning datasets from macromolecular structure
Project description
Macromolecule Census is a set of tools for creating machine-learning datasets from macromolecular structure data, especially those made available by the protein data bank (PDB). The purpose of these tools is to account for the following:
Filter for high-quality (e.g. high resolution, low R-factor), low-redundancy (i.e. sequence identity cutoffs) structures.
Make robust training/validation/test splits by accounting for domain-level structural similarities.
Store atomic coordinates in a compact, portable, standard format (SQLite).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
macromol_census-0.1.0.tar.gz
(26.3 kB
view hashes)
Built Distribution
Close
Hashes for macromol_census-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 25e64dc1e98ef36e9bfa13bc3b7d74af2fe0ceb6b29d131f5cf7735008f78da1 |
|
MD5 | f2c14f94132c98fc925be9988f6ef1ba |
|
BLAKE2b-256 | 34ff8deb21d82ca19fbb9c692df789d32a7530976d8c36111bfb75918ee4f87e |