xenosite-fragment

Library for molecule fragment operations.

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Intended Audience
- Education
- Science/Research
Natural Language
- English
Operating System
- MacOS
- POSIX :: Linux
Programming Language
Topic
- Scientific/Engineering :: Chemistry
Typing
- Typed

Project description

Library for processing molecule fragments.

Create a fragment from a SMILES or a fragment SMILES string:

str(Fragment("CCCC")) # Valid smiles 'C-C-C-C'

str(Fragment("ccCC")) # not a valid SMILES 'c:c-C-C'

Optionally, create a fragment of a molecule from a string and (optionally) a list of nodes in the fragment.

F = Fragment("CCCCCCOc1ccccc1", [0,1,2,3,4,5]) str(F) # hexane 'C-C-C-C-C-C'

If IDs are provided, they MUST select a connected fragment.

F = Fragment("CCCCCCOc1ccccc1", [0,10]) Traceback (most recent call last): ... ValueError: Multiple components in graph are not allowed.

Get the canonical representation of a fragment:

Fragment("O-C").canonical().string 'C-O' Fragment("OC").canonical().string 'C-O' Fragment("CO").canonical().string 'C-O'

Get the reordering of nodes used to create the canonical string representation. If remap=True, then the ID are remapped to the input representation used to initalize the Fragment.

Fragment("COC", [1,2]).canonical(remap=True).reordering [2, 1] Fragment("COC", [1,2]).canonical().reordering [1, 0]

Match fragment to a molecule. By default, the ID correspond with fragment IDs. If remap=True, the ID corresponds to the input representation when the Fragment was initialized.

smiles = "CCCC1CCOCN1" F = Fragment("CCCCCC") # hexane as a string list(F.matches(smiles)) # smiles string (least efficient) [(0, 1, 2, 3, 4, 5)]

import rdkit mol = rdkit.Chem.MolFromSmiles(smiles) list(F.matches(mol)) # RDKit mol [(0, 1, 2, 3, 4, 5)]

mol_graph = Graph.from_molecule(mol) list(F.matches(mol, mol_graph)) # RDKit mol and Graph (most efficient) [(0, 1, 2, 3, 4, 5)]

Matches ensure that the fragment string of matches is the same as the fragment. This is different than standards SMARTS matching, and prevents rings from matching unclosed ring patterns:

str(Fragment("C1CCCCC1")) # cyclohexane 'C1-C-C-C-C-C-1'

assert(str(Fragment("C1CCCCC1")) != str(F)) # cyclohexane is not hexane list(F.matches("C1CCCCC1")) # Unlike SMARTS, no match! []

Efficiently create multiple fragments by reusing a precomputed graph:

import rdkit

mol = rdkit.Chem.MolFromSmiles("c1ccccc1OCCC") mol_graph = Graph.from_molecule(mol)

f1 = Fragment(mol_graph, [0]) f2 = Fragment(mol_graph, [6,5,4])

Find matches to fragments:

list(f1.matches(mol)) [(0,), (1,), (2,), (3,), (4,), (5,)]

list(f2.matches(mol)) [(6, 5, 4), (6, 5, 0)]

Fragments know how to report if they are canonically the same as each other or strings.

Fragment("CCO") == Fragment("OCC") True Fragment("CCO") == "C-C-O" True

Note, however, that strings are not converted to canonical form. Therefore,

Fragment("CCO") == "CCO" False

Enumerate and compute statistics on all the subgraphs in a molecule:

from xenosite.fragment.net import SubGraphFragmentNetwork N = SubGraphFragmentNetwork("CC1COC1") fragments = N.to_pandas() list(fragments.index) ['C-C', 'C', 'C-O-C', 'C-O', 'O', 'C-C1-C-O-C-1', 'C1-C-O-C-1', 'C-C-C-O', 'C-C(-C)-C', 'C-C-O', 'C-C-C'] fragments["size"].to_numpy() array([2, 1, 3, 2, 1, 5, 4, 4, 4, 3, 3])

Better fragments can be enumerated by collapsing all atoms in a ring into a single node during subgraph enumeration.

from xenosite.fragment.net import RingFragmentNetwork N = RingFragmentNetwork("CC1COC1") fragments = N.to_pandas() list(fragments.index) ['C-C1-C-O-C-1', 'C', 'C1-C-O-C-1'] fragments["size"].to_numpy() array([5, 1, 4])

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Intended Audience
- Education
- Science/Research
Natural Language
- English
Operating System
- MacOS
- POSIX :: Linux
Programming Language
Topic
- Scientific/Engineering :: Chemistry
Typing
- Typed

Release history Release notifications | RSS feed

0a11 pre-release

Jul 23, 2023

0a10 pre-release

Jul 5, 2023

0a9 pre-release

Jul 1, 2023

0a8 pre-release

Jun 23, 2023

0a7 pre-release

Jun 17, 2023

0a5 pre-release

Jun 14, 2023

0a4 pre-release

Jun 9, 2023

This version

0a3 pre-release

Jun 9, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xenosite-fragment-0a3.tar.gz (23.7 kB view hashes)

Uploaded Jun 9, 2023 Source

Hashes for xenosite-fragment-0a3.tar.gz

Hashes for xenosite-fragment-0a3.tar.gz
Algorithm	Hash digest
SHA256	`d4d1b91b4cbe135183352a3e47850fc92706432f783feb78f80ad37966e0eef7`
MD5	`3b339ad914e091fd2156533f00e14a62`
BLAKE2b-256	`257fde4c3d0a8f367f373c4c6459878af85d92d62b3a154dfffe2e9218e1b60d`