Skip to main content

Manipulate macromolecular coordinate data using data frames.

Project description

Macromolecular Data Frames

Last release Python version Documentation Test status Test coverage Last commit

Macromol Dataframe is a library meant to help with processing macromolecular coordinate data, e.g. mmCIF files downloaded from the Protein Data Bank (PDB). The key idea behind this library is that the best way to work with such data is by using data frames, specifically polars.DataFrame. The advantages of this approach are:

  • Flexibility: Data frames are general-purpose data processing tools, and are more than capable of accommodating any kind of analysis.

  • Performance: Data frames are meant for processing huge quantities of data, and are accordingly well-optimized. Polars in particular achieves very good performance by using techniques such as execution planning, SIMD instructions, and multi-threading.

  • Familiarity: Data scientists work with data frames all the time, so using them here lowers the learning curve and makes this library easy to get started with. There's not much to learn!

Here's an example showing how to load a specific biological assembly from an mmCIF file:

>>> import macromol_dataframe as mmdf
>>> df = mmdf.read_biological_assembly('6uad.cif.gz', model_id='1', assembly_id='1')
>>> df.select('seq_id', 'comp_id', 'atom_id', 'x', 'y', 'z')
shape: (2_312, 6)
┌────────┬─────────┬─────────┬───────────┬──────────┬──────────┐
│ seq_id ┆ comp_id ┆ atom_id ┆ x         ┆ y        ┆ z        │
│ ---    ┆ ---     ┆ ---     ┆ ---       ┆ ---      ┆ ---      │
│ i64    ┆ str     ┆ str     ┆ f64       ┆ f64      ┆ f64      │
╞════════╪═════════╪═════════╪═══════════╪══════════╪══════════╡
│ 2      ┆ ASN     ┆ N       ┆ -9.89268  ┆ 25.4788  ┆ 9.32073  │
│ 2      ┆ ASN     ┆ CA      ┆ -11.30656 ┆ 25.42029 ┆ 8.91019  │
│ 2      ┆ ASN     ┆ C       ┆ -12.19303 ┆ 26.2788  ┆ 9.79681  │
│ 2      ┆ ASN     ┆ O       ┆ -12.48258 ┆ 25.8771  ┆ 10.91766 │
│ 2      ┆ ASN     ┆ CB      ┆ -11.82931 ┆ 23.99427 ┆ 8.9393   │
│ …      ┆ …       ┆ …       ┆ …         ┆ …        ┆ …        │
│ null   ┆ HOH     ┆ O       ┆ -41.101   ┆ 23.389   ┆ 7.03     │
│ null   ┆ HOH     ┆ O       ┆ -4.60757  ┆ 22.48844 ┆ 9.93407  │
│ null   ┆ HOH     ┆ O       ┆ -22.48104 ┆ 27.68223 ┆ -4.26327 │
│ null   ┆ HOH     ┆ O       ┆ -38.8232  ┆ 17.99957 ┆ 9.24767  │
│ null   ┆ HOH     ┆ O       ┆ -40.22527 ┆ 15.63538 ┆ 7.88049  │
└────────┴─────────┴─────────┴───────────┴──────────┴──────────┘

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

macromol_dataframe-0.3.0.tar.gz (14.9 kB view hashes)

Uploaded Source

Built Distribution

macromol_dataframe-0.3.0-py3-none-any.whl (15.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page