mchmm

Markov chains and Hidden Markov models

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

https://img.shields.io/circleci/build/github/maximtrp/mchmm

https://img.shields.io/github/issues/maximtrp/mchmm.svg

https://codecov.io/gh/maximtrp/mchmm/branch/master/graph/badge.svg

https://app.codacy.com/project/badge/Grade/d7881a827eb9473d89aa1fc10fdd855e

mchmm is a Python package implementing Markov chains and Hidden Markov models in pure NumPy and SciPy. It can also visualize Markov chains (see below).

Dependencies

NumPy
SciPy

Installation

Install from PyPi:

$ pip install mchmm

Clone a GitHub repository:

$ git clone https://github.com/maximtrp/mchmm.git
$ cd mchmm
$ pip install . --user

Features

Discrete Markov chains

Initializing a Markov chain using some data.

>>> import mchmm as mc
>>> a = mc.MarkovChain().from_data('AABCABCBAAAACBCBACBABCABCBACBACBABABCBACBBCBBCBCBCBACBABABCBCBAAACABABCBBCBCBCBCBCBAABCBBCBCBCCCBABCBCBBABCBABCABCCABABCBABC')

Now, we can look at the observed transition frequency matrix:

>>> a.observed_matrix
array([[ 7., 18.,  7.],
       [19.,  5., 29.],
       [ 5., 30.,  3.]])

And the observed transition probability matrix:

>>> a.observed_p_matrix
array([[0.21875   , 0.5625    , 0.21875   ],
       [0.35849057, 0.09433962, 0.54716981],
       [0.13157895, 0.78947368, 0.07894737]])

You can visualize your Markov chain. First, build a directed graph with graph_make() method of MarkovChain object. Then render() it.

>>> graph = a.graph_make(
      format="png",
      graph_attr=[("rankdir", "LR")],
      node_attr=[("fontname", "Roboto bold"), ("fontsize", "20")],
      edge_attr=[("fontname", "Iosevka"), ("fontsize", "12")]
    )
>>> graph.render()

Here is the result:

Pandas can help us annotate columns and rows:

>>> import pandas as pd
>>> pd.DataFrame(a.observed_matrix, index=a.states, columns=a.states, dtype=int)
    A   B   C
A   7  18   7
B  19   5  29
C   5  30   3

Viewing the expected transition frequency matrix:

>>> a.expected_matrix
array([[ 8.06504065, 13.78861789, 10.14634146],
       [13.35772358, 22.83739837, 16.80487805],
       [ 9.57723577, 16.37398374, 12.04878049]])

Calculating Nth order transition probability matrix:

>>> a.n_order_matrix(a.observed_p_matrix, order=2)
array([[0.2782854 , 0.34881028, 0.37290432],
       [0.1842357 , 0.64252707, 0.17323722],
       [0.32218957, 0.21081868, 0.46699175]])

Carrying out a chi-squared test:

>>> a.chisquare(a.observed_matrix, a.expected_matrix, axis=None)
Power_divergenceResult(statistic=47.89038802624337, pvalue=1.0367838347591701e-07)

Finally, let’s simulate a Markov chain given our data.

>>> ids, states = a.simulate(10, start='A', seed=np.random.randint(0, 10, 10))
>>> ids
array([0, 2, 1, 0, 2, 1, 0, 2, 1, 0])
>>> states
array(['A', 'C', 'B', 'A', 'C', 'B', 'A', 'C', 'B', 'A'], dtype='<U1')
>>> "".join(states)
'ACBACBACBA'

Hidden Markov models

We will use a fragment of DNA sequence with TATA box as an example. Initializing a hidden Markov model with sequences of observations and states:

>>> import mchmm as mc
>>> obs_seq = 'AGACTGCATATATAAGGGGCAGGCTG'
>>> sts_seq = '00000000111111100000000000'
>>> a = mc.HiddenMarkovModel().from_seq(obs_seq, sts_seq)

Unique states and observations are automatically inferred:

>>> a.states
['0' '1']
>>> a.observations
['A' 'C' 'G' 'T']

The transition probability matrix for all states can be accessed using tp attribute:

>>> a.tp
[[0.94444444 0.05555556]
 [0.14285714 0.85714286]]

There is also ep attribute for the emission probability matrix for all states and observations.

>>> a.ep
[[0.21052632 0.21052632 0.47368421 0.10526316]
 [0.57142857 0.         0.         0.42857143]]

Converting the emission matrix to Pandas DataFrame:

>>> import pandas as pd
>>> pd.DataFrame(a.ep, index=a.states, columns=a.observations)
          A         C         G         T
0  0.210526  0.210526  0.473684  0.105263
1  0.571429  0.000000  0.000000  0.428571

Directed graph of the hidden Markov model:

Graph can be visualized using graph_make method of HiddenMarkovModel object:

>>> graph = a.graph_make(
      format="png",
      graph_attr=[("rankdir", "LR"), ("ranksep", "1"), ("rank", "same")]
    )
>>> graph.render()

Viterbi algorithm

Running Viterbi algorithm on new observations.

>>> new_obs = "GGCATTGGGCTATAAGAGGAGCTTG"
>>> vs, vsi = a.viterbi(new_obs)
>>> # states sequence
>>> print("VI", "".join(vs))
>>> # observations
>>> print("NO", new_obs)

VI 0000000001111100000000000
NO GGCATTGGGCTATAAGAGGAGCTTG

Baum-Welch algorithm

Using Baum-Welch algorithm to infer the parameters of a Hidden Markov model:

>>> obs_seq = 'AGACTGCATATATAAGGGGCAGGCTG'
>>> a = hmm.HiddenMarkovModel().from_baum_welch(obs_seq, states=['0', '1'])
>>> # training log: KL divergence values for all iterations
>>> a.log

{
  'tp': [0.008646969455670256, 0.0012397829805491124, 0.0003950986109761759],
  'ep': [0.09078874423746826, 0.0022734816599056084, 0.0010118204023946836],
  'pi': [0.009030829793043593, 0.016658391248503462, 0.0038894983546756065]
}

The inferred transition (tp), emission (ep) probability matrices and initial state distribution (pi) can be accessed as shown:

>>> a.ep, a.tp, a.pi

This model can be decoded using Viterbi algorithm:

>>> new_obs = "GGCATTGGGCTATAAGAGGAGCTTG"
>>> vs, vsi = a.viterbi(new_obs)
>>> print("VI", "".join(vs))
>>> print("NO", new_obs)

VI 0011100001111100000001100
NO GGCATTGGGCTATAAGAGGAGCTTG

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.4.3

Jun 2, 2021

0.4.1

Dec 9, 2020

0.4.0

Sep 12, 2020

0.3.2

Jul 16, 2020

0.3.1

Jul 13, 2020

0.3.0

Jul 12, 2020

0.2.0

May 5, 2019

0.1.0

Apr 21, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mchmm-0.4.3.tar.gz (25.4 kB view hashes)

Uploaded Jun 2, 2021 Source

Built Distribution

mchmm-0.4.3-py3-none-any.whl (23.8 kB view hashes)

Uploaded Jun 2, 2021 Python 3

Hashes for mchmm-0.4.3.tar.gz

Hashes for mchmm-0.4.3.tar.gz
Algorithm	Hash digest
SHA256	`36bd23f1c6f197c0f5501522cb65937ee53ee3feb8abc5dcf06cfcfa6b87da3e`
MD5	`6bb4b2d86829c8e1392a252776132cb1`
BLAKE2b-256	`e029dd84fd45adb128028c32d522fdd35cd15577689d7508c001568e7898f5a0`

Hashes for mchmm-0.4.3-py3-none-any.whl

Hashes for mchmm-0.4.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b21cdb06147e5a17a6be0676605d57ac05c1d2b08cdcfb68ff6a3a158cb84a61`
MD5	`392f5835417f60a4fca76cfd4ed4c899`
BLAKE2b-256	`91bc6b0d403b1a6f694fa4ed8bd33dbd08f113a49a61e7de251180de79aecdbd`