Skip to main content

No project description provided

Project description

Non-linear Phylogenetic regression using regularized kernels

Installation

pip install phylokrr

Quick overview

Data simulation

This simulation is based on a given covariance matrix

import random
import numpy as np

# seed for reproducibility
seed = 12038 
np.random.seed(seed)
random.seed(seed)


# cov. matrix obtained from the phylogenetic tree
vcv = np.loadtxt("./data/test_cov2.csv", delimiter=',') 

# Trait simulation under Brownian motion
n = vcv.shape[0]
mean = np.zeros(n)
X = np.random.multivariate_normal(cov=vcv, mean=mean).reshape(-1,1)
# Non-linear response variable (sine curve)
y = np.sin(X*2.1).ravel() + 5 

# Add noise to the response variable
y[::10] += 4 * (0.5 - np.random.rand(X.shape[0] // 10)) 

We then split data into training and testing sets, including their covariances

from phylokrr.utils import split_data_vcv

# split data into training and testing sets 
num_test = round(0.5*n)

(X_train  , X_test,  
 y_train  , y_test,  
 vcv_train, vcv_test) = split_data_vcv(X, y, vcv, num_test, seed = seed) # seed defined above

Simple model fitting without Cross-Validation (CV)

from phylokrr.kernels import KRR

# set model
model = KRR(kernel='rbf', fit_intercept= True)

# arbitrarily proposed hyperparameters
params = {'lambda': 2, 'gamma': 2}

# set hyperparamters
model.set_params(**params)

# fit model with phylogenetic covariance matrix
model.fit(X_train, y_train, vcv = vcv_train)
y_pred1 = model.predict(X_test)

Let's compare it with the standard phylogenetic regression (i.e., PGLS)

import matplotlib.pyplot as plt

from phylokrr.utils import PGLS

# fit standard phylogenetic regression
b_wls = PGLS(X_train, y_train, vcv_train)
y_pred3 = np.hstack((np.ones((X_test.shape[0],1)), X_test)) @ b_wls

plt.scatter(X_test, y_test , color = 'blue' , alpha=0.5, label = 'Testing (unseen) data')
plt.scatter(X_test, y_pred1, color = 'green', alpha=0.5, label = 'phyloKRR predictions w\o CV')
plt.scatter(X_test, y_pred3, color = 'red', alpha=0.5, label = 'PGLS predictions')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()

drawing

Hyperparameter tuning with CV

from phylokrr.utils import k_fold_cv_random

params = {
    'lambda' : np.logspace(-5, 5, 200, base=2),
    'gamma' : np.logspace(-5, 5, 200,  base=2),
}

best_params = k_fold_cv_random(X_train, y_train, vcv_train,
                                model, 
                                params,
                                folds = 2, 
                                sample = 50)

model.set_params(**best_params)
model.fit(X_train, y_train, vcv = vcv_train)
y_pred_cv = model.predict(X_test)

plt.scatter(X_test, y_test, color = 'blue' , alpha=0.5, label = 'Testing (unseen) data')
plt.scatter(X_test, y_pred_cv, color = 'green', alpha=0.5, label = 'phyloKRR predictions \w CV')
plt.scatter(X_test, y_pred3, color = 'red', alpha=0.5, label = 'PGLS predictions') # y_pred3 defined above
plt.xlabel('x')
plt.ylabel('y')
plt.legend()

drawing

Reference

Rosas-Puchuri, U., Santaquiteria, A., Khanmohammadi, S., Solis-Lemus, C., & Betancur-R, R. (2023). Non-linear phylogenetic regression using regularized kernels. bioRxiv, 2023-10.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

phylokrr-0.4.2.tar.gz (8.2 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page