skip to navigation
skip to content

SMART-BS-Seq 2.1.9

Specific Methylation Analysis and Report Tool 2

README for SMART

Time-stamp: <2017-06-16 11:30:00 Hongbo Liu>

Introduction

SMART2 is a newly developed tool for deep analysis of DNA methylation data detected by bisulfite sequencing platforms. This tool is focused on three main functions including de novo identification of differentially methylated regions (DMRs) by genome segmentation, identification of DMRs from predefined regions of interest, and identification of differentially methylated CpG sites. It is known that DNA methylation plays important roles in the regulation of cell development and differentiation. DNA methylation/unmethylation mechanisms are common in all tissue/cell. However, different cell types with the same genome have different methylomes. Recently, high-throughput sequencing combining bisulfite treatment (Bisulfite-Seq) have been used to generate DNA methylomes from a wide range of human tissue/cell types at a genome-wide perspective. In order to de novo identify DMRs across different biological groups, entropy-based procedures facilitated the quantification of methylation specificity for each CpG and the determination of the Euclidean distance and similar entropy for each pair of neighboring CpGs. Subsequently, genome segmentation based on these quantified parameters segments the genome into primary segments comprising CpG sites with high methylation similarities across all groups. Further, the primary segments in close proximity and sharing similar methylation patterns were merged into larger segments of different types, including DMRs and non-DMRs which are identified based on methylation specificity and one-way ANOVA analysis. Eventually, the DMRs with specific hypo-/hypermethylation in the minority of groups, group-specific hypomethylation marks (HypoMarks) and the group-specific hypermethylation marks (HyperMarks), are identified using a statistical method. To facilitate the mining of methylation marks across cell types and species. In addition, SMART2 also supports the identification of DMRs from pre-defined regions of interest and differentially methylated CpG sites.

Detailed information about SMART2 is available at http://fame.edbc.org/smart.

New Features of SMART2

  • Provides more functions ~~~~~~~~Identification of differentially methylated regions of interest, or differentially methylated CpG sites.
  • Provides statistical analysis ~~~~~~~One-way ANOVA analysis was added to identify the DMRs which is more reliable.
  • Supports various BS-Seq data ~~~~~~SMART2 supports data analysis for WGBS, RRBS, and targeting bisulfite sequencing techniques including TruSeqEPIC, SureSelect and CpGiant.
  • Supports replicate samples ~~~~~~~~~~SMART2 supports replicates for the same group such as samples from the same clinical group.
  • Supports missing value replacement ~~~~SMART2 supports the replacement of missing value via the median methylation value of other available samples from the same group.
  • Be applicable to any species ~~~~~~~~~~~~Re-designed algorithm workflow makes SMART2 is applicable for any species.
  • Multiprocessing speeds up analysis ~~~~~~~Re-designed algorithm workflow makes SMART2 is more quick for huge data analysis.

Install

$ pip install SMART-BS-Seq

Detailed information can be found in the file ‘INSTALL’ in the distribution.

Usage of SMART2

$ SMART MethylMatrix [-t {DeNovoDMR,DMROI,DMC}] [-g GENOMEREGIONS] [-n PROJECTNAME] [-o OUTPUTFOLDER]

                     [-MR MISSREPLACE] [-MS MSTHRESHOLD] [-ED EDTHRESHOLD] [-SM SMTHRESHOLD] [-CD CDTHRESHOLD]

                     [-SC SCTHRESHOLD] [-SL SLTHRESHOLD] [-pD P_DMR] [-pM P_METHYLMARK] [-v] [-h]

If above command doesn’t work, you can try one of the following solutions:

(1) Add SMART command to system path

Linux$ export PATH=/usr/local/bin:$PATH
MacOS$ export PATH=/Library/Frameworks/Python.framework/Versions/2.7/bin:$PATH

Then rerun SMART command as described above.

(2) Run the source code which can be found in the installation directory

Installation directory of SMART:

  • Linux (Ubuntu 16.04): /usr/local/lib/python2.7/dist-packages/SMART/
  • macOS (Sierra 10.12): /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/SMART/
$ cd /usr/local/lib/python2.7/dist-packages/SMART/
$ python SMART.py MethylMatrix [-t {DeNovoDMR,DMROI,DMC}] [-g GENOMEREGIONS] [-n PROJECTNAME] [-o OUTPUTFOLDER]

                     [-MR MISSREPLACE] [-MS MSTHRESHOLD] [-ED EDTHRESHOLD] [-SM SMTHRESHOLD] [-CD CDTHRESHOLD]

                     [-SC SCTHRESHOLD] [-SL SLTHRESHOLD] [-pD P_DMR] [-pM P_METHYLMARK] [-v] [-h]

Positional arguments

MethylMatrix

The input methylation file (such as /WGBS/MethylMatrix.txt) including methylation values in all samples to compare (REQUIRED). The methylation data should be arranged as a matrix in which each row represents a CpG site. The columns are tab-separated. The column names should be included in the first line, with the first three columns representing the location of CpG sites: chrome, start, end. The methylation values start from the fourth column. And the methylation value should be between 0 (unmethylated) to 1 (fully methylated). The missing values should be shown as -. The names of samples should be given as G1_1,G1_2,G2_1,G2_2,G3_1,G3_2,G3_3, in which Gi represents group i. The Methylation matrix can be build based on bed files (chrome start end betavalue) by bedtools as: bedtools unionbedg -i G1_1.bed G1_2.bed G2_1.bed G2_2.bed G3_1.bed G3_2.bed G3_3.bed -header -names G1_1 G1_2 G2_1 G2_2 G3_1 G3_2 G3_3 -filler - > MethylMatrix.txt. [Type: file]

The example data is also available here.

Optional arguments

-t {DeNovoDMR,DMROI,DMC}
Type of project including ‘DeNovoDMR’,’DMROI’ and ‘DMC’. DeNovoDMR means de novo identification of differentially methylated regions (DMRs) based on genome segmentation. DMROI means the comparison of the methylation difference in regions of interest (ROIs) across multiple groups. DMC means identification of differentially methylated CpG sites (DMCs). [Type: string] [DEFAULT: ‘DeNovoDMR’]
-g GENOMEREGIONS
Genome regions of interest in bed format without column names (such as /WGBS/Regions_of_interest.bed) for project type DMROI. The regions in the file should be sorted by chromosome and then by start position (e.g., sort -k1,1 -k2,2n in.bed > in.sorted.bed). If this file is provided, SMART treat each region as a unit and compare its mean methylation across groups by methylation specificity and ANOVA analysis. DEFAULT: ” [Type: string]
-n PROJECTNAME
Project name, which will be used to generate output file names. DEFAULT: “SMART” [Type: string]
-o OUTPUTFOLDER
The folder in which the result will be output. If specified all output files will be written to that directory. [Type: folder] [DEFAULT: the directory named using project name and current time (such as SMART20140801132559) in the current working directory]
-MR MISSREPLACE
Replace the missing value with the mediate methylation value of available samples in the corresponding group. The user can control whether to replace missing value by setting this parameter from 0.01 (meaning methylation values are available in at least 1% of samples) to 1.0 (meaning methylation values are available in 100% of samples, i.e there is no missing values). [Type: float] [Range: 0.01 ~ 1.0] [DEFAULT: 0.5]
-MS MSTHRESHOLD
Methylation Specificity Threshold for DMC or DMR calling. This parameter can be used to identify DMC or DMR as the CpG site or region with methylation specificity which is greater than the threshold. [Type: float] [Range: 0.2 ~ 1.0] [DEFAULT: 0.5]
-ED EDTHRESHOLD
Euclidean Distance Threshold for methylation similarity between neighboring CpGs which is used in genome segmentation for de novo identification of DMR. The methylation similarity between neighboring CpGs is high if the Euclidean distance is less than the threshold. [Type: float] [Range: 0.01 ~ 0.5] [DEFAULT: 0.2]
-SM SMTHRESHOLD
Similarity Entropy Threshold for methylation similarity between neighboring CpGs which is used in genome segmentation for de novo identification of DMR. The methylation similarity between neighboring CpGs is high if similarity entropy is less than the threshold. [Type: float] [Range: 0.01 ~ 1.0] [DEFAULT: 0.6]
-CD CDTHRESHOLD
CpG Distance Threshold for the maximal distance between neighboring CpGs which is used in genome segmentation for de novo identification of DMR. The neighboring CpGs will be merged if the distance less than this threshold. [Type: int] [Range: 1 ~ 2000] [DEFAULT: 500]
-SC SCTHRESHOLD
Segment CpG Number Threshold for the minimal number of CpGs of merged segment and de novo identified DMR. The segments/DMRs with CpG number larger than this threshold will be output for further analysis. [Type: int] [Range: > 1] [DEFAULT: 5]
-SL SLTHRESHOLD
Segment Length Threshold for the minimal length of merged segment and de novo identified DMR. The segments/DMRs with a length larger than this threshold will be output for further analysis. [Type: int] [Range: > 1] [DEFAULT: 20]
-pD P_DMR
p value of one-way analysis of variance (ANOVA) which is carried out for identification of DMRs across multiple groups. The segments with p value less than this threshold are identified as DMR. [Type: float] [Range: 1.0e-100 ~ 0.05] [DEFAULT: 0.05]
-pM P_METHYLMARK
p value of one sample t-test which is carried out for identification of Methylation mark in a specific group based on the identified DMRs. The DMRs with p value less than this threshold is identified as group- specific methylation mark (Hyper methylation mark or Hypo methylation mark). [Type: float] [Range: 1.0e-100 ~ 0.05] [DEFAULT: 0.05]
-v, –version
Show program’s version number and exit
-h, –help
Show this help message and exit

Example

Example data

The example data can be found in the directory Example under the installation directory of SMART, and is also available here. In this example, 10,000 CpG sites in each of human chromosomes were extracted for the test of SMART. The user can use the following command to test SMART. It should be noted that the location of installation directory of SMART may be different in different Operating System.

  • Linux (Ubuntu 16.04): /usr/local/lib/python2.7/dist-packages/SMART/
  • macOS (Sierra 10.12): /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/SMART/

Example command

Linux$ cd /usr/local/lib/python2.7/dist-packages/SMART/
macOS$ cd /Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/SMART/
$ SMART ./Example/MethylMatrix_Test.txt -t DeNovoDMR -o ./Example/
$ SMART ./Example/MethylMatrix_Test.txt -t DMROI -g ./Example/CpGisland_hg19.bed -o ./Example/
$ SMART ./Example/MethylMatrix_Test.txt -t DMC -o ./Example/

Output Files

The results for DeNovoDMR are given in the folder DeNovoDMR Folder including:

  • 1_DifferMethlCpGs.txt.gz ~ Differentially methylated CpG sites
  • 2_MergedSegment.bed.gz ~ Merged segments based on small segments for visualization in UCSC browser
  • 3_MergedSegment.txt.gz ~ Merged segments based on small segments for further analysis
  • 4_MergedSegmentwithmethylation.txt.gz ~ Merged segments with methylation values for further analysis
  • 5_MergedSegment_GroupSpecificity.txt.gz ~ Merged segments with group specificity for further analysis
  • 6_GroupSpecific_Methylmark.txt.gz ~ Group specific methylation marks for further analysis
  • Summary.txt ~ Summary of SMART2 analysis

The results for DMROI are given in the folder DMROI Folder including:

  • 1_DifferMethlCpGs.txt.gz ~ Differentially methylated CpG sites
  • 2_DifferMethlROIs.bed.gz ~ Differentially methylated ROIs for visualization in UCSC browser
  • 3_DifferMethlROIs.txt.gz ~ Differentially methylated ROIs for further analysis
  • 4_DifferMethlROIs_withmethylation.txt.gz ~ Differentially methylated ROIs with methylation values for further analysis
  • 5_DifferMethlROIs_GroupSpecificity.txt.gz ~ Differentially methylated ROIs with group specificity for further analysis
  • 6_DifferMethlROIs_Methylmark.txt.gz ~ Group specific methylation marks of DifferMethlROIs for further analysis
  • Summary.txt ~ Summary of SMART2 analysis

The results for DMC are given in the folder DifferMethlCpG Folder including:

  • DifferMethlCpGs.txt.gz ~ DDifferentially methylated CpG sites
  • Summary.txt ~ Summary of SMART2 analysis

Citation

Hongbo Liu et al. Systematic identification and annotation of human methylation marks based on bisulfite sequencing methylomes reveals distinct roles of cell type-specific hypomethylation in the regulation of cell identity genes. Nucleic Acids Res: 2016 ,44(1),75-94.

Contact

For any help:you are welcome to write to Hongbo Liu (hongbo919@gmail.com) at http://cce.edbc.org/members/HongboLiu.html.
 
File Type Py Version Uploaded on Size
SMART-BS-Seq-2.1.9.tar.gz (md5) Source 2017-06-16 906KB
SMART_BS_Seq-2.1.9-py2-none-any.whl (md5) Python Wheel py2 2017-06-16 934KB