Skip to main content

A proteomics search engine for LC-MS1 spectra.

Project description

ms1searchpy - a DirectMS1 proteomics search engine for LC-MS1 spectra

ms1searchpy consumes LC-MS data (mzML) or peptide features (tsv) and performs protein identification and quantitation.

Basic usage

Basic command for protein identification:

ms1searchpy *.mzML -d path_to.FASTA

or

ms1searchpy *_peptideFeatures.tsv -d path_to.FASTA

Read further for detailed info, including quantitative analysis.

Citing ms1searchpy

Ivanov et al. DirectMS1Quant: Ultrafast Quantitative Proteomics with MS/MS-Free Mass Spectrometry. https://pubs.acs.org/doi/10.1021/acs.analchem.2c02255

Ivanov et al. Boosting MS1-only Proteomics with Machine Learning Allows 2000 Protein Identifications in Single-Shot Human Proteome Analysis Using 5 min HPLC Gradient. https://doi.org/10.1021/acs.jproteome.0c00863

Ivanov et al. DirectMS1: MS/MS-free identification of 1000 proteins of cellular proteomes in 5 minutes. https://doi.org/10.1021/acs.analchem.9b05095

Installation

Using pip:

pip install ms1searchpy

It is recommended to additionally install DeepLC version either 1.1.2 (official) or 1.1.2.2 (unofficial fork with small changes) . Newer version has some issues right now.

pip install deeplc==1.1.2

Or

pip install https://github.com/markmipt/DeepLC/archive/refs/heads/alternative_best_model.zip

This should work on recent versions of Python (3.8-3.10).

Usage tutorial: protein identification

The script used for protein identification is called ms1searchpy. It needs input files (mzML or tsv) and a FASTA database.

Input files

If mzML are provided, ms1searchpy will invoke biosaur2 to generate the features table. You can also use other software like Dinosaur or Biosaur, but biosaur2 is recommended. You can also make it yourself, the table must contain columns 'massCalib', 'rtApex', 'charge' and 'nIsotopes' columns.

How to get mzML files

To get mzML from RAW files, you can use Proteowizard MSConvert...

msconvert path_to_file.raw -o path_to_output_folder --mzML --filter "peakPicking true 1-" --filter "MS2Deisotope" --filter "zeroSamples removeExtra" --filter "threshold absolute 1 most-intense"

...or compomics ThermoRawFileParser, which produces suitable files with default parameters.

RT predictor

For protein identification, ms1searchpy needs a retention time prediction model. The recommended one is DeepLC, but you can also use built-in additive model (default).

Examples

ms1searchpy test.mzML -d sprot_human.fasta -deeplc 1 -ad 1

This command will run ms1searchpy with DeepLC RT predictor available as deeplc (should work if you install DeepLC alongside ms1searchpy. -ad 1 creates a shuffled decoy database for FDR estimation. You should use it only once and just use the created database for other searches.

ms1searchpy test.features.tsv -d sprot_human_shuffled.fasta -deeplc 1

Here, instead of mzML file, a file with peptide features is used.

Output files

ms1searchpy produces several tables:

  • identified proteins, FDR-filtered (sample.features_proteins.tsv) - this is the main result;
  • all identified proteins (sample.features_proteins_full.tsv);
  • all identified proteins based on all PFMs (sample.features_proteins_full_noexclusion.tsv);
  • all matched peptide match fingerprints, or peptide-feature matches (sample.features_PFMs.tsv);
  • all PFMs with features prepared for Machnine Learning (sample.features_PFMs_ML.tsv);
  • number of theoretical peptides per protein (sample.features_protsN.tsv);
  • log file with estimated mass and RT accuracies (sample.features_log.txt).

Combine results from replicates

You can combine the results from several replicate runs with ms1combine by feeding it _PFMs_ML.tsv tables:

ms1combine sample_rep_*.features_PFMs_ML.tsv

Usage tutorial: Quantitation

After obtaining the protein identification results, you can proceed to compare your samples using LFQ.

Using directms1quant

New LFQ method designed specifically for DirectMS1 is invoked like this:

directms1quant -S1 sample1_r{1,2,3}.features_proteins_full.tsv -S2 sample2_r{1,2,3}.features_proteins_full.tsv

It produces a filtered table of significantly changed proteins with p-values and fold changes, as well as the full protein table and a separate file simply listing all IDs of significantly modified proteins (e.g. for easy copy-paste into a StringDB search window).

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

ms1searchpy-2.6.6-py3-none-any.whl (8.5 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page