sequali

Fast sequencing quality metrics

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Sequence quality metrics

Features:

Low memory footprint, small install size and fast execution times.
Informative graphs that allow for judging the quality of a sequence at a quick glance.
Overrepresentation analysis using 31 bp sequence fragments. Overrepresented sequences are checked against the NCBI univec database.
Estimate duplication rate using a fingerprint subsampling technique which is also used in filesystem duplication estimation.
Checks for 6 illumina adapter sequences and 15 nanopore adapter sequences.
Per tile quality plots for illumina reads.
Channel and other plots for nanopore reads.
FASTQ and unaligned BAM are supported. See “Supported formats”.

Supported formats

FASTQ. Only the Sanger variation with a phred offset of 33 and the error rate calculation of 10 ^ (-phred/10) is supported. All sequencers use this format today.
- For sequences called by illumina base callers an additional plot with the per tile quality will be provided.
- For sequences called by guppy additional plots for nanopore specific data will be provided.
unaligned BAM. Any alignment flags are currently ignored.
- For uBAM data as delivered by dorado additional nanopore plots will be provided.

Installation

pip install git+https://github.com/rhpvorderman/sequali.git

Usage

usage: sequali [-h] [–json JSON] [–html HTML] [–dir DIR]

[–overrepresentation-threshold-fraction OVERREPRESENTATION_THRESHOLD_FRACTION] [–overrepresentation-min-threshold OVERREPRESENTATION_MIN_THRESHOLD] [–overrepresentation-max-threshold OVERREPRESENTATION_MAX_THRESHOLD] [–max-unique-sequences MAX_UNIQUE_SEQUENCES] [–overrepresentation-fragment-length OVERREPRESENTATION_FRAGMENT_LENGTH] [–overrepresentation-sample-every OVERREPRESENTATION_SAMPLE_EVERY] [–deduplication-estimate-bits DEDUPLICATION_ESTIMATE_BITS] input

positional arguments:

input Input FASTQ file

options:

-h, --help

show this help message and exit

--json JSON

JSON output file. default: ‘<input>.json’

--html HTML

HTML output file. default: ‘<input>.html’

--dir DIR

Output directory. default: current working directory

--overrepresentation-threshold-fraction OVERREPRESENTATION_THRESHOLD_FRACTION

At what fraction a sequence is determined to be overrepresented. Default: 0.0001 (1 in 100 000).

--overrepresentation-min-threshold OVERREPRESENTATION_MIN_THRESHOLD

The minimum amount of sequences that need to be present to be considered overrepresented even if the threshold fraction is surpassed. Useful for smaller files. Default: 100

--overrepresentation-max-threshold OVERREPRESENTATION_MAX_THRESHOLD

The threshold above which a sequence is considered overrepresented even if the threshold fraction is not surpassed. Useful for very large files. Default: unlimited.

--max-unique-sequences MAX_UNIQUE_SEQUENCES

The maximum amount of unique fragments to gather. Larger amounts increase the sensitivity of finding overrepresented sequences at the cost of increasing memory usage. Default: 5,000,000

--overrepresentation-fragment-length OVERREPRESENTATION_FRAGMENT_LENGTH

The length of the fragments to sample. The maximum is 31. Default: 31.

--overrepresentation-sample-every OVERREPRESENTATION_SAMPLE_EVERY

How often a read should be sampled. Default: 1 in 8. More samples leads to better precision, lower speed, and also towards more bias towards the beginning of the file as the fragment store gets filled up with more sequences from the beginning.

--deduplication-estimate-bits DEDUPLICATION_ESTIMATE_BITS

Determines how many sequences are maximally stored to estimate the deduplication rate. Maximum stored sequences: 2 ** bits * 7 // 10. Memory required: 2 ** bits * 24. Default: 21.

Acknowledgements

FastQC for its excellent selection of relevant metrics. For this reason these metrics are also gathered by sequali.
Wouter de Coster for his excellent post on how to correctly average phred scores.

License

This project is licensed under the GNU Affero General Public License v3. Mainly to avoid commercial parties from using it without notifying the users that they can run it themselves. If you want to include code from sequali in your open source project, but it is not compatible with the AGPL, please contact me and we can discuss a separate license.

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.9.1

May 22, 2024

0.9.0

May 21, 2024

0.8.0

May 8, 2024

0.7.1

Apr 17, 2024

0.7.0

Apr 10, 2024

0.6.0

Mar 29, 2024

0.5.1

Mar 22, 2024

0.5.0

Mar 15, 2024

0.4.1

Dec 1, 2023

0.4.0

Dec 1, 2023

0.3.0

Nov 22, 2023

0.2.0

Nov 15, 2023

This version

0.1.0

Nov 9, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sequali-0.1.0.tar.gz (516.3 kB view hashes)

Uploaded Nov 9, 2023 Source

Built Distributions

sequali-0.1.0-cp312-cp312-musllinux_1_1_x86_64.whl (561.9 kB view hashes)

Uploaded Nov 9, 2023 CPython 3.12 musllinux: musl 1.1+ x86-64

sequali-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (561.6 kB view hashes)

Uploaded Nov 9, 2023 CPython 3.12 manylinux: glibc 2.17+ x86-64

sequali-0.1.0-cp311-cp311-musllinux_1_1_x86_64.whl (561.5 kB view hashes)

Uploaded Nov 9, 2023 CPython 3.11 musllinux: musl 1.1+ x86-64

sequali-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (561.2 kB view hashes)

Uploaded Nov 9, 2023 CPython 3.11 manylinux: glibc 2.17+ x86-64

sequali-0.1.0-cp310-cp310-musllinux_1_1_x86_64.whl (561.6 kB view hashes)

Uploaded Nov 9, 2023 CPython 3.10 musllinux: musl 1.1+ x86-64

sequali-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (561.2 kB view hashes)

Uploaded Nov 9, 2023 CPython 3.10 manylinux: glibc 2.17+ x86-64

sequali-0.1.0-cp39-cp39-musllinux_1_1_x86_64.whl (561.6 kB view hashes)

Uploaded Nov 9, 2023 CPython 3.9 musllinux: musl 1.1+ x86-64

sequali-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (561.3 kB view hashes)

Uploaded Nov 9, 2023 CPython 3.9 manylinux: glibc 2.17+ x86-64

sequali-0.1.0-cp38-cp38-musllinux_1_1_x86_64.whl (561.6 kB view hashes)

Uploaded Nov 9, 2023 CPython 3.8 musllinux: musl 1.1+ x86-64

sequali-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (561.3 kB view hashes)

Uploaded Nov 9, 2023 CPython 3.8 manylinux: glibc 2.17+ x86-64

Hashes for sequali-0.1.0.tar.gz

Hashes for sequali-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ed32729962b2f0631b7892ccfd783b86ab60fb2ab10d2045a42c353adb6eb679`
MD5	`9280d9d6f00363f3a740f811c648181b`
BLAKE2b-256	`29cdedc0444454a87a33a124836b63fab50bb738a0924bbc38971f33af77a4f8`

Hashes for sequali-0.1.0-cp312-cp312-musllinux_1_1_x86_64.whl

Hashes for sequali-0.1.0-cp312-cp312-musllinux_1_1_x86_64.whl
Algorithm	Hash digest
SHA256	`7f2a399fe08fef991c5763e4e63262a4e51065263b4bb579314b611997ec9e9b`
MD5	`caf809754f288bf87f78dc11c5c82349`
BLAKE2b-256	`44b71f5c8db64347f6c6e00f8678091882cedcf8fbaad372f0ec1a0cfc886ef3`

Hashes for sequali-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for sequali-0.1.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`9333247f3d78c64e43965ac947f917aa5c2b15d57665a4c1fa0a24b1e825dbfb`
MD5	`22ba98be5c553114afe38ca6ab3841d1`
BLAKE2b-256	`8394bcf27509db6d7e2db1d99458b5a8d140a004a070744c07e12eb45cca6569`

Hashes for sequali-0.1.0-cp311-cp311-musllinux_1_1_x86_64.whl

Hashes for sequali-0.1.0-cp311-cp311-musllinux_1_1_x86_64.whl
Algorithm	Hash digest
SHA256	`8aff55b57066ff209f5c1ea16f515e4f8785001e0da646e4f6be26e5cf9281fc`
MD5	`e0aebe778022b873ea5a1cf0ccce389c`
BLAKE2b-256	`a17defc153024c58cce71d08df1d746cf6ed87fa4ebafbd54c90eed023fa57d9`

Hashes for sequali-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for sequali-0.1.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`07b72c365808caf64353df1b94a3d22aa17458156297b9b29556d36a6c21f60c`
MD5	`f3ac11a8248d5b52330c3a202555214f`
BLAKE2b-256	`10f637a1d15f10f369bac5460eceec7b51e49f2a4bb028ec8cb2fdb6a6da7b93`

Hashes for sequali-0.1.0-cp310-cp310-musllinux_1_1_x86_64.whl

Hashes for sequali-0.1.0-cp310-cp310-musllinux_1_1_x86_64.whl
Algorithm	Hash digest
SHA256	`7d0e5eea4c2016c332258e9991f6bf14ec5a2a54399a5fac2fcf348178a4c05f`
MD5	`0a4694b1aea946aec334713a8937c32f`
BLAKE2b-256	`87e690b182c3d9ade51dceb0e1b96369d7de0c5b238cdbb9f73334ffdf8e1732`

Hashes for sequali-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for sequali-0.1.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`5c0afea9e0be5a344d0595242a474b948daeaf547fcb44cf58262ff649a51216`
MD5	`cee4c010b6ac2215385db67e3a66e472`
BLAKE2b-256	`b5106e2f6c9d158ae4b45f67b7e3e4fcb33fd78382fc4f90797e86b3d2fa1822`

Hashes for sequali-0.1.0-cp39-cp39-musllinux_1_1_x86_64.whl

Hashes for sequali-0.1.0-cp39-cp39-musllinux_1_1_x86_64.whl
Algorithm	Hash digest
SHA256	`a5b0c509c7b2beb5408aed37b47dfea5a77e94af63dd82b012dd4430cfaf43bb`
MD5	`53325896b983329d514bc1cb420d4b34`
BLAKE2b-256	`339253b2fdd39365cdf4aef8244401bcb4ecedd85d60be8040b76a2ab02af0c0`

Hashes for sequali-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for sequali-0.1.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`10b320f3683f196a158b03eef71270c62561fe716109d2ee37bbc93026bfcc67`
MD5	`ff326bc84034d2e9d4c40c541c60b677`
BLAKE2b-256	`3434710310b1e76d53c123c326945c9fbb0413140b0c1451b035bc377892176c`

Hashes for sequali-0.1.0-cp38-cp38-musllinux_1_1_x86_64.whl

Hashes for sequali-0.1.0-cp38-cp38-musllinux_1_1_x86_64.whl
Algorithm	Hash digest
SHA256	`cdebd32dbbffab5a757cf0a888dd9d654d9e28570c1a1417f7ce5f0b747c14d0`
MD5	`3f9777b6c287157ded5e5e6fa52f7e2c`
BLAKE2b-256	`c382333e312427e6b7b250589dd2d048aa5cbaba05a3506a327daf4276a01d01`

Hashes for sequali-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl

Hashes for sequali-0.1.0-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`ba3ff199a35ed55162060b6f52fc9079c476c47cbb19b3cb1050121ee673c2d4`
MD5	`b8d493833da8c89dea4eb7629eb41cbf`
BLAKE2b-256	`4fcd2614dfd07c7043e9e11e3c6e8992393fbb95dd21cbfa97406435c709c98a`