Skip to main content

sourmash plugin to calculate common hashes across multiple sketches.

Project description

sourmash_plugin_commonhash

If you have sketched many samples and you want to remove "rare" k-mers (present in 1, or only a few samples), this plugin is for you! This procedure helps reduce noise in Jaccard comparisons between samples.

See sourmash#2383 for an extended discussion!

Thanks to Taylor Reiter and Jessica Lumian for all their work on this!

Installation

pip install sourmash_plugin_commonhash

Usage

sourmash scripts commonhash <multiple sketches> -o commonhashes.zip

commonhash will output one filtered sketch for each input sketch. You can then use the various sourmash sig commands to union these sketches, extract individual ones, etc.

Example

sourmash scripts commonhash examples/*.sig.gz -o commonhash.zip

should yield:

...

Selecting k=31, DNA
Loaded 10587 hashes from 3 sketches in 3 files.
Of 10587 hashes, keeping 2529 that are in 2 or more samples.
Saved 3 signatures to 'commonhash.zip'

Support

We suggest filing issues in the main sourmash issue tracker as that receives more attention!

Dev docs

commonhash is developed at https://github.com/ctb/sourmash_plugin_commonhash.

Generating a release

Bump version number in pyproject.toml and push.

Make a new release on github.

Then pull, and:

python -m build

followed by twine upload dist/....

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sourmash_plugin_commonhash-0.4.tar.gz (4.8 kB view hashes)

Uploaded Source

Built Distribution

sourmash_plugin_commonhash-0.4-py3-none-any.whl (5.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page