No project description provided
Project description
pyo3_branchwater
tl;dr Do fast and low-memory search/gather of many sourmash sketches via a sourmash plugin.
Details
This repo contains a PyO3-based Python wrapper around the core branchwater code. Branchwater is a fast, low-memory and multithreaded application for searching very large collections of FracMinHash sketches as generated by sourmash.
For details, see the Rust code in src/
and Python wrapper in python/
.
Uses pyo3 for the Python-to-Rust wrapping.
This functionality can be used from within sourmash as a command-line plugin; see below quickstart.
Quickstart for manysearch
.
To try out, you'll need to install a branch of sourmash that contains sourmash#2438.
This quickstart demonstrates manysearch
using
the 64 genomes from Awad et al., 2017.
First, install this code.
Install this repo in developer mode:
pip install -e .
Second, download sketches.
The following commands will download sourmash sketches for them and
unpack them into the directory podar-ref/
:
mkdir -p podar-ref
curl -JLO https://osf.io/4t6cq/download
unzip -u podar-reference-genomes-updated-sigs-2017.06.10.zip
Third, create lists of query and subject files.
manysearch
takes in lists of signatures to search, so we need to
create those files:
ls -1 podar-ref/{2,47,63}.* > query-list.txt
ls -1 podar-ref/* > podar-ref-list.txt
Fourth: Execute!
Now run manysearch
:
sourmash scripts manysearch query-list.txt podar-ref-list.txt -o results.csv
You will (hopefully ;) see a set of results in results.csv
.
Debugging help
If your file lists are not working properly, try running:
sourmash sig summarize query-list.txt
sourmash sig summarize podar-ref-list.txt
Future thoughts
The speed and functions of this code will probably be brought into sourmash core in the future, most likely as part of sourmash#2230. However, in the meantime, this is a fun side project that makes use of sourmash plugins and Rust to provide some fast functionality that may be of use to some people, and it can serve as a testbed for future sourmash functionality.
CTB Feb 2023
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.