Oxidizing sourmash: WebAssembly
sourmash calculates MinHash signatures for genomic datasets, meaning we are reducing the data (via subsampling) to a small representative subset (a signature) capable of answering one question: how similar is this dataset to another one? The key here is that a dataset with 10-100 GB will be reduced to something in the megabytes range, and two approaches for doing that are:
- The user install our software in their computer. This is not so bad anymore (yay bioconda!), but still requires knowledge about command line interfaces and how to install all this stuff. The user data never leaves their computer, and they can share the signatures later if they want to.
- Provide a web service to calculate signatures. In this case no software need to be installed, but it's up to someone (me?) to maintain a server running with an API and frontend to interact with the users. On top of requiring more maintenance, another drawback is that the user need to send me the data, which is very inefficient network-wise and lead to questions about what I can do with their raw data (and I'm not into surveillance capitalism, TYVM).
But... what if there is a third way?
In "Oxidizing sourmash: Python and FFI" I described my road to learn Rust,
but something that I omitted was that around the same time the
support in Rust started to look better and better and was a huge influence in
my decision to learn Rust. Reimplementing the sourmash C++ extension in Rust and
use the same codebase in the browser sounded very attractive,
and now that it was working I started looking into how to use the WebAssembly
target in Rust.
From the official site,
WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust, enabling deployment on the web for client and server applications.
Rust is not the only language targeting WebAssembly: Go 1.11 includes experimental support for WebAssembly, and there are even projects bringing the scientific Python to the web using WebAssembly.
But does it work?
With the Rust implementation in place and with all tests working on sourmash, I
added the finishing touches using
wasm-bindgen and built an NPM package using
wasm-pack: sourmash is a Rust codebase compiled to WebAssembly and ready
(Many thanks to Madicken Munk, who also presented during SciPy about how they used Rust and WebAssembly to do interactive visualization in Jupyter and helped with a good example on how to do this properly =] )
Since I already had the working UI from the previous PoC, I refactored the code to use the new WebAssembly module and voilà! It works!2. 3 But that was the demo from a year ago with updated code and I got a bit better with frontend development since then, so here is the new demo:
sourmash + Wasm
Drag & drop a FASTA or FASTQ file here to calculate the sourmash signature.
For the source code for this demo, check the sourmash-wasm directory.
The proof of concept works, but it is pretty useless right now. I'm thinking about building it as a Web Component and making it really easy to add to any webpage4.
Another interesting feature would be supporting more input formats (the GMOD project implemented a lot of those!), but more features are probably better after something simple but functional is released =P
Where we will go next? Maybe explore some decentralized web technologies like IPFS and dat, hmm? =]
- 2018-08-30: Added a demo in the blog post.
even if horrible, I need to get some design classes =P ↩
the first version of this demo only worked in Chrome because they implemented the BigInt proposal, which is not in the official language yet. The funny thing is that BigInt would have made the JS implementation of sourmash viable, and I probably wouldn't have written the Rust implementation =P. Turns out that I didn't need the BigInt support if I didn't expose any 64-bit integers to JS, and that is what I'm doing now. ↩
Along the way I ended up writing a new FASTQ parser... because it wouldn't be bioinformatics if it didn't otherwise, right? =P ↩
or maybe a React component? I really would like to have something that works independent of framework, but not sure what is the best option in this case... ↩