Gabbleblotchits

Mastodon icon github projects my shared stories on newsblur RSS feed icon Vogon Poetry, Computers and (some) biology

Oxidizing sourmash: WebAssembly

sourmash calculates MinHash signatures for genomic datasets, meaning we are reducing the data (via subsampling) to a small representative subset (a signature) capable of answering one question: how similar is this dataset to another one? The key here is that a dataset with 10-100 GB will be reduced to something in the megabytes range, and two approaches for doing that are:

But... what if there is a third way?

What if we could keep the frontend code from the web service (very user-friendly) but do all the calculations client-side (and avoid the network bottleneck)? The main hurdle here is that our software is implemented in Python (and C++), which are not supported in browsers. My first solution was to write the core features of sourmash in JavaScript, but that quickly started hitting annoying things like JavaScript not supporting 64-bit integers. There is also the issue of having another codebase to maintain and keep in sync with the original sourmash, which would be a relevant burden for us. I gave a lab meeting about this approach, using a drag-and-drop UI as proof of concept. It did work but it was finicky (dealing with the 64-bit integer hashes is not fun). The good thing is that at least I had a working UI for further testing1

In "Oxidizing sourmash: Python and FFI" I described my road to learn Rust, but something that I omitted was that around the same time the WebAssembly support in Rust started to look better and better and was a huge influence in my decision to learn Rust. Reimplementing the sourmash C++ extension in Rust and use the same codebase in the browser sounded very attractive, and now that it was working I started looking into how to use the WebAssembly target in Rust.

WebAssembly?

From the official site,

WebAssembly (abbreviated Wasm) is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable target for compilation of high-level languages like C/C++/Rust, enabling deployment on the web for client and server applications.

You can write WebAssembly by hand, but the goal is to have it as lower level target for other languages. For me the obvious benefit is being able to use something that is not JavaScript in the browser, even though the goal is not to replace JS completely but complement it in a big pain point: performance. This also frees JavaScript from being the target language for other toolchains, allowing it to grow into other important areas (like language ergonomics).

Rust is not the only language targeting WebAssembly: Go 1.11 includes experimental support for WebAssembly, and there are even projects bringing the scientific Python to the web using WebAssembly.

But does it work?

With the Rust implementation in place and with all tests working on sourmash, I added the finishing touches using wasm-bindgen and built an NPM package using wasm-pack: sourmash is a Rust codebase compiled to WebAssembly and ready to use in JavaScript projects.

(Many thanks to Madicken Munk, who also presented during SciPy about how they used Rust and WebAssembly to do interactive visualization in Jupyter and helped with a good example on how to do this properly =] )

Since I already had the working UI from the previous PoC, I refactored the code to use the new WebAssembly module and voilà! It works!2. 3 But that was the demo from a year ago with updated code and I got a bit better with frontend development since then, so here is the new demo:

sourmash + Wasm

Drag & drop a FASTA or FASTQ file here to calculate the sourmash signature.

For the source code for this demo, check the sourmash-wasm directory.

Next steps

The proof of concept works, but it is pretty useless right now. I'm thinking about building it as a Web Component and making it really easy to add to any webpage4.

Another interesting feature would be supporting more input formats (the GMOD project implemented a lot of those!), but more features are probably better after something simple but functional is released =P

Next time!

Where we will go next? Maybe explore some decentralized web technologies like IPFS and dat, hmm? =]

Comments?

Updates


Footnotes

  1. even if horrible, I need to get some design classes =P

  2. the first version of this demo only worked in Chrome because they implemented the BigInt proposal, which is not in the official language yet. The funny thing is that BigInt would have made the JS implementation of sourmash viable, and I probably wouldn't have written the Rust implementation =P. Turns out that I didn't need the BigInt support if I didn't expose any 64-bit integers to JS, and that is what I'm doing now.

  3. Along the way I ended up writing a new FASTQ parser... because it wouldn't be bioinformatics if it didn't otherwise, right? =P

  4. or maybe a React component? I really would like to have something that works independent of framework, but not sure what is the best option in this case...

Tags: rust webassembly