Selected coding projects

Nonconsumptive

A standard and set of python libraries for distributing fast, random-access access to large textual collections using the Apache Arrow format.
python
Website GitHub

Deepscatter

Fast, animated, interactive online maps that scales easily to billions, not millions, of points using WebGL and Apache Arrow.
typescript
Website GitHub

Stable Random Projection

General-purpose, lightweight dimensionality reduction for book or article-length texts. A trick involving cryptographic hashes makes it possible to use the same space for any language without a pre-trained model or dictionary.
pythonjavascript

WordVectors

An R package for training and exploring word2vec models with a fluent vocabulary taking advantage of R's ability to add, subtract, and perform other vector-space models.
R

Quires

An implementation of djot's rich document model as svelte components to allow the creation of rich interactive documents from markdown files. The software rendering blog posts here!
typescript
GitHub

Bookworm

Tools for tokenizing and visually exploring large textual collections backed by an extremely fast MySQL architecture and served over the web through an expressive API.
pythonjavascript
Website GitHub

Markdown Lectures

Document transformation scripts for writing talks and course lectures that simultaneous generate their own slidedecks and outlines with identifying terms, to keep everything aligned.
Haskell