Konubinix' opinionated web of thoughts

Simple Pwa Piano Hero

Fleeting

I have an acoustic piano and a phone. I’d like a game that scrolls notes towards a line, the way Guitar Hero does, and tells me whether I actually played them — not by tapping a button, but by listening to the piano through the phone’s microphone and checking that the right pitches sounded at the right moment.

A piano is polyphonic: I play several notes at once. General “what am I playing?” transcription of overlapping notes is a research problem. But a game already knows the answer — the chart says which notes are due now. So the app never has to transcribe the open-ended sound; it only has to verify that the notes the chart expects are present in what the microphone hears. Verification against a known target is tractable with plain signal processing; transcription is not. That reframing is what makes this buildable without a machine-learning model.

Live analysis only — nothing is recorded or stored.

Live: https://sam.konubinix.eu/piano/.

Choice of technology

The organiser drove a Lit render layer from a single TinyBase store so that one source of truth fed both the renderer and cross-device sync. This app has nothing to sync — no inventory, no relations, no second device that would want to merge. The charts it plays are part of the build, and a game in progress is meaningless to persist. So TinyBase goes, and with it the widget kit: there are no dialogs or menus here, only a canvas and a couple of buttons.

For the chrome around the game — the heading, the buttons, a live readout — I reach for VanJS, the same kilobyte-sized reactive layer the karate beep trainer settled on: a reactive state cell is the source of truth, and the DOM is plain functions that read it. No build step, one ES module from a CDN.

But the heart of the app is neither chrome nor a store: it is a stream of numbers arriving sixty times a second — the spectrum of the sound in the room — and a stream of notes scrolling towards a line. Neither belongs in a reactive DOM: re-rendering elements at audio rate would thrash. Both live on a <canvas> painted from a requestAnimationFrame loop, and the sound comes from the Web Audio API — the microphone feeding an AnalyserNode whose spectrum the loop reads each frame. VanJS owns the chrome; the loop owns the game.

That microphone is also what the document cannot honestly test: a suite can’t play a piano. So the analysis is split from its source. The pipeline that turns a spectrum into “which notes are sounding” is pure and fully testable; what feeds it a spectrum is swappable. In the app the source is the microphone; under test it is a set of oscillators synthesised in the page to a pitch the test names — the faithful analog of the organiser’s make_png, which handed a file input known bytes rather than a real photo. The seam is a ?synth query parameter, the same shape as the trainer’s ?demo hook (Hear a note).

Setting up the app

One thing on this screen changes reactively for now: the note the app currently hears. That is a VanJS state cell — read it inside a render function and VanJS re-runs that function when it changes. It starts empty: nothing has sounded yet.

const heard = van.state(null);

VanJS builds DOM from functions named after tags, so the setup line is the import plus the handful of tag builders the views use.

import van from 'vanjs-core';
const { header, main, h1, div } = van.tags;

The whole app mounts into one element.

<div id="app"></div>

Boot mounts the view into #app, raises the data-app-ready flag the tests wait on, and — only when the page carries a ?synth parameter — starts listening to the synthesised source that parameter describes (Hear a note). The microphone path waits for a later chapter and a user gesture; the synth path needs neither, so it can start on load.

van.add(document.getElementById('app'), App());
document.body.setAttribute('data-app-ready', '1');

Visual basics

The same dark palette as the organiser and the trainer, pinned to CSS custom properties so the feature chapters reach for var(--accent) by name. The live readout is large and centred — at a glance from the piano bench I can see what the app thinks I just played.

:root{
    --bg:#1b1d2e; --card:#262a40; --fg:#e8e8f0; --muted:#8a8ea5;
    --accent:#f9a826;
}
*{box-sizing:border-box}
body{margin:0;background:var(--bg);color:var(--fg);
     font-family:system-ui,sans-serif;line-height:1.4}
.app-bar{padding:12px 16px;border-bottom:1px solid #2a2d44}
.app-bar h1{font-size:1.1rem;margin:0}
.screen{padding:24px 16px}
.readout{font-size:4rem;font-weight:700;text-align:center;
         color:var(--accent);min-height:1.2em}

Hear a note

Before any game, I want proof the app hears me correctly: a live readout of the note it currently detects, like a tuner. It is the foundation every later chapter builds on, and the first thing the suite can pin down — feed a known pitch in, read the right note name out.

The test feeds a synthesised A above middle C — MIDI note 69, 440 Hz — through the ?synth seam and waits for the readout to show A4. No microphone, no permission, no real sound: the oscillator the page builds is the test’s “played note”, the same way make_png was the organiser’s “uploaded photo”.

@testcase
def test_hears_single_note(page):
    """A synthesised A4 (MIDI 69) makes the live readout show A4."""
    page.goto(BASE_URL + "?synth=69")
    page.wait_for_selector("[data-app-ready]")
    page.get_by_text("A4", exact=True).wait_for()
    print("  PASS: hears single note")

The readout is a single reactive element: it shows the note name in heard, or an em dash while the app has heard nothing.

function Readout(){
    return div({ class: 'readout' }, () => heard.val ?? '—');
}

function App(){
    return [
        header({ class: 'app-bar' }, h1('Piano hero')),
        main({ class: 'screen' }, Readout(), Targets()),
    ];
}

A note name is just a MIDI number dressed up: twelve names repeating every octave, the octave number sliding every twelve semitones, with MIDI 60 landing on middle C (C4). And a frequency maps to the nearest MIDI number through the standard equal-temperament formula, anchored at A4 = 440 Hz.

const NOTE_NAMES = ['C','C#','D','D#','E','F','F#','G','G#','A','A#','B'];
function noteName(midi){
    return NOTE_NAMES[((midi % 12) + 12) % 12] + (Math.floor(midi / 12) - 1);
}
function freqToMidi(freq){
    return Math.round(69 + 12 * Math.log2(freq / 440));
}

The sound reaches the analyser through whatever source the boot picked. The context is created lazily and resumed in case the browser handed it back suspended.

let audioCtx = null;
function ensureAudio(){
    if(!audioCtx){
        const Ctx = window.AudioContext || window.webkitAudioContext;
        audioCtx = new Ctx();
    }
    if(audioCtx.state === 'suspended') audioCtx.resume();
    return audioCtx;
}

The synthesised source is the test’s instrument: for each MIDI note in the ?synth list it starts an oscillator at that note’s frequency and mixes them into one output. A single note is the degenerate case of the list, so the same builder will serve chords unchanged in a later chapter.

function synthSource(ctx, spec){
    const midis = spec.split(',').filter(s => s !== '').map(Number);
    const mix = ctx.createGain();
    for(const m of midis){
        const osc = ctx.createOscillator();
        osc.frequency.value = 440 * Math.pow(2, (m - 69) / 12);
        osc.connect(mix);
        osc.start();
    }
    return mix;
}

Listening wires that source into an AnalyserNode and starts the read loop. The analyser also drives a muted gain into the speakers: an analyser only produces data while the graph it sits in is being pulled toward a destination, and a zero gain pulls it without making a sound. The window size is 8192 samples — fine enough to separate adjacent semitones in the range a piano is usually played.

const FFT_SIZE = 8192;
function startListening(synthSpec){
    const ctx = ensureAudio();
    const analyser = ctx.createAnalyser();
    analyser.fftSize = FFT_SIZE;
    synthSource(ctx, synthSpec).connect(analyser);
    const mute = ctx.createGain();
    mute.gain.value = 0;
    analyser.connect(mute);
    mute.connect(ctx.destination);
    listen(analyser);
}

Each frame, the loop reads the spectrum and names the loudest pitch in it. A single clean tone is one tall peak, so for now “the note” is simply the highest-magnitude bin, converted from its bin frequency to a name. A floor in decibels keeps silence from being read as a phantom note. Polyphony and harmonic disambiguation arrive with the game; the tuner only owes a single clear pitch.

const FLOOR_DB = -80;
function detect(analyser){
    const bins = new Float32Array(analyser.frequencyBinCount);
    analyser.getFloatFrequencyData(bins);
    const hzPerBin = audioCtx.sampleRate / analyser.fftSize;
    let peak = -Infinity, peakBin = -1;
    for(let i = 1; i < bins.length; i++){
        if(bins[i] > peak){ peak = bins[i]; peakBin = i; }
    }
    if(peak < FLOOR_DB) return null;
    return noteName(freqToMidi(peakBin * hzPerBin));
}

function listen(analyser){
    (function tick(){
        heard.val = detect(analyser);
        requestAnimationFrame(tick);
    })();
}

The converse guards the readout against an empty room: with no oscillator on the seam — an empty ?synth list — nothing rises above the floor, so detect returns nothing and the readout holds its em dash rather than chase noise into a phantom pitch.

@testcase
def test_silence_reads_as_nothing(page):
    """An empty ?synth list feeds no tone, so the readout stays an em dash."""
    page.goto(BASE_URL + "?synth=")
    page.wait_for_selector("[data-app-ready]")
    page.wait_for_timeout(300)
    assert page.get_by_text("—", exact=True).is_visible(), \
        "a silent room registered a phantom note"
    print("  PASS: silence reads as nothing")

PWA shell

I want to add the app to my phone’s home screen and open it straight at the piano, no browser chrome. Three small things turn the web app into a PWA, the same three the organiser used.

The first is a manifest: the metadata the OS reads when I add the app to my home screen — name, theme colour, and an icon drawn inline as an SVG data-URL so there is no separate asset to ship.

{
    "name": "Piano hero",
    "short_name": "Piano",
    "description": "Listen to an acoustic piano and score the notes you play.",
    "start_url": ".",
    "display": "standalone",
    "background_color": "#1b1d2e",
    "theme_color": "#1b1d2e",
    "icons": [
        {
            "src": "data:image/svg+xml,<svg xmlns='http://www.w3.org/2000/svg' viewBox='0 0 512 512'><rect width='512' height='512' rx='96' fill='%231b1d2e'/><text x='256' y='350' font-size='280' text-anchor='middle' fill='%23f9a826'>P</text></svg>",
            "sizes": "512x512",
            "type": "image/svg+xml",
            "purpose": "any maskable"
        }
    ]
}

The second is a service worker. The cache-first-with-write-through logic is the same as in every other PWA here, so it lives in a shared block; what is app-specific is which files to cache and the cache name, keyed on the build hash so a deploy drops the old cache on the next activation.

const CACHES = [
    { name: 'piano-nil' },
];
const ASSETS = ['./', './index.html', './app.js', './manifest.json'];

nil

The third is a loading ring — a spinner covering the page until the boot raises data-app-ready, so the moment between tapping the icon and the app appearing doesn’t read as broken.

<div id="loading"></div>

#loading{position:fixed;inset:0;background:var(--bg);z-index:9999}
body[data-app-ready] #loading{display:none}

And a tiny build tag in the corner, to confirm at a glance which build a device actually picked up after a deploy.

<span class="build-tag" title="Build">nil</span>

.build-tag{position:fixed;top:8px;left:8px;font-size:.65rem;color:var(--muted);
           font-family:monospace;pointer-events:none;z-index:50}

Playwright tests

The per-feature tests need a runner, shaped by the same two rules the organiser and the trainer settled on: assert on the user-facing surface (visible text, roles, labels) over DOM internals, and let each test register itself with a decorator so there is no central list to forget to update.

TESTS = []

def testcase(fn):
    """Registration decorator: each =@testcase= appends to =TESTS= in source order."""
    TESTS.append(fn)
    return fn

The imports block does a little setup beyond importing. The Nix shell doesn’t put the Playwright browsers on a path the loader knows, so it points PLAYWRIGHT_BROWSERS_PATH at the playwright-browsers entry in buildInputs unless already set. It pins a portrait phone viewport, since the app is phone-first, and names the dev slot’s served URL as the default target.

import os, sys, time
if "PLAYWRIGHT_BROWSERS_PATH" not in os.environ:
    for p in os.environ.get("buildInputs", "").split():
        if "playwright-browsers" in p:
            os.environ["PLAYWRIGHT_BROWSERS_PATH"] = p
            break
from playwright.sync_api import sync_playwright
BASE_URL = os.environ.get("PIANO_URL", "http://localhost:9682/debug/piano/")
PHONE_VIEWPORT = {"width": 400, "height": 800}

The runner mirrors the trainer’s — whole suite by default, name-substring filter from positional args, -x to stop on first failure, --headed to watch — with one addition. Web Audio won’t run a context until the user has interacted with the page, which would freeze the synthesised source before it ever sounded; launching Chromium with --autoplay-policy=no-user-gesture-required lifts that gate so the ?synth seam can start on load.

def main():
    argv = sys.argv[1:]
    headed = "--headed" in argv
    stop_on_fail = "-x" in argv or "--exitfirst" in argv
    patterns = [a for a in argv if a not in ("--headed", "-x", "--exitfirst")]
    selected = [t for t in TESTS
                if not patterns or any(p in t.__name__ for p in patterns)]
    if not selected:
        print(f"no test matches {patterns}")
        sys.exit(2)
    with sync_playwright() as pw:
        browser = pw.chromium.launch(
            headless=not headed,
            args=["--autoplay-policy=no-user-gesture-required"],
        )
        ctx = browser.new_context(viewport=PHONE_VIEWPORT)
        page = ctx.new_page()
        page.set_default_timeout(5000)
        passed = failed = 0
        for t in selected:
            try:
                t(page); passed += 1
            except Exception as e:
                print(f"  FAIL: {t.__name__}: {e}"); failed += 1
                if stop_on_fail: break
        browser.close()
        print(f"\n{passed} passed, {failed} failed out of {len(selected)}")
        sys.exit(1 if failed else 0)

if __name__ == "__main__":
    main()