Konubinix' opinionated web of thoughts

A Duplicate-Photo Batch Resolver With Hyperapp

Fleeting

Why this note

Doublons clears the backlog of duplicate photos in the archive. The same perceptualhash shows up across many imports — a WhatsApp re-share, a re-download, the same shot pulled twice — and the archive has a standing pile of them (≈135 hash clusters / 271 docs at the time of writing). An earlier one-at-a-time helper paged through the clusters and let you click the keeper, hard-deleting the rest — nice for the odd cluster, hopeless for a backlog (clicking 135 times is the slow thing). This note supersedes it and owns the view it relied on.

This tool batches it. It loads the whole pile at once, applies a keep-best heuristic to pre-pick the keeper in every cluster — prefer a non-WhatsApp camera_type, then the doc with the most metadata, then the more precise time, then the bigger size — shows the result as a scannable wall, and asks for one confirmation: good to go. The common case is zero clicks before that button; the rare misfire is one tap to flip the keeper in that cluster. Resolving soft-deletes the losers (state‘delete’), which is reversible and makes a resolved cluster drop out of the view (it already excludes =state‘delete’) — unlike the old helper's hard =DELETE, which is unforgiving over a 271-doc batch.

It is the third sibling of the frise and memories, on the same shared data layer — PostGraphile (/graphql) over the docs Postgres, the same LAN gate, the same /ipfs/ gateway — and, on purpose, a third render technology. The frise is Preact+HTM+urql, memories is Solid; this one is Hyperapp — the Elm architecture in ~1 KB: one immutable state, a pure view(state), actions that return new state, and side-effects (the GraphQL read, the soft-delete) declared as data. No build, no bundler: an import map pulls Hyperapp + hyperlit (an htm-style tagged template, so the view reads like HTML) from esm.sh.

Same discipline as the siblings: each feature is a chapter — prose, a Playwright test written first, the code, the CSS — blocks short, rewrite over patch.

Choice of technology

The photos live in Postgres; the database is the source of truth and this tool only reads it, then issues one batch of soft-deletes on confirm. There is no live concurrent editing to reconcile — data comes down, one decision goes up — so the stack can be tiny.

  • PostGraphile exposes the duplicates_photovideo view as GraphQL, same /graphql the siblings use. No new backend.
  • Hyperapp (~1 KB) is the render layer, picked deliberately as a third paradigm next to the frise’s Preact and memories’ Solid. It is the Elm architecture: a single immutable state, a pure view(state), actions that fold an event into a new state, and effects (the GraphQL read, the resolve mutation) expressed as data rather than called imperatively. For a bounded tool with a clear state — the clusters, the per-cluster keeper, the resolve status — that shape is a near-exact fit, and the whole behaviour stays in one traceable module.
  • hyperlit gives Hyperapp an htm-style tagged template, so the view reads as HTML (html`<main>…</main>`) instead of nested h(...) calls — no JSX, no build. It is pulled with ?external=hyperapp so it shares the one Hyperapp instance.

No timeline, no normalised cache, no router: the page is one wall of clusters and one button. Everything loads from esm.sh through an import map.

It boots

Before a single cluster has loaded, prove the no-build Hyperapp stack mounts: the import map resolves Hyperapp + hyperlit from esm.sh, the html template renders, and a data-app-ready flag is raised for the tests to wait on. The first cycle pins only that the shell mounts and names itself, by its heading role — a stable handle for every later state.

@testcase
def test_boots_into_titled_shell(page):
    """The Hyperapp app loads from the import map and renders its title."""
    open_app(page)
    assert page.get_by_role("heading", name="Doublons").is_visible(), \
        "the app title isn't visible after load"
    print("  PASS: boots into titled shell")

import { app } from "hyperapp";
import html from "hyperlit";
import { client, subscribeAuth } from '../shared/gql.js';
import { UPDATE_PHOTO } from '../shared/data.js';

The data layer — the shared urql client the siblings use — comes with the first data chapter; Hyperapp runs it as an effect, and a 401/403 from the gate flips the client’s shared auth store, surfaced as a state flag later. For now the boot just mounts the titled shell.

app({
  init: [{ clusters: [], loading: true, decided: {}, skipped: {} }, [loadClusters], [authSub]],
  view: (state) => html`
    <main>
      <h1>Doublons</h1>
      ${state.authNeeded
        ? html`<div class="authwall" role="alert">
            <strong>Authorization required.</strong>
            This device can't read your photos yet — open a fresh access link on it.
          </div>`
        : state.loading
        ? html`<p>Loading…</p>`
        : state.clusters.length === 0
        ? html`<p>No duplicates 🎉</p>`
        : html`<div>
            <button class="resolve" onclick=${Resolve}>
              Good to go — resolve ${state.clusters.length} clusters
            </button>
            <ul class="clusters" aria-label="duplicate clusters">
              ${state.clusters.map((c) => {
                const ranked = rankDuplicates(c.duplicates);
                const defaultKid = ranked[0].cid;
                const skipped = !!state.skipped[c.hash];
                return html`<li class="cluster ${skipped ? "skipped" : ""}">
                  <button class="skip" onclick=${[ToggleSkip, { hash: c.hash }]}>
                    ${skipped ? "skipped — include" : "skip"}
                  </button>
                  ${skipped ? "" : html`<button class="resolve-through" onclick=${[ResolveThrough, { hash: c.hash }]}>
                    resolve up to here
                  </button>`}
                  ${ranked.map((d) => {
                    const decision = state.decided[d.cid];
                    const drop = !skipped && (decision ? decision === "drop" : d.cid !== defaultKid);
                    const keep = !skipped && !drop;
                    return html`<figure class="copy">
                      <img
                        class="thumb ${keep ? "keep" : drop ? "drop" : ""}"
                        src="${IPFS}${d.thumbnail_cid}"
                        alt="${d.cid}"
                        aria-label="${keep ? "keep" : drop ? "will be deleted" : "kept"}"
                        onclick=${[SetDrop, { cid: d.cid, drop: !drop }]} />
                      <figcaption class="meta">
                        <div>${d.owner} · ${fmtSize(d.size)}</div>
                        <div>${fmtDate(d.date)}</div>
                        <div>${d.camera_type || "—"}</div>
                        ${d.labels
                          ? html`<div class="tags">${d.labels}</div>`
                          : ""}
                      </figcaption>
                    </figure>`;
                  })}
                </li>`;
              })}
            </ul>
          </div>`}
    </main>`,
  node: document.getElementById("app"),
});
document.body.setAttribute("data-app-ready", "1");
if ("serviceWorker" in navigator) navigator.serviceWorker.register("sw.js").catch(() => {});

The init value and the body fragment grow per chapter; the boot cycle’s trivial pair (init: {}, an empty body) is rewritten by Load the duplicate clusters into the loading/clusters/empty view below.

:root{ --bg:#1b1d2e; --fg:#e8e8f0; }
body{ background:var(--bg); color:var(--fg); font-family:system-ui,sans-serif; margin:0; padding:12px; }
h1{ font-size:18px; margin:0 0 12px; }

Load the duplicate clusters

With the shell up, the first real job is to pull the backlog. PostGraphile exposes the duplicates_photovideo view — one row per perceptualhash cluster (count > 1, state ! ‘delete’) — as the connection =duplicatesPhotovideos; each node carries an opaque result jsonb, \{count, hash, duplicates:[…]\}. Hyperapp loads it the idiomatic way: an effect on init fires the fetch and dispatches the rows into state, so the view stays a pure function of “loading / clusters / empty”.

The test stubs /graphql with a crafted payload (a page route wins over the fixture forward, so the clusters are deterministic and no perceptualhash rows are needed) and asserts the wall lists one item per cluster.

@testcase
def test_lists_each_duplicate_cluster(page):
    """On boot the app fetches duplicatesPhotovideos and lists one item per cluster."""
    stub_clusters(page, [cluster(["a", "b"]), cluster(["c", "d", "e"])])
    open_app(page)
    items = page.get_by_role("list", name="duplicate clusters").get_by_role("listitem")
    expect(items).to_have_count(2)
    print("  PASS: lists each duplicate cluster")

The data layer is the shared urql client — the one memories and the frise reach — behind a small gql helper: a mutation goes one way, a read the other (always fresh), and a GraphQL error throws. The request is same-origin /graphql, so the browser attaches the gate cookie.

const IPFS = "";   // same origin; thumbnail_cid already starts with /ipfs/
async function gql(query, variables) {
  const r = await (/^\s*mutation\b/.test(query)
    ? client.mutation(query, variables)
    : client.query(query, variables, { requestPolicy: 'network-only' })).toPromise();
  if (r.error) throw new Error(r.error.message);
  return r.data;
}

The query asks the connection for every cluster’s result jsonb. SetClusters folds the unwrapped rows into state; loadClusters is the effect that runs the query and dispatches. Hyperapp effects are [fn, payload] tuples and the effecter gets (dispatch, payload) — side-effects stay declared as data, called by the runtime, never inline in the view. (The SetAuthNeeded action and the authSub effect beside them belong to the authorization chapter.)

const CLUSTERS_QUERY = `{ duplicatesPhotovideos(first: 500) { nodes { result } } }`;

const SetClusters = (state, clusters) => ({ ...state, clusters, loading: false });
const SetAuthNeeded = (state, authNeeded) => ({ ...state, authNeeded });
const authSub = (dispatch) => subscribeAuth((v) => dispatch(SetAuthNeeded, v));

const loadClusters = (dispatch) => {
  gql(CLUSTERS_QUERY)
    .then((data) => dispatch(SetClusters, data.duplicatesPhotovideos.nodes.map((n) => n.result)))
    .catch(() => dispatch(SetClusters, []));
};

init seeds the loading state and fires its effects — the cluster load and the auth bridge; the body renders the three states. Each cluster is a listitem in a list named for assistive tech (aria-label), so the test reaches it by role, not a CSS hook — the wall’s real layout comes next chapter.

[{ clusters: [], loading: true, decided: {}, skipped: {} }, [loadClusters], [authSub]]

${state.authNeeded
  ? html`<div class="authwall" role="alert">
      <strong>Authorization required.</strong>
      This device can't read your photos yet — open a fresh access link on it.
    </div>`
  : state.loading
  ? html`<p>Loading…</p>`
  : state.clusters.length === 0
  ? html`<p>No duplicates 🎉</p>`
  : html`<div>
      <button class="resolve" onclick=${Resolve}>
        Good to go — resolve ${state.clusters.length} clusters
      </button>
      <ul class="clusters" aria-label="duplicate clusters">
        ${state.clusters.map((c) => {
          const ranked = rankDuplicates(c.duplicates);
          const defaultKid = ranked[0].cid;
          const skipped = !!state.skipped[c.hash];
          return html`<li class="cluster ${skipped ? "skipped" : ""}">
            <button class="skip" onclick=${[ToggleSkip, { hash: c.hash }]}>
              ${skipped ? "skipped — include" : "skip"}
            </button>
            ${skipped ? "" : html`<button class="resolve-through" onclick=${[ResolveThrough, { hash: c.hash }]}>
              resolve up to here
            </button>`}
            ${ranked.map((d) => {
              const decision = state.decided[d.cid];
              const drop = !skipped && (decision ? decision === "drop" : d.cid !== defaultKid);
              const keep = !skipped && !drop;
              return html`<figure class="copy">
                <img
                  class="thumb ${keep ? "keep" : drop ? "drop" : ""}"
                  src="${IPFS}${d.thumbnail_cid}"
                  alt="${d.cid}"
                  aria-label="${keep ? "keep" : drop ? "will be deleted" : "kept"}"
                  onclick=${[SetDrop, { cid: d.cid, drop: !drop }]} />
                <figcaption class="meta">
                  <div>${d.owner} · ${fmtSize(d.size)}</div>
                  <div>${fmtDate(d.date)}</div>
                  <div>${d.camera_type || "—"}</div>
                  ${d.labels
                    ? html`<div class="tags">${d.labels}</div>`
                    : ""}
                </figcaption>
              </figure>`;
            })}
          </li>`;
        })}
      </ul>
    </div>`}

The keep-best heuristic

The point of batching is that the machine pre-picks the keeper in every cluster, so the common case is zero clicks. The heuristic is the user’s stated order of preference, applied as a chain of comparators — the first that discriminates wins:

  1. non-WhatsApp camera — a real camera_type (not null, not whatsapp) beats a WhatsApp re-share, which is recompressed and stripped.
  2. most labels — more labels (the free-text photovideo.labels, counted by ;-separated tokens) means a better-described copy. (Labels superseded the old tag=/=tagmap metadata.)
  3. more precise time — a real capture time beats a date-only import (which lands at midnight).
  4. bigger size — more bytes is the resolution/quality proxy (there is no width/height column).

rankDuplicates sorts a cluster’s duplicates best-first; [0] is the keeper. It is a pure function, so each rule is pinned by a cluster crafted to isolate it. First rule first: a WhatsApp copy and a real-camera copy, the WhatsApp one listed first so a missing heuristic would keep the wrong one.

@testcase
def test_keeper_prefers_non_whatsapp(page):
    """The heuristic keeps a real-camera copy over a WhatsApp re-share."""
    stub_clusters(page, [cluster([
        dup("wa", camera_type="whatsapp"),
        dup("cam", camera_type="iPhone 12"),
    ])])
    open_app(page)
    assert keeper_cid(page) == "cam", keeper_cid(page)
    print("  PASS: keeper prefers non-whatsapp")

cameraRank scores a duplicate: 2 a real camera, 1 unknown, 0 WhatsApp. rankDuplicates sorts descending by it; the rest of the chain is added rule by rule below.

const cameraRank = (d) =>
  !d.camera_type ? 1 : d.camera_type.toLowerCase() === "whatsapp" ? 0 : 2;
const labelCount = (d) =>
  d.labels ? d.labels.split(";").filter((s) => s.trim()).length : 0;
const isPreciseTime = (d) => !!d.date && !/T00:00:00/.test(d.date);

const rankDuplicates = (dups) =>
  [...dups].sort((a, b) =>
    cameraRank(b) - cameraRank(a) ||
    labelCount(b) - labelCount(a) ||
    isPreciseTime(b) - isPreciseTime(a) ||
    (b.size || 0) - (a.size || 0));

The body marks each cluster’s keeper ([0] after ranking) so the choice is visible — a keeper-labelled element the test reaches by role, the ringed thumbnail of the next chapter in waiting.

Next rule: with cameras equal, the better-described copy wins. The bare copy is listed first, so without the metadata comparator the wrong one would be kept.

@testcase
def test_keeper_prefers_most_metadata(page):
    """With cameras equal, the copy with more labels is kept."""
    stub_clusters(page, [cluster([
        dup("bare", labels=""),
        dup("rich", labels="beach; 2016; aurelie"),
    ])])
    open_app(page)
    assert keeper_cid(page) == "rich", keeper_cid(page)
    print("  PASS: keeper prefers most metadata")

Next: a real capture time beats a date-only import (which the importer stamps at midnight). Equal cameras and metadata, the midnight copy first.

@testcase
def test_keeper_prefers_precise_time(page):
    """Cameras and metadata equal: a real capture time beats a midnight import."""
    stub_clusters(page, [cluster([
        dup("midnight", date="2016-08-03T00:00:00+00:00"),
        dup("captured", date="2016-08-03T21:49:44+00:00"),
    ])])
    open_app(page)
    assert keeper_cid(page) == "captured", keeper_cid(page)
    print("  PASS: keeper prefers precise time")

Last rule, the tie-breaker: with everything else equal, the bigger file wins (the resolution/quality proxy — there is no width/height column). The smaller copy first.

@testcase
def test_keeper_prefers_bigger_size(page):
    """All else equal, the bigger file (resolution proxy) is kept."""
    stub_clusters(page, [cluster([
        dup("small", size=120000),
        dup("big", size=900000),
    ])])
    open_app(page)
    assert keeper_cid(page) == "big", keeper_cid(page)
    print("  PASS: keeper prefers bigger size")

The batch wall

Now make the decision visible and scannable. Each cluster becomes a row of thumbnails — one per copy, loaded from /ipfs/ — with the heuristic keeper ringed and the losers dimmed and tagged “will be deleted”, so the whole batch can be eyeballed at a glance and the rare misfire spotted. Each thumbnail carries the copy’s cid in its alt (its stable identity) and a keeper / will be deleted accessible name, so a screen reader — and the test — can tell the kept one from the doomed ones.

@testcase
def test_wall_rings_keeper_and_marks_losers(page):
    """A cluster shows a thumbnail per copy; exactly one keeper, the rest marked to delete."""
    stub_clusters(page, [cluster([
        dup("wa", camera_type="whatsapp"),
        dup("cam", camera_type="iPhone 12", size=900000),
        dup("old", camera_type="iPhone 12", size=10),
    ])])
    open_app(page)
    cl = page.get_by_role("list", name="duplicate clusters").get_by_role("listitem").first
    expect(cl.get_by_role("img")).to_have_count(3)
    expect(cl.get_by_role("img", name="keep", exact=True)).to_have_count(1)
    expect(cl.get_by_role("img", name="will be deleted", exact=True)).to_have_count(2)
    print("  PASS: wall rings keeper and marks losers")

Each cluster is a wrapped row; the keeper thumbnail is ringed green, the losers dimmed — the whole batch reads at a glance, the ring is what you scan for.

.clusters{ list-style:none; margin:0; padding:0; display:flex; flex-direction:column; gap:14px; }
.cluster{ display:flex; flex-wrap:wrap; gap:8px; align-items:flex-start;
          padding:8px; background:#23263a; border-radius:10px; }
.thumb{ width:120px; height:120px; object-fit:contain; background:#11131f; border-radius:6px; cursor:pointer; }
.thumb.keep{ outline:3px solid #4ade80; outline-offset:2px; }
.thumb.drop{ opacity:.45; filter:grayscale(.4); }

Keep or drop any copy

“Keep exactly one” is the common case, not the only one — sometimes two copies are both worth keeping, sometimes the heuristic’s keeper is itself the one to drop. So the unit of control is the copy, not the cluster: clicking a thumbnail toggles whether it will be deleted, independently of the others. The heuristic still seeds every copy (its keeper kept, the rest marked to delete); a click just overrides that one copy’s fate, recorded in a decided map keyed by cid. Keeping all but the one you don’t want, or dropping the heuristic’s pick, are now both a single click.

@testcase
def test_toggle_a_copy_between_keep_and_drop(page):
    """Clicking a copy toggles whether it will be deleted, independently per copy."""
    stub_clusters(page, [cluster([
        dup("wa", camera_type="whatsapp"),
        dup("cam", camera_type="iPhone 12"),
    ])])
    open_app(page)
    # heuristic seeds: cam kept, wa marked to delete
    expect(page.get_by_role("img", name="keep", exact=True)).to_have_attribute("alt", "cam")
    expect(page.get_by_role("img", name="will be deleted", exact=True)).to_have_attribute("alt", "wa")
    # drop the heuristic's keeper too → both copies now marked to delete
    page.get_by_alt_text("cam").click()
    expect(page.get_by_role("img", name="will be deleted", exact=True)).to_have_count(2)
    # rescue wa → it is kept again, on its own
    page.get_by_alt_text("wa").click()
    expect(page.get_by_role("img", name="keep", exact=True)).to_have_attribute("alt", "wa")
    print("  PASS: toggle a copy between keep and drop")

SetDrop records the per-copy decision; the view reads decided[cid] before falling back to the heuristic default (drop everything but ranked[0]). A thumbnail’s onclick passes the desired new fate (drop: !dropped), so a click simply flips it.

const SetDrop = (state, { cid, drop }) => ({
  ...state,
  decided: { ...state.decided, [cid]: drop ? "drop" : "keep" },
});

Good to go

The payoff: one button resolves the whole batch. Good to go soft-deletes every cluster’s losers — every copy that isn’t the (possibly flipped) keeper — by setting state‘delete’=, which is reversible and drops the cluster from the view (which already excludes deleted rows). The losers are computed from the same keeper rule the wall shows, so what you see is what gets deleted; the keepers are never touched. Resolving clears the wall to the empty state.

The test serves two clusters and captures the mutations: cluster h1 keeps the real camera (deletes wa), cluster h2 is all-equal so the heuristic keeps the first (deletes y, z). After good to go exactly those three copies are marked delete and nothing else.

@testcase
def test_good_to_go_soft_deletes_losers(page):
    """Good to go marks every cluster's losers state='delete' and spares the keepers."""
    muts = stub_with_capture(page, [
        cluster([dup("wa", camera_type="whatsapp"), dup("cam", camera_type="iPhone 12")], hash="h1"),
        cluster([dup("x"), dup("y"), dup("z")], hash="h2"),
    ])
    open_app(page)
    page.get_by_role("button", name="good to go").click()
    wait_until(page, lambda: len(muts) == 3)
    assert sorted(cid for cid, st in muts if st == "delete") == ["wa", "y", "z"], muts
    expect(page.get_by_role("list", name="duplicate clusters")).to_have_count(0)
    print("  PASS: good to go soft-deletes losers")

losersOf reads each cluster’s keeper (override or heuristic) and returns every other cid. Resolve is an action that returns the next state — the wall cleared — plus one softDelete effect per loser; the runtime runs the effects, each a fire-and-forget shared updatePhotovideo setting state‘delete’=. Soft, not hard: reversible, and a deleted row falls out of the view, so re-loading shows the cluster gone.

const softDelete = (dispatch, { cid }) => {
  gql(UPDATE_PHOTO, { cid, patch: { state: "delete" } }).catch(() => {});
};

const droppedIn = (state, c) => {
  const defaultKid = rankDuplicates(c.duplicates)[0].cid;
  return c.duplicates
    .filter((d) => {
      const decision = state.decided[d.cid];
      return decision ? decision === "drop" : d.cid !== defaultKid;
    })
    .map((d) => d.cid);
};

const droppedOf = (state) =>
  state.clusters.filter((c) => !state.skipped[c.hash]).flatMap((c) => droppedIn(state, c));

const Resolve = (state) => [
  { ...state, clusters: [] },
  ...droppedOf(state).map((cid) => [softDelete, { cid }]),
];

// commit the reviewed prefix: every cluster from the top through the clicked one,
// leaving the clusters below it on the wall for a later session
const ResolveThrough = (state, { hash }) => {
  const idx = state.clusters.findIndex((x) => x.hash === hash);
  const prefix = state.clusters.slice(0, idx + 1);
  const cids = prefix
    .filter((c) => !state.skipped[c.hash])
    .flatMap((c) => droppedIn(state, c));
  return [
    { ...state, clusters: state.clusters.slice(idx + 1) },
    ...cids.map((cid) => [softDelete, { cid }]),
  ];
};

.resolve{ display:block; margin:0 0 12px; padding:10px 16px;
          font-size:15px; font-weight:600; color:#0b0d16; background:#4ade80;
          border:0; border-radius:8px; cursor:pointer; }

Skipping a cluster

Not every cluster wants resolving — sometimes the copies are genuinely worth keeping, or the call is unclear and best left for later. Each cluster gets a skip toggle: a skipped cluster drops its keeper/loser marks (nothing in it is bound for deletion) and is left out of good to go entirely. It is per-cluster state, an skipped map keyed by hash, parallel to the overrides of the flip.

The test skips the first cluster, checks its delete-marks vanish, then resolves and confirms only the other cluster’s loser is deleted — the skipped copies are spared.

@testcase
def test_skip_spares_a_cluster(page):
    """Skipping a cluster clears its delete-marks and leaves its copies untouched on resolve."""
    muts = stub_with_capture(page, [
        cluster([dup("wa", camera_type="whatsapp"), dup("cam", camera_type="iPhone 12")], hash="h1"),
        cluster([dup("x"), dup("y")], hash="h2"),
    ])
    open_app(page)
    first = page.get_by_role("list", name="duplicate clusters").get_by_role("listitem").first
    first.get_by_role("button", name="skip", exact=True).click()
    expect(first.get_by_role("img", name="will be deleted", exact=True)).to_have_count(0)
    page.get_by_role("button", name="good to go").click()
    wait_until(page, lambda: ("y", "delete") in muts)
    assert all(cid != "wa" for cid, _ in muts), muts
    print("  PASS: skip spares a cluster")

ToggleSkip flips the cluster’s flag; the view (above) reads it to drop the marks, and losersOf filters skipped clusters out before computing what to delete.

const ToggleSkip = (state, { hash }) => ({
  ...state,
  skipped: { ...state.skipped, [hash]: !state.skipped[hash] },
});

.skip{ align-self:center; margin-right:4px; padding:6px 10px; font-size:12px;
       color:#e8e8f0; background:#3a3f5c; border:0; border-radius:6px; cursor:pointer; }
.resolve-through{ align-self:center; margin-right:4px; padding:6px 10px; font-size:12px;
                  font-weight:600; color:#0b0d16; background:#4ade80; border:0;
                  border-radius:6px; cursor:pointer; }
.cluster.skipped{ opacity:.5; }

When the device isn’t authorized

The proxy gates /graphql; a device with no grant gets a 401/403. Without handling, the failed fetch would fall through to the empty state — a misleading “no duplicates” on a device that simply can’t read. So a 401/403 raises an authNeeded flag the view turns into a banner, the same shape as memories’. The app can’t mint a grant; it only says a fresh access link is needed here.

@testcase
def test_shows_auth_required_when_unauthorized(page):
    """A 401 from /graphql surfaces a clear 'authorization required' banner."""
    page.route("**/graphql", lambda route: route.fulfill(
        status=401, content_type="text/plain", body="Unauthorized"))
    open_app(page)
    expect(page.get_by_role("alert")).to_contain_text(re.compile("authoriz", re.I))
    print("  PASS: shows auth required when unauthorized")

A 401/403 makes the shared client flip its auth store; a boot effect (authSub) subscribes to that store and dispatches SetAuthNeeded on every change, since an external source can reach the app only through dispatch. The body shows the banner ahead of every other state.

.authwall{ background:#f9a826; color:#1b1d2e; padding:10px 14px; border-radius:8px;
           margin:0 0 12px; line-height:1.45; }
.authwall strong{ display:block; }

Installable, fullscreen (PWA)

Doublons is a sit-down cleanup session — nice to launch chrome-free and fullscreen from a home-screen icon. A web-app manifest (display:fullscreen, an SVG icon, the dark theme_color) plus a minimal service worker make it installable. The worker is deliberately trivial — it claims the page and passes fetches straight to the network (no offline caching of a live archive); its only job is installability.

@testcase
def test_pwa_installable(page):
    """The app ships a fullscreen manifest and a registered service worker."""
    open_app(page)
    assert page.locator("link[rel='manifest']").get_attribute("href"), "no manifest linked"
    man = page.evaluate("() => fetch('manifest.json').then(r => r.json())")
    assert man["display"] == "fullscreen", f"display is {man.get('display')!r}"
    assert man["icons"], "manifest has no icons"
    ready = page.evaluate("""() => Promise.race([
        navigator.serviceWorker.ready.then(r => !!r.active),
        new Promise(res => setTimeout(() => res(false), 6000))])""")
    assert ready, "service worker did not become ready"
    print("  PASS: pwa installable")

{
  "name": "Doublons",
  "short_name": "Doublons",
  "start_url": ".",
  "scope": ".",
  "display": "fullscreen",
  "background_color": "#1b1d2e",
  "theme_color": "#1b1d2e",
  "icons": [
    { "src": "icon.svg", "sizes": "any", "type": "image/svg+xml", "purpose": "any maskable" }
  ]
}

<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512">
  <rect width="512" height="512" fill="#1b1d2e"/>
  <g fill="#4ade80">
    <rect x="120" y="120" width="180" height="180" rx="16"/>
    <rect x="212" y="212" width="180" height="180" rx="16" fill="#6cf"/>
  </g>
</svg>

self.addEventListener('install', () => self.skipWaiting());
self.addEventListener('activate', e => e.waitUntil(self.clients.claim()));
self.addEventListener('fetch', () => {});   // pass-through; installability only

The shell links the manifest and an Apple touch icon; app-boot registers the worker (the registration line added to the boot block above).

A tiny monospaced build-tag in the corner shows the short build SHA, so after a deploy a glance at a device confirms which build it picked up (PWAs cache hard). Its value changes every build, so the test pins the shape — a 7-char hex — not the digits.

@testcase
def test_build_tag_shows_hash(page):
    """A build-hash tag names the loaded build, so a device's version is legible."""
    open_app(page)
    tag = page.locator(".build-tag")
    assert tag.count() == 1, "no build tag"
    h = (tag.text_content() or "").strip()
    assert re.fullmatch(r"[0-9a-f]{7}", h), f"build tag isn't a 7-hex hash: {h!r}"
    print("  PASS: build tag shows hash")

.build-tag{ position:fixed; top:4px; right:6px; z-index:50; pointer-events:none;
            font:10px/1 monospace; color:#5b6080; }

See each copy’s metadata

The heuristic’s keeper is a guess; to overrule it you need to see why. So each thumbnail grows a caption with the facts the choice turns on — owner, capture date, camera, file size, and the metadata tag values the view aggregates. Near-identical photos become distinguishable: a WhatsApp re-share next to a 2 MB original, a dated capture next to a midnight import.

@testcase
def test_each_copy_shows_its_metadata(page):
    """Each copy captions its owner, camera and size so the choice is informed."""
    stub_clusters(page, [cluster([
        dup("a", owner="konubinix", camera_type="iPhone 12", size=2_500_000,
            labels="beach; 2016"),
        dup("b", owner="ayla", camera_type="whatsapp", size=90_000),
    ])])
    open_app(page)
    cl = page.get_by_role("list", name="duplicate clusters").get_by_role("listitem").first
    expect(cl).to_contain_text("iPhone 12")
    expect(cl).to_contain_text("ayla")
    expect(cl).to_contain_text("whatsapp")
    expect(cl).to_contain_text("beach")
    print("  PASS: each copy shows its metadata")

fmtSize renders bytes as KB/MB, fmtDate trims the timestamp to the minute; the caption lists owner · size, the date, the camera, and the tag values when present.

const fmtSize = (n) =>
  !n ? "?" : n >= 1e6 ? (n / 1e6).toFixed(1) + " MB" : Math.round(n / 1e3) + " KB";
const fmtDate = (s) => (s ? s.slice(0, 16).replace("T", " ") : "—");

.copy{ margin:0; display:flex; flex-direction:column; gap:4px; width:120px; }
.meta{ font-size:10px; line-height:1.3; color:#9aa0c0; word-break:break-word; }
.meta .tags{ color:#c9b6f0; }

Resolve from the top down to here

135 clusters is more than one sitting, and you review them top to bottom. So each cluster carries a resolve up to here button: it commits every cluster from the top of the wall down through this one — the reviewed prefix — and leaves everything below for next time. Stop wherever you ran out of time and click that cluster’s button; the rest stays put. The global “good to go” is just this applied to the last cluster.

@testcase
def test_resolve_up_to_here_keeps_the_rest_below(page):
    """'Resolve up to here' commits the whole prefix through this cluster; those below stay."""
    muts = stub_with_capture(page, [
        cluster([dup("a1", camera_type="whatsapp"), dup("a2", camera_type="iPhone")], hash="h1"),
        cluster([dup("b1", camera_type="whatsapp"), dup("b2", camera_type="iPhone")], hash="h2"),
        cluster([dup("c1", camera_type="whatsapp"), dup("c2", camera_type="iPhone")], hash="h3"),
    ])
    open_app(page)
    items = page.get_by_role("list", name="duplicate clusters").get_by_role("listitem")
    items.nth(1).get_by_role("button", name="resolve up to here", exact=True).click()
    wait_until(page, lambda: ("a1", "delete") in muts and ("b1", "delete") in muts)
    expect(items).to_have_count(1)                                  # only the 3rd cluster remains
    assert all(cid not in ("c1", "c2") for cid, _ in muts), muts    # the 3rd is untouched
    print("  PASS: resolve up to here keeps the rest below")

ResolveThrough takes the clicked cluster’s index, soft-deletes the dropped copies of every non-skipped cluster in the prefix (top through here, the same per-copy computation the global button uses), and keeps only the clusters below it — so a partial session commits exactly the run you reviewed.

Notes linking here