A Duplicate-Photo Batch Resolver With Hyperapp
FleetingWhy this note
Doublons clears the backlog of duplicate photos in the archive. The same
perceptualhash shows up across many imports — a WhatsApp re-share, a re-download,
the same shot pulled twice — and the archive has a standing pile of them (≈135
hash clusters / 271 docs at the time of writing). An earlier one-at-a-time helper paged
through the clusters and let you click the keeper, hard-deleting the rest — nice for the
odd cluster, hopeless for a backlog (clicking 135 times is the slow thing). This note
supersedes it and owns the view it relied on.
This tool batches it. It loads the whole pile at once, applies a keep-best
heuristic to pre-pick the keeper in every cluster — prefer a non-WhatsApp
camera_type, then the doc with the most metadata, then the more precise time,
then the bigger size — shows the result as a scannable wall, and asks for one
confirmation: good to go. The common case is zero clicks before that button; the
rare misfire is one tap to flip the keeper in that cluster. Resolving soft-deletes
the losers (state‘delete’), which is reversible and makes a resolved cluster drop out of the view (it already excludes =state‘delete’) — unlike the old helper's hard =DELETE, which is unforgiving over a 271-doc batch.
It is the third sibling of the frise and memories, on the same shared data layer —
PostGraphile (/graphql) over the docs Postgres, the same LAN gate, the same
/ipfs/ gateway — and, on purpose, a third render technology. The frise is
Preact+HTM+urql, memories is Solid; this one is Hyperapp — the Elm architecture in
~1 KB: one immutable state, a pure view(state), actions that return new state,
and side-effects (the GraphQL read, the soft-delete) declared as data. No build, no
bundler: an import map pulls Hyperapp + hyperlit (an htm-style tagged template, so
the view reads like HTML) from esm.sh.
Same discipline as the siblings: each feature is a chapter — prose, a Playwright test written first, the code, the CSS — blocks short, rewrite over patch.
Choice of technology
The photos live in Postgres; the database is the source of truth and this tool only reads it, then issues one batch of soft-deletes on confirm. There is no live concurrent editing to reconcile — data comes down, one decision goes up — so the stack can be tiny.
- PostGraphile exposes the
duplicates_photovideoview as GraphQL, same/graphqlthe siblings use. No new backend. - Hyperapp (~1 KB) is the render layer, picked deliberately as a third paradigm
next to the frise’s Preact and memories’ Solid. It is the Elm architecture: a
single immutable
state, a pureview(state),actionsthat fold an event into a new state, and effects (the GraphQL read, the resolve mutation) expressed as data rather than called imperatively. For a bounded tool with a clear state — the clusters, the per-cluster keeper, the resolve status — that shape is a near-exact fit, and the whole behaviour stays in one traceable module. - hyperlit gives Hyperapp an
htm-style tagged template, so the view reads as HTML (html`<main>…</main>`) instead of nestedh(...)calls — no JSX, no build. It is pulled with?external=hyperappso it shares the one Hyperapp instance.
No timeline, no normalised cache, no router: the page is one wall of clusters and one button. Everything loads from esm.sh through an import map.
It boots
Before a single cluster has loaded, prove the no-build Hyperapp stack mounts: the
import map resolves Hyperapp + hyperlit from esm.sh, the html template renders, and
a data-app-ready flag is raised for the tests to wait on. The first cycle pins only
that the shell mounts and names itself, by its heading role — a stable handle for
every later state.
@testcase
def test_boots_into_titled_shell(page):
"""The Hyperapp app loads from the import map and renders its title."""
open_app(page)
assert page.get_by_role("heading", name="Doublons").is_visible(), \
"the app title isn't visible after load"
print(" PASS: boots into titled shell")
import { app } from "hyperapp";
import html from "hyperlit";
import { client, subscribeAuth } from '../shared/gql.js';
import { UPDATE_PHOTO } from '../shared/data.js';
The data layer — the shared urql client the siblings use — comes with the first data chapter; Hyperapp runs it as an effect, and a 401/403 from the gate flips the client’s shared auth store, surfaced as a state flag later. For now the boot just mounts the titled shell.
app({
init: [{ clusters: [], loading: true, decided: {}, skipped: {} }, [loadClusters], [authSub]],
view: (state) => html`
<main>
<h1>Doublons</h1>
${state.authNeeded
? html`<div class="authwall" role="alert">
<strong>Authorization required.</strong>
This device can't read your photos yet — open a fresh access link on it.
</div>`
: state.loading
? html`<p>Loading…</p>`
: state.clusters.length === 0
? html`<p>No duplicates 🎉</p>`
: html`<div>
<button class="resolve" onclick=${Resolve}>
Good to go — resolve ${state.clusters.length} clusters
</button>
<ul class="clusters" aria-label="duplicate clusters">
${state.clusters.map((c) => {
const ranked = rankDuplicates(c.duplicates);
const defaultKid = ranked[0].cid;
const skipped = !!state.skipped[c.hash];
return html`<li class="cluster ${skipped ? "skipped" : ""}">
<button class="skip" onclick=${[ToggleSkip, { hash: c.hash }]}>
${skipped ? "skipped — include" : "skip"}
</button>
${skipped ? "" : html`<button class="resolve-through" onclick=${[ResolveThrough, { hash: c.hash }]}>
resolve up to here
</button>`}
${ranked.map((d) => {
const decision = state.decided[d.cid];
const drop = !skipped && (decision ? decision === "drop" : d.cid !== defaultKid);
const keep = !skipped && !drop;
return html`<figure class="copy">
<img
class="thumb ${keep ? "keep" : drop ? "drop" : ""}"
src="${IPFS}${d.thumbnail_cid}"
alt="${d.cid}"
aria-label="${keep ? "keep" : drop ? "will be deleted" : "kept"}"
onclick=${[SetDrop, { cid: d.cid, drop: !drop }]} />
<figcaption class="meta">
<div>${d.owner} · ${fmtSize(d.size)}</div>
<div>${fmtDate(d.date)}</div>
<div>${d.camera_type || "—"}</div>
${d.labels
? html`<div class="tags">${d.labels}</div>`
: ""}
</figcaption>
</figure>`;
})}
</li>`;
})}
</ul>
</div>`}
</main>`,
node: document.getElementById("app"),
});
document.body.setAttribute("data-app-ready", "1");
if ("serviceWorker" in navigator) navigator.serviceWorker.register("sw.js").catch(() => {});
The init value and the body fragment grow per chapter; the boot cycle’s trivial pair
(init: {}, an empty body) is rewritten by Load the duplicate clusters into the
loading/clusters/empty view below.
:root{ --bg:#1b1d2e; --fg:#e8e8f0; }
body{ background:var(--bg); color:var(--fg); font-family:system-ui,sans-serif; margin:0; padding:12px; }
h1{ font-size:18px; margin:0 0 12px; }
Load the duplicate clusters
With the shell up, the first real job is to pull the backlog. PostGraphile exposes the
duplicates_photovideo view — one row per perceptualhash cluster (count > 1,
state ! ‘delete’) — as the connection =duplicatesPhotovideos; each node carries an
opaque result jsonb, \{count, hash, duplicates:[…]\}. Hyperapp loads it the idiomatic
way: an effect on init fires the fetch and dispatches the rows into state, so the
view stays a pure function of “loading / clusters / empty”.
The test stubs /graphql with a crafted payload (a page route wins over the fixture
forward, so the clusters are deterministic and no perceptualhash rows are needed) and
asserts the wall lists one item per cluster.
@testcase
def test_lists_each_duplicate_cluster(page):
"""On boot the app fetches duplicatesPhotovideos and lists one item per cluster."""
stub_clusters(page, [cluster(["a", "b"]), cluster(["c", "d", "e"])])
open_app(page)
items = page.get_by_role("list", name="duplicate clusters").get_by_role("listitem")
expect(items).to_have_count(2)
print(" PASS: lists each duplicate cluster")
The data layer is the shared urql client — the one memories and the frise reach — behind a
small gql helper: a mutation goes one way, a read the other (always fresh), and a GraphQL
error throws. The request is same-origin /graphql, so the browser attaches the gate cookie.
const IPFS = ""; // same origin; thumbnail_cid already starts with /ipfs/
async function gql(query, variables) {
const r = await (/^\s*mutation\b/.test(query)
? client.mutation(query, variables)
: client.query(query, variables, { requestPolicy: 'network-only' })).toPromise();
if (r.error) throw new Error(r.error.message);
return r.data;
}
The query asks the connection for every cluster’s result jsonb. SetClusters folds the
unwrapped rows into state; loadClusters is the effect that runs the query and dispatches.
Hyperapp effects are [fn, payload] tuples and the effecter gets (dispatch, payload) —
side-effects stay declared as data, called by the runtime, never inline in the view. (The
SetAuthNeeded action and the authSub effect beside them belong to the authorization
chapter.)
const CLUSTERS_QUERY = `{ duplicatesPhotovideos(first: 500) { nodes { result } } }`;
const SetClusters = (state, clusters) => ({ ...state, clusters, loading: false });
const SetAuthNeeded = (state, authNeeded) => ({ ...state, authNeeded });
const authSub = (dispatch) => subscribeAuth((v) => dispatch(SetAuthNeeded, v));
const loadClusters = (dispatch) => {
gql(CLUSTERS_QUERY)
.then((data) => dispatch(SetClusters, data.duplicatesPhotovideos.nodes.map((n) => n.result)))
.catch(() => dispatch(SetClusters, []));
};
init seeds the loading state and fires its effects — the cluster load and the auth bridge;
the body renders the three states.
Each cluster is a listitem in a list named for assistive tech (aria-label), so the
test reaches it by role, not a CSS hook — the wall’s real layout comes next chapter.
[{ clusters: [], loading: true, decided: {}, skipped: {} }, [loadClusters], [authSub]]
${state.authNeeded
? html`<div class="authwall" role="alert">
<strong>Authorization required.</strong>
This device can't read your photos yet — open a fresh access link on it.
</div>`
: state.loading
? html`<p>Loading…</p>`
: state.clusters.length === 0
? html`<p>No duplicates 🎉</p>`
: html`<div>
<button class="resolve" onclick=${Resolve}>
Good to go — resolve ${state.clusters.length} clusters
</button>
<ul class="clusters" aria-label="duplicate clusters">
${state.clusters.map((c) => {
const ranked = rankDuplicates(c.duplicates);
const defaultKid = ranked[0].cid;
const skipped = !!state.skipped[c.hash];
return html`<li class="cluster ${skipped ? "skipped" : ""}">
<button class="skip" onclick=${[ToggleSkip, { hash: c.hash }]}>
${skipped ? "skipped — include" : "skip"}
</button>
${skipped ? "" : html`<button class="resolve-through" onclick=${[ResolveThrough, { hash: c.hash }]}>
resolve up to here
</button>`}
${ranked.map((d) => {
const decision = state.decided[d.cid];
const drop = !skipped && (decision ? decision === "drop" : d.cid !== defaultKid);
const keep = !skipped && !drop;
return html`<figure class="copy">
<img
class="thumb ${keep ? "keep" : drop ? "drop" : ""}"
src="${IPFS}${d.thumbnail_cid}"
alt="${d.cid}"
aria-label="${keep ? "keep" : drop ? "will be deleted" : "kept"}"
onclick=${[SetDrop, { cid: d.cid, drop: !drop }]} />
<figcaption class="meta">
<div>${d.owner} · ${fmtSize(d.size)}</div>
<div>${fmtDate(d.date)}</div>
<div>${d.camera_type || "—"}</div>
${d.labels
? html`<div class="tags">${d.labels}</div>`
: ""}
</figcaption>
</figure>`;
})}
</li>`;
})}
</ul>
</div>`}
The keep-best heuristic
The point of batching is that the machine pre-picks the keeper in every cluster, so the common case is zero clicks. The heuristic is the user’s stated order of preference, applied as a chain of comparators — the first that discriminates wins:
- non-WhatsApp camera — a real
camera_type(not null, notwhatsapp) beats a WhatsApp re-share, which is recompressed and stripped. - most labels — more
labels(the free-textphotovideo.labels, counted by;-separated tokens) means a better-described copy. (Labels superseded the oldtag=/=tagmapmetadata.) - more precise time — a real capture time beats a date-only import (which lands at midnight).
- bigger size — more bytes is the resolution/quality proxy (there is no width/height column).
rankDuplicates sorts a cluster’s duplicates best-first; [0] is the keeper. It is a
pure function, so each rule is pinned by a cluster crafted to isolate it. First rule
first: a WhatsApp copy and a real-camera copy, the WhatsApp one listed first so a missing
heuristic would keep the wrong one.
@testcase
def test_keeper_prefers_non_whatsapp(page):
"""The heuristic keeps a real-camera copy over a WhatsApp re-share."""
stub_clusters(page, [cluster([
dup("wa", camera_type="whatsapp"),
dup("cam", camera_type="iPhone 12"),
])])
open_app(page)
assert keeper_cid(page) == "cam", keeper_cid(page)
print(" PASS: keeper prefers non-whatsapp")
cameraRank scores a duplicate: 2 a real camera, 1 unknown, 0 WhatsApp.
rankDuplicates sorts descending by it; the rest of the chain is added rule by rule
below.
const cameraRank = (d) =>
!d.camera_type ? 1 : d.camera_type.toLowerCase() === "whatsapp" ? 0 : 2;
const labelCount = (d) =>
d.labels ? d.labels.split(";").filter((s) => s.trim()).length : 0;
const isPreciseTime = (d) => !!d.date && !/T00:00:00/.test(d.date);
const rankDuplicates = (dups) =>
[...dups].sort((a, b) =>
cameraRank(b) - cameraRank(a) ||
labelCount(b) - labelCount(a) ||
isPreciseTime(b) - isPreciseTime(a) ||
(b.size || 0) - (a.size || 0));
The body marks each cluster’s keeper ([0] after ranking) so the choice is visible — a
keeper-labelled element the test reaches by role, the ringed thumbnail of the next
chapter in waiting.
Next rule: with cameras equal, the better-described copy wins. The bare copy is listed first, so without the metadata comparator the wrong one would be kept.
@testcase
def test_keeper_prefers_most_metadata(page):
"""With cameras equal, the copy with more labels is kept."""
stub_clusters(page, [cluster([
dup("bare", labels=""),
dup("rich", labels="beach; 2016; aurelie"),
])])
open_app(page)
assert keeper_cid(page) == "rich", keeper_cid(page)
print(" PASS: keeper prefers most metadata")
Next: a real capture time beats a date-only import (which the importer stamps at midnight). Equal cameras and metadata, the midnight copy first.
@testcase
def test_keeper_prefers_precise_time(page):
"""Cameras and metadata equal: a real capture time beats a midnight import."""
stub_clusters(page, [cluster([
dup("midnight", date="2016-08-03T00:00:00+00:00"),
dup("captured", date="2016-08-03T21:49:44+00:00"),
])])
open_app(page)
assert keeper_cid(page) == "captured", keeper_cid(page)
print(" PASS: keeper prefers precise time")
Last rule, the tie-breaker: with everything else equal, the bigger file wins (the resolution/quality proxy — there is no width/height column). The smaller copy first.
@testcase
def test_keeper_prefers_bigger_size(page):
"""All else equal, the bigger file (resolution proxy) is kept."""
stub_clusters(page, [cluster([
dup("small", size=120000),
dup("big", size=900000),
])])
open_app(page)
assert keeper_cid(page) == "big", keeper_cid(page)
print(" PASS: keeper prefers bigger size")
The batch wall
Now make the decision visible and scannable. Each cluster becomes a row of thumbnails —
one per copy, loaded from /ipfs/ — with the heuristic keeper ringed and the losers
dimmed and tagged “will be deleted”, so the whole batch can be eyeballed at a glance and
the rare misfire spotted. Each thumbnail carries the copy’s cid in its alt (its stable
identity) and a keeper / will be deleted accessible name, so a screen reader — and the
test — can tell the kept one from the doomed ones.
@testcase
def test_wall_rings_keeper_and_marks_losers(page):
"""A cluster shows a thumbnail per copy; exactly one keeper, the rest marked to delete."""
stub_clusters(page, [cluster([
dup("wa", camera_type="whatsapp"),
dup("cam", camera_type="iPhone 12", size=900000),
dup("old", camera_type="iPhone 12", size=10),
])])
open_app(page)
cl = page.get_by_role("list", name="duplicate clusters").get_by_role("listitem").first
expect(cl.get_by_role("img")).to_have_count(3)
expect(cl.get_by_role("img", name="keep", exact=True)).to_have_count(1)
expect(cl.get_by_role("img", name="will be deleted", exact=True)).to_have_count(2)
print(" PASS: wall rings keeper and marks losers")
Each cluster is a wrapped row; the keeper thumbnail is ringed green, the losers dimmed — the whole batch reads at a glance, the ring is what you scan for.
.clusters{ list-style:none; margin:0; padding:0; display:flex; flex-direction:column; gap:14px; }
.cluster{ display:flex; flex-wrap:wrap; gap:8px; align-items:flex-start;
padding:8px; background:#23263a; border-radius:10px; }
.thumb{ width:120px; height:120px; object-fit:contain; background:#11131f; border-radius:6px; cursor:pointer; }
.thumb.keep{ outline:3px solid #4ade80; outline-offset:2px; }
.thumb.drop{ opacity:.45; filter:grayscale(.4); }
Keep or drop any copy
“Keep exactly one” is the common case, not the only one — sometimes two copies are both
worth keeping, sometimes the heuristic’s keeper is itself the one to drop. So the unit of
control is the copy, not the cluster: clicking a thumbnail toggles whether it will be
deleted, independently of the others. The heuristic still seeds every copy (its keeper
kept, the rest marked to delete); a click just overrides that one copy’s fate, recorded in
a decided map keyed by cid. Keeping all but the one you don’t want, or dropping the
heuristic’s pick, are now both a single click.
@testcase
def test_toggle_a_copy_between_keep_and_drop(page):
"""Clicking a copy toggles whether it will be deleted, independently per copy."""
stub_clusters(page, [cluster([
dup("wa", camera_type="whatsapp"),
dup("cam", camera_type="iPhone 12"),
])])
open_app(page)
# heuristic seeds: cam kept, wa marked to delete
expect(page.get_by_role("img", name="keep", exact=True)).to_have_attribute("alt", "cam")
expect(page.get_by_role("img", name="will be deleted", exact=True)).to_have_attribute("alt", "wa")
# drop the heuristic's keeper too → both copies now marked to delete
page.get_by_alt_text("cam").click()
expect(page.get_by_role("img", name="will be deleted", exact=True)).to_have_count(2)
# rescue wa → it is kept again, on its own
page.get_by_alt_text("wa").click()
expect(page.get_by_role("img", name="keep", exact=True)).to_have_attribute("alt", "wa")
print(" PASS: toggle a copy between keep and drop")
SetDrop records the per-copy decision; the view reads decided[cid] before falling back
to the heuristic default (drop everything but ranked[0]). A thumbnail’s onclick passes
the desired new fate (drop: !dropped), so a click simply flips it.
const SetDrop = (state, { cid, drop }) => ({
...state,
decided: { ...state.decided, [cid]: drop ? "drop" : "keep" },
});
Good to go
The payoff: one button resolves the whole batch. Good to go soft-deletes every cluster’s
losers — every copy that isn’t the (possibly flipped) keeper — by setting state‘delete’=,
which is reversible and drops the cluster from the view (which already excludes deleted
rows). The losers are computed from the same keeper rule the wall shows, so what you see is
what gets deleted; the keepers are never touched. Resolving clears the wall to the empty
state.
The test serves two clusters and captures the mutations: cluster h1 keeps the real camera
(deletes wa), cluster h2 is all-equal so the heuristic keeps the first (deletes y,
z). After good to go exactly those three copies are marked delete and nothing else.
@testcase
def test_good_to_go_soft_deletes_losers(page):
"""Good to go marks every cluster's losers state='delete' and spares the keepers."""
muts = stub_with_capture(page, [
cluster([dup("wa", camera_type="whatsapp"), dup("cam", camera_type="iPhone 12")], hash="h1"),
cluster([dup("x"), dup("y"), dup("z")], hash="h2"),
])
open_app(page)
page.get_by_role("button", name="good to go").click()
wait_until(page, lambda: len(muts) == 3)
assert sorted(cid for cid, st in muts if st == "delete") == ["wa", "y", "z"], muts
expect(page.get_by_role("list", name="duplicate clusters")).to_have_count(0)
print(" PASS: good to go soft-deletes losers")
losersOf reads each cluster’s keeper (override or heuristic) and returns every other
cid. Resolve is an action that returns the next state — the wall cleared — plus one
softDelete effect per loser; the runtime runs the effects, each a fire-and-forget
shared updatePhotovideo setting state‘delete’=. Soft, not hard: reversible, and a deleted row
falls out of the view, so re-loading shows the cluster gone.
const softDelete = (dispatch, { cid }) => {
gql(UPDATE_PHOTO, { cid, patch: { state: "delete" } }).catch(() => {});
};
const droppedIn = (state, c) => {
const defaultKid = rankDuplicates(c.duplicates)[0].cid;
return c.duplicates
.filter((d) => {
const decision = state.decided[d.cid];
return decision ? decision === "drop" : d.cid !== defaultKid;
})
.map((d) => d.cid);
};
const droppedOf = (state) =>
state.clusters.filter((c) => !state.skipped[c.hash]).flatMap((c) => droppedIn(state, c));
const Resolve = (state) => [
{ ...state, clusters: [] },
...droppedOf(state).map((cid) => [softDelete, { cid }]),
];
// commit the reviewed prefix: every cluster from the top through the clicked one,
// leaving the clusters below it on the wall for a later session
const ResolveThrough = (state, { hash }) => {
const idx = state.clusters.findIndex((x) => x.hash === hash);
const prefix = state.clusters.slice(0, idx + 1);
const cids = prefix
.filter((c) => !state.skipped[c.hash])
.flatMap((c) => droppedIn(state, c));
return [
{ ...state, clusters: state.clusters.slice(idx + 1) },
...cids.map((cid) => [softDelete, { cid }]),
];
};
.resolve{ display:block; margin:0 0 12px; padding:10px 16px;
font-size:15px; font-weight:600; color:#0b0d16; background:#4ade80;
border:0; border-radius:8px; cursor:pointer; }
Skipping a cluster
Not every cluster wants resolving — sometimes the copies are genuinely worth keeping, or
the call is unclear and best left for later. Each cluster gets a skip toggle: a skipped
cluster drops its keeper/loser marks (nothing in it is bound for deletion) and is left out
of good to go entirely. It is per-cluster state, an skipped map keyed by hash, parallel
to the overrides of the flip.
The test skips the first cluster, checks its delete-marks vanish, then resolves and confirms only the other cluster’s loser is deleted — the skipped copies are spared.
@testcase
def test_skip_spares_a_cluster(page):
"""Skipping a cluster clears its delete-marks and leaves its copies untouched on resolve."""
muts = stub_with_capture(page, [
cluster([dup("wa", camera_type="whatsapp"), dup("cam", camera_type="iPhone 12")], hash="h1"),
cluster([dup("x"), dup("y")], hash="h2"),
])
open_app(page)
first = page.get_by_role("list", name="duplicate clusters").get_by_role("listitem").first
first.get_by_role("button", name="skip", exact=True).click()
expect(first.get_by_role("img", name="will be deleted", exact=True)).to_have_count(0)
page.get_by_role("button", name="good to go").click()
wait_until(page, lambda: ("y", "delete") in muts)
assert all(cid != "wa" for cid, _ in muts), muts
print(" PASS: skip spares a cluster")
ToggleSkip flips the cluster’s flag; the view (above) reads it to drop the marks, and
losersOf filters skipped clusters out before computing what to delete.
const ToggleSkip = (state, { hash }) => ({
...state,
skipped: { ...state.skipped, [hash]: !state.skipped[hash] },
});
.skip{ align-self:center; margin-right:4px; padding:6px 10px; font-size:12px;
color:#e8e8f0; background:#3a3f5c; border:0; border-radius:6px; cursor:pointer; }
.resolve-through{ align-self:center; margin-right:4px; padding:6px 10px; font-size:12px;
font-weight:600; color:#0b0d16; background:#4ade80; border:0;
border-radius:6px; cursor:pointer; }
.cluster.skipped{ opacity:.5; }
When the device isn’t authorized
The proxy gates /graphql; a device with no grant gets a 401/403. Without handling, the
failed fetch would fall through to the empty state — a misleading “no duplicates” on a
device that simply can’t read. So a 401/403 raises an authNeeded flag the view turns into
a banner, the same shape as memories’. The app can’t mint a grant; it only says a fresh
access link is needed here.
@testcase
def test_shows_auth_required_when_unauthorized(page):
"""A 401 from /graphql surfaces a clear 'authorization required' banner."""
page.route("**/graphql", lambda route: route.fulfill(
status=401, content_type="text/plain", body="Unauthorized"))
open_app(page)
expect(page.get_by_role("alert")).to_contain_text(re.compile("authoriz", re.I))
print(" PASS: shows auth required when unauthorized")
A 401/403 makes the shared client flip its auth store; a boot effect (authSub) subscribes to
that store and dispatches SetAuthNeeded on every change, since an external source can reach
the app only through dispatch. The body shows the banner ahead of every other state.
.authwall{ background:#f9a826; color:#1b1d2e; padding:10px 14px; border-radius:8px;
margin:0 0 12px; line-height:1.45; }
.authwall strong{ display:block; }
Installable, fullscreen (PWA)
Doublons is a sit-down cleanup session — nice to launch chrome-free and fullscreen from a
home-screen icon. A web-app manifest (display:fullscreen, an SVG icon, the dark
theme_color) plus a minimal service worker make it installable. The worker is
deliberately trivial — it claims the page and passes fetches straight to the network (no
offline caching of a live archive); its only job is installability.
@testcase
def test_pwa_installable(page):
"""The app ships a fullscreen manifest and a registered service worker."""
open_app(page)
assert page.locator("link[rel='manifest']").get_attribute("href"), "no manifest linked"
man = page.evaluate("() => fetch('manifest.json').then(r => r.json())")
assert man["display"] == "fullscreen", f"display is {man.get('display')!r}"
assert man["icons"], "manifest has no icons"
ready = page.evaluate("""() => Promise.race([
navigator.serviceWorker.ready.then(r => !!r.active),
new Promise(res => setTimeout(() => res(false), 6000))])""")
assert ready, "service worker did not become ready"
print(" PASS: pwa installable")
{
"name": "Doublons",
"short_name": "Doublons",
"start_url": ".",
"scope": ".",
"display": "fullscreen",
"background_color": "#1b1d2e",
"theme_color": "#1b1d2e",
"icons": [
{ "src": "icon.svg", "sizes": "any", "type": "image/svg+xml", "purpose": "any maskable" }
]
}
<svg xmlns="http://www.w3.org/2000/svg" viewBox="0 0 512 512">
<rect width="512" height="512" fill="#1b1d2e"/>
<g fill="#4ade80">
<rect x="120" y="120" width="180" height="180" rx="16"/>
<rect x="212" y="212" width="180" height="180" rx="16" fill="#6cf"/>
</g>
</svg>
self.addEventListener('install', () => self.skipWaiting());
self.addEventListener('activate', e => e.waitUntil(self.clients.claim()));
self.addEventListener('fetch', () => {}); // pass-through; installability only
The shell links the manifest and an Apple touch icon; app-boot registers the worker
(the registration line added to the boot block above).
A tiny monospaced build-tag in the corner shows the short build SHA, so after a deploy a
glance at a device confirms which build it picked up (PWAs cache hard). Its value changes
every build, so the test pins the shape — a 7-char hex — not the digits.
@testcase
def test_build_tag_shows_hash(page):
"""A build-hash tag names the loaded build, so a device's version is legible."""
open_app(page)
tag = page.locator(".build-tag")
assert tag.count() == 1, "no build tag"
h = (tag.text_content() or "").strip()
assert re.fullmatch(r"[0-9a-f]{7}", h), f"build tag isn't a 7-hex hash: {h!r}"
print(" PASS: build tag shows hash")
.build-tag{ position:fixed; top:4px; right:6px; z-index:50; pointer-events:none;
font:10px/1 monospace; color:#5b6080; }
See each copy’s metadata
The heuristic’s keeper is a guess; to overrule it you need to see why. So each thumbnail grows a caption with the facts the choice turns on — owner, capture date, camera, file size, and the metadata tag values the view aggregates. Near-identical photos become distinguishable: a WhatsApp re-share next to a 2 MB original, a dated capture next to a midnight import.
@testcase
def test_each_copy_shows_its_metadata(page):
"""Each copy captions its owner, camera and size so the choice is informed."""
stub_clusters(page, [cluster([
dup("a", owner="konubinix", camera_type="iPhone 12", size=2_500_000,
labels="beach; 2016"),
dup("b", owner="ayla", camera_type="whatsapp", size=90_000),
])])
open_app(page)
cl = page.get_by_role("list", name="duplicate clusters").get_by_role("listitem").first
expect(cl).to_contain_text("iPhone 12")
expect(cl).to_contain_text("ayla")
expect(cl).to_contain_text("whatsapp")
expect(cl).to_contain_text("beach")
print(" PASS: each copy shows its metadata")
fmtSize renders bytes as KB/MB, fmtDate trims the timestamp to the minute; the
caption lists owner · size, the date, the camera, and the tag values when present.
const fmtSize = (n) =>
!n ? "?" : n >= 1e6 ? (n / 1e6).toFixed(1) + " MB" : Math.round(n / 1e3) + " KB";
const fmtDate = (s) => (s ? s.slice(0, 16).replace("T", " ") : "—");
.copy{ margin:0; display:flex; flex-direction:column; gap:4px; width:120px; }
.meta{ font-size:10px; line-height:1.3; color:#9aa0c0; word-break:break-word; }
.meta .tags{ color:#c9b6f0; }
Resolve from the top down to here
135 clusters is more than one sitting, and you review them top to bottom. So each cluster carries a resolve up to here button: it commits every cluster from the top of the wall down through this one — the reviewed prefix — and leaves everything below for next time. Stop wherever you ran out of time and click that cluster’s button; the rest stays put. The global “good to go” is just this applied to the last cluster.
@testcase
def test_resolve_up_to_here_keeps_the_rest_below(page):
"""'Resolve up to here' commits the whole prefix through this cluster; those below stay."""
muts = stub_with_capture(page, [
cluster([dup("a1", camera_type="whatsapp"), dup("a2", camera_type="iPhone")], hash="h1"),
cluster([dup("b1", camera_type="whatsapp"), dup("b2", camera_type="iPhone")], hash="h2"),
cluster([dup("c1", camera_type="whatsapp"), dup("c2", camera_type="iPhone")], hash="h3"),
])
open_app(page)
items = page.get_by_role("list", name="duplicate clusters").get_by_role("listitem")
items.nth(1).get_by_role("button", name="resolve up to here", exact=True).click()
wait_until(page, lambda: ("a1", "delete") in muts and ("b1", "delete") in muts)
expect(items).to_have_count(1) # only the 3rd cluster remains
assert all(cid not in ("c1", "c2") for cid, _ in muts), muts # the 3rd is untouched
print(" PASS: resolve up to here keeps the rest below")
ResolveThrough takes the clicked cluster’s index, soft-deletes the dropped copies of every
non-skipped cluster in the prefix (top through here, the same per-copy computation the global
button uses), and keeps only the clusters below it — so a partial session commits exactly the
run you reviewed.