2 Commits

Author SHA1 Message Date
Evan bca980f572 Shrink the per-tick worker → main update payload by ~90% (#4244)
Stacked on #4243 (the `perf:client` harness) — first step of fixing the
every-100ms main-thread stutter: make the per-tick burst small before
spreading what remains across frames.

## Problem

The harness showed the main-thread burst was dominated by
`structuredClone` of the `updates` object, and the clone was dominated
by two kinds of per-tick churn that re-sent object payloads every tick:

- `gold` / `troops` / `tilesOwned` change for nearly every alive player
every tick → ~278 partial `PlayerUpdate` objects per tick (world/400
bots), ~508 on giantworldmap.
- Attack troop counts tick down every tick → whole
`outgoingAttacks`/`incomingAttacks` arrays re-cloned for every fighting
player every tick.
- `playerNameViewData` (an all-players record) was cloned every tick but
only recomputed every 30 ticks.

## Change

Three additions to the worker → main protocol (all transferable,
zero-clone):

1. **`packedPlayerUpdates`** — `[smallID, tilesOwned, gold, troops]`
float64 quads for players whose stats changed. These fields no longer
appear in `PlayerUpdate` diffs (first emissions still carry the full
snapshot). Gold is exact in a float64 (game values ≪ 2^53).
2. **`packedAttackUpdates`** — `[ownerSmallID, direction, index,
troops]` quads. Attack arrays are only resent when
membership/order/retreating changes — which is exactly the condition
that keeps the patch indexes valid (a tick either resends an array or
patches it, never both).
3. **`playerNameViewData` is now optional** — attached only on
placement-rebuild ticks (spawn ticks, first ticks, every 30th, spawn
end). The client keeps the last applied values; dead players' name
placements freeze at death (matching the previous effective behavior).

On the client, `GameView.populateFrame` now also rebuilds `names` /
`relationMatrix` / `allianceClusters` only when their inputs changed
that tick — field presence on a partial `PlayerUpdate` marks them dirty.
(`playerStatus`, nuke telegraphs, and attack rings still recompute every
tick; they're tick- or unit-dependent.)

## Results (perf:client, this machine; low-end devices ~5–20× slower)

Default run (world, 400 bots, 1800 ticks):

| stage | before | after |
|---|---|---|
| clone (serialize+deserialize) | 1.02ms | **0.09ms** |
| GameView.update | 0.62ms | **0.29ms** |
| WebGLFrameBuilder.update | 0.04ms | 0.04ms |
| **TOTAL burst mean** | **1.67ms** | **0.42ms** |
| TOTAL p99 / max | 3.47 / 10.3ms | **1.21 / 3.92ms** |

giantworldmap/600t: 2.54 → 0.68ms mean. Player update objects: 278 → 6.5
per tick (world), 508 → 12 (giant). The remaining burst is mostly tile
apply + per-tick derivations — the part that frame-spreading (next step)
addresses.

## Verification

- **Sim final hash unchanged** on all three reference configs
(`5607618202213430`, `29309648281599524`, `39945089450032050`) — no
simulation behavior change.
- **View hash unchanged** on all three configs (`942106e9`, `a3aae227`,
`cbaaf265`) — the rendered view state is provably identical
tick-for-tick, including the name-freeze semantics.
- New tests: `tests/PackedPlayerUpdates.test.ts` (drain + GameRunner
cadence), packed-channel and freeze-at-death cases in
`tests/client/view/GameView.test.ts`, `packAttackTroopDeltas` unit tests
and updated diff contract in `tests/GameUpdateUtils.test.ts` /
`tests/PlayerUpdateDiff.test.ts`.
- `npm test` (1490 tests), `eslint`, `prettier`, `tsc --noEmit` all
pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 16:50:56 -07:00
Evan aa22339f96 Add a main-thread perf harness for the worker → client update pipeline (#4243)
## What

`npm run perf:client` — a headless harness (companion to `npm run
perf:game` from #4228) that measures the **main-thread burst** the
client runs every simulation tick. The sim ticks at 10Hz in a worker;
each tick the main thread synchronously runs deserialization →
`GameView.update()` → `WebGLFrameBuilder.update()` → HUD ticks. On
low-end devices that burst exceeds the 16.7ms frame budget and shows up
as a stutter every 100ms. Before optimizing that path, this gives us
numbers.

Per tick it runs the real pipeline end to end and times three stages:

- **clone** — `structuredClone` of the `GameUpdateViewData` with the
same transfer list `Worker.worker.ts` uses (serialize+deserialize, an
upper bound on the main-thread share of the real `postMessage`)
- **view** — the real client `GameView.update()`, including all
`populateFrame()` derivations
- **builder** — the real `WebGLFrameBuilder.update()` against a no-op GL
stub that counts payload sizes

It reports mean/p50/p95/p99/max per stage, slowest bursts with their
tile counts, payload stats, a filtered V8 CPU profile table, and writes
a `.cpuprofile`. Not covered (browser-only): CPU inside the WebGL view's
`update*()` methods and HUD layer ticks.

Same flags as `perf:game`: `--map --ticks --bots --nations --seed --top
--no-cpu-profile`.

## Determinism

- Prints the sim **Final hash**, which matches the `perf:game`
references on all three standard configs (world/200t/100b →
`5607618202213430`, default → `29309648281599524`, giantworldmap/600t →
`39945089450032050`) — the harness's worker side is faithful.
- Prints a **View hash** (FNV over the tile-state buffer, FrameData
deriveds, and per-player/unit view state) — verified stable across runs.
Client-side optimizations should keep it identical, the same workflow as
the sim hash.

## Baseline (this machine; low-end devices are ~5–20× slower)

Default run (world, 400 bots, 1800 ticks):

| stage | mean | p50 | p95 | p99 | max |
|---|---|---|---|---|---|
| clone (serialize+deserialize) | 1.02ms | 0.96 | 1.53 | 2.11 | 9.15 |
| GameView.update | 0.62ms | 0.58 | 0.93 | 1.25 | 5.09 |
| WebGLFrameBuilder.update | 0.04ms | 0.04 | 0.05 | 0.07 | 0.17 |
| **TOTAL burst** | **1.67ms** | **1.60** | **2.46** | **3.47** |
**10.3** |

giantworldmap/600t: TOTAL mean 2.54ms, p99 5.65ms, max 6.42ms.

Notable: the clone is the largest stage (~60%) — the packed tile/motion
buffers transfer for free, so the cost is structured-cloning the
`updates` object (~278 partial player updates/tick on world, ~508 on
giantworldmap). Inside `view`, the recurring cost is `populateFrame`'s
derivations (`computePlayerStatus`, the O(players²) relation matrix,
alliance clusters); tile apply dominates the land-grab spikes.

## Code changes outside the harness

- `WebGLFrameBuilder`: the `./render/gl` import is now `import type` so
the module loads under Node — a value import pulls `GPURenderer` and its
`.glsl?raw` shader imports. No behavior change (the symbols were only
used in type positions).
- `tests/perf/client/Shims.ts`: an in-memory `localStorage` shim so
`UserSettings`/theme code runs under Node (all settings resolve to
defaults, which is also the deterministic choice).

## Verification

- Sim + view hashes identical on repeat runs.
- `npm test` (1474 tests), `eslint`, `prettier --check`, `tsc --noEmit`
all pass.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-12 12:25:54 -07:00