Commit Graph

2 Commits

Author SHA1 Message Date
Evan 5e4b2791aa perf: reduce core-sim GC churn 42% and add GC-churn profiling to the perf harness (#4494)
## Summary

Reduces core-simulation GC churn by **42%** on a 20-game-minute Giant
World Map run, and extends the headless full-game perf harness so churn
is measurable and regressions are visible.

### 1. GC-churn measurement (`tests/perf/fullgame/GcProfiler.ts`)

`npm run perf:game` now reports:

- **GC pauses** by kind (minor/major/incremental) via a
`PerformanceObserver` on `'gc'` entries, bucketed into tick windows by
timestamp (V8 only delivers these entries on a timer task, so they're
flushed after the run)
- **Allocation rate** per `--window N` ticks (default 1000) from
used-heap deltas sampled every tick, so churn can be tracked across game
phases
- **Top allocating functions** from the V8 sampling heap profiler with
`includeObjectsCollectedBy{Major,Minor}GC` — i.e. actual churn including
short-lived garbage, not live memory — plus a `.heapprofile` loadable in
Chrome DevTools (Memory → Allocation sampling)

New flags: `--window N`, `--no-gc-profile`, `--no-alloc-profile`.

### 2. Allocation reductions in the hot paths it found

| Site | Change |
|---|---|
| `GameMap.bfs` | inline neighbor enumeration instead of an array per
visited tile |
| `GameMap`/`Game` | new `forEachNeighborNSWE` — allocation-free
iterator matching `neighbors()` N,S,W,E order for order-sensitive
callers (`forEachNeighbor` visits W,E,N,S, so substituting it would
change sim behavior) |
| `PlayerImpl.nearby` / `sharesBorderWith` / `shoreReachableNeighbors` |
no per-call neighbor arrays; no materialized shore-tile array |
| `PlayerImpl.units(types)` | gather into a reusable scratch buffer,
return one exact-size slice (still a fresh snapshot array per call) |
| `AiAttackBehavior.maybeAttack` | single pass over border neighbors
replacing the `flatMap`/`filter`/`map` chain over every border tile |
| `AiAttackBehavior.isBorderingNukedTerritory` | reusable `neighbors4`
buffer with early exit |
| `SharedWaterCache.build` | allocation-free neighbor iteration |
| `SpatialQuery.bfsNearest` | first-minimum scan instead of
collect-then-stable-sort (identical result incl. tie-breaking) |

### Results (Giant World Map, 400 bots, 12,000 ticks ≈ 20 game-minutes,
seed `perf-default`)

| Metric | Before | After |
|---|---|---|
| Sampled allocations (incl. collected) | 97.7 GB | **56.9 GB (−42%)** |
| GC count / total pause | 1,682 / 3,313 ms (1.8% of wall) | 1,058 /
2,087 ms (1.2%) |
| Ticks/sec | 66 | 70 |
| p99 / max tick | 49.9 ms / 988 ms | 43.5 ms / 689 ms |
| Ticks over 100 ms budget | 31 | 19 |

## Determinism

Every rewrite preserves exact iteration order (the new NSWE iterator
exists precisely for the order-sensitive sites). Verified by identical
final game-state hashes on three runs: Giant World Map 12,000 ticks
(`67286276735690560`), Giant World Map 2,000 ticks, and World 1,800
ticks.

## Test plan

- [x] Full suite green (1,896 tests)
- [x] New tests: `forEachNeighborNSWE` order contract vs `neighbors()`
over every tile; `units()` filtering semantics (insertion order,
fresh-array guarantee, duplicate types, Set path)
- [x] Final-hash equality on 3 seeded headless runs (2 maps)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-07-03 12:30:28 -07:00
Evan 2789db8b96 Optimize core simulation hot paths (no behavior change) (#4230)
## Summary

Pure performance optimizations to the attack/conquer/cluster hot paths
in `src/core`, driven by the full-game perf harness from #4228. **No
behavior change**: the final game-state hash is identical before/after
on every config tested — world quick run (2 different seeds),
giantworldmap, and the default 1800-tick run.

### Changes

- **Flat-arithmetic neighbor iteration**: `forEachNeighbor` /
`forEachNeighborWithDiag` / `isBorder` / `isOceanShore` are now
implemented inside `GameMapImpl` using raw `ref±1` / `ref±width` index
math, skipping the per-neighbor `ref()` coordinate validation
(`Number.isInteger` etc.). `GameImpl` and `GameView` delegate.
- **New `neighbors4(ref, out)`**: zero-allocation, callback-free
neighbor query for hot loops (W, E, N, S — same order as
`forEachNeighbor`).
- **`AttackExecution`**: the per-tile closures in `tick()` /
`addNeighbors()` are replaced with reusable neighbor buffers, a cached
`GameMap` reference, and integer `smallID()` owner comparisons instead
of owner-object lookups.
- **`GameImpl`**: the per-conquer `updateBorders` closure is hoisted to
a method with a reusable buffer; `removeInactiveExecutions` compacts the
executions array in place instead of allocating a new ~4200-element
array every tick.
- **`PlayerExecution`**: `surroundedBySamePlayer` / `isSurrounded` /
`getCapturingPlayer` de-closured (`neighbors4` + integer compares;
neighbor visit order preserved, so `getCapturingPlayer`'s
Map-insertion-order tie-breaking is unchanged); flood-fill visit closure
hoisted out of the while loop.
- **`FlatBinaryHeap.dequeue`**: returns the tile directly instead of
allocating a `[tile, priority]` tuple per dequeued tile (AttackExecution
is the only caller).

### Performance (`npm run perf:game`, same machine, before → after)

| run | mean tick | ticks/sec | max tick |
|---|---|---|---|
| default (world, 400 bots, 1800 ticks) | 9.04 → **7.98 ms** | 111 →
**125** | 31.7 → 35.7 ms |
| giantworldmap, 600 ticks | 22.5 → **17.4 ms** | 44 → **58** | 52.8 →
**36.2 ms** |

The giantworldmap tail improvement (max tick −31%) is the most relevant
for the 100 ms tick budget.

### Determinism verification

Identical `Final hash` before and after on all configs:

| config | hash |
|---|---|
| `--map world --ticks 200 --bots 100` | `5455008589403520` |
| same + `--seed second-seed-check` | `5580840142777488` |
| `--map giantworldmap --ticks 600` | `37373734953428430` |
| default run | `26773450321979388` |

### Tests

- New `tests/NeighborIteration.test.ts` pins the exact neighbor
iteration orders (W,E,N,S cardinal; dx-major diagonal — conquest order
and RNG consumption depend on them) and conquer/border-tile invariants
checked mid-battle.
- New `tests/FlatBinaryHeap.test.ts` covers heap ordering, clear, and
growth.
- Full suite passes (122 files / 1386 tests + server tests); lint and
prettier clean.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
2026-06-11 19:58:42 -07:00