mirror of
https://github.com/openfrontio/OpenFrontIO.git
synced 2026-07-05 12:10:45 +00:00
7fa81c6bb92cc154476ed4384ee5833c5669c93c
3 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
7fa81c6bb9 |
perf: reduce core live-memory footprint by 45% on large maps (#4507)
## Summary
Reduces the simulation's steady-state memory footprint. On Giant World
Map at 20 game-minutes (12 000 ticks, 400 bots, seed `perf-default`),
live memory after a full GC drops **293 MB → 161 MB (−45%)**; unforced
peak heap drops **326 MB → 165 MB**. The simulation also runs ~10%
faster (85 → 94 ticks/s). The final game-state hash is **bit-identical**
(`57830793797434300`) — no behavior change.
## Measurement (first commit)
The full-game perf harness gains a footprint mode:
- `--footprint` — forces a full GC at every `--window` boundary and
records the live heap / ArrayBuffer / RSS curve across the game
(requires `NODE_OPTIONS=--expose-gc`).
- `--snapshot-at 0,2000,12000` — writes V8 `.heapsnapshot` files at
chosen ticks.
- `HeapSnapshotRetainers.ts` — attributes every heap node to its nearest
meaningfully-named retainer (e.g. `PlayerImpl._tiles`), plus prints
retainer chains for all nodes ≥128 KB. `HeapSnapshotSummary.ts` is a
streaming fallback for snapshots too large to `JSON.parse`.
Baseline attribution at tick 12 000: player `_tiles`/`_borderTiles` Sets
**83 MB**, GameMap `refToX`/`refToY` lookup tables **38 MB**, two
duplicate 30.5 MB visited-scratch arrays, trade-ship stepper paths **15
MB**, a construction-only flood-fill queue **9.5 MB**.
## Optimizations
**Map-sized buffers (second commit):**
- `GameMap.x()/y()` compute `ref % width` / `(ref / width) | 0` instead
of reading two per-tile Uint16 tables (−38 MB). The arithmetic is
cheaper than the tables' random-access cache misses — this is where the
speedup comes from.
- `PlayerExecution` and `SpatialQuery` each kept their own per-game
generation-stamped visited `Uint32Array`; both now share one via
`TileTraversalScratch` (−30 MB).
- `PathFinderStepper` stores numeric paths as `Uint32Array` (half the
bytes; steppers hold their full path for a unit's whole journey).
- `ConnectedComponents` frees its flood-fill queue after `initialize()`.
**Player tile sets (third commit):**
- New `TileSet`: insertion-ordered set of tile refs backed by a dense
`Uint32Array` plus an open-addressing hash index — ~12 bytes/element vs
~34 for a native `Set<number>`. Deletes tombstone; compaction is
deferred while iteration is in progress so positions never shift under
an iterator.
- Iteration semantics match `Set` exactly (insertion order, entries
added mid-iteration visited, deleted ones skipped, delete+re-add moves
to end) — the simulation relies on this order for determinism, and the
unchanged hash confirms it.
- `Player.borderTiles()` now returns `ReadonlyTileSet` (a native `Set`
still satisfies it structurally); `GameRunner.playerBorderTiles` copies
into a real `Set` since that result crosses the worker boundary via
structured clone.
## Footprint curve (giant world map, live MB after forced GC)
| checkpoint | before | after |
|---|---|---|
| spawn end | 20 + 100 buf | 20 + 55 buf |
| tick 6301 | 119 + 161 buf | 29 + 127 buf |
| tick 12301 | 130 + 161 buf | 32 + 129 buf |
## Validation
- Final hash `57830793797434300` identical across baseline / round 1 /
round 2 runs (12 000 ticks).
- Full suite passes (1798 + 126 tests), including new `TileSet` tests:
order semantics, mutation-during-iteration parity with `Set`, tombstone
compaction, and a 20 000-op randomized differential test against native
`Set`.
- Runs recorded in
`tests/perf/output/footprint-{baseline,round1,round2}-giant.txt`.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
|
||
|
|
5e4b2791aa |
perf: reduce core-sim GC churn 42% and add GC-churn profiling to the perf harness (#4494)
## Summary
Reduces core-simulation GC churn by **42%** on a 20-game-minute Giant
World Map run, and extends the headless full-game perf harness so churn
is measurable and regressions are visible.
### 1. GC-churn measurement (`tests/perf/fullgame/GcProfiler.ts`)
`npm run perf:game` now reports:
- **GC pauses** by kind (minor/major/incremental) via a
`PerformanceObserver` on `'gc'` entries, bucketed into tick windows by
timestamp (V8 only delivers these entries on a timer task, so they're
flushed after the run)
- **Allocation rate** per `--window N` ticks (default 1000) from
used-heap deltas sampled every tick, so churn can be tracked across game
phases
- **Top allocating functions** from the V8 sampling heap profiler with
`includeObjectsCollectedBy{Major,Minor}GC` — i.e. actual churn including
short-lived garbage, not live memory — plus a `.heapprofile` loadable in
Chrome DevTools (Memory → Allocation sampling)
New flags: `--window N`, `--no-gc-profile`, `--no-alloc-profile`.
### 2. Allocation reductions in the hot paths it found
| Site | Change |
|---|---|
| `GameMap.bfs` | inline neighbor enumeration instead of an array per
visited tile |
| `GameMap`/`Game` | new `forEachNeighborNSWE` — allocation-free
iterator matching `neighbors()` N,S,W,E order for order-sensitive
callers (`forEachNeighbor` visits W,E,N,S, so substituting it would
change sim behavior) |
| `PlayerImpl.nearby` / `sharesBorderWith` / `shoreReachableNeighbors` |
no per-call neighbor arrays; no materialized shore-tile array |
| `PlayerImpl.units(types)` | gather into a reusable scratch buffer,
return one exact-size slice (still a fresh snapshot array per call) |
| `AiAttackBehavior.maybeAttack` | single pass over border neighbors
replacing the `flatMap`/`filter`/`map` chain over every border tile |
| `AiAttackBehavior.isBorderingNukedTerritory` | reusable `neighbors4`
buffer with early exit |
| `SharedWaterCache.build` | allocation-free neighbor iteration |
| `SpatialQuery.bfsNearest` | first-minimum scan instead of
collect-then-stable-sort (identical result incl. tie-breaking) |
### Results (Giant World Map, 400 bots, 12,000 ticks ≈ 20 game-minutes,
seed `perf-default`)
| Metric | Before | After |
|---|---|---|
| Sampled allocations (incl. collected) | 97.7 GB | **56.9 GB (−42%)** |
| GC count / total pause | 1,682 / 3,313 ms (1.8% of wall) | 1,058 /
2,087 ms (1.2%) |
| Ticks/sec | 66 | 70 |
| p99 / max tick | 49.9 ms / 988 ms | 43.5 ms / 689 ms |
| Ticks over 100 ms budget | 31 | 19 |
## Determinism
Every rewrite preserves exact iteration order (the new NSWE iterator
exists precisely for the order-sensitive sites). Verified by identical
final game-state hashes on three runs: Giant World Map 12,000 ticks
(`67286276735690560`), Giant World Map 2,000 ticks, and World 1,800
ticks.
## Test plan
- [x] Full suite green (1,896 tests)
- [x] New tests: `forEachNeighborNSWE` order contract vs `neighbors()`
over every tile; `units()` filtering semantics (insertion order,
fresh-array guarantee, duplicate types, Set path)
- [x] Final-hash equality on 3 seeded headless runs (2 maps)
🤖 Generated with [Claude Code](https://claude.com/claude-code)
---------
Co-authored-by: Claude Fable 5 <noreply@anthropic.com>
|
||
|
|
8da2291a49 |
Add full-game perf harness for the core simulation (#4228)
## Summary Adds a full-game performance harness under `tests/perf/fullgame/` that runs the **real simulation pipeline** headlessly — `GameRunner` + `Executor` with the real `Config`, nations from the map manifest, and bots on a production map from `resources/maps/` — for a configurable number of ticks, then reports where the time goes. ```bash npm run perf:game # world, 400 bots, 1800 ticks npm run perf:game -- --map giantworldmap --ticks 3600 npm run perf:game -- --no-exec-profile # purest CPU profile (no timing wrappers) ``` ## What it reports 1. **Per-tick wall time** — mean / p50 / p95 / p99 / max, count of ticks over the 100ms budget, and the slowest ticks by tick number. 2. **Time per Execution class** — every `Execution`'s `init()`/`tick()` is timed and aggregated by class name (`AttackExecution`, `NationExecution`, …). 3. **Top functions by self time** — via the V8 sampling profiler (`node:inspector`), so no instrumentation skew. Also writes a `.cpuprofile` to `tests/perf/output/` (gitignored) that opens in Chrome DevTools as a flame graph. ## Determinism The run is fully deterministic for a given `--seed`/`--map`/`--bots` (verified: identical final hashes across runs), and the final game-state hash is printed — so an optimization can be checked to not change simulation behavior. ## Sample output (world, 400 bots, 1800 ticks) ``` --- Per-tick wall time (game phase) --- mean 9.04ms | p50 7.90ms | p95 17.1ms | p99 21.5ms | max 31.7ms Over 100ms budget: 0 / 1800 ticks --- Time by Execution class --- execution total ms % tick ms init ms ticks instances AttackExecution 6568 48.8 6288 280 212536 4200 PlayerExecution 2832 21.0 2832 0.36 492049 472 NationExecution 2508 18.6 2508 0.23 144654 72 TransportShipExecution 703 5.2 96.0 607 30440 257 ... --- Top functions by self time (V8 sampling profiler) --- self ms % function location 1065 6.5 forEachNeighborWithDiag src/core/game/GameImpl.ts 979 6.0 conquer src/core/game/GameImpl.ts 948 5.8 (anonymous) src/core/execution/AttackExecution.ts 595 3.6 toFullUpdate src/core/game/PlayerImpl.ts ... ``` The harness lives in a subdirectory so the existing `npm run perf` micro-benchmark runner (which globs `tests/perf/*.ts`) doesn't pick it up. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Fable 5 <noreply@anthropic.com> |