OpenFrontIO

alois/OpenFrontIO

Fork 0

mirror of https://github.com/openfrontio/OpenFrontIO.git synced 2026-07-05 12:10:45 +00:00

Commit Graph

Author	SHA1	Message	Date
Evan	7fa81c6bb9	perf: reduce core live-memory footprint by 45% on large maps (#4507 ) ## Summary Reduces the simulation's steady-state memory footprint. On Giant World Map at 20 game-minutes (12 000 ticks, 400 bots, seed `perf-default`), live memory after a full GC drops 293 MB → 161 MB (−45%); unforced peak heap drops 326 MB → 165 MB. The simulation also runs ~10% faster (85 → 94 ticks/s). The final game-state hash is bit-identical (`57830793797434300`) — no behavior change. ## Measurement (first commit) The full-game perf harness gains a footprint mode: - `--footprint` — forces a full GC at every `--window` boundary and records the live heap / ArrayBuffer / RSS curve across the game (requires `NODE_OPTIONS=--expose-gc`). - `--snapshot-at 0,2000,12000` — writes V8 `.heapsnapshot` files at chosen ticks. - `HeapSnapshotRetainers.ts` — attributes every heap node to its nearest meaningfully-named retainer (e.g. `PlayerImpl._tiles`), plus prints retainer chains for all nodes ≥128 KB. `HeapSnapshotSummary.ts` is a streaming fallback for snapshots too large to `JSON.parse`. Baseline attribution at tick 12 000: player `_tiles`/`_borderTiles` Sets 83 MB, GameMap `refToX`/`refToY` lookup tables 38 MB, two duplicate 30.5 MB visited-scratch arrays, trade-ship stepper paths 15 MB, a construction-only flood-fill queue 9.5 MB. ## Optimizations Map-sized buffers (second commit): - `GameMap.x()/y()` compute `ref % width` / `(ref / width) \| 0` instead of reading two per-tile Uint16 tables (−38 MB). The arithmetic is cheaper than the tables' random-access cache misses — this is where the speedup comes from. - `PlayerExecution` and `SpatialQuery` each kept their own per-game generation-stamped visited `Uint32Array`; both now share one via `TileTraversalScratch` (−30 MB). - `PathFinderStepper` stores numeric paths as `Uint32Array` (half the bytes; steppers hold their full path for a unit's whole journey). - `ConnectedComponents` frees its flood-fill queue after `initialize()`. Player tile sets (third commit): - New `TileSet`: insertion-ordered set of tile refs backed by a dense `Uint32Array` plus an open-addressing hash index — ~12 bytes/element vs ~34 for a native `Set<number>`. Deletes tombstone; compaction is deferred while iteration is in progress so positions never shift under an iterator. - Iteration semantics match `Set` exactly (insertion order, entries added mid-iteration visited, deleted ones skipped, delete+re-add moves to end) — the simulation relies on this order for determinism, and the unchanged hash confirms it. - `Player.borderTiles()` now returns `ReadonlyTileSet` (a native `Set` still satisfies it structurally); `GameRunner.playerBorderTiles` copies into a real `Set` since that result crosses the worker boundary via structured clone. ## Footprint curve (giant world map, live MB after forced GC) \| checkpoint \| before \| after \| \|---\|---\|---\| \| spawn end \| 20 + 100 buf \| 20 + 55 buf \| \| tick 6301 \| 119 + 161 buf \| 29 + 127 buf \| \| tick 12301 \| 130 + 161 buf \| 32 + 129 buf \| ## Validation - Final hash `57830793797434300` identical across baseline / round 1 / round 2 runs (12 000 ticks). - Full suite passes (1798 + 126 tests), including new `TileSet` tests: order semantics, mutation-during-iteration parity with `Set`, tombstone compaction, and a 20 000-op randomized differential test against native `Set`. - Runs recorded in `tests/perf/output/footprint-{baseline,round1,round2}-giant.txt`. 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>	2026-07-04 15:25:29 -07:00
Evan	5e4b2791aa	perf: reduce core-sim GC churn 42% and add GC-churn profiling to the perf harness (#4494 ) ## Summary Reduces core-simulation GC churn by 42% on a 20-game-minute Giant World Map run, and extends the headless full-game perf harness so churn is measurable and regressions are visible. ### 1. GC-churn measurement (`tests/perf/fullgame/GcProfiler.ts`) `npm run perf:game` now reports: - GC pauses by kind (minor/major/incremental) via a `PerformanceObserver` on `'gc'` entries, bucketed into tick windows by timestamp (V8 only delivers these entries on a timer task, so they're flushed after the run) - Allocation rate per `--window N` ticks (default 1000) from used-heap deltas sampled every tick, so churn can be tracked across game phases - Top allocating functions from the V8 sampling heap profiler with `includeObjectsCollectedBy{Major,Minor}GC` — i.e. actual churn including short-lived garbage, not live memory — plus a `.heapprofile` loadable in Chrome DevTools (Memory → Allocation sampling) New flags: `--window N`, `--no-gc-profile`, `--no-alloc-profile`. ### 2. Allocation reductions in the hot paths it found \| Site \| Change \| \|---\|---\| \| `GameMap.bfs` \| inline neighbor enumeration instead of an array per visited tile \| \| `GameMap`/`Game` \| new `forEachNeighborNSWE` — allocation-free iterator matching `neighbors()` N,S,W,E order for order-sensitive callers (`forEachNeighbor` visits W,E,N,S, so substituting it would change sim behavior) \| \| `PlayerImpl.nearby` / `sharesBorderWith` / `shoreReachableNeighbors` \| no per-call neighbor arrays; no materialized shore-tile array \| \| `PlayerImpl.units(types)` \| gather into a reusable scratch buffer, return one exact-size slice (still a fresh snapshot array per call) \| \| `AiAttackBehavior.maybeAttack` \| single pass over border neighbors replacing the `flatMap`/`filter`/`map` chain over every border tile \| \| `AiAttackBehavior.isBorderingNukedTerritory` \| reusable `neighbors4` buffer with early exit \| \| `SharedWaterCache.build` \| allocation-free neighbor iteration \| \| `SpatialQuery.bfsNearest` \| first-minimum scan instead of collect-then-stable-sort (identical result incl. tie-breaking) \| ### Results (Giant World Map, 400 bots, 12,000 ticks ≈ 20 game-minutes, seed `perf-default`) \| Metric \| Before \| After \| \|---\|---\|---\| \| Sampled allocations (incl. collected) \| 97.7 GB \| 56.9 GB (−42%) \| \| GC count / total pause \| 1,682 / 3,313 ms (1.8% of wall) \| 1,058 / 2,087 ms (1.2%) \| \| Ticks/sec \| 66 \| 70 \| \| p99 / max tick \| 49.9 ms / 988 ms \| 43.5 ms / 689 ms \| \| Ticks over 100 ms budget \| 31 \| 19 \| ## Determinism Every rewrite preserves exact iteration order (the new NSWE iterator exists precisely for the order-sensitive sites). Verified by identical final game-state hashes on three runs: Giant World Map 12,000 ticks (`67286276735690560`), Giant World Map 2,000 ticks, and World 1,800 ticks. ## Test plan - [x] Full suite green (1,896 tests) - [x] New tests: `forEachNeighborNSWE` order contract vs `neighbors()` over every tile; `units()` filtering semantics (insertion order, fresh-array guarantee, duplicate types, Set path) - [x] Final-hash equality on 3 seeded headless runs (2 maps) 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Claude Fable 5 <noreply@anthropic.com>	2026-07-03 12:30:28 -07:00
Evan	8da2291a49	Add full-game perf harness for the core simulation (#4228 ) ## Summary Adds a full-game performance harness under `tests/perf/fullgame/` that runs the real simulation pipeline headlessly — `GameRunner` + `Executor` with the real `Config`, nations from the map manifest, and bots on a production map from `resources/maps/` — for a configurable number of ticks, then reports where the time goes. ```bash npm run perf:game # world, 400 bots, 1800 ticks npm run perf:game -- --map giantworldmap --ticks 3600 npm run perf:game -- --no-exec-profile # purest CPU profile (no timing wrappers) ``` ## What it reports 1. Per-tick wall time — mean / p50 / p95 / p99 / max, count of ticks over the 100ms budget, and the slowest ticks by tick number. 2. Time per Execution class — every `Execution`'s `init()`/`tick()` is timed and aggregated by class name (`AttackExecution`, `NationExecution`, …). 3. Top functions by self time — via the V8 sampling profiler (`node:inspector`), so no instrumentation skew. Also writes a `.cpuprofile` to `tests/perf/output/` (gitignored) that opens in Chrome DevTools as a flame graph. ## Determinism The run is fully deterministic for a given `--seed`/`--map`/`--bots` (verified: identical final hashes across runs), and the final game-state hash is printed — so an optimization can be checked to not change simulation behavior. ## Sample output (world, 400 bots, 1800 ticks) ``` --- Per-tick wall time (game phase) --- mean 9.04ms \| p50 7.90ms \| p95 17.1ms \| p99 21.5ms \| max 31.7ms Over 100ms budget: 0 / 1800 ticks --- Time by Execution class --- execution total ms % tick ms init ms ticks instances AttackExecution 6568 48.8 6288 280 212536 4200 PlayerExecution 2832 21.0 2832 0.36 492049 472 NationExecution 2508 18.6 2508 0.23 144654 72 TransportShipExecution 703 5.2 96.0 607 30440 257 ... --- Top functions by self time (V8 sampling profiler) --- self ms % function location 1065 6.5 forEachNeighborWithDiag src/core/game/GameImpl.ts 979 6.0 conquer src/core/game/GameImpl.ts 948 5.8 (anonymous) src/core/execution/AttackExecution.ts 595 3.6 toFullUpdate src/core/game/PlayerImpl.ts ... ``` The harness lives in a subdirectory so the existing `npm run perf` micro-benchmark runner (which globs `tests/perf/*.ts`) doesn't pick it up. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-authored-by: Claude Fable 5 <noreply@anthropic.com>	2026-06-11 18:52:18 -07:00

Author

SHA1

Message

Date

Evan

7fa81c6bb9

perf: reduce core live-memory footprint by 45% on large maps (#4507 )

## Summary

Reduces the simulation's steady-state memory footprint. On Giant World
Map at 20 game-minutes (12 000 ticks, 400 bots, seed `perf-default`),
live memory after a full GC drops **293 MB → 161 MB (−45%)**; unforced
peak heap drops **326 MB → 165 MB**. The simulation also runs ~10%
faster (85 → 94 ticks/s). The final game-state hash is **bit-identical**
(`57830793797434300`) — no behavior change.

## Measurement (first commit)

The full-game perf harness gains a footprint mode:

- `--footprint` — forces a full GC at every `--window` boundary and
records the live heap / ArrayBuffer / RSS curve across the game
(requires `NODE_OPTIONS=--expose-gc`).
- `--snapshot-at 0,2000,12000` — writes V8 `.heapsnapshot` files at
chosen ticks.
- `HeapSnapshotRetainers.ts` — attributes every heap node to its nearest
meaningfully-named retainer (e.g. `PlayerImpl._tiles`), plus prints
retainer chains for all nodes ≥128 KB. `HeapSnapshotSummary.ts` is a
streaming fallback for snapshots too large to `JSON.parse`.

Baseline attribution at tick 12 000: player `_tiles`/`_borderTiles` Sets
**83 MB**, GameMap `refToX`/`refToY` lookup tables **38 MB**, two
duplicate 30.5 MB visited-scratch arrays, trade-ship stepper paths **15
MB**, a construction-only flood-fill queue **9.5 MB**.

## Optimizations

**Map-sized buffers (second commit):**
- `GameMap.x()/y()` compute `ref % width` / `(ref / width) | 0` instead
of reading two per-tile Uint16 tables (−38 MB). The arithmetic is
cheaper than the tables' random-access cache misses — this is where the
speedup comes from.
- `PlayerExecution` and `SpatialQuery` each kept their own per-game
generation-stamped visited `Uint32Array`; both now share one via
`TileTraversalScratch` (−30 MB).
- `PathFinderStepper` stores numeric paths as `Uint32Array` (half the
bytes; steppers hold their full path for a unit's whole journey).
- `ConnectedComponents` frees its flood-fill queue after `initialize()`.

**Player tile sets (third commit):**
- New `TileSet`: insertion-ordered set of tile refs backed by a dense
`Uint32Array` plus an open-addressing hash index — ~12 bytes/element vs
~34 for a native `Set<number>`. Deletes tombstone; compaction is
deferred while iteration is in progress so positions never shift under
an iterator.
- Iteration semantics match `Set` exactly (insertion order, entries
added mid-iteration visited, deleted ones skipped, delete+re-add moves
to end) — the simulation relies on this order for determinism, and the
unchanged hash confirms it.
- `Player.borderTiles()` now returns `ReadonlyTileSet` (a native `Set`
still satisfies it structurally); `GameRunner.playerBorderTiles` copies
into a real `Set` since that result crosses the worker boundary via
structured clone.

## Footprint curve (giant world map, live MB after forced GC)

| checkpoint | before | after |
|---|---|---|
| spawn end | 20 + 100 buf | 20 + 55 buf |
| tick 6301 | 119 + 161 buf | 29 + 127 buf |
| tick 12301 | 130 + 161 buf | 32 + 129 buf |

## Validation

- Final hash `57830793797434300` identical across baseline / round 1 /
round 2 runs (12 000 ticks).
- Full suite passes (1798 + 126 tests), including new `TileSet` tests:
order semantics, mutation-during-iteration parity with `Set`, tombstone
compaction, and a 20 000-op randomized differential test against native
`Set`.
- Runs recorded in
`tests/perf/output/footprint-{baseline,round1,round2}-giant.txt`.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>

2026-07-04 15:25:29 -07:00

Evan

5e4b2791aa

perf: reduce core-sim GC churn 42% and add GC-churn profiling to the perf harness (#4494 )

## Summary

Reduces core-simulation GC churn by **42%** on a 20-game-minute Giant
World Map run, and extends the headless full-game perf harness so churn
is measurable and regressions are visible.

### 1. GC-churn measurement (`tests/perf/fullgame/GcProfiler.ts`)

`npm run perf:game` now reports:

- **GC pauses** by kind (minor/major/incremental) via a
`PerformanceObserver` on `'gc'` entries, bucketed into tick windows by
timestamp (V8 only delivers these entries on a timer task, so they're
flushed after the run)
- **Allocation rate** per `--window N` ticks (default 1000) from
used-heap deltas sampled every tick, so churn can be tracked across game
phases
- **Top allocating functions** from the V8 sampling heap profiler with
`includeObjectsCollectedBy{Major,Minor}GC` — i.e. actual churn including
short-lived garbage, not live memory — plus a `.heapprofile` loadable in
Chrome DevTools (Memory → Allocation sampling)

New flags: `--window N`, `--no-gc-profile`, `--no-alloc-profile`.

### 2. Allocation reductions in the hot paths it found

| Site | Change |
|---|---|
| `GameMap.bfs` | inline neighbor enumeration instead of an array per
visited tile |
| `GameMap`/`Game` | new `forEachNeighborNSWE` — allocation-free
iterator matching `neighbors()` N,S,W,E order for order-sensitive
callers (`forEachNeighbor` visits W,E,N,S, so substituting it would
change sim behavior) |
| `PlayerImpl.nearby` / `sharesBorderWith` / `shoreReachableNeighbors` |
no per-call neighbor arrays; no materialized shore-tile array |
| `PlayerImpl.units(types)` | gather into a reusable scratch buffer,
return one exact-size slice (still a fresh snapshot array per call) |
| `AiAttackBehavior.maybeAttack` | single pass over border neighbors
replacing the `flatMap`/`filter`/`map` chain over every border tile |
| `AiAttackBehavior.isBorderingNukedTerritory` | reusable `neighbors4`
buffer with early exit |
| `SharedWaterCache.build` | allocation-free neighbor iteration |
| `SpatialQuery.bfsNearest` | first-minimum scan instead of
collect-then-stable-sort (identical result incl. tie-breaking) |

### Results (Giant World Map, 400 bots, 12,000 ticks ≈ 20 game-minutes,
seed `perf-default`)

| Metric | Before | After |
|---|---|---|
| Sampled allocations (incl. collected) | 97.7 GB | **56.9 GB (−42%)** |
| GC count / total pause | 1,682 / 3,313 ms (1.8% of wall) | 1,058 /
2,087 ms (1.2%) |
| Ticks/sec | 66 | 70 |
| p99 / max tick | 49.9 ms / 988 ms | 43.5 ms / 689 ms |
| Ticks over 100 ms budget | 31 | 19 |

## Determinism

Every rewrite preserves exact iteration order (the new NSWE iterator
exists precisely for the order-sensitive sites). Verified by identical
final game-state hashes on three runs: Giant World Map 12,000 ticks
(`67286276735690560`), Giant World Map 2,000 ticks, and World 1,800
ticks.

## Test plan

- [x] Full suite green (1,896 tests)
- [x] New tests: `forEachNeighborNSWE` order contract vs `neighbors()`
over every tile; `units()` filtering semantics (insertion order,
fresh-array guarantee, duplicate types, Set path)
- [x] Final-hash equality on 3 seeded headless runs (2 maps)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>

2026-07-03 12:30:28 -07:00

Evan

8da2291a49

Add full-game perf harness for the core simulation (#4228 )

## Summary

Adds a full-game performance harness under `tests/perf/fullgame/` that
runs the **real simulation pipeline** headlessly — `GameRunner` +
`Executor` with the real `Config`, nations from the map manifest, and
bots on a production map from `resources/maps/` — for a configurable
number of ticks, then reports where the time goes.

```bash
npm run perf:game                                        # world, 400 bots, 1800 ticks
npm run perf:game -- --map giantworldmap --ticks 3600
npm run perf:game -- --no-exec-profile                   # purest CPU profile (no timing wrappers)
```

## What it reports

1. **Per-tick wall time** — mean / p50 / p95 / p99 / max, count of ticks
over the 100ms budget, and the slowest ticks by tick number.
2. **Time per Execution class** — every `Execution`'s `init()`/`tick()`
is timed and aggregated by class name (`AttackExecution`,
`NationExecution`, …).
3. **Top functions by self time** — via the V8 sampling profiler
(`node:inspector`), so no instrumentation skew. Also writes a
`.cpuprofile` to `tests/perf/output/` (gitignored) that opens in Chrome
DevTools as a flame graph.

## Determinism

The run is fully deterministic for a given `--seed`/`--map`/`--bots`
(verified: identical final hashes across runs), and the final game-state
hash is printed — so an optimization can be checked to not change
simulation behavior.

## Sample output (world, 400 bots, 1800 ticks)

```
--- Per-tick wall time (game phase) ---
mean 9.04ms | p50 7.90ms | p95 17.1ms | p99 21.5ms | max 31.7ms
Over 100ms budget: 0 / 1800 ticks

--- Time by Execution class ---
execution                      total ms  %     tick ms  init ms  ticks   instances
AttackExecution                6568      48.8  6288     280      212536  4200
PlayerExecution                2832      21.0  2832     0.36     492049  472
NationExecution                2508      18.6  2508     0.23     144654  72
TransportShipExecution         703       5.2   96.0     607      30440   257
...

--- Top functions by self time (V8 sampling profiler) ---
self ms  %    function                 location
1065     6.5  forEachNeighborWithDiag  src/core/game/GameImpl.ts
979      6.0  conquer                  src/core/game/GameImpl.ts
948      5.8  (anonymous)              src/core/execution/AttackExecution.ts
595      3.6  toFullUpdate             src/core/game/PlayerImpl.ts
...
```

The harness lives in a subdirectory so the existing `npm run perf`
micro-benchmark runner (which globs `tests/perf/*.ts`) doesn't pick it
up.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-authored-by: Claude Fable 5 <noreply@anthropic.com>

2026-06-11 18:52:18 -07:00

3 Commits