852 Commits over 243 Days - 0.15cph!
Optim: replacing GetWaterLevels hot path with burst jobs
- Converted some internal NativeArrays to persistent, lazy growing ones to reduce allocation overhead
On a 25 player case shows a 25% improvement/5micros, though the sample size is too small(not enough players to check). I need a good way to test this at larger scales, or these 5min waiting times will murder me.
Tests: ran unit tests, ran staging demo multiple times, water checks counters in the expected range.
Bugfix: properly access count from CoarseQueryGridBoundsJobIndirect
Tests: ran unit tests
Merge: from erosion
Bringing across TerrainMap NativeArray conversion that I need for my batch checks
Tests: reran TerrainMap tests
Merge: from main
Aligning with erosion branch parent CL
Tests: none, no conflicts
Update: added GetTopologies jobs for TopologyMap
- Also covered with tests
- Sprinkled [WriteOnly]/[ReadOnly] attributes into jobs
- Fixed constant string allocation in tests
Tests: ran unit tests
Update: adding WaterMap height sampling burst jobs
- Also covered them with tests to validate output against non-job calls
Tests: ran unit tests
Merge: from terrainmap_nativearray2
- Fixed a bug of invalid reinterpret
Bringing across my tests since we're both working on the same thing.
Tests: ran the new tests
Add: covering TerrainMap public api in tests
Prep for switching over to NativeArray
Tests: ran the new tests
Optim: skip issuing 0-length WaterCollision.GetIgnore jobs
Tests: ran unit tests
Optim: make TerrainCollision.GetIgnore and WaterCollision.GetIgnore use indirect Burst jobs
- Also added a bunch of optim TODOs
Starting to build up an indirect collection of methods. Next up will convert related GamePhysics calls
Tests: ran unit tests - they passed. Played back staging demo multiple times with analyzedemo - got comparable in-water counts
New: add WaterStateProcessor for full server demo analysis
- Also redid ViolationProcessor to avoid leaking internal implementation to other files
Tracks how many players across all server frames were in water - using it to track consistency of water checks while modifying internals
Tests: played back staging demo - got consistent-enough results
Merge: froim texttable_allocs
Previously merged into Aux, but Aux2 one seems to be more fresh
Tests: none, no conflicts
Merge: from main
Tests: none, no conflicts
Merge: from concurrentquueue_leak
- Fixes an edge-case on high-pop servers that can cause a 10MB/s garbage allocation rate
Tests: validated fix works via synthetic test, then had a 2-player session on craggy to validate network traffic works as intended
Clean: remove false-sharing todo
Don't have proof of how impactful it is now in this area, so not going to jump the gun for now
Tests: none, trivial change
Undo: pick the right version of ProjectSettings from history
Tests: none, trivial change
Undo: revert ProjectSettings
Tests: none, trivial change
Bugfix/Optim: propagate fix to other ConcurrentQueues in the file
Tests: local 2-player session on Craggy in editors
Clean: remove the hack test
Now that the bug was validated this doesn't serve any purpose
Tests: compiled in editor
Bugfix/Optim: Don't force ConcurrentQueue to allocate new segments on every push
My hack/forced test no longer allocates - now I just need to cover left-over cases of this problem
Tests: on Craggy in editor took a snapshot - no more allocs in the forced test area.
Hack: improved the runaway test
Now I can see it via server profiler - ~90KB across 192 allocs for 192 packets
Tests: Craggy in editor, took a snapshot
Hack: synthetic test to proc ConcurrentQueue memory runaway
Managed to reproduce high memory allocation edge case of ConcurrentQueue. Need to rip it out after applying the fix.
Tests: ran the code and checked state of ConcurrentQueue with a debugger
Merge: from texttable_allocs
- Replacing old TextTable with the new one that allows deferred formatting and avoids allocs
Tests: new unit tests and manual invoke of server.playerlistpos and status commands on Craggy
Optim: replacing old TextTable with the new one
- Updated Server.GetPlayerListPosTable to new APIs
Synthetic test of `playerlistpos` for 200 players on Craggy runs in 0.5ms (instead of prev 5ms) and 99% less allocs.
Tests: Started Craggy in editor with a synthetic test. Also used a couple TextTable rcon commands
Update: make logic match `shouldPadColumns` meaning
it was doing the inverted logic before, but didn't affect tests since they used old values before the rename.
Tests: ran unit tests.
Update: adding extra perf test to track shouldPadColumns influence
- Also renamed isForJson to shouldPadColumns
Shows 6x perf impact between no pad and pad. Makes sense, since for some types we need to do string formatting and that's heavy.
Tests: ran the tests.
Update: Add deferred formatting for more types (uint, long, ulong, double, vec3)
- Extended tests to cover these cases
Tests: ran unit tests
Optim: don't allocate when writing values via JsonTextWriter
- also exposed a `stringify` param that can avoid string conversion
Tests: ran unit tests
Update: revert back to Newtonsoft.Json
- Temporarily reverted the TextTable in use to original version to validate via tests
We have to o many json serialization impls, so going to avoid trying to add a new one. I'll have to check if I can rip out the stale dll or not.
Tests: ran the unit tests
Update: Replace old TextTable with new
- Also did a light convert of `GetPlayerListPosTable` - it was able to handle 300 players instead of 200 in 4.7ms
- Updated ServerProfiler to exclude new System.Text.Json assemblies (and a couple extra bits) - built from f5b849e4
Tests: synthetic test in editor via constantly running getplayerlistpos command for 300 players, built server and client standalones(win64), booted server standalone
Clean: remove extra comments and finish TODOs
Just gotta swap out the implementations and test it all builds/runs
Tests: ran tests
Optim: store all row values into a continuous aray
- Inserts are now 50% faster than the old table, 99% less allocs
- ToText is 10% faster than old table, 99% less allocs
- ToJson is 47% slower than old table, 99% less allocs
Json serialization is a head scratcher - need to look through the utf8jsonwriter.
Tests: unit tests
Optim: don't serialize bool in memory - we can rely on const strings instead
Brings us a smidge closer to original api, but still slower(-12%).
Tests: unit tests
Optim: allow pre-allocating of columns and rows
- Rows were supported previously, but I didn't realize the call was going to the params overload, breaking the optim
Inserts are 22% faster now, with 6 allocs per test run (except first run, for what-ever reason - need to explore) instead of previous 12k
Tests: unit tests
Update: update perf tests to use new apis
Rip#2 - was testing old compat apis. And suprisingly, they were faster than new APIs. Well, there's still options that I haven't tapped into.
Tests: unit tests
Optim: allow user to provide an optim hint if we'll be serializing to json
This allows to skip table alignment logic. Still no effect on json perf tests(wat), I'm suspicious.
Tests: ran the unit tests.
Optim: disable json validation when streaming it with the new table
Surprisingly has 0 effect on 4k perf - the bottleneck must be somewhere else.
Tests: ran unit tests
Bugfix: Fixing json serialization
- Switching to System.Text.Json as it allows steamed serialization
- Updated binaries, System.Text.Json deps got stale
Text serialization is 2x faster in 4k perf test(62.5% less allocs), Json serialization is 12% slower(20% less allocs)
Tests: ran the unit tests
Bugfix: Fix ToString not aligning correctly for new table
- Also increased correctness test data set to 128 rows
Float size calculation could be improved, but I'll leave that till later as it's not trivial. Now to fix json serialization
Tests: ran the unit tests
Update: add deferred formatting overloads to TextTable
- Also expanded tests to cover backwards compatibility checks as well as new API correctness checks
- Refactored tests as they were becoming unwieldy
New API correctness checks are failing - will fix next.
Tests: ran new unit tests
Optim: Pool internal lists
- This is a breaking change - can no longer double call `ToJson` or `ToString` as we clean up resources during it
New version is 12k allocs at 4k records test vs 16k allocs. Still loosing on time, but hoping future serialization streaming changes will catch us up.
Tests: ran the unit tests
Update: store type unions instead of strings as row cell
Right now this is a pessimization(-25% perf on 4k "Record" test) - we do more work to manage it, but it should open up more optim opportunities.
Tests: ran the now correct unit tests
Bugfix: fixing unit tests
Rip. Missed the fact that unit tests were runnin on empty data. At least there weren't any issues with previous optims.
Tests: rajn the unit tests
Optim: avoid row allocations
Tests: ran unit tests
Optim: get rid of small Column allocs
- also removed all private qualifiers
Tests: ran unit tests
Update: set out optim plans
- reuse string builder via pool
- Renamed TextTableOriginal to TextTableNew (otherwise I'll accidentally break the game before I mean it)
Tests: none, trivial changes
Update: initial test setup to validate optimizations for TextTable
Tests: ran the new unit and perf tests
Merge: from eventrecord_allocs
- Reduces the number of allocations caused by our server-side analytics
- New "analytics.small_buffer_send_limit" persistent ServerVar to reduce task scheduling overhead. Set -1 to return original behavior.
Tests: ran existing analytics unit tests, booted server in editor.
Merge: from main
Tests: none, no conflicts
Clean: remove my testing setup
Test: none, trivial change