964 Commits over 243 Days - 0.17cph!
Update: replace Spans with NativeArray in WaterLevel.GetWaterInfos
Will allow to pursue using burst jobs internally.
Tests: tests passed
Update: leaving an optim todo idea comment
Tests: not applicable
Update: sprinkle some profiling scopes
Tests: ran unit tests
Optim: use persistent allocs in GetWaterLevels
- I need to properly clean those up at server shutdown, but I'll solve that later
1k waves perf test shows ~100micros savings and no allocs - 1.45ms vs old 1.55ms. This is final optim in the area for now, making us ~80% faster than vanilla managed code(8.6ms).
Tests: ran unit tests
Optim: remove last managed loop that picks between dynamic waves and static water
1k test with waves shaves off ~0.25ms - 1.55ms vs previous 1.8ms
Tests: ran unit tests
Optim: gather coarse distances to shore via indirect batch
1k waves perf test shows another ~0.5ms shaved - from 2.3ms to 1.8ms
Tests: ran unit tests
Optim: grab TerrainHeights via indirect batch
1k waves perf test used to take 3ms, now 2.3ms
Tests: ran unit tests
Update: gathering OceanSim's water heights in indirect way
- Also fixed the OceanSim's GetHeightsJobIndirect job as it works on world positions, not uvs
This allows me to convert the rest of the logic to Burst jobs.
Tests: ran unit tests
Update: Bunch of utility Burst jobs for WaterLevel.GetWaterLevels outstanding jobification
Tests: none, they are not plugged in
Update: Use NativeArray for TerrainTexturing.ShoreVector storage of distances and vectors
- Had to add an editor-only safety check for WaterCamera for a super rare exception
Tests: Played procgen and Craggy in editor. Forced a bunch of domain reloads to validate WaterCamera doesn't break anymore
Update: perf tests for WaterInfo/-s
For 1k sample points - batch version is 4(no waves)/3(with waves) times faster than serial
Tests: not applicable
Update: addding perf tests to GetWaterLevel/-s
- Doesn't have cases that have water volumes, so slowest path not stressed in both cases
On 1k locations batch version is 10x faster than serial(130micros vs 1.18ms) with no waves, with waves - 3x faster(3ms vs 9ms)
Tests: not applicable
Optim: replacing GetWaterLevels hot path with burst jobs
- Converted some internal NativeArrays to persistent, lazy growing ones to reduce allocation overhead
On a 25 player case shows a 25% improvement/5micros, though the sample size is too small(not enough players to check). I need a good way to test this at larger scales, or these 5min waiting times will murder me.
Tests: ran unit tests, ran staging demo multiple times, water checks counters in the expected range.
Bugfix: properly access count from CoarseQueryGridBoundsJobIndirect
Tests: ran unit tests
Merge: from erosion
Bringing across TerrainMap NativeArray conversion that I need for my batch checks
Tests: reran TerrainMap tests
Merge: from main
Aligning with erosion branch parent CL
Tests: none, no conflicts
Update: added GetTopologies jobs for TopologyMap
- Also covered with tests
- Sprinkled [WriteOnly]/[ReadOnly] attributes into jobs
- Fixed constant string allocation in tests
Tests: ran unit tests
Update: adding WaterMap height sampling burst jobs
- Also covered them with tests to validate output against non-job calls
Tests: ran unit tests
Merge: from terrainmap_nativearray2
- Fixed a bug of invalid reinterpret
Bringing across my tests since we're both working on the same thing.
Tests: ran the new tests
Add: covering TerrainMap public api in tests
Prep for switching over to NativeArray
Tests: ran the new tests
Optim: skip issuing 0-length WaterCollision.GetIgnore jobs
Tests: ran unit tests
Optim: make TerrainCollision.GetIgnore and WaterCollision.GetIgnore use indirect Burst jobs
- Also added a bunch of optim TODOs
Starting to build up an indirect collection of methods. Next up will convert related GamePhysics calls
Tests: ran unit tests - they passed. Played back staging demo multiple times with analyzedemo - got comparable in-water counts
New: add WaterStateProcessor for full server demo analysis
- Also redid ViolationProcessor to avoid leaking internal implementation to other files
Tracks how many players across all server frames were in water - using it to track consistency of water checks while modifying internals
Tests: played back staging demo - got consistent-enough results
Merge: froim texttable_allocs
Previously merged into Aux, but Aux2 one seems to be more fresh
Tests: none, no conflicts
Merge: from main
Tests: none, no conflicts
Merge: from concurrentquueue_leak
- Fixes an edge-case on high-pop servers that can cause a 10MB/s garbage allocation rate
Tests: validated fix works via synthetic test, then had a 2-player session on craggy to validate network traffic works as intended
Clean: remove false-sharing todo
Don't have proof of how impactful it is now in this area, so not going to jump the gun for now
Tests: none, trivial change
Undo: pick the right version of ProjectSettings from history
Tests: none, trivial change
Undo: revert ProjectSettings
Tests: none, trivial change
Bugfix/Optim: propagate fix to other ConcurrentQueues in the file
Tests: local 2-player session on Craggy in editors
Clean: remove the hack test
Now that the bug was validated this doesn't serve any purpose
Tests: compiled in editor
Bugfix/Optim: Don't force ConcurrentQueue to allocate new segments on every push
My hack/forced test no longer allocates - now I just need to cover left-over cases of this problem
Tests: on Craggy in editor took a snapshot - no more allocs in the forced test area.
Hack: improved the runaway test
Now I can see it via server profiler - ~90KB across 192 allocs for 192 packets
Tests: Craggy in editor, took a snapshot
Hack: synthetic test to proc ConcurrentQueue memory runaway
Managed to reproduce high memory allocation edge case of ConcurrentQueue. Need to rip it out after applying the fix.
Tests: ran the code and checked state of ConcurrentQueue with a debugger
Merge: from texttable_allocs
- Replacing old TextTable with the new one that allows deferred formatting and avoids allocs
Tests: new unit tests and manual invoke of server.playerlistpos and status commands on Craggy
Optim: replacing old TextTable with the new one
- Updated Server.GetPlayerListPosTable to new APIs
Synthetic test of `playerlistpos` for 200 players on Craggy runs in 0.5ms (instead of prev 5ms) and 99% less allocs.
Tests: Started Craggy in editor with a synthetic test. Also used a couple TextTable rcon commands
Update: make logic match `shouldPadColumns` meaning
it was doing the inverted logic before, but didn't affect tests since they used old values before the rename.
Tests: ran unit tests.
Update: adding extra perf test to track shouldPadColumns influence
- Also renamed isForJson to shouldPadColumns
Shows 6x perf impact between no pad and pad. Makes sense, since for some types we need to do string formatting and that's heavy.
Tests: ran the tests.
Update: Add deferred formatting for more types (uint, long, ulong, double, vec3)
- Extended tests to cover these cases
Tests: ran unit tests
Optim: don't allocate when writing values via JsonTextWriter
- also exposed a `stringify` param that can avoid string conversion
Tests: ran unit tests
Update: revert back to Newtonsoft.Json
- Temporarily reverted the TextTable in use to original version to validate via tests
We have to o many json serialization impls, so going to avoid trying to add a new one. I'll have to check if I can rip out the stale dll or not.
Tests: ran the unit tests
Update: Replace old TextTable with new
- Also did a light convert of `GetPlayerListPosTable` - it was able to handle 300 players instead of 200 in 4.7ms
- Updated ServerProfiler to exclude new System.Text.Json assemblies (and a couple extra bits) - built from f5b849e4
Tests: synthetic test in editor via constantly running getplayerlistpos command for 300 players, built server and client standalones(win64), booted server standalone
Clean: remove extra comments and finish TODOs
Just gotta swap out the implementations and test it all builds/runs
Tests: ran tests
Optim: store all row values into a continuous aray
- Inserts are now 50% faster than the old table, 99% less allocs
- ToText is 10% faster than old table, 99% less allocs
- ToJson is 47% slower than old table, 99% less allocs
Json serialization is a head scratcher - need to look through the utf8jsonwriter.
Tests: unit tests
Optim: don't serialize bool in memory - we can rely on const strings instead
Brings us a smidge closer to original api, but still slower(-12%).
Tests: unit tests
Optim: allow pre-allocating of columns and rows
- Rows were supported previously, but I didn't realize the call was going to the params overload, breaking the optim
Inserts are 22% faster now, with 6 allocs per test run (except first run, for what-ever reason - need to explore) instead of previous 12k
Tests: unit tests
Update: update perf tests to use new apis
Rip#2 - was testing old compat apis. And suprisingly, they were faster than new APIs. Well, there's still options that I haven't tapped into.
Tests: unit tests
Optim: allow user to provide an optim hint if we'll be serializing to json
This allows to skip table alignment logic. Still no effect on json perf tests(wat), I'm suspicious.
Tests: ran the unit tests.
Optim: disable json validation when streaming it with the new table
Surprisingly has 0 effect on 4k perf - the bottleneck must be somewhere else.
Tests: ran unit tests
Bugfix: Fixing json serialization
- Switching to System.Text.Json as it allows steamed serialization
- Updated binaries, System.Text.Json deps got stale
Text serialization is 2x faster in 4k perf test(62.5% less allocs), Json serialization is 12% slower(20% less allocs)
Tests: ran the unit tests