branchrust_reboot/main/baseplayer_serverupdateparallelcancel
141 Commits over 92 Days - 0.06cph!
Update: new ServerProfiler with more filters
- Built on 276b03cf
Excludes a bunch of BasePlayer, BaseEntity, BaseNetworkable light functions, Native calls to RakNet and EAC, StableCache iterators.
Tests: booted in editor, took a snapshot of 10 frames
Optim: Set Pool<BufferStream> capacity to 32k (sum of NetRead and NetWrite caps)
We end up eating up default capacity on busy servers(1.1k BufferStreams on 330p server), causing allocs. This should help avoid that.
Tests: none, trivial change
Optim: UsePlayerTasks - pre-allocate NetWrites when sending snapshot queues
This should reduce contention on the Pool<NetWrite>'s lock and let worker threads blast as fast as possible. Will need to revisit after measuring the pre-alloc overhead for main thread.
Tests: On Craggy with UsePlayerUpdateJobs 2 flew away from island's network range then back - entities were where there have been before flying away.
Update: UsePlayerTasks - ensure SendEntitySnapshotsWithChildren_AsyncState have equally sized batches
Previously we counter players for batch limit (skipping it's hierarchy), but that could lead to lopsided unequal tasks
Tests: couldn't test this one, but it's similar code to other places so should be gud
Optim: UseOcclusionV2 - ServerOcclusion caches unoccluded connections
- removed couple redundant if checks
This allows us to completelly skip faster-but-still-slow ShouldNetworkTo that have internal hash lookups in SendNetworkPositions - hoping it'll speed it up 50%
Tests: ran in and out of occlusion on Craggy. disconnected to validate sleeper, then killed sleeper and reconnected - all works
Update: add extra TODO for UseOcclusionV2
Tests: none, trivial change
Update: UseOcclusionV2 - propagate networkTime to OcclusionGetRecentlySeen
Should be a smidge faster, but it's primarily to enable threading if we go that way. Also saves on profiler inhibiting the call (need to exclude it on the profiler side)
Tests: none, trivial change
Clean: restructure UseOcclusionV2 code to make profiling easier
Tests: compiles in editor
Clean: fix ShouldSkipServerOcclusSion typo
English is hard
Tests: editor compiles
Optim: UseOcclusionV2 - OcclusionFrameCache uses NetIDs instead of BasePlayer pairs
This avoids ObjectEqualityComparer which has overhead because of unity-null checks
Tests: none, trivial change
Optim: UseOcclusionV2 - reduce lastPlayerVisibility hash lookups
Should help OcclusionGatherLostPairsToSend internal loop to stay hot. Also gets rid of small funcs that are inhibited by profiler
Tests: ran in and out of occlusion
Clean: add a comment explaining the extra OcclusionUpdateLostVisibility
Tests: none, trivial change
Optim: UseOcclusionV2 - invert how we process occludees
- instead of checking if we need to send player to occludees, we process occludees against player
This caused us to miss sleepers previously, which we corrected with extra checks by reversing pairs(and potentially generating more netowrk messages). But we don't need to do all of that extra work if we do it in the right order. Hoping this cuts downstream queue 2x.
Tests: 2p on craggy - checked occlusion works, going invis works, spectating works, sleepers replicate
Clean: introduce BasePlayer.UseOcclusionV2 use it as a switch instead of UsePlayerTasks (works same way)
Makes it easier to track what's related to what
Tests: editor compiles
Optim: ServerOcclusion - use network IDs instead of BasePlayer for dictionary key
Helps avoid unity-null check overhead when looking up recently seen players
Tests: 2p session on craggy - disconnected as second and killed sleeper
Bugfix: ServerOcclusion - remove destroyed players from lastPlayerVisibility cache
We were leaking baseplayers and it was slowing down hash lookups
Tests: 2p session on craggy, 2nd player disconnected, killed their sleeper - lastPlayerVisibility was empty for the remaining player
Update: UsePlayerTasks - lifting skipping logic out of OcclusionLineOfSight to caller
- left a couple more TODOs as I found a weirdness in the original code
Only thing left in OcclusionLineOfSight is grid-cell caching and deduplication.
Tests: none, discovered unit tests are broken since I upgraded occlusion cache version a month ago. Will fix and test later
Optim: UsePlayerTasks - ServerOcclusion uses CustomShouldNetworkTo and CustomShouldSkipServerOcclusion
Microoptim - this reduces hashset look ups and moves some checks to be run earlier. Main benefit is lifting more code out of BasePlayer.OcclusionLineOfSight
Tests: none, skipping tests for now
Update: refactor ServerUpdateOcclusionParallel to allow for experimental changes
Tests: compiles in editor
Update: add a bunch of optimization TODO notes for UsePlayerUpdateJobs 2
Gathered from inspecting 330p server snapshot from a late-in-the-patch-cycle server. Will clean-up before merge.
Tests: none, trivial merge
Clean: comments, spans and names
Tests: none, trivial changes
Optim: UsePlayerUpdateJobs 2 - merge partially ordered chains of snapshots
- This reduces allocations (since we create less tasks), and should help with parallelization (reducing task scheduling overhead), but requires preprocess cost
Tests: on craggy with UsePlayerUpdateJbos 2, flew out of bounds and respawned(this causes 500+ entity snapshots to be sent) 10 times in a row - no exceptions, disconnects and all entities seemed to be present
Bugfix: fix a missing profile scope in FoliageGrid
Tests: used the profiler in editor, no more errors
Bugfix: UsePlayerUpdateJobs 2 - maintain partial ordering of snapshots when sending to player using tasks
- This fixes rare parenting desync that can manifest as a weapon attachment world model not hiding (or something much worse)
Need to optimize it by merging tasks, as this can create too many tasks for the server to process
Tests: flew around on Craggy with fast noclip - attachments didn't "duplicate". Shot animals, switched active items. Tested occlusion with 2p and geared guns
Optim: UsePlayerUpdateJobs - limit duplicate snapshot check to per-player only
Snapshot queues can grow up to 1k per player, so this should reduce the lookup overhead a smidge.
Tests: 2p session on Craggy with UsePlayerUpdateJobs 2
Update: gate parallel snapshot queue behind UsePlayerUpdateJobs 2
All other parts have separation of no tasks under UsePlayerUpdateJobs 1 and tasks allowed under 2, but this slipped my net
Tests: used both in local session on Craggy
Update: move OcclusionCache reset to always happen for disabled UsePlayerUpdateJobs
Don't think this fixes anything, but helps avoid invalid emergency shutdown fallout if it happens in the future.
Tests: none, trivial change
Bugfix: StableObjectCache - properly drop references when clearing
- PlayerCache emergency shutdown logic inspects internal buffer, which tripped up. Change also allows GC
- Added a unit test that validates Clear call.
Tests: Ran the new unit test - it was borked before the fix, gud afterwards
Bugfix: invalidate all non-null managed objects
- Previously it allowed for UnityNull objects to be skipped, leading to desync of StableObjectCache
Tests: ran PlayerCacheTests.StressTest - it now passes
Update: ammend PlayerCacheTests.StressTest to trigger cleanup causing desync with StableObjectCache
Tests: ran test, failing as expected
Optim: UsePlayerUpdateJobs 2 - get rid of ExecutionContext copies when aggregating tasks in SendEntitySnapshots and -WithChildren
Tests: none, trivial change
Optim: UsePlayerUpdateJobs 2 - BasePlayer.WaitForTasks no longer allocs
Could be slower to wait like this, but I doubtr we'll have more than 30 tasks at sync points even on high-pop servers, so it doesn't matter
Tests: 2p on Craggy with UsePlayerUpdateJobs 2
Optim: UsePlayerUpdateJobs 2 - reduce garbage generation of BasePlayer.ApplyChangesParallel
Can't eliminate it fully till we get UniTask, but this should make a dent
Tests: couldn't test thoroughly as missing credentials, but it runs(and doesn't send) under UsePlayerUpdateJobs 2. Profiler confirmed less allocs
Optim: UsePlayerUpdateJobs 2 - stop constantly resizing array, only grow it if needed
Tests: 1p on Craggy with UsePlayerUpdateJobs 2
Optim: UsePlayerUpdateJobs 2 - cache continuation actions
Prevents allocs when no batches are sent, while also makes it cheaper to send when we do have batches.
Tests: craggy with UsePlayerUpdateJobs 2 in editor + checked IL
Bugfix: UsePlayerUpdateJobs 1/2 - fix metabolism being slow
- Fixed by using correct delta times - use tick time length instead of deltaTime (as we accumulate deltaTime until we have a whole tick's worth, then step tick's duration)
- Also fixed marking player hostile if weapon drawn for long enough - same fix as above
Noticed same issue with another bit of code, but this one's wrong in vanilla flow as well - so leaving it as is till later.
Tests: compared rate of healing with log.level Combat 2 after using large medkit on UsePlayerUpdateJobs 0 and 2
Clean: extra "can kick" annotation
Tests: none, trivial change
Clean: fix formatting in BasePlayerr-Server.cs after merges
Tests: none, trivial change
Optim: skip running server occlusion between a pair of same players
Saves us a bunch of extra dictionary entries, running extra occlusion jobs
Tests: 2p sessions with UsePlayerUpdateJobs 0/1/2 on Craggy (with reconnects, kills, occlusion culling)
Bugfix: UsePlayerUpdateJobs 2 - skip trying to run occlusion checks between 2 same players
A little slapdash, but it fixes editor-only CLIENT+SERVER bug of sometimes failing to notify player that they're awake. I'll exclude it from all modes next CL.
Tests: loaded into 2k procgen world with UsePlayerUpdateJobs 2 set from the start 3 tmes - all times no weird zombie state
Optim: UsePlayerUpdateJobs 2 - reduce GC allocations around creating tasks
Surprisingly, this was a struggle to test. Suspicious about a potential bug in unrelated part of UsePlayerUpdateJobs 1/2.
Tests: 2p on Craggy and 2k Procgen map with UsePlayerUpdateJobs 2