602 Commits over 184 Days - 0.14cph!
Update: DemoServer - Isolated all client demo logic into it's own player
This is prep for full server demo support.
Tests: played back a short craggy demo - it went through the entire thing without issues. Tried without demo - it started as expected
Buildfix: removing unused variable
Tests: editor compile
Merge: from profiling_improvements
Further exclude small methods/utility classes that are fast 95% of the time.
Tests: Took a snapshot on a defualt ProcGen map in Editor(Client+Server). ~13% uncompressed json reduction.
Update: more profiling exclusions
- Don't track NetRead and NetWrite
- Dont' track Facepunch.System's containers (including pooling), StringPool and ArrayPool
- Don't track EntityRef
- Don't track all Enumerators (previously only Facepunch's was excluded)
- Don't track all GetHashCode
- Don't track TimeWarning (debug-only calls, but can be frequent)
Tests: Took a snapshot of default procgen map in Editor(Client+Server), confirmed about 13% reduction in uncompressed json size.
Merge: from main
Tests: none
Update: DemoServer - ripping out fixed timestep logic
After experimenting with slowing down playback to bellow play speed, it did reduce the number of violations, but timing inconsistency between demo playback and server simulation leads to more issues.
Tests: none, simple change
Update: DemoServer - implement fixed step playback
- Should keep demo stream consumption stable
Right now we're streaming too much data(at 30hz with ~200hz editor sim), which trips up a number of violation checks. Going to try tweaking the number to see if it helps with reproducable results.
Tests: played the demo twice - the step count was the same, but the result numbers were different.
▍▉▇▍▇█: ▅▄▉▋▋▇▉██▆ - ▅▋▍▆▌ ▆▌▉▍▋▆█ ▌█▋▆▌▄▋▄▋ ▉▌▅▇█▋▇▇▆
- █▌▌▌ ▍█▌▄▍▋▊▋▄▊▆ ▋▆▄▍██▉▊ █▉▊▍▋▍███ ▆▊▆▇▉▍▉▇ ▇▋▄ ▍▇▍▉ ▅▌▊▆▊▇▌▌▍▅.
- █▆▇█ ▄▅▉▇▆█▇ ▇▍▅▆▉▊▅▊▉ ▋▅▉▅▋▌-▆▅▋▄ ▇▌▇▆ ▅▋▆▌▆▊█▊▅█▅▌▍
█▄▅▊▋▅█ ▇▆▋▆ ▊▆▌█ ▉▅▅▉ ▍▊▄█▇▌▍▄ ▄▄▄▄▍▍▍▌▅ ▊▇▊-▋▍▌▌▍▍▅▄▋▄▄▄ ▇▉▍▇█▉▋(▌▋▄▇▆▆ ▊▆▋ ▇▄ ▆▍▋█▌▄▋ ▅▉▇▊▌▇▄▄ ▄▊▇▄▉█ ▊▌▉▌▊▊&▉▊▇▋█▋▋ ▋▅▆▍) - ▋▍▊▆ ▄█ ▆▍▋▆▆▆ ▍▌▋ ▋▊▊ ▅▅ ▉▅▇▅▉▅ ▊▆ ▅▊▆▍▋▋ ▉▄▍▆▊█▇ ▍▍ ▇▆▇▇ ▌▊▍▄▅▍ ▄▇▊▉█.
▊▌▋▄▋: ▅▍▇▉▆▆ ▊▌█▍ ▋ ▌▉▅ ▄▌▆▌ ▋▉▅▉▅ ▍▊▇▍-▉▉-▉▋▊▇ - ▅▌▇ ▍▌▅▊▄▅▊▅▆ ▅▋▉▉▉▇▄ (▄▉▊▉ ▌█ ▌▌▇▄ ▇▌▆▅▉█▆▊▍█)
Update: DemoServer - bypas failed validation
- We can't always reconstruct correct tick history in some situations, so instead we'll use them as data to compare against
Tests: played the opriginal long demo - 1711 total violations across ~18 players.
Update: DemoServer - rudimentary tick visualization using gizmos
- Temporrary while working on server demo reconstruction - will rip out once done with feature
Tests: used on a demo recording on Craggy + the original demo that started it all.
Update: DemoServer - spawn entities with the right initial flags
Turns out I had doors in a base to be closed on spawn, leading to tick violations - this fixes it. There's more violations to go.
Tests: Played the demo, checked that the relevant door is now open.
Update: DemoServer - hook up metabolism and make every player invincible
I thought metabolism would fix the drowning of main player, but the recording info contains empty oxygen. Instead, we treat every player as invincible unless there's a replication message to destroy them.
Tests: Played the demo till the end - no more logs on main player drowning
Update: DemoServer handles a number of RPC messages
- Only propertly implementing model flags for now
- Adding a bunch of RPCs to ignore to avoid heavy spam during playback
- Also renaming player game objects during playback to make it easier to track and inspect their state
This revelas that during playback we're triggering a bunch of tick violations, which prevents position updates. Need to figure out how to deal with them.
Tests: ran the same demo, this time with warnings not filtered out - once map loaded, the rate of warnings was decreased substantially.
Clean: removing no longer relevant comment
Update: DemoServer improvements and fixes
- All ticks are now accepted
- exposed an editor only API to inject ticks (avoid serialization roundtrip)
- cleaned away tick logging - it generated too many logs
Ticks are now caught, which is nice, but it looks like it's not validating them all outside of demo playback (saw only 2 players doing it on a perf capture). That'll be next.
Tests: added temp debug assertions that would catch any discarded tick - played the new demo, and there were no more assertions.
Update: DemoServer - split ticks by distance instead of time
- Splitting by time didn't guarantee that they were in valid distance ranges
- Also handle case where we get positional data while the player is still initializing
Getting closer, according to logs most ticks get accepted, but there are still a bunch that get filtered out - investigating.
Tests: played the same demo, observed the logs.
Update: Server-Editor tries to synthesize position ticks for other players in client-demos
- Also supports movement of other, non-player entities
- Only handling positions for now
Doing this to allow for more thorough testing. Some ticks get rejected despite being in the same position - need to investigate why.
Tests: played the same demo as before - checked logs to see the injection and acceptance of ticks.
Bugfix: more NRE reductions in server-demo
- Skip VoiceData and other messages that we can't support in editor environment (or don't want to)
- Properly "disconnect" player when entity is being destroyed
- "shutdown" the demo server when at the end of the demo to avoid unnecessary replication attempts/NREs
This brings down NRE count during playback and shutdown from 40+ down to 4. Next up need to figure out if Tick processing works correctly (it ticks, but main player doesn't move).
Tests: played back the same client demo, saaw the reduction in errors
Bugfix: no more duplicate players when playing a client demo on server-editor
Now there's an issue with disconnecting/destryoed players - about 8 NREs about acessing something dead during BasePlayer.ServerCycle
Tests: played the same demo - max players was 20 instead of 1k
Update: properly initialize players when playing a client-demo in server-editor
- Also log when creating a main player
- Report kick reasons as errors
No more unexpected kicks for players. But, looks like we're duplicating players - by the end of playback we had 1k players, which is much more than I expected
Tests: played back the same demo to completion, accumulated errors are only related to some invalid packets that we don't care about (like Voice)
Update: Server-editor is able to see ticks from the player when playing a client-demo
- now also handling flag messages
- skip server demos and warn user that it's not supported for now
There are a couple things left to investigate and validate - why the kicks happening for being under terrain, whether I can restore full initialization flow
Tests: ran the same demo, was able to verify that main player is identified and it's tick history is being stepped through
Update: Server demo playback now creates entities on palyers as it first encounters them
- Added demo progress logging
- Avoided a number of reasons for kicking (as we don't fully setup entity simulation)
I can see more activity now - next up is making sure the important history is also replicated/present.
Tests: played the same demo from before - logs confirmed players were present.
New: Editor-Server can playback a server demo
Mimics how client demos are played back - streams commands to the server for execution. Currently doesn't spawn players/has some entities missing - that's next to investigate
Tests: Took an old 5 min demo and played it back until it stopped the editor play session.
Merge: from profiling_improvements
Avoids recording methods that are tiny/fast - helps with overhead.
Tests: in editor on Craggy generated a new snapshot and opened in Perfetto, couldn't find my methods.
Merge: from main
Tests: none
Update: Further reduce what methods we annotate
- Removes get_* property accessors, as they are frequent but usually quick
- Removes various storage classes and math utilities (ByteExtensions, BitUtility, Facepunch.System.Enumerator, all of Unity.Mathematics, etc)
- Removes operator invocation (any op_* method)
- Removes comparison method calls (as they are usually quick)
This should reduce performance degradation in tight loops that frequently invoke these methods and produce smaller snapshot(6.7mb -> 6.1mb).
Tests: in editor on Craggy generated a new snapshot and opened in Perfetto, couldn't find my methods.
Merge: from buildingprivilegeretrotool_recycling
Fixes invalid pooling of protobuf type when replicating data.
Tests: On Craggy setup a tiny box base and placed retro cupboard - before fix it immediately reported negatives via pool.print_memory, after fix - stayed >= 0
Bugfix: don't flood pool with ProtoBuf.BuildingPrivelegeRetroTool
Tests: On Craggy setup a tiny box base and placed retro cupboard - before fix it immediately reported negatives via pool.print_memory, after fix - stayed >= 0
Buildfix: define symbol on Mac Server
Tests: compiled editor, then compiled linux DGS
Merge: from profiler_improvements
- Adds linux support (tested on Ubuntu
22404 via WSL)
- Optimizations for JSON export
- Added debug utility to export binary snapshot - run `perfsnapshot <delay> <name> <frames> <shouldBinExport>`
- Added Tools/Profiler Bin Viewer, an editor only tool to inspect binary snapshots
- Reduced default frames captured to 4 from 10
- Profiler now skips annotating UnityEngine.CoreModule methods (reduces capture overhead)
- Works around Perfetto visualization issue with Complete events (https://github.com/google/perfetto/issues/970)
Tests:
- Exported a number of editor snapshots with binary snapshots to test bin viewer
- Using WSL, tested exporting a snapshot on Ubuntu - 3k procgen world
Merge: from main
Tests: editor compiles
Bugfix: Workaround Perfetto's "Complete" event hierarchy bug
- Reported issue on their repo: https://github.com/google/perfetto/issues/970
Tests: exported snapshot from a linux server (running on WSL Ubuntu), 3k procgen world. Exported from editor as well.
Update: Binary export no longer pre-processes the stream
- Saves time on the export
- Also added if-deffed out extra checks, disabled by default
My previous checks were wrong and produced false positives. Also, think I got an idea what jumbles the json vizualization - will fix in next CL.
Tests: used the extra-debug version to export linux snapshot - it succeded
Update: ProfileBinViewer - report found exceptions in thread stream
Still looking for why things are wrong with linux snapshot
Tests: opened a borked linux snapshot
Buildfix: Disable ProfileBinViewer if we're not in Server mode
Tests: switched to Client in editor
Update: ProfileBinViewer now shows thread summary
Tests: opened a snapshot from editor
WIP: rewriting stream processing to gather frame data during pre-process step
- Should fix invalid make placement in a frame + be more efficient to generate, as we only do 2 stream scans per thread stream instead of 4.
Not complete, need to track down why alloc offsets are invalid for last frame.
Merge: from amvienceemitter_recycle
Fixes an inconsistent bug on client disconnect from a server trying to reactivate a gameobject.
Tests: validated it doesn't affect entity pool warmup sequence (as we create->retire there). Using `log.level Audio 2` and a bit of 100% code-forcing the issue, disconnected 3 times:
- without the fix, it was 100% generating an error on disconnect
- with the fix, had 0 error reports
Bugfix: don't try to reactivate a destroyed spawnpoing in SERVER+CLIENT
- Reimplemented the spawn point status based on new internal state - this retains original behavior
Editor-only bug caused by my previous change of pooling behavior (destroy object instead of trying to pool it on scene unload).
Tests: on Craggy spawned and disconnected - no errors. Checked BaseSpawnPoint inheritance hierarchy to ensure there's no other places trying to activate gameobjects
Merge: from main
Tests: compiled in editor
Update: added search support to bin snapshot viewer
I think I have all I need to explore the broken profile
Tests: opened the borked profile snapshot
Bugfix: fixed reading string f rom the binary snapshot
- Forgot that they're not null terminated - this fixes random characters at the ends
Tests: Opened borked editor snapshot
Update: added ability to display sub range of thread track in Bin viewer
- also supports rudimentary [N] input to resolve syncpoint indices
- added mark index to view as well
Tests: vizualized borked editor snapshot
Update: display call depth for marks in bin viewer
Makes it easier to track callstack consistency at sync points.
Tests: opened borked snapshot from editor
Update: reworked the bin vizualizer to have a different layout
- Able to jump to sync points in the list
- Able to view specific thread's stream
Couldn't figure out how to do nested dynamic scrollviews, so went for a different approach. Already revealed a question mark about some names having invalid characters at the end, though doubt it's the contributing factor
Tests: loaded up a borked binary snapshot, was able to inspect it
Bugfix: don't double up threads in bin viewer
Also got lucky and captured a snapshot in editor where frames weren't properly aligned.
Tests: opened an existing snapshot, saw no duplicate threads
New: Editor viewer for binary profile snapshots
- Very rudimentary, needs more work
- Also added how many marks there are in a thread profile for binary snapshot export
Reveals that I have a bug in binary exporter - looks like I double up the threads somewhere.
Tests: opened a snapshot from the editor in the tool
Update: Exposing binary exporter
- Added missing features from ServerProfiler.Core
- Can be triggered via profile.perfsnapshot last argument
I need it to be able to investigate and fix hard-to-reproduce issues - hopefully it'll speed up the workflow.
Tests: in Editor on craggy took a binary snapshot - it exported succesfully
Bugfix: don't access invalid memory when exporting Linux snapshot
Same GCC vs MSVC issue in the native libs, but this time on Managed side (since I have a copy of these structs for name resolving purposes).
Tests: Built a linux server, ran it on WSL and triggered a snapshot - it generated. But periodically select frames export incorrectly, investigating further
Bugfix: new ServerProfiler.Core dynamic libs
- Fixes Linux exporting symbols with name mangling
- Fixes GCC vs MSVC struct packing inconsistency, causing Linux server to crash when instrumenting functions
Tests: DLL tested in editor on Craggy, SO tested in WSL standalone server (snapshot export fails though)