branchrust_reboot/main/profiling_improvementscancel
148 Commits over 243 Days - 0.03cph!
Merge: from main
Tests: built all modes in editor, exported snapshot from editor's Craggy 5 times in a row, built standalone release server and exported snapshot 3 times
Update: Add chat feedback when perf snapshot is being taken
It'll warn users if they're in the middle of something important
Tests: exported in editor with no delay and default standalone delay
Clean: remove unnecessary params in ProfilerExporter
Tests: none, trivial change
Update: Add commandline argument support to explicitly turn on profiler instrumentation
- Added log to explicitly confirm if it's enabled or disabled
Tests: ran in editor and server standalone with and without it being enabled
Bugfix: avoid a rare case of dealocating main thread's Allocs storage
- Code is written with the assumption that it's always there, but if 1 frame didn't record any allocs, it would nuke the storage, tripping up the profiler.
Discovered when doing additional testing in standalone (somehow editor was unaffected)
Tests: did 6 snapshots of standalone server with 6k map - no crashes
Update: Don't allocate storage for alloc marks on worker threads
Tests: exported a couple snapshots in the editor
Update: truncate snapshot names to 32 chars
Tests: none, trivial change
Update: export worker threads in the json snapshot
- Also fixed a bug I introduced in previous submit that led to sporadic exceptions
Tests: exported 5 profiles in a row from Craggy in Editor, exported 2 in standalone, checked in perfetto
Update: record marks from worker threads
- Had to leave allocation tracking enabled for main thread only - there's a comment explaining why
Need to implement export for worker threads - that's next
Tests: exported snapshot from Craggy in editor and opened in Perfetto
Update: generate snapshots under server root
Tests: exported in editor, found in the right location
Undo: unintentional change to ProjectSettings, reverted by hand
Tests: none, trivial change
Update: rewrote ServerProfiler TLS storage
- Instead of having per-frame storage, we now have one big buffer
- Rewrote ProfilerExporter to support changes
- Removes a weird stall on EndOfFrame invoke in standalone
I couldn't find a way to implement lock-free perf mark recording with previous approach, but now I should have a way - will attempt next.
Tests: exported profile from Craggy in editor and standalone 6k server - both open in Perfetto and look coherent
Update: export snapshots into separate folder
Makes it easier to build tooling for it
Tests: generated an editor snapshot
Update: allow specifying the name and how many frames to collect for perf snapshot
- We support max 10 frames of recording, so frames input gets clamped
- Also left a note for future maintenance
Tests: exported multiple snapshots in editor with 1 and 11 frames
Update: adding extra logging and sanity checking when exporting a profile snapshot
For some reason one of frames from standalone gets borked - hoping this'll help track it down.
Tests: exported craggy in editor
Update: Gather GC.Collect activity into perf snapshot
- Avoid thread id checks reaching to mono runtime to resolve it, as it's unsafe during GC.Collect
Tests: did a 6k standalone perf snapshot, but didn't catch a collection event. Forced one in editor on 3rd frame, confirmed visible in perfetto.
Update: export Allocs as process-wide events
This puts them on a separate track, making it easier to spot them.
Tests: checked craggy snapshot in perfetto
Clean: removing some unnecessary sanitization logic
One of recent updates now guarantees storage is enough to house all snapshot data
Tests: loaded craggy snapshot in perfetto - no missing names/allocs, no asserts
Update: export Alloc events to json snapshot
They get a bit lost in the sea of all other instantaneous events, so will need to somehow improve this
Tests: loaded craggy snapshot in perfetto
Update: record GC alloc events in the snapshot
Currently don't emit them in the json, but that'll be the next thing
Tests: took a perf snapshot on craggy
Optim: reduce profiling capacity, instead lazy-grow it
- When frame didn't fit the capacity, drop it and rerecord it
This should help avoid large stutters in editor (and hopefully on the server as well).
Tests: tested craggy in editor - spikes gone. Tested on 6k world in standalone server - spikes still present. Also noticed one export failed, but think it's unrelated to current changes
Optim: Avoid allocations when generating method names
- Also cleaned up a couple TODOs
- Added extra logging to track stages of export progress
Getting very close to completing all outstanding TODOs
Tests: exported craggy and checked scope names in perfetto
Update: add cross-frame stitching of torn scopes
Tests: snapshotted craggy, loaded in perfetto
Update: offloading snapshot export to a task thread
- Stopped exporting binary snapshot, and left a comment explaining why it's not in use (but not deleted)
- Only export compressed json (saves a bit of time on iteration)
Hope is to reduce stutter, but my tests in editor show that it's inconsistent (it's less, but for some reason main thread has frames that somtimes seem to wait for worker thread to finish - need to investigate)
Tests: exported snapshot from craggy, unzipped and loaded in perfetto
Bugfix: exclude 'length' bytes from content end offset
Tests: none, trivial change
Update: Add support to export snapshot as a binary blob
- Also emitting compressed version of json and bin snapshots
- Minor code reorganization + TODOs
Surprisingly, despite tighter binary format, it compressed to a larger size than compressed json
Tests: loaded compressed snapshot to perfetto. Didn't trest the binary as it doesn't have any readers yet, so it's untested
Bugfix: patch sheared frames when exporting json snapshot
- In Editor EndOfFrame is called as part of nested GUI, which our profiler shears apart. For now we inject additional marks to maintain callstack structure
- left a TODO to properly reconstruct a sheared frame
- For now expanding frame scope to cover sheared period
Tests: exported craggy snapshot and opened in perfetto - no more randomly trashed frames
Update: Factor out profielr exporting logic to a separate script
- Also cleaned up a couple log outputs, as the collection seems sensible
- Cleaned a couple already-done TODOs
Tests: Did an export from editor and standalone server (6k size, 0 pop), loaded in perfetto
Undo: removing ignore.conf accidental submit
Tests: none
Update: ServerProfiler snapshot now contains full names for scopes
Tests: opened the new profile in perfetto
Update: Emit all 10 frames of snapshot
- Also emitting UnityFrame as a CompleteEvent, so it's always visible in the profiler
- UnityFrames are now numbered
This produces a 160mb snapshot on Craggy (taken immediately after spawning), will need testing on larger worlds (and this is before proper labels)
Tests: opened snapshot in perfetto - saw all 10 frames
Update: Synthesize missing OnEnter profiling marks
This restore the profile's structure
Test: opened snapshot in perfetto
Update: first export of profiling snapshot
- Also filtering out System and System.Core from annotation (otherwise we produce too much data, even for 1 frame)
- Also added timestamp utility functions (as trace format requires micros)
Not super reliable - need to post-process gathered data to get rid of torn marks
Tests: opened trace in perfetto - although wonky, it's there
Update: minor improvements to ServerProfiler
- Added a smidge more debug to track the growth of per-frame lazy storage
- Added output of how long each frame took
Tests: ran the perfsnapshot command
Update: add profile.perfsnapshot command
Currently just outputs gathered marks telemetry for sanity checking - will expand more once happy with results.
Tests: ran the command and confirmed output
Update: more improvements and debug to ServerProfiler
- Added tracking of thread IDs
- Only gather main thread marks (have thread races in my lock-free approach, so delaying this till later)
- Add tracking of assemblies to which the method belongs
- Filter our mscorlib assembly from instrumenting
- rewrote code to use ref semantics
Still chasing why consecurtive snapshots seem to double-up in size on frames 2-10.
Tests: ran a bunch of profile snapshots (next frame) and looked at the output. It works but has above weird behavior.
Bugfix: various ServerProfiler fixes and improvements
- Account that native strings are not null terminated when doing unmanaged string comparison
- Filter out native profiler funcs from annotation
- Unregister profiler callbacks when stopping play in editor (this would cause mono crash on repeat plays due to GCed callbacks)
- Reduce scope of unsafe keyword (as some funcs are safe)
- Replace test-run annotation with fully enabled one
- Hook in on frame end to count frames
Tests: Started, stopped and started the game in the editor. Wrote some simple test code to validate StrEq. Validated in logs that native serverprofiler functionality is ignored.
Update: Filter instrumenting anotations based on class and namespaces
- Added reporting of class and namespace into annotation log
Currently only skipping ServerProfiler itself to avoid a recursion when invoking callbacks (will be tested in next submit).
Tests: pressed play in editor - saw the instrumentation logs
Update: Invoking of profiler method annotation filter works
- Instead of hooking in we rely on Unity's assembly initialization to invoke us first - seems to be early enough, but we'll see
- Store profiler callbacks to avoid GC cleaning them up and crashes in mono
- Add compile guards (for now allowing to run editor to test)
Tests: launched in editor, no crashes and a lot(112k) of debug logs indicating what gets annotated.
Update: skeleton of a server profiler
- Most of the native imports in place
- Implemented the general flow of per-frame tracking
In theory should be able to track function entry/exits, but can't test until a decide on an early enough method of initialization.
Tests: none, too early
Merge: from main
Tests: none (trivial merge)
Merge: from main
Tests: none
Update: Make Pool capacity waste tracking off by default
- Added an editor-only command to enable it, "pool.memory_overhead_tracking_enabled"
- Cleaned up implementation a smidge, left a comment explaining where the box is coming from
- "pool.print_memory_overhead" skips entries with 0 overhead
I couldn't find a way to avoid boxing in a generic call, so I decided to disable this telemetry gathering by default to avoid muddying profiler data.
Tests: in editor on Craggy started a normal session, couldn't see allocations in the profiler. Overrode the startup params and was able to see the overhead table printed out when invoking cmd.
Merge: from main
Tests: none
Merge: from main
Tests: none
Update: Gamephysics now uses TryGetComponent instead of GetComponent
- Same perf in standalone builds, better perf in editor builds (avoids string format alloc for null obj description)
Tests: Rode a horse on craggy and check profiler to confirm GamePhysics routines were triggered
Update: Adding missing ChangeLOD profiling annotation
Tests: loaded 6k world and found it in profile data
Update: Annotating all VehicleFixedUpdate calls
- Replacing existing Profiler.BeginSample with TimeWarning
- Moved some samplers around to build a nested flamegraph (consolidates vehicles in the profile)
Tests: loaded 6k world and looked through the profiling info