branchrust_reboot/main/profiling_improvementscancel

85 Commits over 92 Days - 0.04cph!

5 Days Ago
Update: more profiling exclusions - Don't track NetRead and NetWrite - Dont' track Facepunch.System's containers (including pooling), StringPool and ArrayPool - Don't track EntityRef - Don't track all Enumerators (previously only Facepunch's was excluded) - Don't track all GetHashCode - Don't track TimeWarning (debug-only calls, but can be frequent) Tests: Took a snapshot of default procgen map in Editor(Client+Server), confirmed about 13% reduction in uncompressed json size.
5 Days Ago
Merge: from main Tests: none
15 Days Ago
Merge: from main Tests: none
15 Days Ago
Update: Further reduce what methods we annotate - Removes get_* property accessors, as they are frequent but usually quick - Removes various storage classes and math utilities (ByteExtensions, BitUtility, Facepunch.System.Enumerator, all of Unity.Mathematics, etc) - Removes operator invocation (any op_* method) - Removes comparison method calls (as they are usually quick) This should reduce performance degradation in tight loops that frequently invoke these methods and produce smaller snapshot(6.7mb -> 6.1mb). Tests: in editor on Craggy generated a new snapshot and opened in Perfetto, couldn't find my methods.
33 Days Ago
Merge: from main Tests: editor compiles
33 Days Ago
Bugfix: Workaround Perfetto's "Complete" event hierarchy bug - Reported issue on their repo: https://github.com/google/perfetto/issues/970 Tests: exported snapshot from a linux server (running on WSL Ubuntu), 3k procgen world. Exported from editor as well.
33 Days Ago
Update: Binary export no longer pre-processes the stream - Saves time on the export - Also added if-deffed out extra checks, disabled by default My previous checks were wrong and produced false positives. Also, think I got an idea what jumbles the json vizualization - will fix in next CL. Tests: used the extra-debug version to export linux snapshot - it succeded
33 Days Ago
Update: ProfileBinViewer - report found exceptions in thread stream Still looking for why things are wrong with linux snapshot Tests: opened a borked linux snapshot
33 Days Ago
Buildfix: Disable ProfileBinViewer if we're not in Server mode Tests: switched to Client in editor
33 Days Ago
Update: ProfileBinViewer now shows thread summary Tests: opened a snapshot from editor
34 Days Ago
WIP: rewriting stream processing to gather frame data during pre-process step - Should fix invalid make placement in a frame + be more efficient to generate, as we only do 2 stream scans per thread stream instead of 4. Not complete, need to track down why alloc offsets are invalid for last frame.
35 Days Ago
Update: added search support to bin snapshot viewer I think I have all I need to explore the broken profile Tests: opened the borked profile snapshot
35 Days Ago
Bugfix: fixed reading string f rom the binary snapshot - Forgot that they're not null terminated - this fixes random characters at the ends Tests: Opened borked editor snapshot
35 Days Ago
Update: added ability to display sub range of thread track in Bin viewer - also supports rudimentary [N] input to resolve syncpoint indices - added mark index to view as well Tests: vizualized borked editor snapshot
35 Days Ago
Update: display call depth for marks in bin viewer Makes it easier to track callstack consistency at sync points. Tests: opened borked snapshot from editor
35 Days Ago
Update: reworked the bin vizualizer to have a different layout - Able to jump to sync points in the list - Able to view specific thread's stream Couldn't figure out how to do nested dynamic scrollviews, so went for a different approach. Already revealed a question mark about some names having invalid characters at the end, though doubt it's the contributing factor Tests: loaded up a borked binary snapshot, was able to inspect it
35 Days Ago
Bugfix: don't double up threads in bin viewer Also got lucky and captured a snapshot in editor where frames weren't properly aligned. Tests: opened an existing snapshot, saw no duplicate threads
35 Days Ago
New: Editor viewer for binary profile snapshots - Very rudimentary, needs more work - Also added how many marks there are in a thread profile for binary snapshot export Reveals that I have a bug in binary exporter - looks like I double up the threads somewhere. Tests: opened a snapshot from the editor in the tool
36 Days Ago
Update: Exposing binary exporter - Added missing features from ServerProfiler.Core - Can be triggered via profile.perfsnapshot last argument I need it to be able to investigate and fix hard-to-reproduce issues - hopefully it'll speed up the workflow. Tests: in Editor on craggy took a binary snapshot - it exported succesfully
36 Days Ago
Bugfix: don't access invalid memory when exporting Linux snapshot Same GCC vs MSVC issue in the native libs, but this time on Managed side (since I have a copy of these structs for name resolving purposes). Tests: Built a linux server, ran it on WSL and triggered a snapshot - it generated. But periodically select frames export incorrectly, investigating further
36 Days Ago
Bugfix: new ServerProfiler.Core dynamic libs - Fixes Linux exporting symbols with name mangling - Fixes GCC vs MSVC struct packing inconsistency, causing Linux server to crash when instrumenting functions Tests: DLL tested in editor on Craggy, SO tested in WSL standalone server (snapshot export fails though)
40 Days Ago
Update: borked linux ServerProfiler.Core It gets loaded, but fails to find an entry point, so it just falls over. I'll continue tomorrow. Tests: built a standalone linux server and tried launching in via WSL - my profiler script throws an exception
40 Days Ago
Update: ServerProfiler.Core separates platform specific code Required for Linux support - still figuring out how to organize msbuild projects Tests: took spanshot in Editor's Craggy, and did a couple start-stops of playing to ensure no errors
41 Days Ago
Clean: fixing typo in "ServerProfiler::AppendNameTo" Noticed when working on now-discarded assembly name skipping Tests: scripts build in editor
42 Days Ago
Clean: remove unused "Profile.cs" script Tests: made sure ditor compiles
42 Days Ago
Update: don't annotate UnityEngine.CoreModule funcs It has a lot of small functions, so it inflates the profile by quite a bit as well as adds overhead to small loops. Tests: tested in editor on Craggy
42 Days Ago
Update: Reduce default frame count for perf snapshot to 4 Going to max(10) by default is too much - on a 150 pop release server that tried to produce 3.5GB json snapshot. I'm expecting 4 to give us a 1 GB profile Tests: took a snapshot on Craggy in editor - it took 4 frames
42 Days Ago
Update: Reduce main buffer relocations when processing generating json snapshot - Also report total marks instead of max to allow predicting how big the snapshot is in case of failure Will help with reducing GC activity on the server, hopefully also reducing the generation time. Tests: locally on Craggy in editor took a couple snapshots - no issues
42 Days Ago
Merge: from main Tests: none
2 Months Ago
Update: Avoid shipping ServerProfiler to clients - Done by deleting the dll after it's built Tests: Build Win64 Client Debug and Release - no more dll
2 Months Ago
Update: added Assembly name to the snapshot marks Makes it a bit clearer where calls are coming from at the expense of larger snapshot(compressed: 2mb -> 7mb, json: 50mb->140mb). But, since the profiler is much faster now, the snapshots are smaller on 6k servers. Tests: exported 2 snapshots from Editor's Craggy, and exported 5 from standalone 6k server
2 Months Ago
Bugfix: Prohibit constexpr initialization of mutexes in ServerProfiler.Core Turns out the toolset I'm using produces non-binary compatible assembly - using an escape hatch to avoid this issue. This only manifested in standalone builds. Tests: exported 5 snapshots from procgen 6k world in a row over 2 minutes. Used both borked and good hamony mods as well.
2 Months Ago
Bugfix: fixed incorrect time scale in reports - Also replaced the debug dll with a release one (interestingly, this also removed quite a lot of scopes in the report) Tests: exported snapshot from Craggy in editor
2 Months Ago
Bugfix: various issues that snuck in during rewrite - Turns out unmanaged threads get destroyed by engine, so needed to add deferred clean up of collected telemetry - this fixed editor crashes and crashes in export thread - I accidentally recorded allocations for worker threads only - flipped to record main thread only - I was emitting methods for MethodExit and MethodException marks, which is unnecessary - cleaned up Still need to fix timestamp conversion on export, that'll be next Tests: exported snapshot from Editor's Craggy
2 Months Ago
Update: moved profiler code to unmanaged assembly - submitting debug DLL for now, will ship release once ready Can take snapshot, but export doesn't finish - likely an exception on export thread. Tests: taken snapshot in editor on Craggy - saw logs of data accumulated, but no final "compressed" message
2 Months Ago
Bugfix: avoid leaking repeating invoke when taking snapshots - Also avoids ambiguity of taking multiple snapshots Tests: generated 2 snapshots in editor back to back with a delay
2 Months Ago
Update: notify that a snapshot was taken when no delay was requested Tests: none, trivial change
2 Months Ago
Merge: from main Tests: built all modes in editor, exported snapshot from editor's Craggy 5 times in a row, built standalone release server and exported snapshot 3 times
2 Months Ago
Update: Add chat feedback when perf snapshot is being taken It'll warn users if they're in the middle of something important Tests: exported in editor with no delay and default standalone delay
2 Months Ago
Clean: remove unnecessary params in ProfilerExporter Tests: none, trivial change
2 Months Ago
Update: Add commandline argument support to explicitly turn on profiler instrumentation - Added log to explicitly confirm if it's enabled or disabled Tests: ran in editor and server standalone with and without it being enabled
2 Months Ago
Bugfix: avoid a rare case of dealocating main thread's Allocs storage - Code is written with the assumption that it's always there, but if 1 frame didn't record any allocs, it would nuke the storage, tripping up the profiler. Discovered when doing additional testing in standalone (somehow editor was unaffected) Tests: did 6 snapshots of standalone server with 6k map - no crashes
2 Months Ago
Update: Don't allocate storage for alloc marks on worker threads Tests: exported a couple snapshots in the editor
2 Months Ago
Update: truncate snapshot names to 32 chars Tests: none, trivial change
2 Months Ago
Update: export worker threads in the json snapshot - Also fixed a bug I introduced in previous submit that led to sporadic exceptions Tests: exported 5 profiles in a row from Craggy in Editor, exported 2 in standalone, checked in perfetto
2 Months Ago
Update: record marks from worker threads - Had to leave allocation tracking enabled for main thread only - there's a comment explaining why Need to implement export for worker threads - that's next Tests: exported snapshot from Craggy in editor and opened in Perfetto
2 Months Ago
Update: generate snapshots under server root Tests: exported in editor, found in the right location
2 Months Ago
Undo: unintentional change to ProjectSettings, reverted by hand Tests: none, trivial change
2 Months Ago
Update: rewrote ServerProfiler TLS storage - Instead of having per-frame storage, we now have one big buffer - Rewrote ProfilerExporter to support changes - Removes a weird stall on EndOfFrame invoke in standalone I couldn't find a way to implement lock-free perf mark recording with previous approach, but now I should have a way - will attempt next. Tests: exported profile from Craggy in editor and standalone 6k server - both open in Perfetto and look coherent
2 Months Ago
Update: export snapshots into separate folder Makes it easier to build tooling for it Tests: generated an editor snapshot