1,159 Commits over 304 Days - 0.16cph!
Update: Merging asset loading flows together
- still editor only + debug code to early out
- still slow (there's a number of issues left to resolve)
Discovered that mixing Sync + Async loads causes an integration queue flush(big stall for us). This'll be a tricky problem to address, since SoundDefinition (and I presume others) load assets as part of OnValidate
Tests: procgen in editor
Update: hooking up gameobject spawning to async load logic
- Contains a bunch of testing code used for profiling, will clean up in next update
Needs a bit of rework to ensure both the orignal flow and new flow can work together.
Tests: ran procgen
Bugfix: fix out of bounds access during prefab shuffling
Tests: ran procgen, no exceptions
Update: implement missing logic for both GatherAssets and Process
- GatherAssets now respects all relevant settings and sorts paths
- implemented Process that works on a batch of objects
Tests: only GatherAssets has been checked (confirmed reduction of assets due to config use)
Update: exposing prefab preprocesing from GameManager
Tests: none, simple change
Update: List and Array Shuffle range overloads
Tests: none, trivial code
Optim: replace prefab search logic with editor manifest lookups
- commented out a bunch of code for quicker iteration, wil lrevert later
- doesn't account for monument duplication/probability
Significantly faster because we don't load any assets in the process - goes from 30s+ down to 15ms
Test: tried to procgen default editor map
Update: Sort editor manifest by path
Allows to do faster lookups
Tests: ran in the editor
Update: initial work on parallelizing prefab loading during editor procgen
Loads too quickly(0.2s instead of 90) - I feel like it only loads the root-level gameobject, instead of the entire hierarchy. Will continue later.
Tests: ran it once, got some telemetry, but already certain it's wrong.
Merge: from parallel_validatemove
- Extra validation checks exposed via server.EmergencyDisablePlayerJobs (default to true). In case of error, shuts down UsePlayerUpdateJobs and goes back to vanilla flow
These are cheap to run and should help us track down any problems in the future.
Tests: compilation tests, unit tests and played back server demo
Update: Another validity check for UsePlayerUpdateJobs
- validates player counts between PlayerCache and activePlayerList
Tests: played back server demo
Update: promote some UsePlayerUpdateJobs validation logic from DEBUG only to release
- Hidden behind EmergencyDisablePlayerJobs switch(on by default) and UsePlayerUpdateJobs(off by default)
- ValidatePlayerCache checks whole range instead of just up to player count (in case we got more than expected)
Tests: played back server demo
Update: turn server.EmergencyDisablePlayerJobs const into a servervar
Allows to run some extra validation
Tests: editor compiles
Test: test case for missing player removal from PlayerCache
Tests: ran the new unit test
Bugfix: ContinuousProfiler now atomically updates it's write indices
- internal fix in ServerProfiler.Core, based on e39afb43
- Removed now-unhelpful echeck for right mark type at the start of main thread perf stream (all cases confirmed legal now)
After soaking it for 15 minutes total, only main thread export gets lost in the binary sauce. Hopefully last bug.
Tests: soaked 3 times on craggy, only hit unexpected mark type on main thread
Update: ContinuousProfiling will emergency stop if fails to export
Yet to pin down the worker thread telemetry stream seeing stale/garbage data.
Tests: soaked on Craggy in editor
Bugfix: fix main source of invalid profiling stream from ServerProfiler
- Drop dead threads on every succesful frame
- Reset all writing indices on new frame and on resuming continuous profiling post-export
- release binary built using 019295b4
There's still another issue hiding somewhere, but it's much more stable now.
Tests: on craggy exporting every 3rd frame for 5 minutes straight. Previously would trip after 20seconds.
Update: prototype of continuous allocation tracking is working
- started with profile.watchallocs [Name, default="Allocs"]
- can be stopped with profile.stopwatchingallocs
Dumps [Name].json.gz with data about allocs and associated callstacks. Still hardcoded to trigger export every 3 frames. Exporter frequently gets lost in the bin stream, and needs to be optimized (craggy has noticeable stutter).
Tests: in CLIENT+SERVER editor on Craggy with allocation tracking.
Bugfix: fix missing callstacks
- Updated binary that contains the fix internally to 6440ec7 (still hardcoded to always capture callstacks) - was due to ABI missmatch
- Updated continuous profiler unit test to check for AllocsWithStack and a bit of the data set
- Partially updated ProfileExporter, and definitelly borked ProfilerBinViewer
Got updated overhead numbers - always capturing a callstack leads to 9micros per allocation cost(even though inflated due to tests adding 38 calls per alloc).
Tests: unit tests + perf test
Update: initial stack gathering support for allocs in Continuous mode
- using release libs based on d48bcf49, with hardcoded stack gathering for now
Somehow it's 15% faster than mono_get_last_method, which doesn't make sense - need to update the exporter to figure out what's being generated.
Tests: none
Profiling shows
Test: adding an profiler-allocation overhead estimate test
- Switched to relase binaries of d340789f
Without profiler recording, allocs cost us ~0.3micros, with recording it costs 1micro. Next will see if we can afford gathering full callstacks for each alloc.
Tests: unit tests
Merge from: main
Tests: none (no conflicts)
Bugfix: NotSupportedException when trying to use NetWrite.Read
- Fixed by going directly via underlying buffer of NetRead/NetWrite
- Removed generic Stream call path for recording of packets
Tests: ran a server-side client demo recording in editor - before exceptions, now clean
Update: Continuous profiling that only captures allocations (for now)
- using debug binaries built from d340789f, it triggers a snapshot every 3rd frame for testing
- added a test to validate the loop of capture-and-resume
- Native.StartRecording -> Native.TakeSnapshot
Pretty barebones for now, need to profile callstack gathering to see how expensive it is for continuous profiling.
Tests: unit test
Merge: from parallel_validatemove
- Optim to reduce physics cast scheduling overhead
Tests: unit tests
Optim: reduce physics casts scheduling overhead when using batches
- Added a helper function that subdivides workloar across equal batches and potential for work stealing
On a 10player test case(40 ticks total), reduces parallel processing time from 0.2ms to 0.12ms. Still slower than serial execution 0.06ms.
Tests: unit tets
Merge: from profiling_improvements
- New: experimental perfsnapshot_stream [Name, MainCap(MB), WorkerCap(MB), Debug] server command that streams perf data into a user-defined buffer. Limited to 256MB per thread. Stable up to 32MB, past that might fail to export.
- New: profile.quiet persistent server-var - controls whether perfsnapshot commands notify chat of incoming stutters
- Bugfix: fix snapshots failing to export in standalone(introduced yesterday to staging)
Tests: build tests, unit tests, taking snapshots in editor, and snapshots in standalone server build
Update: updated binaries after merge of task branch
Tests: snapshot in editor on craggy
Bugfix: ProfileExport.JSON - correctly identify frame start when there's 0 or 1 callstack depth at the start of recording
This is a standalone-specific issue
Tests: built standalone, did a perfsnapshot there
Merge: from main
Tests: none, no conflicts
Bugfix: ServerProfiler - ensure worker threads that get created initialize with the right fixed buffer size
- built from 76319c30 commit
Previously they wouild initialize with the default 16KB.
Tests: perfsnapshot_stream in editor on craggy with varying main thread buffer sizes
Update: profile.quiet persistent server command to control whether perfsnapshot commands should post chat messages
Tests: ran perfsnapshot_stream with quiet set to true - no chat messages
Update: perfsnapshot_stream [Name, MainThreadBuffer, WorkerThreadBuffer, Debug] server command
- Fixed string buffer over-allocating (need to replace it with a memory stream)
- Fixed frame index wrapping due to now being able to larger than byte.MaxValue
- Fixed invalid final mark reconstruction that would lead to 180d+ slices
Allows streaming of up to 128MB of performance data before generating a snapshot. Seems to be stable up to 64MB, but afterwards it's a bit of a dice-roll. Haven't caught where it's failing yet.
Tests: perf snapshot in editor on craggy.
Update: ServerProfiler - initial FixedStorage support
- Added test for FixedStorage snapshot recording
- update ProfileExporter to handle FixedStorage binary stream quirks
- updated binaries based on 2ce19cfe
New mode will allow us to stream profiling info until the buffer fills up. RCon commands will come next
Tests: unit tests
Merge: from profiling_improvements
- Buildfix for tests trying to use ServerProfiler in non-server env
Tests: scripts compile in editor CLIENT mode
Buildfix: exclude ServerProfiler tests from non-server builds
Tests: built CLIENT config in editor
Merge: from profiling_improvements
- Rewrote internal storage of profiling data to use 1 buffer per thread
- Bugfix for allocation graphs not properly resetting and having gaps at the start
- Bugfix for failing to generate export in rare cases where mono runtime allocates with no managed callstack
Tests: unit tests and generating snapshots in editor
Bugfix: ProfileExporter.JSON - don't spam allocs-0 counter for worker threads
Tests: snapshot in editor on craggy
Update: ServerProfiler.Core binaries
- Relase bins built from 66537fcc
Didn't remember if I snuck in debug bins at one point, so updating them to be safe
Tests: unit tests
Bugfix: ProfileExporter.JSON - reset allocation graphs
- Reset when a new frame starts
- Reset on worker threads if it allocates
Tests: snapshot in editor on craggy
Clean: ProfileExporter.JSON - don't cache per-frame callstack depths
Was never used, so don't waste allocs.
Tests: none, trivial change
Clean: ProfileExporter.JSON - remove debug logs
Tests: none, trivial change
Bugfix: ProfileExporter.JSON - gracefully handle managed allocations coming from native runtime
- Emit "<mono-native-runtime>" if we don't have managed callstack
Finally caught it - this can happen when mono tries to invoke a managed callback which requires a managed allocation (the callback accepts string[], for example) as a first method in managed code. Was able to repro in editor due to it's script compilation callbacks.
Tests: triggered perfsnapshot 40 times without issues
Bugfix: ProfileExporter - avoid reading allocs at the start of the frame as method-entries
- Added a bunch of temporary logging to help track down last issue
Rare, but legal due to our filtering of code.
Tests: snapshotted a bunch of times in editor (there's still one issue with main thread export)
Update: ProfileBinViewer displays binary offsets for marks
Tests: opened a couple bin snapshots
Bugfix: avoid out-of-bounds access when scanning for alloc-only threads
Fixes perf snapshot export failing to generate while processing worker threads. There's still a rare case of main thread crashing - investigating
Tests: Exported a snapshot in editor