userDaniel Pcancel

1,177 Commits over 304 Days - 0.16cph!

15 Days Ago
Update: Allow user to control how big of a callstack to record when tracking allocations - Defaults to 16, should be enough to track where in code it originates - Updated description - windows binary built with 1a176138 commit Tests: used it on craggy. Discovered an issue with preceeding commit, but this change works as expected
15 Days Ago
Optim: ProfielrExporter.Json - export now uses streaming compression Avoids the need to allocate massive StringBuilder. Running watchallocs for 2 mins caused 3-4 GC collection events, instead of 1 during each export. Tests: done a perfsnapshot and ran watchallocs for couple minutes
15 Days Ago
Bugfix: ContinuousProfiler - don't record Sync marks when paused for export - Based off 23b9590b commit This is last known bug - lib was still writing Sync marks for new frame, eventually leading to main thread buffer growth, which invalidated pointers during export. Test: soaked for almost 1hour with watchallocs - no more unrecognized reads on main thread
15 Days Ago
Merge: from main
16 Days Ago
Debug: add logging of prefab path to track down server bundle NRE Tests: update game manifest (optimized)
16 Days Ago
Backout of CL 121065 - should reintroduce merge from prefab_process_optim Tests: update game manifest (optimized)
16 Days Ago
Merge: from main Catching up assets to the point of failure (I hope?) - still trying to reproduce NRE during server bundle generation
16 Days Ago
Backout CL 121063 due to failing server bundle generation
16 Days Ago
Merge: from main
16 Days Ago
Merge: from prefab_process_optim - Optimizes component checks during Prefab Processing (speeds up Asset Warmup and monument spawning) Tests: with temp old code that throws exceptions on result missmatch, ran Asset Warmup and ran Scene2Prefab on all large and xlarge monuments
16 Days Ago
Tests: perf test for FileSystem Warmup Recent optims show prefab processing cost for entire server-warmup goes from 39s down to 4.5s (averages across 5 runs) Tests: ran the perf test
16 Days Ago
Cherrypick(hackweek_procgen_async) Optim: PrefabPreProcess.FindComponents is now using GetComponentsInChildren With profiler, this ended up 2x faster than old way (lighthouse monument goes from 96ms to 46ms) Tests: used old code inline to validate outputs of new code
16 Days Ago
Cherrypick(hackweek_procgen_async) Optim: PrefabPreProcess - replace GetComponent with TryGetComponent Those are cheaper since they do less allocations and text formatting. Saves ~35s (but new flow is still slower). Tests: ran procgen with early out
19 Days Ago
Optim: PrefabPreProcess.FindComponents is now using GetComponentsInChildren With profiler, this ended up 2x faster than old way (lighthouse monument goes from 96ms to 46ms) Tests: used old code inline to validate outputs of new code
19 Days Ago
Optim: PrefabPreProcess - replace GetComponent with TryGetComponent Those are cheaper since they do less allocations and text formatting. Saves ~35s (but new flow is still slower). Tests: ran procgen with early out
19 Days Ago
Update: Merge prefab loading and preprocessing to run concurrently Surprisingly leads to worse timings than them being separate (120s prev CL vs 144s new). Might be overhead from doing a single prefab per frame Tests: ran procgen with early out
19 Days Ago
Update: move prefab processing to WorldSetup Tests: ran procgen with early out
19 Days Ago
Update: Prefab<T> gains a convenience (Prefab, T) constructor Tests: compiles
20 Days Ago
Update: Merging asset loading flows together - still editor only + debug code to early out - still slow (there's a number of issues left to resolve) Discovered that mixing Sync + Async loads causes an integration queue flush(big stall for us). This'll be a tricky problem to address, since SoundDefinition (and I presume others) load assets as part of OnValidate Tests: procgen in editor
20 Days Ago
Update: hooking up gameobject spawning to async load logic - Contains a bunch of testing code used for profiling, will clean up in next update Needs a bit of rework to ensure both the orignal flow and new flow can work together. Tests: ran procgen
20 Days Ago
Bugfix: fix out of bounds access during prefab shuffling Tests: ran procgen, no exceptions
20 Days Ago
Update: implement missing logic for both GatherAssets and Process - GatherAssets now respects all relevant settings and sorts paths - implemented Process that works on a batch of objects Tests: only GatherAssets has been checked (confirmed reduction of assets due to config use)
20 Days Ago
Update: exposing prefab preprocesing from GameManager Tests: none, simple change
20 Days Ago
Update: List and Array Shuffle range overloads Tests: none, trivial code
20 Days Ago
Optim: replace prefab search logic with editor manifest lookups - commented out a bunch of code for quicker iteration, wil lrevert later - doesn't account for monument duplication/probability Significantly faster because we don't load any assets in the process - goes from 30s+ down to 15ms Test: tried to procgen default editor map
20 Days Ago
Update: Sort editor manifest by path Allows to do faster lookups Tests: ran in the editor
21 Days Ago
Update: initial work on parallelizing prefab loading during editor procgen Loads too quickly(0.2s instead of 90) - I feel like it only loads the root-level gameobject, instead of the entire hierarchy. Will continue later. Tests: ran it once, got some telemetry, but already certain it's wrong.
22 Days Ago
Merge: from parallel_validatemove - Extra validation checks exposed via server.EmergencyDisablePlayerJobs (default to true). In case of error, shuts down UsePlayerUpdateJobs and goes back to vanilla flow These are cheap to run and should help us track down any problems in the future. Tests: compilation tests, unit tests and played back server demo
22 Days Ago
Update: Another validity check for UsePlayerUpdateJobs - validates player counts between PlayerCache and activePlayerList Tests: played back server demo
22 Days Ago
Update: promote some UsePlayerUpdateJobs validation logic from DEBUG only to release - Hidden behind EmergencyDisablePlayerJobs switch(on by default) and UsePlayerUpdateJobs(off by default) - ValidatePlayerCache checks whole range instead of just up to player count (in case we got more than expected) Tests: played back server demo
22 Days Ago
Clean: fix formatting
22 Days Ago
Update: turn server.EmergencyDisablePlayerJobs const into a servervar Allows to run some extra validation Tests: editor compiles
22 Days Ago
Test: test case for missing player removal from PlayerCache Tests: ran the new unit test
22 Days Ago
Merge: from main
26 Days Ago
Bugfix: ContinuousProfiler now atomically updates it's write indices - internal fix in ServerProfiler.Core, based on e39afb43 - Removed now-unhelpful echeck for right mark type at the start of main thread perf stream (all cases confirmed legal now) After soaking it for 15 minutes total, only main thread export gets lost in the binary sauce. Hopefully last bug. Tests: soaked 3 times on craggy, only hit unexpected mark type on main thread
27 Days Ago
Update: ContinuousProfiling will emergency stop if fails to export Yet to pin down the worker thread telemetry stream seeing stale/garbage data. Tests: soaked on Craggy in editor
27 Days Ago
Bugfix: fix main source of invalid profiling stream from ServerProfiler - Drop dead threads on every succesful frame - Reset all writing indices on new frame and on resuming continuous profiling post-export - release binary built using 019295b4 There's still another issue hiding somewhere, but it's much more stable now. Tests: on craggy exporting every 3rd frame for 5 minutes straight. Previously would trip after 20seconds.
27 Days Ago
Update: prototype of continuous allocation tracking is working - started with profile.watchallocs [Name, default="Allocs"] - can be stopped with profile.stopwatchingallocs Dumps [Name].json.gz with data about allocs and associated callstacks. Still hardcoded to trigger export every 3 frames. Exporter frequently gets lost in the bin stream, and needs to be optimized (craggy has noticeable stutter). Tests: in CLIENT+SERVER editor on Craggy with allocation tracking.
28 Days Ago
Bugfix: fix missing callstacks - Updated binary that contains the fix internally to 6440ec7 (still hardcoded to always capture callstacks) - was due to ABI missmatch - Updated continuous profiler unit test to check for AllocsWithStack and a bit of the data set - Partially updated ProfileExporter, and definitelly borked ProfilerBinViewer Got updated overhead numbers - always capturing a callstack leads to 9micros per allocation cost(even though inflated due to tests adding 38 calls per alloc). Tests: unit tests + perf test
28 Days Ago
Update: initial stack gathering support for allocs in Continuous mode - using release libs based on d48bcf49, with hardcoded stack gathering for now Somehow it's 15% faster than mono_get_last_method, which doesn't make sense - need to update the exporter to figure out what's being generated. Tests: none Profiling shows
28 Days Ago
Test: adding an profiler-allocation overhead estimate test - Switched to relase binaries of d340789f Without profiler recording, allocs cost us ~0.3micros, with recording it costs 1micro. Next will see if we can afford gathering full callstacks for each alloc. Tests: unit tests
28 Days Ago
Merge from: main Tests: none (no conflicts)
28 Days Ago
Bugfix: NotSupportedException when trying to use NetWrite.Read - Fixed by going directly via underlying buffer of NetRead/NetWrite - Removed generic Stream call path for recording of packets Tests: ran a server-side client demo recording in editor - before exceptions, now clean
28 Days Ago
Update: Continuous profiling that only captures allocations (for now) - using debug binaries built from d340789f, it triggers a snapshot every 3rd frame for testing - added a test to validate the loop of capture-and-resume - Native.StartRecording -> Native.TakeSnapshot Pretty barebones for now, need to profile callstack gathering to see how expensive it is for continuous profiling. Tests: unit test
30 Days Ago
Merge: from parallel_validatemove - Optim to reduce physics cast scheduling overhead Tests: unit tests
30 Days Ago
Optim: reduce physics casts scheduling overhead when using batches - Added a helper function that subdivides workloar across equal batches and potential for work stealing On a 10player test case(40 ticks total), reduces parallel processing time from 0.2ms to 0.12ms. Still slower than serial execution 0.06ms. Tests: unit tets
30 Days Ago
Merge: from main
33 Days Ago
Merge: from profiling_improvements - New: experimental perfsnapshot_stream [Name, MainCap(MB), WorkerCap(MB), Debug] server command that streams perf data into a user-defined buffer. Limited to 256MB per thread. Stable up to 32MB, past that might fail to export. - New: profile.quiet persistent server-var - controls whether perfsnapshot commands notify chat of incoming stutters - Bugfix: fix snapshots failing to export in standalone(introduced yesterday to staging) Tests: build tests, unit tests, taking snapshots in editor, and snapshots in standalone server build
33 Days Ago
Update: updated binaries after merge of task branch Tests: snapshot in editor on craggy
33 Days Ago
Bugfix: ProfileExport.JSON - correctly identify frame start when there's 0 or 1 callstack depth at the start of recording This is a standalone-specific issue Tests: built standalone, did a perfsnapshot there