userDaniel Pcancel
reporust_rebootcancel

1,623 Commits over 396 Days - 0.17cph!

4 Months Ago
Debug: add logging of prefab path to track down server bundle NRE Tests: update game manifest (optimized)
4 Months Ago
Backout of CL 121065 - should reintroduce merge from prefab_process_optim Tests: update game manifest (optimized)
4 Months Ago
Merge: from main Catching up assets to the point of failure (I hope?) - still trying to reproduce NRE during server bundle generation
4 Months Ago
Backout CL 121063 due to failing server bundle generation
4 Months Ago
Merge: from main
4 Months Ago
Merge: from prefab_process_optim - Optimizes component checks during Prefab Processing (speeds up Asset Warmup and monument spawning) Tests: with temp old code that throws exceptions on result missmatch, ran Asset Warmup and ran Scene2Prefab on all large and xlarge monuments
4 Months Ago
Tests: perf test for FileSystem Warmup Recent optims show prefab processing cost for entire server-warmup goes from 39s down to 4.5s (averages across 5 runs) Tests: ran the perf test
4 Months Ago
Cherrypick(hackweek_procgen_async) Optim: PrefabPreProcess.FindComponents is now using GetComponentsInChildren With profiler, this ended up 2x faster than old way (lighthouse monument goes from 96ms to 46ms) Tests: used old code inline to validate outputs of new code
4 Months Ago
Cherrypick(hackweek_procgen_async) Optim: PrefabPreProcess - replace GetComponent with TryGetComponent Those are cheaper since they do less allocations and text formatting. Saves ~35s (but new flow is still slower). Tests: ran procgen with early out
4 Months Ago
Optim: PrefabPreProcess.FindComponents is now using GetComponentsInChildren With profiler, this ended up 2x faster than old way (lighthouse monument goes from 96ms to 46ms) Tests: used old code inline to validate outputs of new code
4 Months Ago
Optim: PrefabPreProcess - replace GetComponent with TryGetComponent Those are cheaper since they do less allocations and text formatting. Saves ~35s (but new flow is still slower). Tests: ran procgen with early out
4 Months Ago
Update: Merge prefab loading and preprocessing to run concurrently Surprisingly leads to worse timings than them being separate (120s prev CL vs 144s new). Might be overhead from doing a single prefab per frame Tests: ran procgen with early out
4 Months Ago
Update: move prefab processing to WorldSetup Tests: ran procgen with early out
4 Months Ago
Update: Prefab<T> gains a convenience (Prefab, T) constructor Tests: compiles
4 Months Ago
Update: Merging asset loading flows together - still editor only + debug code to early out - still slow (there's a number of issues left to resolve) Discovered that mixing Sync + Async loads causes an integration queue flush(big stall for us). This'll be a tricky problem to address, since SoundDefinition (and I presume others) load assets as part of OnValidate Tests: procgen in editor
4 Months Ago
Update: hooking up gameobject spawning to async load logic - Contains a bunch of testing code used for profiling, will clean up in next update Needs a bit of rework to ensure both the orignal flow and new flow can work together. Tests: ran procgen
4 Months Ago
Bugfix: fix out of bounds access during prefab shuffling Tests: ran procgen, no exceptions
4 Months Ago
Update: implement missing logic for both GatherAssets and Process - GatherAssets now respects all relevant settings and sorts paths - implemented Process that works on a batch of objects Tests: only GatherAssets has been checked (confirmed reduction of assets due to config use)
4 Months Ago
Update: exposing prefab preprocesing from GameManager Tests: none, simple change
4 Months Ago
Update: List and Array Shuffle range overloads Tests: none, trivial code
4 Months Ago
Optim: replace prefab search logic with editor manifest lookups - commented out a bunch of code for quicker iteration, wil lrevert later - doesn't account for monument duplication/probability Significantly faster because we don't load any assets in the process - goes from 30s+ down to 15ms Test: tried to procgen default editor map
4 Months Ago
Update: Sort editor manifest by path Allows to do faster lookups Tests: ran in the editor
4 Months Ago
Update: initial work on parallelizing prefab loading during editor procgen Loads too quickly(0.2s instead of 90) - I feel like it only loads the root-level gameobject, instead of the entire hierarchy. Will continue later. Tests: ran it once, got some telemetry, but already certain it's wrong.
4 Months Ago
Merge: from parallel_validatemove - Extra validation checks exposed via server.EmergencyDisablePlayerJobs (default to true). In case of error, shuts down UsePlayerUpdateJobs and goes back to vanilla flow These are cheap to run and should help us track down any problems in the future. Tests: compilation tests, unit tests and played back server demo
4 Months Ago
Update: Another validity check for UsePlayerUpdateJobs - validates player counts between PlayerCache and activePlayerList Tests: played back server demo
4 Months Ago
Update: promote some UsePlayerUpdateJobs validation logic from DEBUG only to release - Hidden behind EmergencyDisablePlayerJobs switch(on by default) and UsePlayerUpdateJobs(off by default) - ValidatePlayerCache checks whole range instead of just up to player count (in case we got more than expected) Tests: played back server demo
4 Months Ago
Clean: fix formatting
4 Months Ago
Update: turn server.EmergencyDisablePlayerJobs const into a servervar Allows to run some extra validation Tests: editor compiles
4 Months Ago
Test: test case for missing player removal from PlayerCache Tests: ran the new unit test
4 Months Ago
Merge: from main
4 Months Ago
Bugfix: ContinuousProfiler now atomically updates it's write indices - internal fix in ServerProfiler.Core, based on e39afb43 - Removed now-unhelpful echeck for right mark type at the start of main thread perf stream (all cases confirmed legal now) After soaking it for 15 minutes total, only main thread export gets lost in the binary sauce. Hopefully last bug. Tests: soaked 3 times on craggy, only hit unexpected mark type on main thread
4 Months Ago
Update: ContinuousProfiling will emergency stop if fails to export Yet to pin down the worker thread telemetry stream seeing stale/garbage data. Tests: soaked on Craggy in editor
4 Months Ago
Bugfix: fix main source of invalid profiling stream from ServerProfiler - Drop dead threads on every succesful frame - Reset all writing indices on new frame and on resuming continuous profiling post-export - release binary built using 019295b4 There's still another issue hiding somewhere, but it's much more stable now. Tests: on craggy exporting every 3rd frame for 5 minutes straight. Previously would trip after 20seconds.
4 Months Ago
Update: prototype of continuous allocation tracking is working - started with profile.watchallocs [Name, default="Allocs"] - can be stopped with profile.stopwatchingallocs Dumps [Name].json.gz with data about allocs and associated callstacks. Still hardcoded to trigger export every 3 frames. Exporter frequently gets lost in the bin stream, and needs to be optimized (craggy has noticeable stutter). Tests: in CLIENT+SERVER editor on Craggy with allocation tracking.
4 Months Ago
Bugfix: fix missing callstacks - Updated binary that contains the fix internally to 6440ec7 (still hardcoded to always capture callstacks) - was due to ABI missmatch - Updated continuous profiler unit test to check for AllocsWithStack and a bit of the data set - Partially updated ProfileExporter, and definitelly borked ProfilerBinViewer Got updated overhead numbers - always capturing a callstack leads to 9micros per allocation cost(even though inflated due to tests adding 38 calls per alloc). Tests: unit tests + perf test
4 Months Ago
Update: initial stack gathering support for allocs in Continuous mode - using release libs based on d48bcf49, with hardcoded stack gathering for now Somehow it's 15% faster than mono_get_last_method, which doesn't make sense - need to update the exporter to figure out what's being generated. Tests: none Profiling shows
4 Months Ago
Test: adding an profiler-allocation overhead estimate test - Switched to relase binaries of d340789f Without profiler recording, allocs cost us ~0.3micros, with recording it costs 1micro. Next will see if we can afford gathering full callstacks for each alloc. Tests: unit tests
4 Months Ago
Merge from: main Tests: none (no conflicts)
4 Months Ago
Bugfix: NotSupportedException when trying to use NetWrite.Read - Fixed by going directly via underlying buffer of NetRead/NetWrite - Removed generic Stream call path for recording of packets Tests: ran a server-side client demo recording in editor - before exceptions, now clean
4 Months Ago
Update: Continuous profiling that only captures allocations (for now) - using debug binaries built from d340789f, it triggers a snapshot every 3rd frame for testing - added a test to validate the loop of capture-and-resume - Native.StartRecording -> Native.TakeSnapshot Pretty barebones for now, need to profile callstack gathering to see how expensive it is for continuous profiling. Tests: unit test
4 Months Ago
Merge: from parallel_validatemove - Optim to reduce physics cast scheduling overhead Tests: unit tests
4 Months Ago
Optim: reduce physics casts scheduling overhead when using batches - Added a helper function that subdivides workloar across equal batches and potential for work stealing On a 10player test case(40 ticks total), reduces parallel processing time from 0.2ms to 0.12ms. Still slower than serial execution 0.06ms. Tests: unit tets
4 Months Ago
Merge: from main
4 Months Ago
Merge: from profiling_improvements - New: experimental perfsnapshot_stream [Name, MainCap(MB), WorkerCap(MB), Debug] server command that streams perf data into a user-defined buffer. Limited to 256MB per thread. Stable up to 32MB, past that might fail to export. - New: profile.quiet persistent server-var - controls whether perfsnapshot commands notify chat of incoming stutters - Bugfix: fix snapshots failing to export in standalone(introduced yesterday to staging) Tests: build tests, unit tests, taking snapshots in editor, and snapshots in standalone server build
4 Months Ago
Update: updated binaries after merge of task branch Tests: snapshot in editor on craggy
4 Months Ago
Bugfix: ProfileExport.JSON - correctly identify frame start when there's 0 or 1 callstack depth at the start of recording This is a standalone-specific issue Tests: built standalone, did a perfsnapshot there
4 Months Ago
Merge: from main Tests: none, no conflicts
4 Months Ago
Bugfix: ServerProfiler - ensure worker threads that get created initialize with the right fixed buffer size - built from 76319c30 commit Previously they wouild initialize with the default 16KB. Tests: perfsnapshot_stream in editor on craggy with varying main thread buffer sizes
4 Months Ago
Update: profile.quiet persistent server command to control whether perfsnapshot commands should post chat messages Tests: ran perfsnapshot_stream with quiet set to true - no chat messages
4 Months Ago
Update: perfsnapshot_stream [Name, MainThreadBuffer, WorkerThreadBuffer, Debug] server command - Fixed string buffer over-allocating (need to replace it with a memory stream) - Fixed frame index wrapping due to now being able to larger than byte.MaxValue - Fixed invalid final mark reconstruction that would lead to 180d+ slices Allows streaming of up to 128MB of performance data before generating a snapshot. Seems to be stable up to 64MB, but afterwards it's a bit of a dice-roll. Haven't caught where it's failing yet. Tests: perf snapshot in editor on craggy.