Optim: couple micro optims
- rewrite a couple branches into branchless logic
- inlined IsTurnTicketSameIgnoreProgress
- got rid of module via use of MADD
- added extra comments for explaining what's going on
AllocDeallocST: 2.82 -> 2.77ms for 10k allocs. Tiny, but every bit helps
Tests: ran unit and perf tests