Optim: TraceRaysUnordered - run water traces in parallel to raycasts
Not the best impl, but shows improvement for smaller ray counts (where we're not dominated by Verify):
* TraceRaysUnordered - 128rays: 0.44ms -> 0.37ms, 1k+ rays same
* TraceRays - 128rays: 0.56ms -> 0.50ms, 1k+ rays same
Can apply the same to sphere casts as well.
Tests: unit tests