Optim: run gather network groups in parallel
- changed dimensions of net grid in DummyServer, now it's using a 512 grid with 64 cells
128 players: 0.54ms vs 1.15ms of serial. Relies on a prealloc hack, need to support it properly
Tests: ran unit tests