The overall performance of the v0400 design was satisfactory,
only the nps per worker was too low,
but maybe i can couple up to 1024 gpu threads to one worker,
used to infere the ann faster? Benchs will tell...