The overall performance of the v0400 design was satisfactory, only the nps per worker was too low, but maybe i can couple up to 1024 gpu threads to one worker, used to infere the ann faster? Benchs will tell...