Eta Chess

Eta - v0302 - batches

LC0 uses batch sizes of 256 or 512 to utilize a gpu, i did a quick bench with 256 positions to be evaluated per run...

4096 nps on Nvidia GTX 750
16640 nps on AMD Fury X

Note that nn cache could double these values, but this is still far less than i could achieve when doing all computations directly on gpu device, wo host-device interaction.

And waiting for 256 positions to be evaluated at once is against the serial nature of AlphaBeta search...

Eta - v0301 - host-device latencies

One reason gpus are not used as accelerators for chess is the host-device latency.

Afaik the latencies are in the range of 5 to 10s or even 100s of microseconds, so you can get max 200K kernel calls per second per thread, even if the gpu is able to process its task much faster.

Therefore, Eta v0300, a cpu based AlphaBeta search with gpu as ANN accelerator, is flawed by design.

Eta - v0301

Back to cpu based AlphaBeta search with gpu ANN evaluation.

On Nvidia GTX 750 i achieve with one single cpu thread about 2 Knps, and up to ~20 Knps with 256 parallel cpu threads.

This sounds far too slow for an AlphaBeta search...

Home - Top