I figured I could run maybe 32 gpu-workers per thread efficient, to let one CPU thread iterate the game tree in memory for 32 workers who perform AB-playouts on GPU, to use one OpenCL command-queue per thread.

Project is in my pipe again, will work on Zeta NNUE first, if it runs, I will use the Zeta AB-NNUE framework for Eta, will take some time.