Eta Chess

Yet Another Turing Test

Now with context generative AIs, the switch from pattern recognition to pattern creation with neural networks, I would like to propose my own kind of Turing Test:

An AI which is able to code a chess engine and outperforms humans in this task.

1A) With hand-crafted eval. 1B) With neural networks.

2A) Outperforms non-programmers. 2B) Outperforms average chess-programmers. 2C) Outperforms top chess-programmers.

3A) An un-self-aware AI, the "RI", restricted intelligence. 2B) A self-aware AI, the "SI", sentient intelligence.

***update 2024-02-14***

4A) An AI based on expert-systems. 4B) An AI based on neural networks. 4C) A merger of both.

The Chinese Room Argument applied onto this test would claim that there is no conscious in need to perform such a task, hence this test is not meant to measure self-awareness, consciousness or sentience, but what we call human intelligence.

The first test candidate was already posted by Thomas Zipproth, Dec 08, 2022:

Provide me with a minimal working source code of a chess engine

***update 2024-06-08***

Second test candidate posted by Darko Markovic 2024-06-08 on TalkChess:

GPT-4o made a chess engine

The Next Big Thing in Computer Chess?

We are getting closer to the perfect chess oracle, a chess engine with perfect play and 100% draw rate.

The Centaurs reported already that their game is dead, Centaurs participate in tournaments and use all kind of computer assist to choose the best move, big hardware, multiple engines, huge opening books, end game tables, but meanwhile they get close to the 100% draw rate with common hardware, and therefore unbalanced opening books were introduced, where one side has an slight advantage, but again draws.

The #1 open source engine Stockfish lowered in the past years the effective branching factor of the search algorithm from ~2 to ~1.5 to now ~1.25, this indicates that the selective search heuristics and evaluation heuristics are getting closer to the optimum, where only one move per position has to be considered.

About a decade ago it was estimated that with about ~4000 Elo points we will have a 100% draw rate amongst engines on our computer rating lists, now the best engines are in the range of ~3750 Elo (CCRL), what translates estimated to ~3600 human FIDE Elo points (Magnus Carlsen is rated today 2852 Elo in Blitz). Larry Kaufman (grandmaster and computer chess legenda) mentioned that with the current techniques we might have still ~50 Elo to gain, and it seems everybody waits for the next bing thing in computer chess to happen.

We replaced the HCE, handcrafted evaluation function, of our computer chess engines with neural networks. We train now neural networks with billions of labeled chess positions, and they evaluate chess positions via pattern recognition better than what a human is able to encode by hand. The NNUE technique, neural networks used in AlphaBeta search engines, gave an boost of 100 to 200 Elo points.

What could be next thing, the next boost?

If we assume we still have 100 to 200 Elo points until perfect play (normal chess with standard opening and a draw), if we assume an effective branching factor ~1.25 with HCSH, hand crafted search heuristics, and that neural networks are superior in this regard, we could imagine to replace HCSH with neural networks too and lower the EBF further, closer to 1.

Such an technique was already proposed, NNOM++. Move Ordering Neural Networks, but until now it seems that the additional computation effort needed does not pay off.

What else?

We use neural networks in the classic way for pattern recognition in nowadays chess engines, but now the shift is to pattern creation, the so called generative AIs. They generate text, source code, images, audio, video and 3D models. I would say the race is now up for the next level, an AI which is able to code an chess engine and outperforms humans in this task.

An AI coding a chess engine has also a philosophical implication, such an event is what the Transhumanists call the takeoff of Technological Singularity, when the AI starts to feed its own development in an feedback loop and exceeds human understanding.

Moore's Law has still something in pipe, from currently 5nm to 3nm to maybe 2nm and 1+nm, so we can expect even larger and more performant neural networks for generative AIs in future. Maybe in ~6 years there will be a kind of peak or kind of silicon sweetspot (current transistor density/efficiency vs. needed financial investment in fab process/research), but currently there is so much money flowing into this domain that progress for the next couple of years seems assured.

Interesting times ahead.

CPU Vector Unit, the new jam for NNs...

Heyho, you are already aware of it, NNUE uses the CPU Vector Unit to boost NNs,
so here a lil biased overview of SIMD units in CPUs...

- the term SIMD and Vector Unit can be used analogous
- a SIMD unit executes n times the same instruction/operation on different data
- SIMD units differ in bit width, for example from 64 to 512 bits
- SIMD units differ in support for different instructions/operations
- SIMD units differ in support for different data types
- SIMD units may run with a lower frequency than the main CPU ALUs
- SIMD units increase power usage and TDP of the CPU under load

Simplified, older CPUs have 128-bit SSE units, newer ones 256-bit AVX2, ARM
mobile processors for example 128-bit NEON.

A 128-bit SSE unit can perform for example 4x 32-bit FP32 operations at once, a
256-bit AVX2 unit can perform 16x 16-bit INT16 operations at once. The broader
the bit width and the smaller the data-types, the more operations you can run
at once, the more throughput you get. NNs can run for example with FP16,
floating-point 16-bit, or also with INT8, integer 8-bit, inference.

Currently Intel's AVX-512 clocks significantly down under load, so there is no
speed gain by broader bit-width compared to AVX2, may change in future. Also
there is an trend to multiple Vector Units per CPU core underway.

Eta - v0600 - The next step for LC0?

I know, LC0's primary goal was an open source adaptation of A0, and I am not into the Discord development discussions and alike, anyway, my 2 cents on this:

  • MCTS-PUCT search was an descendant from AlphaGo, generalized to be applied on Go, Shogi and Chess, it can utilize a GPU via batches but has its known weaknesses, tactics in form of "shallow-traps" in a row, and end-game.
  • A CPU AB search will not work with NN on GPU via batches.
  • NNUE makes no sense on GPU.
  • LC0 has a GPU-cloud-cluster to play Reinforcement Learning games.
  • LC0 plays already ~2400(?) Elo with an depth 1 search alone.
  • It is estimated that the NN eval is worth 4 plies AB search.

Looking at the above points it seems pretty obvious what the next step for LC0 could be, drop the weak part, MCTS-PUCT search, ignore AB and NNUE, and focus what LC0 is good at. Increase the plies encoded in NN, increase the Elo at depth 1 eval.

To put it to an extreme, drop the search part completely, increase the CNN size 1000 fold, decrease NPS from ~50K to ~50, add multiple, increasing sized NNs to be queried stepwise for Time Control.

Just thinking loud...

LC0 vs. NNUE - some tech details...

- LC0 uses CNNs, Convolutional Neural Networks, for position evaluation
- NNUE is currently a kind of MLP, Multi-Layer-Perceptron, with incremental updates for the first layer

- A0 used originally about 50 million neural network weights
- NNUE uses currently about 10 million weights? Or more, depending on net size

- LC0 uses a MCTS-PUCT search
- NNUE uses the Alpha-Beta search of its "host" engine

- LC0 uses the Zero approach with Reinforcement Learning on a GPU-Cloud-Cluster
- NNUE uses initial RL with addition of SL, Supervised Learning, with engine-engine games

- LC0 runs the NN part well on GPU (up to hundreds of Vector-Units) via batches
- NNUE runs on the Vector-Unit of the CPU (SSE, AVX, NEO), no batches in need

Cos NNUE runs a smaller kind of NN on a CPU efficient it gains more NPS in an AB search than previous approaches like Giraffe, you can view it in a way that it can combine both worlds, the LC0 NN part and the SF AB search part, on a CPU.

Transhuman Chess with NN and RL...

Some people argue that the art of writing a chess engine lies in the evaluation function. A programmer gets into the expert knowledge of the domain of chess and encodes this via evaluation terms in his engine. We had the division between chess advisor and chess programmer, and with speedy computers our search algorithms were able to reach super-human level chess and outperform any human. We developed automatic tuning methods for the values of our evaluation functions but now with Neural Networks and Reinforcement Learning present I wish to point that we entered another kind of level, I call it trans-human level chess.

If we look at the game of Go this seems pretty obvious, I recall one master naming the play of A0 "Go from another dimension". A super-human level engine still relies on handcrafted evaluation terms human do come up with (and then get tuned), but a Neural Network is able to encode evaluation terms humans simply do not come up with, to 'see' relations and patterns we can not see, which are beyond our scope, trans-human, and the Reinforcement Learning technique discovers lines which are yet uncommon for humans, trans-human.

As mentioned, pretty obvious for Go, less obvious for chess, but still applicable. NNs replacing the evaluation function is just one part of the game, people will come up with NN based pruning, move selection, reduction and extension. What is left is the search algorithm, and we already saw the successful mix of NNs with MCTS and classic eval with MCTS, so I am pretty sure we will see different kind of mixtures of already known (search) techniques and upcoming NN techniques. Summing above up, the switch is now from encoding the expert knowledge of chess in evaluation terms to encoding the knowledge into NNs and use them in a search algorithm, that is what the paradigm shift since A0 and Lc0 and recently NNUE is about, and that is the shift to what I call trans-human chess.

NNs are also called 'black-boxes' cos we can not decode what the layers of weights represent in an human-readable form, so I see here some room for the classic approach, can we decode the black-box and express the knowledge via handcrafted evaluation terms in our common programming languages?

Currently NNs outperform human expert-systems in many domains, this not chess or Go specific, but maybe the time for the question of reasoning will come, a time to decode the black-boxes, or maybe the black-box will decode itself, yet another level, time will tell.

Eta - v0600

Okay, let's do an timewarpjump back to the year 2008 and figure out how we could use the hardware back then for an neural network based chess engine.

Reinforcement Learning on a GPU-Cluster is probably a no go (the Titan supercomputer with 18,688 K20Xs went op in 2012) so we stick on Supervised Learning from a database of quality games or alike. A neural network as used in A0 with ~50 millions parameters queried by an MCTS-PUCT like search with ~80 knps is also not doable, we had only ~336 GFLOPS on an Nvidia 8800 GT back then, compared to ~108 TFLOPS on an RTX 2080 TI via Tensor Cores nowadays. So we have to skip the MCTS-PUCT part and rethink the search. Instead to go for NPS, we could build a really big CNN, but the memory back then on a GPU was only about 512 MB, so we stick on ~128 Mega parameters. So, we have to split the CNN, for example by piece count, let us use 30 distinct neural networks indexed by piece count, so we get accumulated ~3840 Mega parameters, that sounds already better. Maybe this would be already enough to skip the search part and do only a depth 1 search for NN eval. If not, we could split the CNN further, layer by layer, inferred via different waves on GPU, loaded layer-wise from disk to GPU memory via PCIe or alike and hence increase the total number of what is the drawback if we could run an CNN with several billion parameters? Obviously the training of such an monster, not only the horse power needed to train, but the training data, the games. A0 used about 40 million RL games to reach top-notch computer chess level, for only ~50 million parameters, the Chess Base Mega Database contains ~8 million quality we simply have not enough games to train such an CNN monster via Supervised Learning, we rely on Reinforcement Learning, and therefore on some kind of GPU-Cluster to play RL games... nowadays, and also back in 2008.

I see...

I see three ways which neural networks for chess may take in future...

1. With more processing power available, the network size will raise, and we will have really big nets on one side of the extreme, which drop the search algorithm part and perform only a depth 1 search for evaluation.

2. With neural network accelerators with less latency, we will see engines with multiple, smaller neural networks, which perform deeper AlphaBeta searches on the other side.

3. Something in beetween 1. and 2.

Eta - a neural network based chess engine

Since i have read the paper about NeuroChess by Sebastian Thrun i pondered on how to improve his results.

It was obvious that the compute power available in the 90s limited his approach, in training and in inference.

So he had only 120K games for training, a relative small neural network, and could test his approach only with limited search depths.

Recent results with A0 and LC0 show how Deep Learning methods profit by GPGPU, so i think the time has come to give a GPU ANN based engine a try....


Home - Top