In the previous post I discussed the history of chess engines and why they don’t “think” like we think. Trading interpretability for computation cycles ultimately led to the engines we have today, fairly alien in nature and perhaps less pedagogically useful because of it. At the time, though, the goal was to beat human grandmasters by any means necessary, a great engineering feat that the field had been working on for decades.
This post contains two related proposals. The first is a chess engine tournament, unique in the type of engine which will be permitted to enter and likely to succeed. Importantly, the vast majority of engines currently holding the highest performance ratings will likely not be effective.
The second proposal is the outlines of a chess engine that is likely to be successful in this tournament, taking advantage of the highly qualitative nature of chess position evaluation. Although it is unlikely to perform as strongly against top-performing engines, there are several distinct advantages of such an engine. In short, there is likely to be a great deal of educational value as well as financial incentive driving the construction of highly successful qualitative chess engines.
Qualitative chess analysis
Qualitatively, there are many aspects to a chess game that may be captured. Let’s take a look at the way a grandmaster analyzes a position. It will become quite apparent that at the highest levels, the qualitative aspects of position analysis dominate over quantitative aspects (i.e. the number and value of each piece).
In the selected lecture, Grandmaster Varuzhan Akobian details a game he played previously. At a key moment of the game, Akobian sacrificed his rook for a key pawn in the center of the board. The resulting position is reproduced for convenience in Figure 1.
I would like you to spend a minute or two just to give me the evaluation of this position. It may not seem that clear because I’m down the exchange. [Novice chess players are taught that chess pieces have quantitative values, which may come into consideration when exchanging one piece for another. These values are measured in terms of pawns. Knights and bishops are generally understood to be worth 3 pawns, rooks are worth 5, and queens are worth 9.] I have a knight and a pawn for a rook. Rook is valued 5, knight and a pawn is 4. It may seem like I’m down a pawn here. But what do you think is the proper evaluation of this position?
…Basically white is very active. There are a few other things we can mention about white’s position, that it’s very strong. What else is very strong? White’s king is very safe, he cannot attack me. But how about the black king? Do you think the black king is very safe? [No.] For example, I could put my queen here [the e4 square] then I have a battery! Remember when we have a queen and a bishop on the same diagonal we call that a battery. And suddenly if I can deflect this queen [black queen on the g7 square] I will just go queen takes pawn, checkmate!
His dark square bishop is basically trapped behind his own pawns so it’s ineffective… . My bishop is very active… . And one more thing that you can mention. Passed pawn, exactly! And it’s a very strong passed pawn because with a knight on d6 very quickly [the pawn] will turn into [a queen].
… How much advantage does white have here? Big advantage, slight advantage, maybe winning? … We’re not going to use Houdini [a chess engine], Houdini will probably say black is slightly worse. But in practical play, I would be very comfortable to play this against anybody, and pretty comfortable I can win this position for white.
Note that quantitative analysis is almost entirely absent from GM Akobian’s evaluation. Towards the beginning he mentions that he has sacrificed his rook for a knight and a pawn, and consequently is at a material deficit. However, he quickly discards this shallow evaluation, going so far as to label his subsequent qualitative evaluation as the “proper” evaluation.
GM Akobian goes on to mention several other qualitative features of the position which are difficult to assign quantitative value to. Firstly, the activity of his pieces means that it is much easier to play the position as white because his pieces are on better squares, including some deep in black’s half of the board. The lack of activity is mentioned later on, noting that black’s bishop is essentially trapped behind his own pieces.
King safety is another difficult thing to quantify. In the given position, it is difficult to find a way that black can even check the white king. Moving the d8 rook to b1 will take 2 moves, and even then the b1 square is guarded by the bishop on d3. So the white king is indeed quite safe. In contrast black king is quite vulnerable, guarded mainly by the black queen, who is herself vulnerable to deflection or direct attack. (Deflection is a chess tactic which involves “distracting” an opponent’s piece which plays an important defensive role. For example, a piece which is defending two pieces simultaneously may be deflected by capturing one of the defended pieces.)
GM Akobian emphasizes the weakness of black’s king by sketching out a simple game-winning checkmate plan: arrange the bishop and queen in a battery which attacks the h7 pawn, deflect the black queen, and deliver checkmate with the queen by taking the h7 pawn. Although it is not immediately clear how to implement the plan, this type of simple plan creates a well-defined long-term “threat” that black must contend with.
Another threat he mentions is encompassed by white’s passed pawn on c5. This pawn may become a queen, which would become an insurmountable advantage for white. Therefore, this threat is another long-term vulnerability for black. (A passed pawn is a pawn which cannot be stopped or attacked by an opponent’s pawns. This occurs when there are no opponent pawns in the “file” (vertical column) of the pawn, as well as the file to the left and the right, if applicable. For example, a pawn in the C file is a passed pawn if there are no opponent pawns in the B, C or D files. A pawn in the H file is passed if there are no opponent pawns in the G or H files.)
Finally, note that GM Akobian does not assign a quantitative value to the board position, but rather a “very comfortable to win” assessment. Very little of this analysis involves quantities, but rather qualitative situations which must be dealt with. Consequently, it seems that qualitative reasoning is an ideal tool which a chess engine might use.
Qualitative analysis is more human
The nature of expert-level perception described by GM Akobian was studied directly in a 1973 paper by Chase and Simon. Participants at three different levels of chess ability (a master-level player, an experienced club-level player, and a novice) were asked to complete two chess-related cognitive tasks. The first was a perception task, requiring him to reproduce a chess position on an adjacent board as quickly as possible, with the model board in plain view. The second task was a memory task, requiring participants to reproduce a position from memory after viewing it for only 5 seconds.
Importantly, the perceptual study attended to chess players’ tendencies to “chunk” the board position as they reproduced positions, tending to remember groups of interrelated pieces. These pieces tended to have relationships which the authors characterized in five ways: a piece attacks another, a piece defends another, two pieces are adjacent, two pieces are the same color, two pieces are the same type.
The results of the study found that “the C, S, and null relations are low because subjects are placing pieces which usually have multiple relations. Thus, from the within-glance relations, it appears that subjects are noticing the pawn structure, clusters of pieces of the same color, and attack and defense relations over small spatial distances.” In other words, it seems likely that human players use the qualitative relationships between pieces to remember the board.
A qualitative chess engine
It is unlikely that a qualitative chess engine will be able to entirely do away with the basic structural algorithm involved in chess calculations, i.e. minimax. We would like our qualitative engine to calculate in a way most similar to humans, and thus will require some level of ply depth to the calculations. However, a qualitative engine will have a much stronger sense of the “flow” of the game, and will thus explore fewer branches. Rather than considering each position as discrete, a qualitative engine should note how each move guides the evolution of the chess board position.
It is important to note that a qualitative chess engine may not be the most computationally efficient, a factor which was the primary motivation during the period of time when top chess engines needed to be run on supercomputers and every ounce of performance needed to be squeezed out of the machine. A qualitative engine should instead favor explainability over performance whenever possible. Instead, an engine should produce a good explanation of which moves were considered and why a particular move was chosen.
More modern qualitative research can improve upon Wilkins’ knowledge-based PARADISE approach. It is important to recognize that his knowledge base is quite similar in nature to the FAC component of the retrieval model presented by Forbus et al. in 1995, but does not take advantage of the performance speedups presented there. Because of the high number of positional examples available online, there is a huge opportunity for a performant analogical retrieval system at present. The MAC/FAC retrieval system could pay huge performance dividends in retrieval if applied to this problem.
Specifically, the Lichess database referenced above contains 1,737,489 chess “puzzles” as of the time of writing. A chess puzzle is simply a chess board position in which players are encouraged to find the best move. Each puzzle relates to one or more chess “themes” (e.g. “mate in 1”, “pin”, “discovered attack”, etc.), analogous to Wilkins’ concepts outlined above. Each puzzle also includes the best move to make in the position. Some research will need to be done to derive meaning from this best move, relating it by analogy to the current position being evaluated by the engine.
Qualitative spatial calculi may also be used to construct more psychologically plausible models of chess positions than simply noting which pieces occupy which squares, seeking to emulate the models suggested by Chase and Simon. Chess pieces have intricate relationships which can be captured, and which change whenever a piece moves to another square. Importantly, however, not all relationships are affected by the movement of a single chess piece, suggesting that performance gains may be realized by recomputing only those relationships which have changed.
It is likely that low-level piece relationships may give rise to higher level relationships and tactics. For example, the concept of “capturing the defender” arises from the concept of attacking a piece A which defends B, which works when pieces A and B are attacked by pieces C and D of opposing colors. And in the case of Figure 2 below, a defender may be “deflected” to win the piece it is defending. Defensive relationships may be thought of in a chain or directed graph, with each piece defending another and the safety of a piece being considered in relation to its connection to a defensive group.
Applications of a qualitative chess engine
There are many benefits to reopening the pursuit of qualitative reasoning in chess. The first and most clear value proposition is that qualitative reasoning is likely to serve as a more plausible model for how humans think about the game. This is evidenced by the fact that as Chase and Simon found, chess players do not “see” the whole board at once, but rather in chunks of interrelated pieces. Even if the details of human mental models differ slightly from the implementation of a qualitative reasoning engine, it will be able to provide a traceable account of its decision-making process, an important step towards explainability.
As we saw in part 1, current top chess engines reason about chess in ways that are quite contrary to human intuition. Stockfish uses full-width search, considering each move in each position without prejudice and assigning numerical values to each position. As we saw from the analysis from GM Akobian, qualitative evaluations are far more meaningful to humans.
ther chess engines approach chess in an even more alien way. Specifically, it is unlikely that any engine which makes heavy use of neural evaluation functions will model human-derived organic strategies in ways which chess players will recognize. At the far end of this spectrum is the fully neural Maia chess engine, but even Lc0’s Monte-Carlo tree search precludes consideration for cognitive plausibility.
Qualitative chess engines which are able to better reproduce the types of chess reasoning used by top human chess players are also likely to serve as better pedagogical tools for those interested in studying chess. This applies at every level, from beginner to grandmaster. The skill level of such a chess engine would be quite easily tunable simply by disabling more advanced knowledge from the knowledge base.
This is a far more natural method of “handicapping” than the search depth limitations used in current chess engines. Each piece of knowledge becomes a tunable parameter to the engine. As students learn concepts, the corresponding representations in the knowledge base could be enabled, allowing for gradual learning in a far more accessible way. In fact, it is likely true that a qualitative chess engine could outperform human grandmasters (who often teach chess to others) in this respect.
Finally, it is likely that a qualitative engine would become a key component of a first line of defense against cheating in chess. Most cheating is performed by using assistance from a chess engine during online games with unsuspecting opponents. Consulting a functionally omniscient computer program can thus provide a cheater with a theoretically insurmountable advantage.
In an interview with the Perpetual Chess Podcast, Chris Callahan of the popular chess website LiChess.org stated that the majority of employees of the website work primarily to detect cheaters and yet the problem still persists. By exploiting the difference between conventional full-width engines like Stockfish and a qualitative evaluation, those working to detect cheaters will be better equipped to detect “suspicious” moves. However, qualitative chess engines are unlikely to be able to completely replace human moderation.
Nerf the Engine!
A computer chess tournament be held between chess engines can encourage the type of reasoning and gameplay which resembles human games. However, because we are not interested in the best overall chess engine, but one which can reason like a human might, the rules of the tournament will be adjusted in several key ways to discourage brute-force computational methods. We already know that calculating millions of positions can find the optimal move. But what happens when an engine is limited to e.g. 1000 positions?
Because we expect few entrants in early iterations of this special tournament, engineering an automatic enforcement mechanism for the limitations stipulated in this document are likely to be unnecessary. Engine compliance may simply be verified through manual inspection. Future iterations may include further safeguards, potentially separating the position evaluation function and directly counting the number of invocations while arbitrating the tournament to directly verify compliance.
Firstly, competing chess engines will be limited in the number of board positions they can evaluate during any one move. Because human grandmasters evaluate around 100 positions before making a move, the tournament arbitration system will artificially impose this limitation on all competing engines.
This cap immediately creates an issue for full-width chess engines because of chess’ high branching factor. Were an engine to evaluate each possible move, it would perform quite poorly in board positions with many possible plies and replies available, rarely reaching a depth of more than 2 or 3. As a result, any engine which naively assesses a chess board would perform quite poorly in this setup.
The practical upshot of the position limitation is that the engine will be incentivized to gather as much relevant information about a position as possible rather than optimizing for the maximum number of positions.
Additionally, engines will be required to implement scheduling logic which takes the time remaining into consideration. While this creates the immediate problem of how an engine should allocate its time, it creates the ancillary challenge of evaluating a position’s quiescence. Positions which are “quiet” and have few forcing moves require less evaluation than positions in which there are many non-forcing moves.
This requirement immediately motivates a qualitative chess engine to recognize the futility of falling prey to the Horizon Effect. The Horizon Effect causes engines to waste many position calculations pursuing delaying moves which amount to hopeless rabbit trails. Instead, an engine should recognize that quiescence has to do with the relationships between pieces. Humans understand this and can quite quickly see the futility of a move and terminate their search. A qualitative analysis which takes this factor into consideration will be able to save a great deal of position calculations, behaving more like a human player.
Given that computers have achieved and sustained superhuman capabilities in the domain of chess, the next frontier is not in building increasingly strong engines, but harnessing the present computational power to reason about the game in ways that humans do. Qualitative reasoning can provide novel and intuitive ways to reason about previously seen moves and think about the game.