What is a blunder in chess? The tension between the qualitative and quantitative answers to this question is at the heart of different approaches towards chess, and more broadly, how quantitative metrics may lack context, but qualitative metrics lack precision.

Qualitative answer

There are many qualitative answers to this question, especially when comparing “blunders” and “mistakes”:

  • “a move that negatively affects their position in a significant way” ~ chess.com
  • “severely worsens the player’s situation by allowing a loss of material, checkmate, or anything similar” ~ Wikipedia
  • “Blunders tend to be immediately refutable, while mistakes require planning to capitalize on.” ~ r/chess

An issue with these qualitative answers is that while their words may be correct, smart people may still disagree with their applicability at the margins. For a suboptimal move to have a “significant” negative effect, it requires that the opponent notices and takes advantage of it.

Quantitative answer

The quantitative answer considers a move which causes a significant drop in probability of winning to be blunder. What is “significant”? A change 14% or greater.

How is winning probability in a chess game calculated? Objectively, since there are only three possible outcomes in a game (win, draw, loss), by definition any real advantage will lead to a win with perfect play. But objectively, humans aren’t perfect. Even grandmasters can let an advantage slip. If Magnus Carlsen doesn’t capitalize on your blunder, was it really a blunder?

From a machine learning perspective, we can view winning probability as a logistic regression problem, where centipawn evaluation is a feature, and game outcome is a label. If we further limit data points to 2300+ rated games, this is what Lichess uses.

Of course, this isn’t perfect, and there’s an argument to be made that outcomes for 2300+ elo players may not be representative of lower-rated players. It also doesn’t take time pressure into consideration. But there is a tradeoff between accuracy of our metric and generalization power of the model.

One other important objection to this line of inquery is that the centipawn evaluation of a position is not a constant. The evaluation varies by search depth and between engines. So if Stockfish 15 evaluates a position at +1 and Stockfish 16 evaluates the same position at +1.5, white’s actual winning chances haven’t changed at all. The evaluation is not anchored to any real value, especially with the introduction of NNUE eval functions.

References

https://lichess.org/page/accuracy