What is Perplexity?

TLDR: NLP metric ranging from 1 to infinity. Lower is better.

In natural language processing, perplexity is the most common metric used to measure the performance of a language model. To calculate perplexity, we use the following formula:

$ perplexity = e^z $

where

$ z = -{1 \over N} \sum_{i=0}^N ln(P_{n}) $

Typically we use base e when calculating perplexity, but this is not required. Any base will do, so sometimes the formula will use base 2 or base 10, along with logarithms to the corresponding base.

Example

Imagine that we have a language model which generates the following sequence of tokens:

<start> jack and jill went up the hill

And suppose that the conditional probabilities for each of the tokens are as follows:

token	probability
`<start>`	15%
`jack`	5%
`and`	12%
`jill`	18%
`went`	25%
`up`	40%
`the`	33%
`hill`	50%

For the purposes of calculating perplexity it doesn’t matter how the sequence was generated. It may be using an n-gram model or an LSTM or a transformer. All that matters is the probabilities the model assigns to each of the tokens. To calculate perplexity, we calculate the logarithm of each of the values above:

token	P	ln(P)
`<start>`	15%	-1.897
`jack`	5%	-2.996
`and`	12%	-2.120
`jill`	18%	-1.715
`went`	25%	-1.386
`up`	40%	-0.916
`the`	33%	-1.109
`hill`	50%	-0.693

Summing the logs, we get -12.832. Since there are 8 tokens, we divide -12.832 by 8 to get -1.604. Negating that allows us to calculate the final perplexity:

$ perplexity = e^{1.604} = 4.973 $

Therefore the perplexity of this sequence is about 4.973.