2 minutes
What is Perplexity?
TLDR: NLP metric ranging from 1 to infinity. Lower is better.
In natural language processing, perplexity is the most common metric used to measure the performance of a language model. To calculate perplexity, we use the following formula:
$ perplexity = e^z $
where
$ z = -{1 \over N} \sum_{i=0}^N ln(P_{n}) $
Typically we use base e
when calculating perplexity, but this is not required. Any base will do, so sometimes the formula will use base 2 or base 10, along with logarithms to the corresponding base.
Example
Imagine that we have a language model which generates the following sequence of tokens:
<start>
jack
and
jill
went
up
the
hill
And suppose that the conditional probabilities for each of the tokens are as follows:
token | probability |
---|---|
<start> |
15% |
jack |
5% |
and |
12% |
jill |
18% |
went |
25% |
up |
40% |
the |
33% |
hill |
50% |
For the purposes of calculating perplexity it doesn’t matter how the sequence was generated. It may be using an n-gram model or an LSTM or a transformer. All that matters is the probabilities the model assigns to each of the tokens. To calculate perplexity, we calculate the logarithm of each of the values above:
token | P | ln(P) |
---|---|---|
<start> |
15% | -1.897 |
jack |
5% | -2.996 |
and |
12% | -2.120 |
jill |
18% | -1.715 |
went |
25% | -1.386 |
up |
40% | -0.916 |
the |
33% | -1.109 |
hill |
50% | -0.693 |
Summing the logs, we get -12.832. Since there are 8 tokens, we divide -12.832 by 8 to get -1.604. Negating that allows us to calculate the final perplexity:
$ perplexity = e^{1.604} = 4.973 $
Therefore the perplexity of this sequence is about 4.973.