For example, lower perplexity indicates a better language model in general cases. The questions are (1) What exactly are we measuring when we calculate the codebook perplexity in VQ models? (2) Why would we want to have large codebook perplexity? What is the ideal perplexity for VQ models? Sorry if my questions are unclear.
https://stats.stackexchange.com/questions/600948/codebook-perplexity-in-vq-vae