Perplexity (PPX) is an industry-standard term used to measure how well a language model or a neural network can predict the next word in a sequence, given the context. It's a way to evaluate the quality of a language model's predictions.
In other words, perplexity measures how surprised you would be if you were shown a sequence of words and asked to predict the next word, based on the patterns learned by the model. A lower perplexity score means that the model is better at predicting the next word, while a higher score indicates that the model is less accurate.
OpenPerplexity is intended to help evaluate the Perplexity (PPX) of different models by allowing the human user to provide feedback on resulting responses, thereby on a macroscopic response perspective sharing the "surprise" or the "good job" of the model's response.
We do this by allowing users to use a simple open-source Retrieval Augmented Generation (RAG) implementation with various open-source and other models including (Mistral, Llama3, and others) to respond to the users query, using the fine-tuned model, the additional data from the RAG and data from the Internet. The user is then able to evaluate the resulting responses helping to establish the OpenPerplexity score for each model. These scores will be published and provided to the community under an open-source license to help evaluate the models and eventually fine-tuning future models using the best OpenPerplexity scored data.