Skip to Content

Tag: LLMs

A better method for identifying overconfident large language models

Large language models (LLMs) can generate credible but inaccurate responses, so researchers have developed uncertainty quantification methods to check the reliability of predictions. One popular method involves submitting the same prompt multiple times to see if the model generates the same answer.

But this method measures self-confidence, and even the most impressive LLM might be confidently wrong. Overconfidence can mislead users about the accuracy of a prediction, which might result in devastating consequences in high-stakes settings like health care or finance.

To address this shortcoming, MIT researchers introduced a new method for measuring a different type of uncertainty that more reliably identifies confident but incorrect LLM responses. Learn more

Making it easier to verify an AI model’s responses

Despite their impressive capabilities, large language models are far from perfect. These artificial intelligence models sometimes “hallucinate” by generating incorrect or unsupported information in response to a query.

Due to this hallucination problem, an LLM’s responses are often verified by human fact-checkers, especially if a model is deployed in a high-stakes setting like health care or finance. However, validation processes typically require people to read through long documents cited by the model, a task so onerous and error-prone it may prevent some users from deploying generative AI models in the first place.

To help human validators, MIT researchers created a user-friendly system that enables people to verify an LLM’s responses much more quickly. With this tool, called SymGen, an LLM generates responses with citations that point directly to the place in a source document, such as a given cell in a database. Learn more
image description