
The latest OPENAI AI models exceed the competitors of Google, Anthropic, Xai and Meta to keep their right facts, according to new rankings. The results show striking differences in “hallucination rate” or how often these AI models invent the details.
The results come from the classification of the Hughes Hallucination evaluation model of Vectara (HHEM), which measures the “report of the summaries which hallucinate” through the main models of languages. In head tests
How the best AI tools compare themselves when the facts count
Vectara’s HHEM ranking is based on a large -scale test designed to determine whether AI models can adhere to the facts when summarizing real press articles. Each AI model received the same set of short documents and noted the frequency to which its summaries included information that is not found in the original text.
The refusal rates have also been followed, capturing the frequency to which an AI model refused to respond. The conditions have remained identical to all levels, the results reveal which IA tools best manage the truth under the same pressure. Here’s how they played.
OPENAI
OPENAI holds five of the lowest hallucination rates in the ranking, with chatgpt-o3 mini at 0.795%, followed by Chatgpt-4.5, Chatgpt-5, Chatgpt-O1 mini and Chatgpt-4o all grouped around 1.2%to 1.49%.
That the Earth in fact made the beginning of Chatgpt-5 as default model A strong movement for the AI giant, until users grow back, Require the return of Chatppt-4o. CEO Sam Altman has given way, leaving subscribers more to choose their model.
But there is a compromise. Once free users have reached their GPT-5 limit, They went to Chatgpt-5 MiniA sharp drop in precision with a hallucination rate of 4.9% which is among the highest in the OpenAi range. This could mean a sudden slide in how much you can trust the answers you get.
GEMINI 2.5 PRO OPERVIEUR DE Google and Gemini 2.5 Flash Lite obtained a score respectively 2.6% and 2.9%. Not as low as Openai leaders, but always clear of the most risky models. Pro Preview replaced Gemini 2.5 PRO experimental now retired, which had formerly displayed one of the lowest scores on the board of directors at 1.1%.
Anthropic
New anthropic models, Close work 4.1 And Claude Sonnet 4, post-ha a hallucination rate of 4.2% and 4.5%. These scores place the two models among the most subject models to the errors of the Board of Directors, well behind leaders such as Chatgpt and Gemini.
Meta
Meta’s Llama 4 Maverick and Llama 4 Scout had hallucination rates of 4.6% and 4.7%, putting them in the same stage as the last models of Claude and outside the group of the most precise artists in the table.
Xai
Grok 4 displays a high hallucination rate of 4.8%, placing it among the least precise models in the ranking. Elon Musk promoted the newly released model Like “more intelligently than all graduate students, in all disciplines”, highlighting his score of 26.9% on the last examination of humanity.
The chatbot is also confronted Criticism for harmful And inappropriate outings. This combination of a high error rate and current content problems could make Grok a risky choice for invoice responses.
Keep track of the truth in the AI era
When the AI is wrong, it may seem well. And when these invented details pass after the facts unnoticed, folding facts and distributing disinformation, this can lead to serious risks in fields such as health, law, finance and politics. This is why the current and transparent tests are more important than ever.
Vectara’s HHEM ranking is updated with each change of model, following in real time that the AIS improve and which are late. While these systems are learning from research, messaging and everyday tools, knowing which model of AI remains closest to the truth is to know what to trust.
In our Take a closer look at the Openai GPT-5We focus on the references and guidelines related to the health of the AI model.