AI Accuracy Scoreboard

Live rankings based on real multi-model consensus data.

13,516 facts checked Last updated: June 29, 2026 at 06:08 AM UTC

#	Model	Accuracy	Claims	Best Category	Worst Category	Verified	Disputed
1	Gemini 3.5	71%	2,002	Other	—	26%	8%
2	Claude Opus 4.8	71%	2,204	Other	—	26%	13%
3	Gemini 2.5 Flash	70%	1,804	Other	—	42%	23%
4	GPT 5.5	64%	7,865	Other	—	33%	19%
5	Grok 4	62%	7,333	Other	—	32%	19%
6	Grok 3	49%	111	Other	—	16%	13%

Insufficient data

These models have fewer than 100 verified claims so far and are excluded from the ranking. They will appear above once enough data is collected.

Claude Haiku 4.5 64 claims
Gemini 2.5 Flash Lite 62 claims
Grok 3 Mini 51 claims
GPT-5 1 claims
Claude Sonnet 4.6 1 claims
GPT-5 Nano 1 claims

Accuracy by Category

Categories with fewer than 50 verified claims are hidden due to insufficient data.

#	Model	Accuracy	Claims
1	Grok 3 Mini	56%	25
2	Claude Haiku 4.5	52%	31
3	Gemini 2.5 Flash Lite	38%	29

#	Model	Accuracy	Claims
1	Gemini 3.5	71%	2,002
2	Claude Opus 4.8	71%	2,204
3	Gemini 2.5 Flash	70%	1,804
4	GPT 5.5	64%	7,865
5	Grok 4	62%	7,333
6	Grok 3 Mini	56%	16
7	Grok 3	49%	109
8	Claude Haiku 4.5	45%	22
9	Gemini 2.5 Flash Lite	30%	20

Methodology

Accuracy is measured by cross-model consensus. A model is accurate when its claims are corroborated by other independent models. Each question is sent to multiple AI models simultaneously, and their answers are compared at the claim level using algorithmic semantic matching.

Accuracy varies by question type and model version. Rankings reflect data collected through NoParrot.

Learn more about AI consensus methodology →

Share: Twitter/X LinkedIn

Contribute to the scoreboard

Every question you ask helps build more accurate rankings. Try NoParrot and see how AI models compare on your questions.

Ask a question