OpenAI’s o3 and o4-mini hallucinate way higher than previous models

A troubling issue nestled in OpenAI’s technical report.

By OpenAI‘s own testing, its newest reasoning models, o3 and o4-mini, hallucinate significantly higher than o1.

First reported by TechCrunch, OpenAI’s system card detailed the PersonQA evaluation results, designed to test for hallucinations. From the results of this evaluation, o3’s …

Leave a Reply

Your email address will not be published. Required fields are marked *