A troubling issue nestled in OpenAI’s technical report.
By OpenAI‘s own testing, its newest reasoning models, o3 and o4-mini, hallucinate significantly higher than o1.
First reported by TechCrunch, OpenAI’s system card detailed the PersonQA evaluation results, designed to test for hallucinations. From the results of this evaluation, o3’s …
Leave a Reply