Don't use LLMs for accounting measures

yuvaltaylor · October 20, 2025, 7:48pm

Gemini performed significantly worse than Claude and ChatGPT on the finance exam referenced by parallaxblue above, FWIW. I don't know. Every time I switch to a new AI model I end up getting garbage answers to straightforward questions. Claude Sonnet was fine until I asked it a legal/tax question and it cited cases in which the ruling was opposite to what it was arguing for. I simply can't understand why nobody seems to have trained these AI models to check their sources when they're presenting information. It seems it would be so easy to program that one little extra step.