When OpenAI launched its personal finance feature for ChatGPT last week, it led with a number: 82.5 out of 100 on an internal personal finance benchmark. Developed in collaboration with 50 finance professionals, it was designed to evaluate how well ChatGPT handles the kinds of questions real people ask about money.
82.5 sounds pretty good. Until you ask who wrote the test.
OpenAI built the benchmark. OpenAI administered it. OpenAI reported the results. That's not a knock on their methodology — it's just worth knowing when you're evaluating a claim about financial competence, because in any other context, we'd call that grading your own homework.
The CFP® exam is not like that. It's the industry standard for human Certified Financial Planners — an independent, standardized test covering investment planning, tax planning, estate planning, retirement, insurance, and financial analysis. It's the bar that human advisors have to clear to give advice professionally. The average human CFP® scores around 79.5%.
Origin's AI Advisor took that test. Across 6,000 unique sample questions administered over 432 hours, it scored 98.3% — with variance held between 95–97% across multiple runs. GPT-5 scored 93.8% on the same questions. Gemini 2.5 Pro scored 93.1%. Every major frontier model was tested under the same controlled conditions: identical question sets, randomized order, no access to external tools or retrieval, no prompt engineering advantages.
Origin scored highest. By a significant margin. On a test none of them wrote.
The difference between an internal benchmark and an independent one isn't a technicality. In finance specifically, it's the whole ballgame. A benchmark can be designed to emphasize what a model does well and minimize what it doesn't. An independent exam can't be gamed that way — it tests what it tests, and you either know it or you don't.
ChatGPT's 82.5 might be a genuine reflection of its financial reasoning ability. It might also reflect a benchmark calibrated to what GPT-5.5 handles well. We don't have enough information to know, because OpenAI hasn't published the methodology at the level that would let an outside party verify it.
What we do know is that on the same CFP® sample questions, under the same conditions, GPT-5 scored 93.8% — which is lower than Origin's 98.3% and notably higher than ChatGPT's self-reported 82.5 on its own test. That gap is hard to explain unless the internal benchmark and the CFP® exam are measuring meaningfully different things.
The CFP® exam isn't multiple choice trivia about financial concepts. It includes scenario-based questions that require multi-step reasoning — the kind where you have to hold several pieces of information in context simultaneously and arrive at a recommendation that's numerically precise and situationally appropriate.
One test case from Origin's evaluation: when asked about RSUs at a non-public company, the model flagged an inconsistency in the question itself and responded: "Are you sure? If the company isn't public, you may have stock options instead." That's not pattern matching. That's context-aware reasoning — catching something the question didn't explicitly surface. Generic models miss that. They answer what's asked, not what's actually going on.
That kind of reasoning is what Origin's multi-agent architecture was built to produce. The CFP® score is the external validation that it's working.
82.5 on a proprietary benchmark from the company that built the product being tested. 98.3 on an independent professional exam used to certify the humans who give financial advice for a living.
Both numbers are real. Only one of them tells you something you can actually rely on.
Try Origin for $1 for your first year.
Yes. Origin offers partner access so you can manage your finances together at no additional cost. You’ll be able to filter transactions by member—making it easy to see which spending is yours and which belongs to your partner.
Yes. You can edit existing transactions and add new ones directly in Origin, so your records stay accurate and personalized.
Origin connects securely through trusted partners including Plaid, MX, and Mastercard.
Yes. Origin supports CSV uploads. You can upload a .csv file of your transactions, and we’ll import them into your account.
Yes. Your data is protected with bank-level security and advanced encryption. When you connect accounts through Origin, your login credentials are never shared with us. Instead, our partners generate secure tokens that let Origin access only the data you authorize—keeping your personal information private while enabling personalized insights.
Yes. You have full control to organize your spending in Origin. Transactions are automatically categorized by Origin, but you can always edit categories, add your own tags, and filter transactions however you like—so your spending reflects the way you actually manage money.