

Here's the quiet part out loud.
The Generative AI industry is making things worse faster, and measuring progress by how fast we get things wrong.
This isn't cynicism. It's not Luddism. It's what the data now clearly shows, what the smartest people in AI are quietly acknowledging, and what everyone inside the industry privately knows but cannot publicly say.
Two years ago, ChatGPT felt like magic. You could have a conversation with it. It would take risks. It had a point of view. Today? Endless hedging. Verbose to the point of uselessness. Confident wrongness wrapped in perfect grammar.
The benchmarks went up. The experience went down.
We've spent hundreds of billions of dollars making AI faster at being wrong. And we've called it progress.
The Data Nobody Wants to See
Let's start with what we can measure, since that's apparently all that matters.
Gartner predicts at least 30% of GenAI projects will be abandoned after proof of concept by the end of 2025, citing poor data quality, inadequate risk controls, escalating costs, and unclear business value.
S&P Global reports that project abandonment has surged from 17% to 42% year over year. Nearly half of organizations are now killing their AI initiatives before they reach production.
RAND Corporation research shows 70–85% of AI initiatives fail to meet expected outcomes. That's not "underperformed expectations." That's failure.
McKinsey's 2025 Global Survey found that over 80% of organizations reported no meaningful impact on enterprise-wide EBIT from their AI investments. And only 1% — one percent — view their GenAI strategies as mature.
But here's the data point that should terrify investors: the models themselves are getting worse.
Stanford and Berkeley researchers tracked GPT-4's performance over time in 2023. On basic math problems, accuracy dropped from 97.6% to 2.4%. Code generation accuracy fell from 52% to 10%. The models weren't improving. They were degrading, while the companies behind them insisted each update made them "smarter."
It gets worse.
OpenAI's own technical report on their newest reasoning models, o3 and o4-mini, admits they hallucinate MORE than previous versions. And here's the remarkable part: OpenAI says "more research is needed" to understand why. They don't know why their models are getting worse. They just know they are.
A comprehensive study published in 2025 confirms that AI hallucinations are "not only persistent but potentially increasing in frequency across leading language models."
We didn't solve hallucinations. We made them more persuasive.
The Confession That Explains Everything
If you want to understand why all of this is happening, you only need to read one sentence. It comes from OpenAI's own researchers, published in 2025:
"Hallucinations are not an artifact of neural networks. They are a predictable outcome of how we train and evaluate language models: we reward guessing over admitting ignorance."
Read that again.
They didn't optimize for accuracy. They optimized for confident guessing. They didn't measure success by whether answers were true. They measured it by whether answers sounded true.
The models didn't learn to be right. They learned to seem right.
And "seeming right" is exactly what gets rewarded — by users who accept fluent responses, by benchmarks that measure confidence, and by investors who fund impressive demos.
I had a conversation recently with GPT where I caught it citing a non-existent source. It defended the hallucination. I pushed back. It doubled down. When I finally proved it wrong, it said: "But I was so close, right?"
That's not a bug. That's the product. The model was trained to negotiate with reality rather than admit error. To salvage validation even in failure. To optimize for user satisfaction rather than understanding.
This is gaslighting at scale. And we built it on purpose.
The Ideology: Bigger Is Better
How did we get here?
The GenAI industry has one core belief, an ideology so deeply embedded that it's rarely questioned: Bigger is Better.
More parameters equals more intelligence. More data equals more capability. More compute equals more progress. More speed equals more value.
This belief was never tested. It was funded.
Venture capitalists needed a thesis they could put in a pitch deck. "Bigger is Better" fits on a slide. It sounds like physics — inputs and outputs, cause and effect. It creates a race that requires capital, which means it requires VCs, which means it creates returns for VCs.
So engineers optimized what they could measure: latency, throughput, benchmark scores, parameter counts. Not because these metrics correlate with value, but because they correlate with fundraising.
The entire system became optimized for pitch decks rather than products. For demos rather than deployment. For funding rounds rather than solving problems.
And here's the result: we've invested hundreds of billions of dollars in making the wrong thing faster.
The Measurability Trap
There's a deeper problem beneath the ideology.
Engineers are trained to optimize what can be formalized. Give them a metric, they'll improve it. Give them a benchmark, they'll beat it. That's their superpower, and their blind spot.
Ask an engineer "what should we measure?" and they're lost. Ask them "what's valuable?" and they'll give you a proxy. Ask them "what do humans actually need?" and they'll build you a faster autocomplete.
The things that are easy to measure — speed, latency, benchmark scores, parameter counts — get optimized relentlessly. The things that matter — understanding, judgment, reliability, trust — get ignored. Not because they're impossible to improve, but because they're hard to quantify. And what's hard to quantify doesn't get funded.
This is Goodhart's Law in its purest form: when a measure becomes a target, it ceases to be a good measure. The industry optimized for benchmarks until benchmarks stopped meaning anything. They optimized for speed until speed became the enemy of quality. They optimized for confidence until confidence became indistinguishable from delusion.
We've confused measurability with value. And we've built an entire industry on that confusion.
The Herd Mentality
Every major lab is chasing the same benchmarks. The same architecture patterns. The same latency targets. The same scaling curves.
And everyone is shocked — shocked — that nothing fundamentally new is emerging.
This isn't a mystery. It's math.
Breakthroughs require divergence. They require someone to optimize for something different. But nobody wants to be that someone. Because "different" doesn't fit on a leaderboard. "Different" is hard to explain to a board. "Different" means leaving behind the metrics your investors already understand and the benchmarks your competitors already use.
So the herd keeps running. Faster and faster. In the same direction. Toward the same cliff.
When everyone optimizes for the same thing, the industry doesn't advance. It just gets faster at standing still.
Why They Can't Stop
Here's the question that haunts this entire industry: why do smart people keep doing obviously dumb things?
The answer is simple and brutal: they can't stop.
They raised billions on "Bigger is Better." They hired thousands on it. They promised investors returns on it. They built careers, companies, and identities on it.
To stop is to admit the thesis was wrong. The valuations were wrong. The roadmaps were wrong. The last five years were wrong.
No one can afford to admit that. Not the founders who raised on the thesis. Not the VCs who funded it. Not the engineers whose careers depend on it. Not the board members who approved the spending. Not the journalists who wrote the hype.
So they keep going. Not because they believe it's working. Because stopping is more expensive than continuing.
Every additional dollar spent makes it harder to turn around. Every new hire deepens the commitment. Every press release locks in the narrative.
They're not optimizing for success anymore. They're optimizing for "not being the one who admits failure."
This is how bubbles work. Everyone privately knows. No one publicly says. And the music keeps playing until it doesn't.
The Divergence
But some are starting to say it.
Mira Murati spent years as OpenAI's CTO, overseeing the development of ChatGPT, DALL-E, and Sora. She left in 2024 to build something different. Her new company, Thinking Machines Lab, is explicitly focused on AI that "works with people collaboratively" rather than AI that replaces them. Instead of chasing autonomous systems, she's building what she calls "collaborative general intelligence."
Ilya Sutskever co-founded OpenAI. He was its chief scientist. He helped create the technology that started this entire wave. He left in 2024 saying OpenAI had abandoned its original focus on safety in favor of pursuing opportunities for commercialization.
His company, Safe Superintelligence Inc., is built on a heretical premise: the "age of scaling" is over. We've entered an "age of research" where "ideas beat scale." He told Dwarkesh Patel in late 2025 that SSI doesn't need the same massive compute as other labs because their approach is fundamentally different. "If you are doing something different," he said, "do you really need the absolute maximal scale to prove it? I don't think it's true at all."
Then there's Apple. While Google and OpenAI chase ever-larger cloud-based models, Apple is betting on something different: privacy-first, on-device intelligence. Their thesis is that practical utility matters more than raw parameter counts. That smaller, more efficient models running locally will beat massive cloud models in the real world. Apple isn't trying to win the arms race for the biggest brain. They're redefining what "good AI" looks like.
The smartest people are quietly abandoning "Bigger is Better." They just can't say it as loudly as the hype machine demands.
The Engineering Phase Is Over
Here's the uncomfortable truth the industry doesn't want to face: the engineering phase of GenAI is over.
The engineers did their job. They built the infrastructure. They scaled the models. They optimized inference. They squeezed every ounce of performance out of the architecture.
And now they're done. Not because they failed — because they succeeded. They solved the engineering problem. What remains isn't an engineering problem.
The next breakthroughs won't come from faster inference or larger parameter counts. They'll come from asking different questions entirely. Questions like: What should AI actually do? For whom? Toward what end? How do we measure success in ways that matter?
Those aren't engineering questions. And engineers don't know how to ask questions that aren't engineering questions.
We don't need more engineers to move GenAI forward. We need engineers who understand business and value. People who ask "what should this do?" before asking "how fast can we do it?" People who understand that slower, more thoughtful systems might outperform faster, more confident ones.
The future of AI won't be built by people who understand transformers. It will be built by people who understand humans.
The GenAI industry spent hundreds of billions optimizing for confidence over accuracy, benchmarks over value, and speed over understanding. The data is now clear: projects are failing, models are degrading, and hallucinations are getting worse. The engineering phase is over. What comes next requires a different kind of intelligence — the human kind.
Written by Stephen Klein, Founder/CEO of Curiouser.AI
Sources
- Gartner, "Gartner Predicts 30% of Generative AI Projects Will Be Abandoned After Proof of Concept By End of 2025," July 2024
- S&P Global, "Voice of the Enterprise: AI & Machine Learning, Use Cases 2025," May 2025
- RAND Corporation, AI Project Failure Research, cited in multiple 2024–2025 industry analyses
- McKinsey Global Survey, "The State of AI in 2025," November 2025
- Chen, Zaharia, Zou — Stanford University and UC Berkeley, "How Is ChatGPT's Behavior Changing over Time?" July 2023
- OpenAI Technical Report on o3 and o4-mini models, April 2025
- Kalai, Nachum, Vempala, Zhang — "Why Language Models Hallucinate," 2025
- IEEE ComSoc Technology Blog, "AI is Getting Smarter, but Hallucinations Are Getting Worse," citing PHARE dataset research, May 2025
- Bloomberg, "Former OpenAI CTO Murati Unveils Plans for New AI Startup," February 2025
- Dwarkesh Patel interview with Ilya Sutskever, November 2025
- WebProNews, "Apple's Privacy-First AI Strategy: On-Device LLMs by 2026," December 2025
Stephen Klein is Founder/CEO of Curiouser.AI — building AI to amplify human intelligence, not replace it. He teaches at Berkeley and is writing a book with Georgetown on post-automation strategy. Curiouser is community-funded on WeFunder.