

This week, DeepSeek released a paper describing manifold-constrained hyper-connections (mHC), a new architectural approach that dramatically improves training stability and reasoning performance without brute-force scale.¹
Most coverage has framed this as a China-vs-U.S. story. That framing misses what actually matters.
This is a capital structure story. And it has implications that extend far beyond any single paper or company.
The Problem They Solved
One of the persistent challenges in training large language models is signal amplification — the tendency for gradients to explode or vanish during training, creating instability that compounds as models grow larger.
The industry's dominant response has been compensation through scale: more parameters, more compute, more capital. If instability emerges, throw resources at it until it stabilizes.
DeepSeek took a different approach. They asked whether the instability could be removed architecturally rather than compensated for financially.
Their solution is elegant: constrain the residual mixing matrices to live on a mathematical manifold called the Birkhoff polytope, where all entries are non-negative and each row and column sums to one. They enforce this using the Sinkhorn-Knopp algorithm, a technique from 1967 that alternates row and column normalizations to maintain doubly stochastic matrices.²
The results:³
• Signal amplification reduced from approximately 3000× to stable
• 2.1% improvement on BBH reasoning benchmark for 27B model
• Demonstrated across 3B, 9B, and 27B parameter models
• With roughly 6.7% compute overhead
The benchmarks matter less than the economics. A 6.7% compute overhead for dramatically improved training stability represents a fundamentally different cost curve than the one U.S. incumbents built their businesses around.
Why This Is Hard for Incumbents to Absorb
The major U.S. GenAI players — OpenAI, Google, Meta, and others — are no longer purely technology companies. They have become capital structures.
Consider the scale of commitments already made:
• Hyperscalers are projected to spend $342 billion on capex in 2025 alone, a 62% increase from 2024⁴
• The Stargate project has committed $500 billion, with $400+ billion already secured and nearly 7 gigawatts of data center capacity planned⁵
• Data center construction hit $40 billion in June 2025, a single month, up 30% year-over-year⁶
• Investment in AI infrastructure contributed an estimated 92% of U.S. GDP growth in the first half of 2025⁷
A genuine architectural pivot — one that suggests smaller, more efficient models can achieve comparable or superior results — doesn't just require engineering changes. It requires unwinding financial commitments, revising investor expectations, and acknowledging that the moat you've been building may not be the moat that wins.
That's not impossible. But it's very, very hard when you're in the middle of a capital cycle.
The Innovator's Dilemma, Updated
Clayton Christensen's framework remains relevant: incumbents struggle to adopt disruptive innovations not because they can't see them, but because their existing business models create rational incentives to ignore them.
The twist in AI is that the "business model" isn't just revenue. It's capital allocation.
When you've raised $10 billion on the thesis that scale is the path to AGI, you can't easily pivot to "actually, architectural efficiency might matter more." Your investors, your board, your recruiting pitch, your partnerships — they're all built on the scale narrative.
This doesn't mean U.S. labs are doomed. They have extraordinary talent, resources, and the ability to incorporate new ideas. But clean, first-principles architectural pivots almost never come from incumbents at the height of a capital cycle.
They come from constraint.
Historical Pattern Recognition
This pattern has precedent:
Japanese automotive manufacturing (1970s–80s): Detroit's response to competition was bigger factories and more automation. Toyota's response was architectural — lean manufacturing, just-in-time inventory, continuous improvement. The constraint of limited capital and space forced a fundamentally different approach that eventually reshaped the entire industry.
ARM vs. Intel (1990s–2010s): Intel dominated through performance-at-any-cost chip design. ARM, constrained by the power and thermal requirements of mobile devices, optimized for efficiency. When mobile became the dominant computing platform, ARM's architecture won — not because it was more powerful, but because it was more appropriate.
SpaceX vs. Legacy Aerospace: Boeing and Lockheed had decades of expertise and government relationships. SpaceX, forced to compete on cost, redesigned launch economics from first principles. Reusability wasn't just an engineering achievement; it was a capital-constrained response to an incumbent cost structure.
In each case, the incumbent had more resources. The challenger had more freedom.
What This Means for the Next Phase of AI
For the past three years, the implicit assumption in GenAI has been:
Bigger models → More intelligence → More value
DeepSeek's work suggests a different framing:
Better architecture → More efficient intelligence → Broader deployment
These aren't mutually exclusive, but they lead to very different investment theses, competitive dynamics, and market structures.
If intelligence-per-dollar becomes the relevant metric — rather than raw capability at any cost — then the players who win will be those who can deliver useful AI economically, not just impressively.
That shift has profound implications:
For enterprises: The AI procurement decision changes from "who has the most powerful model?" to "who delivers the best results for our budget?" This favors efficiency-focused architectures and may commoditize raw capability faster than incumbents expect.
For startups: The barrier to entry potentially drops. If you don't need billions in GPU access to build competitive models, the playing field widens. This could enable a Cambrian explosion of specialized, efficient AI applications.
For investors: The "picks and shovels" thesis around GPU infrastructure may face headwinds. If architectural innovation reduces compute requirements, the assumption that AI growth automatically means GPU growth becomes questionable.
For the U.S.-China dynamic: This isn't primarily a national competition story, but it has national implications. If efficiency-focused architectures can achieve frontier-level results, export controls on high-end chips become less decisive. You can't embargo architectural innovation.
The Question of Generalization
One important caveat: DeepSeek's results were demonstrated on models up to 27B parameters. The relevant question is whether these architectural insights generalize to frontier-scale models (70B+, 175B+, etc.).
If they do, the implications are transformative.
If they don't — if there's something about extreme scale that reintroduces the instabilities this architecture addresses — then this becomes an interesting approach for certain use cases rather than a paradigm shift.
The honest answer is: we don't know yet. But the direction of inquiry matters. DeepSeek has demonstrated that asking "can we achieve this more efficiently?" yields real results. That question will now be asked more frequently, by more teams, across more architectures.
What I'm Watching
Over the next 12–18 months, I'll be paying attention to:
- Replication and extension: Do other labs confirm and build on these results? Does the approach generalize to larger scales?
- Incumbent response: How do the major U.S. labs respond? Do they incorporate these ideas, dismiss them, or attempt to leapfrog with their own innovations?
- Deployment economics: Do we see meaningful cost reductions in AI inference and training? Does the cost curve bend in ways that change competitive dynamics?
- Narrative shift: Does the investor and media narrative around AI begin to emphasize efficiency alongside scale? How quickly?
- Startup activity: Do we see new companies forming around efficiency-first architectures? Does funding flow toward this thesis?
The Deeper Point
The specific technical details of manifold-constrained hyper-connections matter less than what they represent: a demonstrated alternative to the brute-force scaling paradigm that has dominated AI development.
That alternative may or may not prove decisive. But its existence changes the conversation.
For three years, the implicit message from frontier AI labs has been: "Give us more capital, more compute, more scale, and we'll deliver more intelligence."
DeepSeek's implicit message is different: "What if you don't need all that?"
That's not just a technical question. It's a question about how this industry allocates resources, who gets to compete, and what kind of AI future we're building toward.
The answer isn't yet clear. But the question is now on the table.
And once it's on the table, it doesn't go away.
DeepSeek's architectural innovation isn't primarily a geopolitical story — it's a capital structure story. It demonstrates that asking "can we achieve this more efficiently?" yields real results. If intelligence-per-dollar becomes the relevant metric, the competitive dynamics of AI shift dramatically. The question is now on the table, and it won't go away.
Written by Stephen Klein, Founder/CEO of Curiouser.AI
Footnotes
¹ Xie, Z., Wei, Y., Cao, H., et al. "mHC: Manifold-Constrained Hyper-Connections." arXiv:2512.24880, December 31, 2025.
² The Sinkhorn-Knopp algorithm was originally published in: Sinkhorn, R. and Knopp, P. "Concerning nonnegative matrices and doubly stochastic matrices." Pacific Journal of Mathematics, 1967. DeepSeek uses 20 iterations per layer during training.
³ Performance metrics from DeepSeek paper and analysis by MarkTechPost, Analytics Vidhya, and alphaXiv.
⁴ J.P. Morgan Asset Management, "Is AI already driving U.S. growth?" 2025.
⁵ OpenAI, "OpenAI, Oracle, and SoftBank expand Stargate with five new AI data center sites," October 22, 2025.
⁶ Bank of America Institute report, September 2025.
⁷ Harvard economist Jason Furman analysis, September 27, 2025, cited in Fortune.
Stephen Klein is Founder/CEO of Curiouser.AI — building AI to amplify human intelligence, not replace it. He teaches at Berkeley and is writing a book with Georgetown on post-automation strategy. Curiouser is community-funded on WeFunder.