Scaling Law Uncertainty

The assumption that AI capability improves predictably with more compute and data — the “scaling law” — is increasingly uncertain, with evidence of diminishing returns and alternative pathways to capability improvement that disrupt investment theses built on raw scale.

What It Is

The “scaling hypothesis” — that simply training larger models on more data reliably produces more capable AI — dominated AI research and investment from roughly 2018–2024. This hypothesis justified the enormous infrastructure buildout: if capability scales with compute, then the company that can spend the most on compute will have the most capable models.

Sara Hooker (SSRN, 2026) argues this era is ending: “The relationship between training compute and performance is highly uncertain and rapidly changing.” Alternative levers — architectural innovations, data efficiency, reasoning methods, specialised models — are gaining relative importance. This doesn’t mean scaling is irrelevant, but it means raw compute is no longer the dominant determinant of frontier capability.

The CSET (January 2026) report on AI R&D automation adds a further uncertainty: even within AI R&D itself, there is no consensus on whether increasing automation of the research process will accelerate progress or lead to plateau. Workshop participants with different assumptions about how AI R&D works disagreed on trajectories even when shown the same empirical evidence.

Why It Matters (for Investors)

If scaling laws are weakening, several investment implications follow. First, NVIDIA’s pricing power depends in part on the assumption that frontier labs will continue scaling compute indefinitely — commoditisation risk increases if scaling is less productive. Second, smaller labs and academic groups may be able to close capability gaps with frontier labs through architectural innovation rather than raw compute spend — reducing the competitive advantage of massive capital. Third, the timeline for AI reaching various capability thresholds becomes more uncertain, making investments dependent on specific capability milestones higher-risk.

The VUB “Benchmarks Saturate” study (January 2026) adds a measurement problem: as models approach human-level performance on existing benchmarks, the benchmarks themselves become unreliable measures of true capability improvement — making it harder to track progress independently.

Evidence & Examples

  • Hooker: “Academia has been marginalized from meaningfully participating in AI progress and industry labs have stopped publishing” — creating opacity precisely when measurement is most needed (ssrn-5877662 (1).pdf)
  • Hooker’s thesis: scaling has produced a “massive windfall in capital for industry labs” and “fundamentally reshaped the culture of conducting science” — but the scaling formula is changing, and “key disruptions lie ahead” (ssrn-5877662 (1).pdf)
  • CSET workshop (July 2025): experts disagree on whether AI R&D becoming more automated will accelerate or plateau AI progress; “new data on how AI R&D automation is progressing in practice may be insufficient to resolve conflicting perspectives” — the different camps make different assumptions that lead to different interpretations of the same evidence (CSET-When-AI-Builds-AI.pdf)
  • CSET: existing benchmark evaluations are “insufficient for measuring, understanding, and forecasting the trajectory of automated AI R&D” — the measurement infrastructure doesn’t exist to reliably track progress (CSET-When-AI-Builds-AI.pdf)
  • VUB “Benchmarks Saturate” (Jan 2026): when models surpass human-level performance, human-judged benchmarks lose discriminative power — the judge can no longer distinguish between models better than themselves (2601.19532v1.pdf)
  • Epoch AI estimates effective compute for training AI models is rising 10x annually, but Jones (2026) notes the capability per unit of compute has also been rising at a similar rate — making raw compute spending a less clear signal of frontier advantage (AIandEconomicFuture.pdf)

Tensions & Open Questions

  • The “intelligence explosion” uncertainty: CSET documents that “intelligence explosion” scenarios — where AI rapidly self-improves — are neither confirmed nor ruled out by current evidence. They are low-probability but non-negligible, and difficult to detect in advance. This is the tail risk that most investment analyses do not adequately price.
  • Measurement opacity: If scaling is uncertain AND benchmarks are saturating AND industry labs have stopped publishing, investors are pricing AI capability trajectories with very limited public information. This creates both risk (prices could be very wrong) and opportunity (information advantages are possible for those closest to frontier labs).
  • Non-scaling alternatives: Hooker argues that “more interesting levers of progress” beyond scaling are emerging — but doesn’t specify which. CSET notes that architectural innovation, data efficiency, and improved training methods are candidates. Investments that benefit from architectural improvement rather than raw scale may be better positioned if Hooker is right.
  • Competitive implications for open-source: If scaling advantages diminish, open-source models (which benefit from architectural innovation shared publicly) may close the gap with proprietary frontier models faster. This would accelerate LLM commoditisation.

LLM Commoditization · AI Investment Thesis Capex and Returns · Recursive Self-Improvement and AI R&D · AI Capability Measurement