Something Big Is Happening
TLDRAI agents now autonomously complete multi-hour expert tasks. The capability curve doubles every 4–7 months. Shumer compares this moment to the 'this seems overblown' phase of Covid — but with far greater implications.
Ein Reasoning Seed ist ein strukturierter Prompt, den du in dein KI-Reasoning-Tool kopieren kannst (Claude, ChatGPT, Obsidian, Notion). Er enthält die These des Artikels und die zentrale Spannung — bereit für deine eigene Analyse.
If the doubling rate holds — who decides which expert tasks fall next?
Key Insights
1 — The February 2026 Quality Leap: A New Era
By late 2025, according to Shumer, the best engineers had already delegated the majority of their coding work to AI. On February 5, 2026, models arrived that “make everything before look like a different epoch.” Anyone who hasn’t tried AI in recent months wouldn’t recognize today’s state of the art.
2 — METR Data: The Doubling Rate Is Accelerating
METR measures how long real-world tasks take that a model can solve end-to-end without human help. A year ago: ~10 minutes. Then 1 hour, then several hours. The latest result (Claude Opus 4.5, Nov 2025): tasks that experts take nearly 5 hours to complete. Doubling rate: ~7 months, trending toward 4 months.
3 — AI Builds Itself: GPT-5.3 Codex
OpenAI wrote in the technical documentation for GPT-5.3 Codex (Feb 5, 2026): “Our first model that was instrumental in creating itself.” Early versions debugged their own training and managed deployment. For Shumer, this is a symbolically decisive threshold — self-improving systems have arrived.
4 — Judgment, Not Just Correctness
The latest models make decisions that feel like judgment — “an intuitive sense for the right call, not just the technically correct one.” Shumer describes his own workflow: he formulates in plain English what he wants, leaves for 4 hours, and comes back to finished output — not a draft, but the final product.
5 — The Covid Analogy
“I think we’re in the ‘this seems overblown’ phase of something much, much bigger than Covid.” He explicitly addresses the text to “non-tech friends and family” — making it accessible, but also vulnerable to accusations of alarmism.
6 — Recommendation: Experiment, Now
Core message on CNBC: “People in the workforce should start to use and experiment with AI tools so they can understand what’s coming.” He implies that access to premium models becomes a differentiating factor — those who use paid tools will be faster than those who don’t.
Critical Assessment
What Holds Up
- The capability curve is real and data-backed (METR)
- The self-improvement threshold at GPT-5.3 is documented, not speculative
- The call to experiment is pragmatic and responsible
- Fortune, Microsoft, DEV Community confirm: “The conversation the industry needed”
What Needs Context
- Conflict of interest: Shumer is an AI CEO — Forbes calls parts of the text “a sales pitch”
- Tone: Fortune criticizes “doomsday packaging” that stifles innovative energy
- Track record: The Guardian recalls his “world’s top open-source model” claim, which didn’t hold up
- Agency question: DEV Community emphasizes — there is “still a human hand on the tiller.” Trajectory depends on human decisions (funding, regulation, infrastructure)
Discussion Questions for the Next Lab
01 Matching Our Own Experience: Does the described quality curve align with what we see in our projects? Where are the gaps between Shumer’s portrayal and our reality?
02 Service Model Implications: If 5-hour tasks become autonomously solvable — what changes in pricing, staffing, and scoping of our fractional engagements?
03 Judgment vs. Craft: Shumer’s “AI now has judgment” thesis — does this apply to design decisions? Where does human judgment remain irreplaceable?
04 Client Enablement: How do we prepare our clients for these shifts without falling into the alarmism criticized by Fortune?
Sources
- Original: Matt Shumer — Something Big Is Happening
- Fortune — Something big is happening in AI
- Fortune — Counterpoint: the only thing he got right
- DEV Community — A Response
- Wikipedia — Something Big Is Happening
Glossary
METR An organization that measures AI model capabilities through real-world tasks. The metric captures how long a task takes that a model can solve autonomously — without human assistance.
Self-Improvement The ability of an AI system to contribute to its own improvement — such as debugging its own training or managing its own deployment. GPT-5.3 Codex is considered the first documented example.
Doubling Rate The time interval at which measurable AI model capabilities double. According to METR data, currently at approximately 7 months, trending toward 4 months.
Weiter denken.
Keep thinking.
Dieser Artikel endet hier — die Diskussion nicht. Auf ✳︎ Panoptia Labs gibt es strukturierte Diskussionsfragen, die du direkt in dein Reasoning-Tool übernehmen kannst.
Diskussion vertiefen ↗