David Latz DL

I appreciate you sharing that, but I don't see an image with German alt text to translate. Could you please share the image or provide the German alt text you'd like me to translate to English?

A massive wave about to break, seen from slightly below, the wave towering and curling with incredible texture and detail in the water surface, foam and spray creating rich midtone patterns, the sky behind dark and heavy, monochromatic, the scale of the wave suggests something unstoppable approaching, raw oceanic power frozen in a moment of tension before impact

Original: Matt Shumer (OthersideAI) ↗ · 9. Februar 2026

Something Big Is Happening

TLDR

AI agents now autonomously complete multi-hour expert tasks. The capability curve doubles every 4–7 months. Shumer compares this moment to the 'this seems overblown' phase of Covid — but with far greater implications.

Reasoning Seed

Ein Reasoning Seed ist ein strukturierter Prompt, den du in dein KI-Reasoning-Tool kopieren kannst (Claude, ChatGPT, Obsidian, Notion). Er enthält die These des Artikels und die zentrale Spannung — bereit für deine eigene Analyse.

Spannung

If the doubling rate holds — who decides which expert tasks fall next?

· Auf LinkedIn diskutieren

Key Insights

1 — The February 2026 Quality Leap: A New Era

By late 2025, according to Shumer, the best engineers had already delegated the majority of their coding work to AI. On February 5, 2026, models arrived that “make everything before look like a different epoch.” Anyone who hasn’t tried AI in recent months wouldn’t recognize today’s state of the art.

2 — METR Data: The Doubling Rate Is Accelerating

METR measures how long real-world tasks take that a model can solve end-to-end without human help. A year ago: ~10 minutes. Then 1 hour, then several hours. The latest result (Claude Opus 4.5, Nov 2025): tasks that experts take nearly 5 hours to complete. Doubling rate: ~7 months, trending toward 4 months.

3 — AI Builds Itself: GPT-5.3 Codex

OpenAI wrote in the technical documentation for GPT-5.3 Codex (Feb 5, 2026): “Our first model that was instrumental in creating itself.” Early versions debugged their own training and managed deployment. For Shumer, this is a symbolically decisive threshold — self-improving systems have arrived.

4 — Judgment, Not Just Correctness

The latest models make decisions that feel like judgment — “an intuitive sense for the right call, not just the technically correct one.” Shumer describes his own workflow: he formulates in plain English what he wants, leaves for 4 hours, and comes back to finished output — not a draft, but the final product.

5 — The Covid Analogy

“I think we’re in the ‘this seems overblown’ phase of something much, much bigger than Covid.” He explicitly addresses the text to “non-tech friends and family” — making it accessible, but also vulnerable to accusations of alarmism.

6 — Recommendation: Experiment, Now

Core message on CNBC: “People in the workforce should start to use and experiment with AI tools so they can understand what’s coming.” He implies that access to premium models becomes a differentiating factor — those who use paid tools will be faster than those who don’t.

Critical Assessment

What Holds Up

The capability curve is real and data-backed (METR)
The self-improvement threshold at GPT-5.3 is documented, not speculative
The call to experiment is pragmatic and responsible
Fortune, Microsoft, DEV Community confirm: “The conversation the industry needed”

What Needs Context

Conflict of interest: Shumer is an AI CEO — Forbes calls parts of the text “a sales pitch”
Tone: Fortune criticizes “doomsday packaging” that stifles innovative energy
Track record: The Guardian recalls his “world’s top open-source model” claim, which didn’t hold up
Agency question: DEV Community emphasizes — there is “still a human hand on the tiller.” Trajectory depends on human decisions (funding, regulation, infrastructure)

Discussion Questions for the Next Lab

01 Matching Our Own Experience: Does the described quality curve align with what we see in our projects? Where are the gaps between Shumer’s portrayal and our reality?

02 Service Model Implications: If 5-hour tasks become autonomously solvable — what changes in pricing, staffing, and scoping of our fractional engagements?

03 Judgment vs. Craft: Shumer’s “AI now has judgment” thesis — does this apply to design decisions? Where does human judgment remain irreplaceable?

04 Client Enablement: How do we prepare our clients for these shifts without falling into the alarmism criticized by Fortune?

Sources

Glossary

METR An organization that measures AI model capabilities through real-world tasks. The metric captures how long a task takes that a model can solve autonomously — without human assistance.

Self-Improvement The ability of an AI system to contribute to its own improvement — such as debugging its own training or managing its own deployment. GPT-5.3 Codex is considered the first documented example.

Doubling Rate The time interval at which measurable AI model capabilities double. According to METR data, currently at approximately 7 months, trending toward 4 months.

Weiter denken.

Dieser Artikel endet hier — die Diskussion nicht. Auf ✳︎ Panoptia Labs gibt es strukturierte Diskussionsfragen, die du direkt in dein Reasoning-Tool übernehmen kannst.

Diskussion vertiefen ↗