← Field Notes ·

Something Big Is Happening

AI agents now autonomously complete multi-hour expert tasks. The capability curve doubles every 4–7 months. Shumer compares this moment to the 'this seems overblown' phase of Covid — but with far greater implications.

Key Insights

1 — The February 2026 Quality Leap: A New Era

By late 2025, according to Shumer, the best engineers had already delegated the majority of their coding work to AI. On February 5, 2026, models arrived that “make everything before look like a different epoch.” Anyone who hasn’t tried AI in recent months wouldn’t recognize today’s state of the art.

2 — METR Data: The Doubling Rate Is Accelerating

METR measures how long real-world tasks take that a model can solve end-to-end without human help. A year ago: ~10 minutes. Then 1 hour, then several hours. The latest result (Claude Opus 4.5, Nov 2025): tasks that experts take nearly 5 hours to complete. Doubling rate: ~7 months, trending toward 4 months.

3 — AI Builds Itself: GPT-5.3 Codex

OpenAI wrote in the technical documentation for GPT-5.3 Codex (Feb 5, 2026): “Our first model that was instrumental in creating itself.” Early versions debugged their own training and managed deployment. For Shumer, this is a symbolically decisive threshold — self-improving systems have arrived.

4 — Judgment, Not Just Correctness

The latest models make decisions that feel like judgment — “an intuitive sense for the right call, not just the technically correct one.” Shumer describes his own workflow: he formulates in plain English what he wants, leaves for 4 hours, and comes back to finished output — not a draft, but the final product.

5 — The Covid Analogy

“I think we’re in the ‘this seems overblown’ phase of something much, much bigger than Covid.” He explicitly addresses the text to “non-tech friends and family” — making it accessible, but also vulnerable to accusations of alarmism.

6 — Recommendation: Experiment, Now

Core message on CNBC: “People in the workforce should start to use and experiment with AI tools so they can understand what’s coming.” He implies that access to premium models becomes a differentiating factor — those who use paid tools will be faster than those who don’t.

Critical Assessment

What Holds Up

What Needs Context

Discussion Questions for the Next Lab

01 Matching Our Own Experience: Does the described quality curve align with what we see in our projects? Where are the gaps between Shumer’s portrayal and our reality?

02 Service Model Implications: If 5-hour tasks become autonomously solvable — what changes in pricing, staffing, and scoping of our fractional engagements?

03 Judgment vs. Craft: Shumer’s “AI now has judgment” thesis — does this apply to design decisions? Where does human judgment remain irreplaceable?

04 Client Enablement: How do we prepare our clients for these shifts without falling into the alarmism criticized by Fortune?

Sources

Glossary

METR An organization that measures AI model capabilities through real-world tasks. The metric captures how long a task takes that a model can solve autonomously — without human assistance.

Self-Improvement The ability of an AI system to contribute to its own improvement — such as debugging its own training or managing its own deployment. GPT-5.3 Codex is considered the first documented example.

Doubling Rate The time interval at which measurable AI model capabilities double. According to METR data, currently at approximately 7 months, trending toward 4 months.