AI
Everything Changed While You Were Sleeping
Everything Changed While You Were Sleeping
Last week Andrej Karpathy left an AI agent running on a single GPU before going to bed. By morning, the agent had run 700 experiments. And they actually worked.
The way Karpathy described it stuck with me: "I've been doing this iterative optimization manually for two decades. You come up with ideas, implement them, check if they work, come up with new ideas. Seeing the agent do this entire workflow end-to-end and all by itself is wild."
Wild. Because this isn't an "AI helped me" story. This is an "AI worked without me" story.
And this story is no longer confined to research labs.
The last two weeks may have been the moment AI shifted from "what's coming" to "what's here." But the interesting part isn't the model names or the benchmarks. The interesting part is watching the nature of work transform in real time.
Boris Cherny, the creator of Claude Code, said something on Lenny Rachitsky's podcast that stopped me mid-sentence: he hasn't edited a single line of code by hand since November 2025. He ships 10 to 30 pull requests every day. All written by AI. He's still one of the most prolific engineers at Anthropic, just like he was at Instagram. Except now he never touches a keyboard for code.
First time you hear that, it sounds like hype. Then you look at your own workflow and realize: you're heading in the same direction. I know I am.
At Flalingo, the last few months have been exactly this transformation. We replaced Intercom, Pipedrive, our entire calling system with custom AI pipelines. But the real thing I noticed wasn't about the tools we replaced. It was about how I spend my time now. Most of my day isn't writing code anymore. It's defining how AI should work. Which step does what, what it checks, where it decides.
That distinction sounds small. It changes everything.
Karpathy calls his approach "autoresearch." You give an AI agent a real training setup. You write your instructions in a markdown file. You start the agent. It modifies the code, runs a 5-minute training cycle, keeps the result if it improved, discards it if it didn't. Repeats all night.
You wake up to an experiment log and a better model.
The things the agent found weren't trivial either. A missing multiplier in the attention mechanism that made it too diffuse. No regularization on Value Embeddings. Messed up AdamW betas. A banded attention window that was too conservative. These were real oversights in a project Karpathy himself had already spent serious time tuning. They stacked up and dropped the leaderboard time by 11%.
His next step is even more interesting: multiple agents collaborating in parallel. You spin up a swarm, have them find the most promising ideas on smaller models, then promote winners to increasingly larger scales. Humans? They "contribute on the edges."
Shopify CEO Tobi Lutke saw this, adapted it to his own project that same night. Woke up to a 19% improvement in quality scores. The agent-optimized smaller model had outperformed a larger model that was manually configured.
Stop and think about the trajectory here. A year ago we said "vibe coding" — tell AI what you want, it writes code. Six months ago we moved to "agentic engineering" — orchestrate agents, provide oversight. Now Karpathy is showing something one step further: the human's only job is writing a markdown file. Describing how the agent should think. The agent does the rest.
Each step removes another layer of human involvement.
Is This Just for Research Labs?
Not even close.
Claude Code now writes 4% of all public GitHub commits. It went from zero to the most-used AI coding tool in eight months. Cursor doubled its revenue in three months. 73% of engineering teams use AI coding tools daily now — two years ago it was 18%. 41% of all code written globally is AI-generated.
But the data point that really made me think was this one: METR's study found that experienced developers actually work 19% slower with AI tools. Even though they believe they're 20% faster.
That paradox tells you everything. Speed isn't about the tool's speed anymore. Speed is about whether you know how to work with AI. Giving the right instructions, setting the right constraints, building the right feedback loops. OpenAI calls this "harness engineering" — the discipline of designing the system around the model. One researcher improved the coding performance of 15 different LLMs in a single afternoon just by changing the harness.
I see this clearly in my own work. At Flalingo, the people who get the best results from AI aren't the best coders. They're the ones who give AI the best context, define the clearest constraints, evaluate output most effectively. It's a new kind of literacy. And it's developing incredibly fast because the tools improve every single week.
Beyond Software
This transformation isn't limited to software either. It started there, but the wave is spreading everywhere.
A few days ago, Anthropic launched Claude Cowork for enterprises. AI agents that work across Google Workspace, DocuSign, Apollo, Clay. HR, finance, legal, engineering. With their own plugin marketplace. Microsoft built their Copilot Cowork on top of Anthropic's Claude technology and announced it days later.
Enterprise software stocks took serious hits in a short period. Adobe, Salesforce, SAP, ServiceNow — all affected. Satya Nadella himself said business applications will collapse in the agentic era. Do you realize he was talking about one of his own company's biggest revenue streams?
Software used to sell "tools." A seat, a license, a subscription. Now the direction is selling "completed work." A company pays maybe $10K a year for accounting software and $120K for an accountant. Next-gen AI is starting to handle both. This is the "services as software" shift and it's no longer speculation.
EdTech is living the same story. AI education market is growing rapidly and expected to multiply many times over by 2034. AI-powered platforms are reporting dramatically higher course completion rates. But the real question is this: the gap between "tools that sell seats per student" and "AI-native platforms that sell learning outcomes" is getting more visible every day. Being on the right side of that gap is probably the most critical strategic decision of the next few years.
The Emotional Dimension
There's an emotional dimension to this too. And I think it's the least discussed but most important part.
Read the prologue Karpathy wrote in his autoresearch repo:
"One day, frontier AI research used to be done by meat computers in between eating, sleeping, having other fun, and synchronizing once in a while using sound wave interconnect in the ritual of 'group meeting.' That era is long gone."
He wrote it as a joke. But like every good joke, there's a kernel of truth in it.
I've been building software for 20 years. My brother and I built Flalingo from zero to where it is today without outside investment. Every line of code, every architectural decision, every integration — I built with my own hands. Now, the moments where I create the most value aren't the moments my hands are on the keyboard. They're the moments I'm thinking about how an AI pipeline should behave under certain conditions, designing an agent's decision tree, evaluating output quality and building feedback loops.
This transition isn't easy. Recalibrating 20 years of reflexes is hard. Suppressing the "let me just do it myself, it'll be faster" instinct is hard. But the reality is: the bottleneck isn't model capability anymore. Models are improving at an incredible pace. The bottleneck is the system design around the model. Asking the right question, providing the right context, defining the right evaluation criteria.
That's why I believe the most valuable skill in the next 12 months won't even be "prompt engineering." That's already becoming table stakes. The most valuable skill will be "AI systems design." The ability to decompose a problem into AI-solvable pieces, build the right feedback mechanisms for autonomous agents, and most importantly: deciding what stays with AI and what stays with humans.
A Few Predictions
By the end of 2026, "AI-assisted" will feel outdated. Even "AI-first" won't be enough. The winning companies will be the ones with "AI-autonomous" processes. Every step requiring human intervention will be seen as an inefficiency to be optimized away.
Karpathy's autoresearch approach won't stay confined to software development. Sales, customer success, content production, product management — any job with a measurable metric that can be reasonably efficiently evaluated will become autonomously optimizable by an agent swarm. As Karpathy put it: "It's worth thinking about whether your problem falls into this bucket too."
"Harness engineering" will go mainstream. Just like "DevOps" emerged as its own discipline a decade ago, the practice of designing environments, constraints, and feedback loops for AI systems will become a standalone expertise.
And maybe most importantly: the companies that adopt this fastest won't be the large enterprises. They'll be small, agile, founder-led teams. Because bureaucracy and the "this is how we do things" reflex are the biggest enemies of AI transformation. A startup CTO can change an AI pipeline overnight. At an enterprise company, that takes six months and three committee meetings.
Karpathy woke up to 700 experiments. Shopify's CEO got a 19% improvement from an agent he started before bed. Boris Cherny hasn't written a line of code in months.
These sound like anecdotes. But they're all different faces of the same thing:
The human is no longer the one doing the work. The human is the one defining how the work gets done.
And the ones who define it best will win.
Simple question: What are your agents doing while you sleep?
Hayreddin Tüzel
CTO & Co-Founder @ Flalingo