This is where you step back and get honest. The hype around AI agents can make it sound like they're ready to run your whole operation tomorrow. The truth is more grounded, and actually more useful: some agent work is already solid. Some is still overhyped. Some is a decade away. Andrej Karpathy, OpenAI's cofounder, puts it plainly. This is the decade of AI agents, not the year. Watch this to understand which camp your candidate task falls into.
What this means for you
The video walks through three use cases, each showing what agents can actually do with today's models.
First: coding assistants. They work well. Code is structured, rules are clear, and right or wrong answers are unambiguous. An agent working inside an IDE doesn't need to navigate messy web interfaces or understand edge cases the way a human does. Pattern matching across millions of examples of working code plays to what current models do best. If your candidate task is writing, reviewing or fixing code, an agent already gives you real utility.
Second: travel booking. This one comes up in every demo and it's impressive on the happy path. Direct flights, standard hotels, straightforward comparisons. But the moment a flight delays, visa rules come in, or a traveller has unusual needs, things fall apart. Agents struggle with edge cases. They can't navigate every airline and hotel website reliably. They can't read a map to understand whether a hotel is really near your meeting venue. And here's the tricky bit: they can't learn your preferences over time, not from a profile you fill out once, but by watching your actual choices and adapting. For now, you'd want a person checking the result closely.
Third: autonomous IT support. Fixing laptop problems autonomously sounds perfect for agents. It's repetitive and often follows patterns. But every machine is unique, every problem could stem from a dozen different causes, and you definitely wouldn't hand full control to an AI that might delete things or install fixes without supervision. Computer use and edge case reasoning just aren't reliable enough yet. This one's aspirational.
The pattern: agents excel where tasks are narrow, structured, and have clear feedback loops. They struggle when real-world messiness means endless edge cases, varied interfaces, or learning from experience over time. For now, you're in the year of agents for well-defined work in structured environments. The broader vision, agents that handle genuine chaos, is still a decade of development away.
Picture a support desk team. Sending templated replies to common tickets? An agent handles that today. But if someone opens a ticket saying "the system's doing that weird thing again" with no context, or a problem that could stem from three different causes depending on the customer's setup, the agent gets stuck. A person still needs to read the ticket, ask clarifying questions, and check what the agent tried before it touches anything real. That's today's honest state: agents help with routine, predictable tasks; people stay in charge of judgment and risk.
Try this
Take the task you identified in Lesson 1. Now ask yourself: is it structured or messy? Does it have clear right and wrong answers, or does it require you to know the edge cases? Does the agent need to navigate wildly different interfaces, or is it working in one consistent system? The more structured and repeatable it is, the sooner you can trust it to work alone. If it's messier, start with the agent showing you its work and a person staying in the loop.
Common questions about the future of AI agents
What is the future of AI agents?
The honest view is that some agent work is already solid, some is still overhyped, and some is roughly a decade away. As Andrej Karpathy, OpenAI's cofounder, puts it, this is the decade of AI agents, not the year, so the broader vision of agents handling genuine chaos is still about a decade of development away. For now, treat agents as useful for narrow, structured work and stay realistic about the messier tasks they cannot yet handle reliably. Take a customer support desk: an agent can send templated replies to common tickets today, but the messy, no-context tickets still need a person.
Will AI agents replace jobs?
Not wholesale replacement. Agents handle routine, predictable tasks well, while people stay in charge of judgment and risk. When a task is messy, full of edge cases, or needs someone to ask clarifying questions, a person still has to read it and check what the agent tried before anything real happens. Think of a support desk: the agent sends the templated replies to common tickets, but when a ticket arrives with no real context, a person reads it, asks the clarifying questions, and checks what the agent tried before anything is touched.
What is the impact of AI agents on work right now?
Today agents work best where tasks are narrow, structured, and have clear feedback loops. They struggle with real-world messiness. The parts that trip them up are:
- Endless edge cases: the unusual situations the happy path never accounts for.
- Wildly varied interfaces: having to navigate many different systems and websites reliably.
- Learning your preferences over time: adapting from your actual choices rather than a profile.
That is why travel booking still needs close supervision, and why fully autonomous IT support is still aspirational rather than something you can rely on. A coding assistant working inside an IDE, by contrast, sits in one structured system and already gives real utility, with people staying in the loop on anything risky or ambiguous.
What should my team do about AI agents now?
Take a task you want to improve and run it through three questions:
- Structured or messy: is the task tidy and repeatable, or full of exceptions?
- Clear right and wrong answers: can you tell good output from bad at a glance?
- One system or many: does the agent work in one consistent system, or navigate wildly different interfaces?
The more structured and repeatable the task, the sooner you can trust an agent to handle it alone. Take a task from your support desk: sending the templated replies to common tickets is structured and repeatable, so you can trust an agent sooner, while a no-context ticket is messier, so start with the agent showing you its work and a person staying in the loop.