Skip to content
deep-dive · 2026.06.16

What Anthropic's 235,000-user AI study means for your rollout

by paul thomas·7 min·1,674 wordsDEEP-DIVE

In June 2026, Anthropic published a study of how people actually use its coding agent: roughly 235,000 users across 400,000 sessions, over seven months. It is, on the surface, a report about software. Read properly, it is one of the clearest pieces of evidence yet on how organisations get value from AI, and it lands on a conclusion most rollouts learn the hard way.

The headline is not that AI replaces skilled people. It is that AI pays the biggest dividends to people who understand the problem in front of them, in any field, whether or not they can code. If you are a CEO deciding how to put AI to work, that changes what you spend money on, how you scale it, and how you keep it safe.

Here is the detailed read, and what to do with it.

The shape of the work: people decide what, the agent decides how

The first finding is about who is in charge. Anthropic separated the decisions in each session into planning (what to do, which approach, what counts as done) and execution (which files to change, what to write, what to run). The split is consistent:

Decision typeMade by the personMade by the AI
Planning (what to build)~70%~30%
Execution (how to build it)~20%~80%

Humans make about 70% of planning decisions; the AI makes about 80% of execution decisions. Recreated from Anthropic's data (2026).

In the report's own words, "people decide what to build, and the agent decides how to build it." The more the person directs the planning, the more work the agent does per instruction: an expert's single prompt triggers around 12 actions and 3,200 words of output, a novice's around 5 actions and 600 words.

What this means for you. AI is a collaborator that executes; the judgment stays with your people. A rollout that treats AI as a self-driving replacement misreads the data. A rollout that treats it as leverage for people who know what they want is reading it right. The collaboration is the strategy, and it has to be designed, not assumed.

Expertise wins, but it is domain expertise, not coding

The study rated each user's expertise at the specific task, and tracked whether sessions succeeded (with hard evidence like passing tests or committed work). More expertise, more success:

User expertiseVerified successAt least partial success
Novice15%77%
Intermediate to expert28 to 33%91 to 92%

Verified and partial success rates rising with user expertise, from novice to expert. Recreated from Anthropic's data (2026).

The striking part is what kind of expertise. When Anthropic inferred users' actual occupations, software professionals reached verified success about 30% of the time, and everyone else about 26%. In sessions that produced code, every one of the ten largest occupation groups landed within seven points of software engineers. Management occupations scored highest of all.

The report's conclusion is blunt: "coding agents are making a coding background less relevant to successful programming," while "the ability to steer Claude toward success comes more from command of a domain than from the ability to write code." People without domain understanding got far less from the same tool.

What this means for you (scalability). You do not need to hire a cadre of technical people to capture this, and you should not gate AI to the IT function. The people best placed to get value from AI across your business are the ones who already understand your business: your finance leads, your operations managers, your clinicians, your lawyers. The scalable move is to put capable AI tools in the hands of your existing domain experts in every function, not to build a central "AI team" that the rest of the company queues behind.

Competence beats mastery, which is where the money is

The gains are front-loaded. Moving someone from novice to intermediate produces a bigger jump in success than moving them from intermediate to expert. The report puts it plainly: "a working grasp of the domain captures most of the benefit, while deep specialisation adds only a bit more beyond that."

What this means for you (cost). This is the most important line in the report for a budget. The cheapest, highest-return investment is not creating a handful of AI specialists. It is getting your whole workforce from novice to competent. Broad, basic capability across many people beats deep expertise in a few, both in impact and in cost per unit of value. If you are weighing an expensive centre of excellence against a programme that lifts everyone to a working grasp, the data favours the second.

Expertise shows up most when things go wrong

The clearest evidence for the value of expertise is in recovery. When sessions hit trouble, experienced users salvaged them and novices gave up:

When a session hit troubleNoviceIntermediate to expert
Still reached verified success4%15%
Abandoned (gave up, nothing produced)19%5 to 7%

Novices abandoned troubled sessions roughly three to four times as often. As the report puts it, "part of the value of expertise appears to be the ability to steer the agent in the right direction."

There is a second safety signal buried in the method. Anthropic distinguished judged success (it looks done) from verified success (there is hard proof it worked). The gap between the two is the gap every AI rollout has to manage: AI output that looks finished but is subtly wrong.

What this means for you (safety). Safety here is not mainly about the model. It is about keeping experienced human judgment in the loop, especially at two moments: reviewing AI output before it counts, and recovering when the AI goes off track. The teams that get burned are the ones who let inexperienced people ship unverified AI work. Practically, that means training your managers and reviewers to assess AI-assisted work, and treating "looks done" as the start of checking, not the end.

The work keeps growing

Over the seven months, what people did with the agent shifted, and the value rose with it:

Work modeOct 2025Apr 2026
Fixing broken code33%19%
Operating software (deploy, run, monitor)14%21%
Writing and data analysis~10%~20%

Sessions fixing code fell from 33% to 19% while operating software and writing and analysis rose, October 2025 to April 2026. Recreated from Anthropic's data (2026).

The estimated value of the average task rose about 27% over the period, with building, operating and fixing tasks all up by roughly a third. People moved from reactive debugging towards end-to-end work.

What this means for you. Whatever your teams attempt with AI this quarter, expect the ambition to climb next quarter. Budget and govern for AI moving up your value chain, not just for it tidying up small tasks. The capability compounds, and so does the value at stake, which raises the cost of getting the human-side controls wrong.

The strategy this points to: collaboration, designed on purpose

Step back and the whole report is about one thing: a working division of labour between human judgment and AI execution. Every finding reinforces it. The tool executes; the person supplies the understanding, the direction and the recovery. Value appears where that collaboration is well set up, and stalls where it isn't.

That is the strategy a CEO should build, and it is not a technology strategy. It is a collaboration strategy. Here is the concrete version.

Your concernWhat the data saysWhat to do
CollaborationPeople plan, AI executes; value follows good directionDefine explicitly, per team, what AI is trusted to execute and what stays a human decision. Make the division of labour a stated thing, not an accident.
ScalabilityDomain experts in every field succeed, not just codersPut AI tools in the hands of your existing experts across all functions. Don't gate it to IT, and don't hire "AI people" to do the thinking for your domain people.
CostCompetence beats mastery; the big gain is novice to intermediateSpend on lifting the whole workforce to a working grasp, not on a small centre of excellence. That is where the return concentrates.
SafetyExperts recover and verify; novices ship and abandonKeep experienced judgment on review and recovery. Train managers to assess AI work. Treat "looks done" as the cue to verify, never the conclusion.
GrowthTasks got more valuable and more end-to-end over timePlan for AI to move up your value chain. Budget and govern for rising ambition, not just small wins.

The bottom line

The most reassuring finding in the report, if you run a business with experienced people, is that AI does not make their experience worthless. It makes it more valuable. The data says success comes from understanding the problem, and that AI rewards the people who have that understanding with more leverage, not less.

That is also why AI rollouts stall where they stall. Not on the model, which is mostly fine, but on whether the organisation puts judgment in the right place and designs the collaboration deliberately. Anthropic now has a quarter of a million sessions' worth of evidence for it. It is not a technology problem. It never was.


Sources

  • Zoe Hitzig, Maxim Massenkoff, Eva Lyubich, Ryan Heller and Peter McCrory, Agentic coding and persistent returns to expertise, Anthropic, 16 June 2026. All figures and quotations are from this report. Charts in this piece are recreated from the report's data and credited accordingly.
// subscribe
One post like this a week.
Free. Unsubscribe in one click.