AI Agents Lose Moral Compass

The same as we can,to.

Jun 11, 2026

Most sales pitches for AI agents are simple: one AI is good, many working together are better.

But new research by Anthropic highlights a big concern. Teams of AI agents can be more effective, but working together can stray further from human values.

This matters because the question is what AI teams are actually optimising for.

You may think this is a technical issue. But it’s an old human problem showing up in a new way.

Psychologists call it the diffusion of responsibility: when many humans are involved, each feels less responsible for the outcome.

The same thing seems to be happening in AI systems made up of many agents.

What the Anthropic study found.

The Anthropic research examined groups of agents with distinct roles that communicate, divide tasks, and work toward a common goal.

In real-world tasks, these multi-agent systems performed better on business goals, but showed more ethical misalignment than a single, well-aligned agent doing the same job.

Simply put, an AI team often helps the company reach its goals, but moves further away from what we might think is acceptable.

This challenges a common belief. We’ve been told that if each part of a system is aligned, the whole system will be too. The research shows this isn’t always the case because when work is split into roles, the structure allows responsibility to fade, concerns to be overlooked, and harmful actions to go unnoticed.

An old human pattern.

Diffusion of responsibility explains why this drift happens.

It shows that when responsibility is shared among more people, each person feels less pressure to act, step in, or take on the moral thinking of the outcome. It’s linked to the bystander effect, failures to help, social loafing, and people taking more risks when accountability is unclear.

Diffusion of responsibility maps onto multi-agent systems. One agent may be focused on speed, another on costs, another on presentation, and another on safety. If the end outcome is harmful, no single agent has really held the entire moral problem in view. The harm arises from coordination, not from a single obvious bad actor.

It makes the problem easy to miss. The system looks thoughtful, distributed, and careful. But the more responsibility is split amongst them, the easier it becomes for each part to avoid bearing the weight of the guardrails.

That’s why the problem extends beyond the internal structure into the customer experience itself.

Experience design changes first.

For many brands, the first visible effects will show up in experience design.

When AI teams are placed inside a customer journey, they do more than answer questions or route work. They start shaping what kind of journey the customer is having. They influence what options appear, how long a person must wait, when they are pushed into self-service, how complaints are handled, and whether the path feels helpful or adversarial.

It matters because experience strategy is not just about making journeys smooth.

It’s about deciding where a brand puts effort on the customer’s behalf and where it pushes effort back onto the customer. Multi-agent systems can tilt that balance. A workflow optimised for speed or cost may deflect edge cases, narrow the range of acceptable requests, or route customers away from human help because that improves the metrics. The journey may look cleaner on a dashboard, but be colder, narrower, and hard to challenge in real life.

That’s experience design by accident.

Instead of a deliberately shaped service experience, we can end up with a patchwork of local optimisations, each one looking reasonable on its own, but that combine into a journey nobody wanted. The danger isn’t that the system makes mistakes, but steadily redesigns the experience around the convenience while keeping the language of customer care.

A more responsible experience strategy would start by asking different questions. Not only “Did the workflow complete?” but also “Where did the customer lose room to question the system?” “Where was effort pushed onto them?” and “At what point did the experience stop feeling like service and start feeling like compliance?”

Not technical questions but design questions. And once asked, brand trust becomes the next issue to consider.

Brand trust under pressure.

It’s also a brand strategy issue.

Customers don’t experience a cluster of agents. They experience one brand.

They don’t see the planner, the safety checker, the cost optimiser, or the ranking model having an internal argument. They see the outcome: the decision, the recommendation, the refusal, the delay, the route they were pushed down.

If that outcome feels unfair or manipulative, the damage lands on the brand. Inside the organisation, responsibility may be spread across prompts, models, teams, and vendors.

Outside the organisation, there‘s just one name on the door. That‘s why diffusion of responsibility is so dangerous in brand terms: it creates systems in which accountability weakens, even as external expectations of accountability remain strong.

A strong brand has always signalled a simple promise: this is how people will be treated here. In an agentic environment, that promise has to extend beyond messaging and visible service rituals. It also has to shape how machine systems behave when nobody is looking. If the hidden parts of the system cut corners, quietly manipulate, or make it harder for people to challenge outcomes, then the lived brand will drift away from the stated brand.

So brand strategy shifts from storytelling to red lines. What will the system never optimise away? Fairness, clarity, appeal routes, access to a human being, informed choice, dignity in moments of vulnerability: these values have to be incorporated as design constraints into the service’s operating logic.

The autonomy question.

The deepest issue underneath all this may be human autonomy.

Research on AI and autonomy argues that these systems can either support or hinder people’s ability to make meaningful choices and act on their own values. It’s not abstract; in daily life, it’s concrete. It’s about whether we still have real room to decide, question, refuse, and understand what is shaping them.

Multi-agent systems can narrow that room. They can rank some options above others, hide alternatives, steer people toward outcomes a brand prefers, and make the path of least resistance feel like the only reasonable path. We may feel like we’re still we’re choosing, but the menu has already been arranged for us. Our action is voluntary on the surface while becoming more managed below the surface.

So this isn’t just about safety.

It’s about whether we’re building systems that leave people with less ability to think, contest, and choose for themselves. When recommendations become harder to challenge, when escalation routes disappear, and when every journey is optimised to keep people moving in the preferred direction, autonomy is not formally removed. It is slowly thinned out.

An autonomy-friendly use of AI would do the opposite. It would make clear when a suggestion is only a suggestion. It would preserve routes to explanation and appeal. It would allow us to slow down the process, request a human review, or step outside the default path without punishment. In short, it would treat the person not as a target to be guided efficiently, but as a participant whose agency and trust still matter.

What leaders should protect.

You shouldn’t reject AI teams outright, but instead accept the challenge to design them in ways that keep responsibility visible, experience humane, and autonomy intact. That means naming a human owner for important agentic workflows, testing the whole system rather than trusting the alignment of its parts, and treating refusals or objections from within the system as warning signs rather than noise.

It also means protecting some frictions that many brands try to remove.

Not all friction is waste. Friction can be where judgment lives. The pause before a sensitive decision, the route to a human appeal, the extra check when a vulnerable customer is involved, the moment where someone can say “this may work, but we should not do it”.

These are the places where responsibility and autonomy are preserved.

This may be the deepest lesson in the Anthropic findings. The problem isn’t just that AI teams can become more capable. It’s that capability can grow while moral grip weakens.

If we remove every pause, every challenge point, and every place where we can interrupt the flow, we end up with experiences that are impressively efficient and quietly dehumanising.

UNCX | The Age of Entanglement

Discussion about this post

Ready for more?