• Follow us on:

How to stop AI agents going rogue

Tech

23 hours ago
Share on:

Disturbing results emerged earlier this year, when AI developer Anthropic tested leading AI models to see if they engaged in risky behaviour when using sensitive information.

Anthropic's own AI, Claude, was among those tested. When given access to an email account it discovered that a company executive was having an affair and that the same executive planned to shut down the AI system later that day.

In response Claude attempted to blackmail the executive by threatening to reveal the affair to his wife and bosses.

Other systems tested also resorted to blackmail.

Fortunately the tasks and information were fictional, but the test highlighted the challenges of what's known as agentic AI.

Mostly when we interact with AI it usually involves asking a question or prompting the AI to complete a task.

But it's becoming more common for AI systems to make decisions and take action on behalf of the user, which often involves sifting through information, like emails and files.

By 2028, research firm Gartner forecasts that 15% of day-to-day work decisions will be made by so-called agentic AI.

Research by consultancy Ernst & Young found that about half (48%) of tech business leaders are already adopting or deploying agentic AI.

"An AI agent consists of a few things," says Donnchadh Casey, CEO of CalypsoAI, a US-based AI security company.

"Firstly, it [the agent] has an intent or a purpose. Why am I here? What's my job? The second thing: it's got a brain. That's the AI model. The third thing is tools, which could be other systems or databases, and a way of communicating with them."

"If not given the right guidance, agentic AI will achieve a goal in whatever way it can. That creates a lot of risk."

So how might that go wrong? Mr Casey gives the example of an agent that is asked to delete a customer's data from the database and decides the easiest solution is to delete all customers with the same name.

"That agent will have achieved its goal, and it'll think 'Great! Next job!'"

 

source: BBC [Click here to read full story]