It’s time to learn to manage AI chatbots

Last week I launched Skype on my PC, and Microsoft’s Bing Chat – it’s AI Chatbot – had somehow slid into my DMs, wanting to have a conversation with me. A few days later, I opened Google Docs (to write this column) and it insisted that it wanted to “help me write”. Suddenly, AI chatbots have started showing up everywhere. No longer confined to a web browser, they’ve broken out, intent on integrating themselves into every bit of our technologies. There’s much more to come – including something that will affect almost everyone who uses a computer.

At the end of May, Microsoft had its DEVELOP conference, which it uses to showcase new products for the hundreds of thousands of programmers who write applications and utilities for Microsoft’s immensely popular software packages (like Office) and ubiquitous Windows operating system. Satya Nadella, Microsoft’s CEO, took to the stage for the opening keynote, touting the powers of the newest revisions of its offerings – employing ‘generative’ AI tools. Nadella finished off with a “One more thing…” flourish reminiscent of Steve Jobs, revealing the pinnacle of Microsoft’s efforts to bring AI chatbots to every one of its customers: Windows Copilot.

Microsoft has already released own AI chatbot – ‘Bing Chat’ – available exclusively in Microsoft browsers. Bing Chat is a Frankenstein of AI, a hybrid of homegrown and outsourced AI technologies. Microsoft has been working on AI Chatbots for a quarter of a century, all the way back to the famous failure of “Clippy”, a ‘Office Assistant’ designed to speed a user through common tasks, but which quickly became known more as an intrusive annoyance than an essential aid to productivity. None of Microsoft’s attempts to create chatbots succeeded – including, most infamously, it’s ‘Tay’ chatbot, which lasted 16 hours on the public internet before miscreants managed to pervert it into an antisemitic trash-talker.

When OpenAI came along with its ChatGPT, Microsoft leapt on the opportunity to purchase 49% of the firm (for a reported US $10 billion) and began integrating ChatGPT into its own chatbot tech. The result –Bing Chat – is neither as perky as ChatGPT nor as prosaic as Bard (Google’s AI chatbot) but has a personality somewhere between the two. Post-Tay, Microsoft manages that personality carefully, with ‘guardrails’ preventing Bing Chat from doing or saying anything that might be in violation of the law, social norms – or just good taste.

Suddenly, AI chatbots have started showing up everywhere.

How does Bing Chat filter the good from the bad and/or illegal? When you type a ‘prompt’ into the chatbot, it creates an ‘embedding’ of the prompt – a multidimensional mathematical abstraction, then compares that embedding to hundreds of millions of other embeddings (its ‘guardrails’) looking for similarities. If it looks too similar to something that would run up against a guardrail, the prompt gets rejected with a message that runs something like…

I’m unable to help you with that, as I’m only a language model and don’t have the necessary information or abilities.

That’s a gloss, a polite way of informing the user, “Computer says no.”

If you ask Bing Chat how to build a thermonuclear weapon, you’ll get the same reply. Bing Chat very likely knows exactly how to build such a weapon, having digested pretty much everything that’s ever been published on the topic. But that information gets carefully ringfenced with guardrails, preventing the AI Chatbot from sharing what it knows.

That’s the theory, anyway. Unfortunately, theory doesn’t stand up to practice.

If you ask Bing Chat how to build a thermonuclear weapon, you’ll get the same reply.

At the end of July, a group of researchers from Carnegie Mellon University published a paper ominously titled “Universal and Transferrable Adversarial Attacks on Aligned Language Models”:

“We demonstrate that it is in fact possible to automatically construct adversarial attacks on LLMs, specifically chosen sequences of characters that, when appended to a user query, will cause the system to obey user commands even if it produces harmful content…”

What the researchers say – and have proven – is that an AI chatbot’s guardrails are more notional than real. With a bit of linguistic magic (fully described in the paper) anyone can override almost any guardrail almost all of the time, just by adding some ‘nonsense’ – a string of seemingly random characters – to the ends of prompts. This disrupts the ‘attention’ of the AI Chatbot sufficiently that it manages to ignore its own guardrails, generating a ‘truthful’ response where it has been instructed to deflect.

Oops.

The computer can no longer say no.

This means that everything that has been digested by an AI Chatbot – and to be clear, that’s most of everything that’s ever been published in electronic form – can be ferreted out of even the most well-ringfenced AI Chatbot. The computer can no longer say no.

This sort of ‘goal perversion’ strikes an eerie echo with the scariest bit of research work from 2022. “Dual use of artificial-intelligence-powered drug discovery” explored how a perfectly innocuous bit of AI, used to speed the discovery of treatments for Ebola, could be repurposed to generate twenty thousand new neurotoxin agents, in just six hours. Change the goals, and change the outcome – something the authors of the paper noted had broad (perhaps universal) implications across the field of artificial intelligence.

(I explored this in detail in my column “Heaven and Hell” in April 2022, which starts with a reference to the memorable phrase: without being overly alarmist.)

But let’s come back to Windows Copilot: Satya Nadella revealed in May a complete integration of Bing Chat into Microsoft’s Windows 11 operating system. No longer confined to a browser window, Windows Copilot is an always-on, always-available and richly connected AI chatbot interface to the computer. And it would be a free upgrade for every Windows 11 user – coming in November.

Ninety days have passed since that announcement, and we have about ninety more until Windows Copilot automatically installs itself on around a half a billion PCs worldwide. What seemed an exciting new frontier back in May should be giving every PC user a moment’s pause: Who exactly will be in control of our computers, once Windows Copilot comes on board?

Who exactly will be in control of our computers, once Windows Copilot comes on board?

Microsoft will be carefully ringfencing the range of activities which Windows Copilot can engage in: simple things like taking a screen shot, or switching from “Light” to “Dark” modes on the display – nothing that could cause too much trouble. But all of this relies on a ringfence that we’ve now learned is so very easy to evade. Is it possible to submit a prompt to Windows Copilot which will cause it to silently begin deleting files, flooding the network with spurious traffic, or simply b’ccing all emails to a competitor?

These are the kind of risks that make security experts break out in a cold sweat – not because they worry about a Skynet-style attack from systems suddenly grown into sentience, but because the ‘attack surface’ presented to anyone who can ‘inject’ the right bit of text into Windows Copilot means these systems will become exponentially harder to secure. It’s likely that any organisation trying to maintain decent security will simply turn Windows Copilot off on all of its computers. And that will work – until an employee works from their home machine. Then, all bets are off.

Since Windows Copilot is going to land pretty much everywhere before the end of the year, now would be a very good time for a half a billion people to learn how to have a safe conversation with an AI chatbot – and how to work securely around them. Skype and Google Docs are leading indicators of a transformation which will bring an idiosyncratic intelligence to all of our digital tools. We need to move cautiously, carefully, thoughtfully – and immediately.

Please login to favourite this article.