Chatbots Are Entering Their Stone Age

Anthropic and other big AI startups are teaching chatbots “tool use,” to make them more useful in the workplace.
Photograph of hand tools flying through the air with a yellow and blue overlay effect
Photo-illustration: WIRED Staff; Dual Dual/Getty Images

For all the bluster about generative artificial intelligence upending the world, the technology has yet to meaningfully transform white-collar work. Workers are dabbling with chatbots for tasks such as drafting emails, and companies are launching countless experiments, but office work hasn’t undergone a major AI reboot.

Perhaps that’s only because we haven’t given chatbots like Google’s Gemini and OpenAI’s ChatGPT the right tools for the job yet; they’re generally restricted to taking in and spitting out text via a chat interface. Things might get more interesting in business settings as AI companies start deploying so-called “AI agents,” which can take action by operating other software on a computer or via the internet.

Anthropic, a competitor to OpenAI, announced a major new product today that attempts to prove the thesis that tool use is needed for AI’s next leap in usefulness. The startup is allowing developers to direct its chatbot Claude to access outside services and software in order to perform more useful tasks. Claude can, for instance, use a calculator to solve the kinds of math problems that vex large language models; be required to access a database containing customer information; or be compelled to make use of other programs on a user’s computer when it would help.

I’ve written before about how important AI agents that can take action may prove to be, both for the drive to make AI more useful and the quest to create more intelligent machines. Claude’s tool use is a small step toward the goal of developing these more useful AI helpers being launched into the world right now.

Anthropic has been working with several companies to help them build Claude-based helpers for their workers. Online tutoring company Study Fetch, for instance, has developed a way for Claude to use different features of its platform to modify the user interface and syllabus content a student is shown.

Other companies are also entering the AI Stone Age. Google demonstrated a handful of prototype AI agents at its I/O developer conference earlier this month, among many other new AI doodads. One of the agents was designed to handle online shopping returns, by hunting for the receipt in a person’s Gmail account, filling out the return form, and scheduling a package pickup.

Google has yet to launch its return-bot for use by the masses, and other companies are also moving cautiously. This is probably in part because getting AI agents to behave is tricky. LLMs do not always correctly identify what they are being asked to achieve, and can make incorrect guesses that break the chain of steps needed to successfully complete a task.

Restricting early AI agents to a particular task or role in a company’s workflow may prove a canny way to make the technology useful. Just as physical robots are typically deployed in carefully controlled environments that minimize the chances they will mess up, keeping AI agents on a tight leash could reduce the potential for mishaps.

Even those early use cases could prove extremely lucrative. Some big companies already automate common office tasks through what’s known as robotic process automation, or RPA. It often involves recording human workers’ onscreen actions and breaking them into steps that can be repeated by software. AI agents built on the broad capabilities of LLMs could allow a lot more work to be automated. IDC, an analyst firm, says that the RPA market is already worth a tidy $29 billion, but expects an infusion of AI to more than double that to around $65 billion by 2027.

Adept AI, a company cofounded by David Luan, formerly VP of engineering at OpenAI, has been honing AI agents for office work for more than a year. Adept is cagey about who it works with and what its agents do, but the strategy is clear.

“Our agents are already in the 90s [percent] for reliability for our enterprise customers,” Luan says. “The way we did that was to limit the scope of deployment a bit. All the new research we do is to improve reliability for new use cases that we don't yet do well on."

A key part of Adept’s plan is to train its AI agents to be better at understanding the goal at hand and the steps required to achieve it. The company hopes that will make the technology flexible enough to help out in all kinds of workplaces. “They need to understand the reward of the actual task at hand,” Luan says. “Not just have the ability to copy existing human behavior.”

The core capabilities needed to make AI agents more useful are also necessary to advance on the grander vision of making machine intelligence more powerful. Right now, the ability to make plans to achieve specific goals is a hallmark of natural intelligence that is notably lacking in LLMs.

It may be an extremely long time before machines attain humanlike intelligence, but the concept of tool use being crucial is evocative given the evolutionary path of Homo sapiens. In the natural world, prehuman hominids began handling crude stone tools for tasks such as cutting animal hides. The fossil record shows how increasingly sophisticated tool use blossomed alongside advancing intelligence, as humans’ dexterity, bipedalism, vision, and brain size progressed. Maybe now it’s time for one of humankind’s most sophisticated tools to develop tool use of its own.