Chatbots Are Entering Their Stone Age

REPORTHY

For all the bluster about generative artificial intelligence upending the world, the technology has yet to meaningfully transform white-collar work. Workers are dabbling with chatbots for tasks such as drafting emails, and companies are launching countless experiments, but office work hasn’t undergone a major AI reboot.

Perhaps that’s only because we haven’t given chatbots like Google’s Gemini and OpenAI’s ChatGPT the right tools for the job yet; they’re generally restricted to taking in and spitting out text via a chat interface. Things might get more interesting in business settings as AI companies start deploying so-called “AI agents,” which can take action by operating other software on a computer or via the internet.

Anthropic, a competitor to OpenAI, announced a major new product today that attempts to prove the thesis that tool use is needed for AI’s next leap in usefulness. The startup is allowing developers to direct its chatbot Claude to access outside services and software in order to perform more useful tasks. Claude can, for instance, use a calculator to solve the kinds of math problems that vex large language models; be required to access a database containing customer information; or be compelled to make use of other programs on a user’s computer when it would help.

I’ve written before about how important AI agents that can take action may prove to be, both for the drive to make AI more useful and the quest to create more intelligent machines. Claude’s tool use is a small step toward the goal of developing these more useful AI helpers being launched into the world right now.

Anthropic has been working with several companies to help them build Claude-based helpers for their workers. Online tutoring company Study Fetch, for instance, has developed a way for Claude to use different features of its platform to modify the user interface and syllabus content a student is shown.

Other companies are also entering the AI Stone Age. Google demonstrated a handful of prototype AI agents at its I/O developer conference earlier this month, among many other new AI doodads. One of the agents was designed to handle online shopping returns, by hunting for the receipt in a person’s Gmail account, filling out the return form, and scheduling a package pickup.

Google has yet to launch its return-bot for use by the masses, and other companies are also moving cautiously. This is probably in part because getting AI agents to behave is tricky. LLMs do not always correctly identify what they are being asked to achieve, and can make incorrect guesses that break the chain of steps needed to successfully complete a task.

Restricting early AI agents to a particular task or role in a company’s workflow may prove a canny way to make the technology useful. Just as physical robots are typically deployed in carefully controlled environments that minimize the chances they will mess up, keeping AI agents on a tight leash could reduce the potential for mishaps.

Even those early use cases could prove extremely lucrative. Some big companies already automate common office tasks through what’s known as robotic process automation, or RPA. It often involves recording human workers’ onscreen actions and breaking them into steps that can be repeated by software. AI agents built on the broad capabilities of LLMs could allow a lot more work to be automated. IDC, an analyst firm, says that the RPA market is already worth a tidy $29 billion, but expects an infusion of AI to more than double that to around $65 billion by 2027.

source