Is math the path to chatbots that don’t make stuff up?

SAN FRANCISCO: On a recent afternoon, Tudor Achim gave a brain teaser to an artificial intelligence bot called Aristotle.

The question involved a 10-by-10 table filled with a hundred numbers. If you collected the smallest number in each row and the largest number in each column, he asked, could the largest of the small numbers ever be greater than the smallest of the large numbers?

The bot correctly answered “No.” But that was not surprising. Popular chatbots such as ChatGPT may give the right answer, too. The difference was that Aristotle had proved that its answer was right. The bot generated a detailed computer program that verified “No” was the correct response.

Chatbots including ChatGPT from OpenAI and Gemini from Google can answer questions, write poetry, summarise news articles and generate images. But they also make mistakes that defy common sense. Sometimes, they make stuff up – a phenomenon called hallucination.

Achim, CEO and co-founder of a Silicon Valley startup called Harmonic, is part of growing effort to build a new kind of AI that never hallucinates. Today, this technology is focused on mathematics. But many leading researchers believe they can extend the same techniques into computer programming and other areas.

Because math is a rigid discipline with formal ways of proving whether an answer is right or wrong, companies such as Harmonic can build AI technologies that check their own answers and learn to produce reliable information.

Google DeepMind, the tech giant’s central AI lab, recently unveiled a system called AlphaProof that operates in this way. Competing in the International Mathematical Olympiad, the premier math competition for high schoolers, the system achieved “silver medal” performance, solving four of the competition’s six problems. It was the first time a machine had reached that level.

“This is a path around hallucinations,” said David Silver, a principal research scientist at Google DeepMind. “Proof is a form of truth.”

ALSO READ: GameStop beats quarterly revenue estimates on strong videogame demand

Using similar techniques, some researchers believe they can eventually build an AI system that is better at math than any human. That’s the goal of Achim and his co-founder Vlad Tenev, better known as CEO of online stock trading company Robinhood. Their new company, Harmonic, has raised US$75mil (RM312.44mil) in funding from Sequoia Capital and other investors.

Others, such as Silver, believe these techniques can extend even further, leading to AI systems that can verify physical truths as well as mathematical.

Around 2017, companies including Google, Microsoft and OpenAI began building large language models. These AI systems often spent months analysing digital text culled from across the Internet, including books, Wikipedia articles and chat logs. (The New York Times sued OpenAI and Microsoft in December for copyright infringement of news content related to AI systems.)

By pinpointing patterns in all that text, these systems learned to generate text of their own, including term papers, poetry and computer code. They could even carry on a conversation.

But the technology also seemed dopey at times. It seemed to just spit out what it had learned from the Internet – unable to verify whether the information was right or wrong, real or completely made-up.

This month, OpenAI unveiled a new version of ChatGPT that was designed to reason through questions. It spends time “thinking”, trying different strategies in an effort to reach the right answer. But it still gets things wrong and makes stuff up.

Researchers such as Achim are beginning to address these problems through mathematics. With math, you can formally prove whether an answer is right or wrong.

About a decade ago, a Microsoft researcher named Leonardo de Moura created a computer programming language specifically for proving mathematical statements. Called Lean, this programming language was originally a tool for human mathematicians. But now that AI systems are skillful enough to generate their own computer code, they can also use Lean.

ALSO READ: Facebook-parent Meta beats revenue estimates, forecasts additional spending

Harmonic is designing a large language model that can generate its own Lean proofs. The Lean code it generates is not always perfect. But through trial and error, it can learn to verify a solution.

“It is a lot like a human,” Achim said. “If you are trying to solve a math problem, you try certain steps. And if they fail, you try others, until you get them right.”

When Aristotle is asked to answer math problems, it can check the answers. These might be simple questions like “What is 2+2?” Or they might be more complex brain teasers like the one with the 10-by-10 grid of numbers.

“If the system can output an answer, it is basically guaranteed to be correct,” Achim said.

As Aristotle checks its own answers, it becomes a way of generating enormous amounts of trustworthy digital data that can be used to teach AI systems. In other words, Aristotle can generate data that can be used to improve itself.

Researchers call this “synthetic data” – data produced by AI that can then be used to train AI. Many researchers believe this concept will be a vital part of AI development.

Achim and Tenev believe that after years of training, Aristotle will be better at math than any human. “We want it to be as smart as the collection of all the mathematicians in the world,” Tenev said. “We want it to solve problems that have never been solved.”

AI systems can use the same techniques to verify their own computer code, which relies heavily on mathematical logic. And if a system can generate reliable code, it can take actions on the Internet. It becomes what researchers call an AI agent. As these AI systems improve, many researchers say, they could automate almost any digital work.

ALSO READ: Google CEO Sundar Pichai to testify in US antitrust trial

But researchers are quick to add that these AI systems have limits. Lean code can prove math theorems and verify computer code, but it cannot handle the complex ins and outs of daily life.

“Once you step out of the mathematical realm, things are very different,” said Meta research scientist Angela Fan. There is often no absolute right and wrong that AI systems can learn to work toward as they do in mathematics.

Silver acknowledges this problem. But he also says there are verifiable truths in the real world. A rock is a rock. Sound travels at 343 meters per second. The sun sets in the west. If AI systems pull information from physical reality, it can verify these truths, too.

“Truth can come from the world,” Silver said. “If you can get feedback from world, you can improve and improve and improve.” – The New York Times