Artificial intelligence, particularly large language models (LLMs), can leave you wondering whether they're truly reasoning or just regurgitating memorized information. One classic example of this dilemma is the question: "Why are manhole covers round?" Initially, someone had to reason from first principles to answer this question. But now, anyone can look up the answer online, making it hard to distinguish between true reasoning and simple recall.
To tackle this challenge, we can leverage the Brave Search API, a powerful and independent search tool that enhances information retrieval. Brave's search engine covers over 20 billion web pages, offering a less biased alternative to big tech companies. It’s affordable, scales with your needs, and offers 2000 free queries monthly at brave.com/api.
Meet Subbarao Kambhampati
I recently had the pleasure of discussing AI and reasoning with Subbarao Kambhampati, a seasoned professor at Arizona State University. He’s been in the field for over three decades, diving into speech recognition, planning, decision-making, and explainable AI. Lately, he's been scrutinizing the reasoning and planning capabilities of LLMs.
Kambhampati famously said, "LLMs are n-gram models on steroids." These models predict the next word in a sequence based on the previous words, a concept dating back to Claude Shannon’s time. The difference? LLMs like GPT-3.5 use a context of up to 3000 words, compared to the 2- or 3-word context of traditional n-gram models. This vast context results in a staggering number of possible word sequences, which LLMs handle through incredible compression, enabling them to generate fluent, human-like text.
Do LLMs Really Reason?
Despite their impressive language abilities, Kambhampati argues that LLMs don't truly reason. They excel in generating text that mimics human language, but this doesn’t mean they understand the content. Instead, LLMs rely on pattern matching and statistical correlations learned from vast datasets.
For instance, Kambhampati and his team tested GPT-4's planning capabilities using block-stacking problems. While the model showed some improvement over previous versions, it still struggled with more complex scenarios. This limitation became evident when they altered the names of actions in the problem, causing the model’s performance to drop dramatically. This suggests LLMs rely heavily on memorized patterns rather than genuine reasoning.
Formal vs. Natural Languages
Kambhampati also touched on the debate over whether natural language is a formal language. Unlike programming languages, which have strict rules and interpreters, natural language is flexible and context-dependent. This flexibility makes it difficult for LLMs to verify the factual accuracy of their outputs.
An experiment highlighted this issue: GPT-4 was tested on decoding Caesar ciphers (a simple encryption technique). While it excelled at the common ROT13 cipher, it failed with other rotations. This suggests the model's success was due to memorizing common patterns rather than understanding the underlying principles.
Creativity vs. Reasoning
LLMs shine in generating creative ideas, thanks to their ability to mix and match knowledge from diverse sources. However, they falter in reasoning, which requires understanding and verifying new information. Kambhampati pointed out that even in fields like mathematics, creative conjectures need rigorous proofs to be accepted. LLMs can generate plausible hypotheses, but they lack the capability to verify them.
The Future: Combining Strengths
Kambhampati proposes a hybrid approach: using LLMs for their creative potential while relying on external systems for verification and reasoning. This "LLM modulo" framework involves generating ideas with LLMs and then using verifiers to test and refine these ideas. For example, in planning, LLMs can suggest possible solutions, but human experts or specialized algorithms must verify their correctness.
Conclusion
In summary, while LLMs are powerful tools for generating language and creative ideas, they lack true reasoning capabilities. To harness their full potential, we need to combine them with verification systems, ensuring that the solutions they propose are not only plausible but also correct. This hybrid approach can push the boundaries of what AI can achieve, blending the strengths of LLMs with the rigor of traditional reasoning methods.