Join our daily and weekly newsletters for the latest updates and exclusive content on industry-leading AI coverage. Learn More
The introduction of ChatGPT has brought large language models (LLMs) into widespread use across both tech and non-tech industries. This popularity is primarily due to two factors:
LLMs as a knowledge storehouse: LLMs are trained on a vast amount of internet data and are updated at regular intervals (that is, GPT-3, GPT-3.5, GPT-4, GPT-4o, and others);
Emergent abilities: As LLMs grow, they display abilities not found in smaller models.
Does this mean we have already reached human-level intelligence, which we call artificial general intelligence (AGI)? Gartner defines AGI as a form of AI that possesses the ability to understand, learn and apply knowledge across a wide range of tasks and domains. The road to AGI is long, with one key hurdle being the auto-regressive nature of LLM training that predicts words based on past sequences. As one of the pioneers in AI research, Yann LeCun points out that LLMs can drift away from accurate responses due to their auto-regressive nature. Consequently, LLMs have several limitations:
Limited knowledge: While trained on vast data, LLMs lack up-to-date world knowledge.
Limited reasoning: LLMs have limited reasoning capability. As Subbarao Kambhampati points outLLMs are good knowledge retrievers but not good reasoners.
No Dynamicity: LLMs are static and unable to access real-time information.
To overcome LLM's challenges, a more advanced approach is required. This is where agents become crucial.
Agents to the rescue
The concept of intelligent agent in AI has evolved over two decades, with implementations changing over time. Today, agents are discussed in the context of LLMs. Simply put, an agent is like a Swiss Army knife for LLM challenges: It can help us in reasoning, provide means to get up-to-date information from the Internet (solving dynamicity issues with LLM) and can achieve a task autonomously. With LLM as its backbone, an agent formally comprises tools, memory, reasoning (or planning) and action components.
Tools enable agents to access external information -- whether from the internet, databases, or APIs -- allowing them to gather necessary data.
Memory can be short or long-term. Agents use scratchpad memory to temporarily hold results from various sources, while chat history is an example of long-term memory.
The Reasoner allows agents to think methodically, breaking complex tasks into manageable subtasks for effective processing.
Actions: Agents perform actions based on their environment and reasoning, adapting and solving tasks iteratively thro ...