Glossary
Everything you need to understand the world of AI agents, from fundamentals to advanced concepts.
The systematic testing and measurement of AI agent performance against defined benchmarks, scenarios, and quality metrics.
The transfer of an ongoing task or conversation from one AI agent to another, including the relevant context needed for the receiving agent to continue seamlessly.
The mechanisms by which an AI agent stores and retrieves information across interactions, enabling it to maintain context, learn from past actions, and build knowledge over time.
The ability to understand what an AI agent is doing and why, through traces, logs, metrics, and visualizations of the agent's decision-making process.
The coordination of multiple AI agents working together on a complex task, including routing, handoffs, shared memory, and workflow management.
The ability of an AI agent to decompose a complex goal into a sequence of actionable steps and execute them in the right order, adapting the plan as new information emerges.
The process of directing incoming requests or subtasks to the most appropriate specialized agent based on the content, intent, or requirements of the task.
The execution environment that runs AI agents, managing the loop of observation, reasoning, and action along with tool execution, memory, and error handling.
The set of practices, mechanisms, and design patterns that ensure AI agents behave reliably, don't cause harm, and operate within defined boundaries.
An approach to AI systems where models operate with agency — making autonomous decisions, using tools, and pursuing goals over multiple steps without constant human direction.
An autonomous software system that uses a large language model to perceive its environment, make decisions, and take actions to achieve specified goals.
An AI agent that can operate independently over extended periods, making decisions, executing tasks, and recovering from errors without human intervention.
A prompting technique where an AI model is guided to break down complex problems into intermediate reasoning steps before arriving at a final answer.
The maximum amount of text (measured in tokens) that a language model can process in a single interaction, including both input and output.
A numerical vector representation of text, images, or other data that captures semantic meaning in a high-dimensional space, enabling similarity comparisons.
The process of further training a pre-trained language model on a specific dataset to improve its performance on particular tasks or domains.
A capability of language models that allows them to generate structured function calls with typed parameters, enabling reliable interaction with external APIs and tools.
Safety mechanisms that constrain AI agent behavior, preventing harmful actions, enforcing policies, and ensuring outputs meet quality and compliance standards.
When an AI model generates information that sounds plausible but is factually incorrect, fabricated, or not grounded in the provided context.
A design pattern where an AI agent pauses and requests human approval or input before taking high-stakes or irreversible actions.
A neural network trained on vast amounts of text data that can understand and generate human language, serving as the reasoning engine for AI agents.
An architecture where multiple specialized AI agents collaborate to accomplish complex tasks, each handling a specific part of the workflow.
The practice of designing and optimizing the instructions given to a language model to achieve desired outputs, including system prompts, few-shot examples, and formatting guidelines.
An attack where malicious input attempts to override an AI agent's instructions, causing it to ignore its system prompt and follow attacker-controlled instructions instead.
A technique that enhances LLM responses by retrieving relevant documents from an external knowledge base and including them in the model's context.
A search technique that finds results based on meaning rather than exact keyword matching, using vector embeddings to understand the intent behind queries.
A technique for constraining LLM responses to follow a specific format or schema, such as JSON, XML, or typed objects, ensuring reliable downstream processing.
The basic unit of text that language models process — roughly corresponding to a word or word fragment — used to measure input length, output length, and API costs.
The ability of an AI agent to invoke external tools — APIs, databases, code interpreters, web browsers — to gather information or take actions in the real world.
A specialized database that stores and efficiently searches high-dimensional vector embeddings, enabling semantic similarity search for AI applications.