Mastering Autogen: Build Intelligent AI Agent Workflows End-to-End

🚀 Welcome to the forefront of multi-agent AI development! In this comprehensive tutorial, I will be diving deep into AutoGen, the revolutionary framework from Microsoft that is making it easier than ever to build complex, sophisticated applications using conversational AI agents. We will meticulously cover all the essential concepts and techniques you need to gain a complete and practical understanding of Autogen, empowering you to harness its full potential.

AutoGen v0.4 is a framework for building multi-agent AI systems with an asynchronous, event-driven architecture. Key improvements over previous versions include:

A redesigned Core layer for scalability.
AgentChat, a simplified high-level API for agent interactions.
Extensions for integrating external tools and data.

Figure: Autogen Ecosystem Reference: https://github.com/microsoft/autogen

The architecture is divided into three main sections:

APPS: Pre-built and custom applications.
AG Framework: Core components powering agents.
Developer Tools: Tools for testing and building.

Applications Layer

Magnetic-One
- Purpose: A ready-to-use app showcasing multi-agent collaboration for tasks like group problem-solving.
- Role: Acts as a reference implementation for developers to learn from.
- Code Reference: Check out microsoft/autogen.
Custom Apps
- Purpose: A space for creating your own AI-driven applications.
- Use Cases:
  - Customer support chatbots
  - AI research assistants
  - Automated content creation workflows
- Flexibility: Can connect with LLMs (like GPT-4 or local models) through the AutoGen framework.

AG Framework Layer

Core
- Design: Event-driven, asynchronous system using an actor-style architecture.
- Features:
  - Agents operate independently and respond to events (messages, API calls).
  - Supports parallel execution for scalability.
- Reference: Core Architecture Documentation
AgentChat
- Purpose: Makes building conversational agents easier.
- Features:
  - Predefined agent types like AssistantAgent and UserProxyAgent.
  - Automatically handles message flow and state management.
- Reference: AgentChat Guide
Extensions
- Purpose: Add extra capabilities to agents.
- Types:
  - Official: Connectors for OpenAI/Azure, math tools, etc.
  - Custom: Integrate external APIs like weather data or databases.
- Reference: Extensions Guide

AgentChat v/s Agent Core

Feature	AgentChat	AutoGen Core
Abstraction Level	High	Low
Primary Focus	LLM-driven chat, multi-modal, tools	Agent orchestration, message routing
Complexity	Easier to start	More setup, more control
Tool Integration	Built-in (LangChain, custom tools)	Manual, via handlers
Multi-Agent Teams	Supported (RoundRobinGroupChat, etc.)	Supported, but manual via runtime & handlers
Memory & Context	Built-in support	Must be implemented manually

Now, let’s start with creating the basic AI Agent.

1️⃣ Load Environment Variables

First, we load API keys securely from a .env file.

from dotenv import load_dotenv
load_dotenv(override=True)

2️⃣ Choose Your AI Model

We can use different models as the brain of our assistant. Here’s an example with OpenAI’s GPT-4o-mini and Ollama’s LLaMA 3.2:

from autogen_ext.models.openai import OpenAIChatCompletionClient

from autogen_ext.models.ollama import OllamaChatCompletionClient

model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")
ollama_client = OllamaChatCompletionClient(model="llama3.2")

3️⃣ Define a User Message

Wrap user input in a TextMessage object:

from autogen_agentchat.messages import TextMessage

message = TextMessage(content="I would like to go to Paris", source="user")

4️⃣ Create a Basic Agent

This agent can chat with the user:

from autogen_agentchat.agents import AssistantAgent

agent = AssistantAgent(
    name="airline_agent",
    model_client=model_client,
    system_message="You are a professional and highly knowledgeable assistant for an airline. You provide accurate and detailed information regarding flights, policies, and reservations.",
    model_client_stream=True
)

The agent is your AI assistant that interprets messages and generates responses.

5️⃣ Send Messages to the Agent

from autogen_core import CancellationToken

response = await agent.on_messages([message], cancellation_token=CancellationToken())
print(response.chat_message.content)

This is where the agent “talks back” to the user.

Here, token is passed to the agent.
If at some point you call token.cancel(), the agent stops processing immediately.

Example: Cancelling mid-stream

import asyncio

async def stop_agent_later(token, delay=2):
    await asyncio.sleep(delay)
    token.cancel()
    print("Token cancelled!")

token = CancellationToken()
asyncio.create_task(stop_agent_later(token))

response = await agent.on_messages([message], cancellation_token=token)

After 2 seconds, the token is cancelled.
The agent stops generating text even if it hasn’t finished.

6️⃣ Create a Local Database of Ticket Prices

We’ll store roundtrip ticket prices for different cities.

import sqlite3, os

# --- Database Setup ---
# Delete existing database if it exists
if os.path.exists("tickets.db"):
    os.remove("tickets.db")

# Create database and table
conn = sqlite3.connect("tickets.db")
c = conn.cursor()
# Note: Table name changed from 'cities' to 'destinations'
c.execute("CREATE TABLE destinations (destination_name TEXT PRIMARY KEY, round_trip_cost REAL)")
conn.commit()
conn.close()

# --- Functions ---

# Function to save destination costs (Function name changed)
def save_destination_cost(destination_name, round_trip_cost):
    conn = sqlite3.connect("tickets.db")
    c = conn.cursor()
    # Note: Column names updated
    c.execute("REPLACE INTO destinations (destination_name, round_trip_cost) VALUES (?, ?)", (destination_name.lower(), round_trip_cost))
    conn.commit()
    conn.close()

# Add sample destinations (Sample cities and prices changed)
save_destination_cost("Tokyo", 950)
save_destination_cost("Sydney", 1200)
save_destination_cost("Rio de Janeiro", 800)
save_destination_cost("Cape Town", 1150)
save_destination_cost("Bangkok", 750)
save_destination_cost("Dubai", 680)

# Function to fetch cost (Function name changed)
def get_destination_cost(destination_name: str) -> float | None:
    conn = sqlite3.connect("tickets.db")
    c = conn.cursor()
    # Note: Table and column names updated
    c.execute("SELECT round_trip_cost FROM destinations WHERE destination_name = ?", (destination_name.lower(),))
    result = c.fetchone()
    conn.close()
    return result[0] if result else None

print(get_destination_cost("Tokyo"))  # Example usage
print(get_destination_cost("Sydney"))

7️⃣ Create a Smart Agent With Tools

Now the agent can fetch ticket prices using our database function.

smart_agent = AssistantAgent(
    name="smart_airline_agent",
    model_client=model_client,
    system_message="You are a helpful assistant for an airline. You give short, knowledgable, well researched answers that includes the price of roundtrip ticket.",
    model_client_stream=True,
    tools=[get_destination_cost],
    reflect_on_tool_use=True
)

tools: Functions the agent can call (like fetching city prices).
reflect_on_tool_use: Lets the agent explain how it used tools in its reasoning.

8️⃣ Ask the Smart Agent a Question

response = await smart_agent.on_messages([message], cancellation_token=CancellationToken())

for inner_message in response.inner_messages:
    print(inner_message.content)

print("Final answer:", response.chat_message.content)

You can see both the reasoning steps (inner_messages) and the final answer.

Now, let’s deep dive into Autogen framework and cover the important concepts:

modal messages
structured outputs
LangChain tools
teams

A. Multi-modal Conversation

You can send images + text to an agent and get a descriptive response.

from io import BytesIO
import requests
from PIL import Image
from autogen_core import Image as AGImage, CancellationToken
from autogen_agentchat.messages import MultiModalMessage
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient

# Load an image from the web
url = "https://www.promptingguide.ai/research/rag"
pil_image = Image.open(BytesIO(requests.get(url).content))
img = AGImage(pil_image)

# Create a multi-modal message
multi_modal_message = MultiModalMessage(
    content=["Describe the content of this image in detail", img],
    source="user"
)

# Create the agent
model_client = OpenAIChatCompletionClient(model="gpt-4o-mini")
image_describer = AssistantAgent(
    name="description_agent",
    model_client=model_client,
    system_message="You are good at providing details from the images"
)

# Get the response
response = await image_describer.on_messages([multi_modal_message], cancellation_token=CancellationToken())
print(response.chat_message.content)

Multi-modal messages

In AutoGen, a MultiModalMessage allows you to send more than just text to an agent. It can contain:

Text — instructions, questions, prompts
Images — photos, diagrams, screenshots
(In the future: possibly audio, video, etc., depending on support)
The agent receives a text instruction: "Describe the content of this image in detail"
The agent also receives an image object (img) to process alongside the text.

Does GPT-4o-mini read the image?

Yes, in AutoGen's multi-modal pipeline, gpt-4o-mini can process images sent this way, but only when wrapped in AutoGen’s AGImage object.
GPT-4o-mini itself in the standard OpenAI API cannot directly process raw images — AutoGen provides the glue code that converts the image into a format the model can reason about.

Essentially, AutoGen enables GPT-4o-mini to be “multi-modal” by preprocessing images and sending them alongside text instructions.

B. Structured Outputs

Agents can return structured data instead of plain text.

from pydantic import BaseModel, Field
from typing import Literal

class ImageDescription(BaseModel):
    scene: str = Field(description="Overall scene of the image")
    message: str = Field(description="What the image is trying to convey")
    style: str = Field(description="Artistic style")
    orientation: Literal["portrait", "landscape", "square"] = Field(description="Image orientation")


image_describer = AssistantAgent(
    name="description_agent",
    model_client=model_client,
    system_message="provide description of the image",
    output_content_type=ImageDescription
)

response = await image_describer.on_messages([multi_modal_message], cancellation_token=CancellationToken())
print(response.chat_message.content)

You now get a structured dictionary with fields like scene, message, style, and orientation.

C. Using LangChain Tools

Agents can call external tools like search engines or file managers. AutoGen wraps these into agents.

from autogen_ext.tools.langchain import LangChainToolAdapter
from langchain_community.utilities import GoogleSerperAPIWrapper
from langchain_community.agent_toolkits import FileManagementToolkit
from langchain.agents import Tool
from autogen_agentchat.agents import AssistantAgent
from autogen_ext.models.openai import OpenAIChatCompletionClient
from autogen_core import CancellationToken

# Setup LangChain tools
serper = GoogleSerperAPIWrapper()
search_tool = Tool(name="internet_search", func=serper.run, description="Search the internet")
autogen_search_tool = LangChainToolAdapter(search_tool)

file_tools = [LangChainToolAdapter(t) for t in FileManagementToolkit(root_dir="sandbox").get_tools()]

# Create an agent with these tools
# Agent name changed to "researcher"
agent = AssistantAgent(
    name="researcher", 
    model_client=OpenAIChatCompletionClient(model="gpt-4o-mini"),
    tools=[autogen_search_tool] + file_tools,
    reflect_on_tool_use=True
)

# Prompt changed to a query about tech stocks
prompt = """Find the latest quarterly earnings reports for Tesla and Nvidia. 
Search online, write the key financial highlights for both companies to a file called earnings_summary.txt, then summarize which company had a stronger quarter."""

message = TextMessage(content=prompt, source="user")

result = await agent.on_messages([message], cancellation_token=CancellationToken())

print(result.chat_message.content)

The agent can search online, write to a file, and pick the best option.

C. Team Interactions

AutoGen supports multi-agent collaboration with teams.

from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.teams import RoundRobinGroupChat
from autogen_agentchat.conditions import TextMentionTermination

# Create agents
# Agent name and system message changed
coder = AssistantAgent(
    "coder",
    model_client=model_client,
    tools=[autogen_search_tool], # Retaining search tool for general knowledge lookup
    system_message="You are an expert Python programmer. Your goal is to write clean, efficient, and well-documented code."
)

# Agent name and system message changed
reviewer = AssistantAgent(
    "reviewer",
    model_client=model_client,
    system_message="You are a senior code reviewer. You analyze the provided code for bugs, efficiency, and adherence to best practices. Respond with 'APPROVE' when the code is perfect.",
)

# Termination condition
termination = TextMentionTermination("APPROVE")

# Round-robin team
team = RoundRobinGroupChat([coder, reviewer], termination_condition=termination, max_turns=20)

# Prompt changed
prompt = "Write a Python function to efficiently reverse a list in-place and then explain the time complexity. When satisfied, the reviewer must respond with 'APPROVE'."
result = await team.run(task=prompt)

for msg in result.messages:
    print(f"{msg.source}:\n{msg.content}\n")

Agents collaborate and review each other until the task is approved.

RoundRobinGroupChat in AutoGen

RoundRobinGroupChat is a way to simulate a team of AI agents working together. Instead of just one agent responding, you have multiple agents that can take turns interacting, giving feedback, or performing tasks collaboratively.

It’s like having a mini brainstorming session among agents.

Key Points

Agents take turns responding in a round-robin fashion.
Each agent can see previous messages in the conversation.
Useful for multi-agent workflows:
- Primary agent generates ideas
- Evaluator agent critiques and approves
- You can chain more agents for extra checks, approvals, or modifications

D. Multi-Agent Interaction

In this example, we will simulate three AI agents interacting using AutoGen Core.

       +----------------+
       |   Task: Judge  |
       | "Plan best dish"|
       +----------------+
                |
                v
       +----------------+
       |   Judge Agent  |
       | Collects chef  |
       | suggestions    |
       +----------------+
          /      |      \
         /       |       \
        v        v        v
+-----------+ +-----------+ +-----------+
| Chef1     | | Chef2     | | Chef3     |
| Favorite  | | Favorite  | | Favorite  |
| Dish:     | | Dish:     | | Dish:     |
| Pasta     | | Sushi     | | Tacos     |
+-----------+ +-----------+ +-----------+
         \       |       /
          \      |      /
           v     v     v
        +----------------+
        | Judge Agent    |
        | Picks the best |
        | dish           |
        +----------------+
                |
                v
       +----------------+
       | Final Decision |
       | "Best Dish: X" |
       +----------------+

1. Setup

from dataclasses import dataclass
from autogen_core import AgentId, MessageContext, RoutedAgent, message_handler
from autogen_core import SingleThreadedAgentRuntime
from autogen_agentchat.agents import AssistantAgent
from autogen_agentchat.messages import TextMessage
from autogen_ext.models.openai import OpenAIChatCompletionClient
from dotenv import load_dotenv

load_dotenv(override=True)

SingleThreadedAgentRuntime: This is the local runtime where all agents live and communicate.
AgentId: Unique identifier for each agent.
MessageContext: Context passed to agents when they handle a message.
RoutedAgent: A generic agent class that can handle messages.
load_dotenv: Loads your environment variables (API keys, etc.)
asyncio: To run asynchronous code (all messages are async).

2. Define a simple Message

@dataclass
class Message:
    content: str

Each message has only one field: content.
This is what agents send to each other.

3. Define Chef Agents

Each agent will recommend a dish for the party. We use AssistantAgent to delegate to GPT-4o-mini or another model.

async def chef_response(agent_name: str, dish: str, message: Message, ctx: MessageContext) -> Message:
    response_content = f"{agent_name} suggests: {dish}"
    return Message(content=response_content)

Each chef receives a message and responds with their favorite dish.
Example: "Chef1 suggests: Pasta"

4. Register Chefs

async def register_chefs(runtime: SingleThreadedAgentRuntime):
    async def chef1_handler(message: Message, ctx: MessageContext) -> Message:
        return await chef_response("Chef1", "Pasta", message, ctx)

    async def chef2_handler(message: Message, ctx: MessageContext) -> Message:
        return await chef_response("Chef2", "Sushi", message, ctx)

    async def chef3_handler(message: Message, ctx: MessageContext) -> Message:
        return await chef_response("Chef3", "Tacos", message, ctx)

    await RoutedAgent.register(runtime, "chef1", lambda: RoutedAgent("chef1", handlers={"on_my_message": chef1_handler}))
    await RoutedAgent.register(runtime, "chef2", lambda: RoutedAgent("chef2", handlers={"on_my_message": chef2_handler}))
    await RoutedAgent.register(runtime, "chef3", lambda: RoutedAgent("chef3", handlers={"on_my_message": chef3_handler}))

Registers 3 chefs in the runtime with their handlers.

`RoutedAgent.register`

What it is: A class method used to register an agent type with a runtime.
Why it’s needed:
AutoGen separates agent logic from message delivery. The runtime doesn’t know how your agent works until you register it.
RoutedAgent.register(runtime, agent_type, factory_function) tells the runtime:
- “Here’s a type of agent (agent_type) I want to handle messages.”
- “Use this factory function to create an instance when needed.”
runtime: Where this agent will live.
"chef1": The unique type name for this agent.
lambda: RoutedAgent(...): A factory that returns an instance of the agent.
handlers: Dictionary mapping message types to functions that handle them.

Handlers

In AutoGen Core, a RoutedAgent doesn’t automatically know how to respond to messages. The handler is a function you attach to the agent so it knows what to do when it receives a message.

Think of it like this:

An agent is like a robot.
A message comes in like someone pressing a button.
The handler is the “function” that decides what the robot should do when that button is pressed.

Without a handler, the agent would receive messages but not know how to respond.

"on_my_message" is just a key for the runtime to know which handler to call.

When we send a message, the runtime looks up the handler and runs it.

Judge Agent using LLM

Judge asks all chefs for their suggestions.
Collects responses into chef_suggestions.
Creates an AssistantAgent with GPT-4o-mini to analyze suggestions and pick the best.
Sends the chef suggestions to the LLM and asks for a judgement.
Combines chef suggestions + judge’s decision into a final message.

async def register_judge(runtime: SingleThreadedAgentRuntime):
    async def judge_handler(message: Message, ctx: MessageContext) -> Message:
        # AgentIds for chefs
        chef1_id = AgentId("chef1", "default")
        chef2_id = AgentId("chef2", "default")
        chef3_id = AgentId("chef3", "default")

        # Ask chefs for their suggestions
        response1 = await ctx.runtime.send_message(Message("What's your favorite dish?"), chef1_id)
        response2 = await ctx.runtime.send_message(Message("What's your favorite dish?"), chef2_id)
        response3 = await ctx.runtime.send_message(Message("What's your favorite dish?"), chef3_id)

        # Combine chef suggestions into a single string
        chef_suggestions = f"{response1.content}\n{response2.content}\n{response3.content}"

        # Use LLM to pick the best dish
        model_client = OpenAIChatCompletionClient(model="gpt-4o-mini", temperature=0.7)
        llm_agent = AssistantAgent(
            name="judge_llm",
            model_client=model_client,
            system_message="You are a judge. Choose the best dish among the following chefs."
        )

        prompt_message = TextMessage(
            content=f"The chefs suggested the following dishes:\n{chef_suggestions}\nDecide which dish is the best and explain why briefly.",
            source="user"
        )

        llm_response = await llm_agent.on_messages([prompt_message], ctx.cancellation_token)
        final_decision = f"{chef_suggestions}\n\nJudge's Decision:\n{llm_response.chat_message.content}"
        return Message(content=final_decision)

    await RoutedAgent.register(runtime, "judge", lambda: RoutedAgent("judge", handlers={"on_my_message": judge_handler}))

`AgentId`

What it is: A unique identifier for an agent instance.
Why it’s needed:
In AutoGen, the runtime can have multiple agents of the same type, e.g., chef1, chef2. AgentId lets the runtime know which agent instance to send the message to.

Run the Simulation

async def main():
    runtime = SingleThreadedAgentRuntime()

    # Register chefs and judge
    await register_chefs(runtime)
    await register_judge(runtime)

    runtime.start()

    # Send a message to the Judge
    judge_id = AgentId("judge", "default")
    message = Message(content="Plan the best dish for the dinner party")
    response = await runtime.send_message(message, judge_id)

    print(response.content)

    await runtime.stop()
    await runtime.close()

# Run async main
asyncio.run(main())

Workflow:

Create runtime.
Register all chefs + judge agent.
Start runtime.
Send a message to Judge:
- Judge queries all chefs.
- Collects responses.
- Uses LLM to pick the best dish.
Print final output (all chef suggestions + LLM decision).
Stop and close runtime.

This tutorial has walked you through the essentials of AutoGen, from understanding the core architecture and AgentChat to exploring multi-agent collaboration, structured outputs, tools integration, and real-world examples. By now, you should have a clear picture of how agents are created, communicate, and interact with both LLMs and external tools, as well as how to manage workflows, memory, and multi-modal inputs.

In conclusion, AutoGen equips you with everything needed to build sophisticated, collaborative AI applications. Whether your goal is to create chatbots, research assistants, or custom automation workflows, AutoGen’s flexible framework and rich set of tools empower you to design scalable, intelligent, and responsive agents—unlocking the full potential of AI in your projects.