Alright, folks, Alex Petrov here, back on agntai.net. Today, I want to talk about something that’s been nagging at me, something I’ve seen a lot of folks trip over when they start building serious AI agents. It’s not about the latest LLM or some fancy new retrieval method. It’s about something far more fundamental, and honestly, a bit more mundane: agent state management in multi-step workflows.
Yeah, I know. Not the sexiest topic. But hear me out. We’re all excited about agents that can do complex tasks, right? Plan a trip, summarize a research paper and then draft an email based on it, or even debug code. The moment you move beyond a single prompt-response cycle, you’re dealing with an agent that needs to remember things, track progress, and adapt. And that’s where state management becomes crucial. If you mess this up, your agent will feel dumb, repetitive, or just plain broken.
I learned this the hard way, like most things in engineering. I was working on a prototype for a client last year – a research assistant agent. The idea was simple: give it a topic, it finds papers, summarizes them, then synthesizes a short report. My initial approach was… let’s just say optimistic. I had a main orchestrator loop calling different tools (search, summarizer, report writer) and passing data around as arguments. It worked for simple cases. But then the client wanted to add a step: “If you find conflicting information, ask for clarification.” Or “If the report is too long, try to condense it.”
Suddenly, my beautiful, simple argument passing became a tangled mess. My orchestrator was trying to keep track of what had been searched, what had been summarized, whether clarification was needed, what the user said in response, and so on. It was like trying to juggle five balls while riding a unicycle. My agent kept forgetting previous steps, repeating searches, or getting stuck in loops. It was embarrassing. That’s when I realized: this isn’t just about passing data; it’s about maintaining a coherent understanding of the agent’s journey through a task.
The Problem: Why Simple Argument Passing Fails
When you start, it’s natural to think of an agent as a series of function calls. You call `search_tool(query)`, get results, then call `summarize_tool(results)`, get summary, and so on. The state, in this naive view, is just whatever variables you’re passing around. This works great for linear, predictable workflows.
But real-world agent tasks are rarely linear. They involve:
- Conditional Logic: “If X happens, do Y; otherwise, do Z.”
- Iteration/Loops: “Keep trying to do X until condition Y is met.”
- User Interaction: “Ask the user for input, then adapt based on their response.”
- Tool Failures/Retries: “If tool A fails, try tool B or retry A.”
- Partial Progress: An agent might complete part of a task, save its work, and resume later.
In these scenarios, simply passing a dict of arguments from one function to the next quickly becomes unmanageable. You end up with deeply nested function calls, or a sprawling orchestrator function with hundreds of lines of `if/else` statements trying to manage every permutation of data. It’s a fast track to spaghetti code and bugs that are incredibly hard to trace.
The Solution: Explicit State Machines or Task Graphs
The core idea is to externalize the agent’s “memory” of its current situation, its past actions, and its future intentions. Instead of implicitly managing state through function arguments, we make it an explicit, inspectable part of the agent’s architecture.
There are two main patterns I’ve found particularly effective, depending on the complexity of your workflow:
- Finite State Machines (FSMs): Great for agents with well-defined, discrete states and transitions.
- Task Graphs/DAGs (Directed Acyclic Graphs): Better for more complex, dynamic workflows where sub-tasks might run in parallel or have more intricate dependencies.
Let’s dive into FSMs first, as they’re often a good starting point and surprisingly powerful.
Pattern 1: Finite State Machines (FSMs) for Agent Orchestration
An FSM defines a set of possible states an agent can be in (e.g., `INITIAL`, `SEARCHING`, `SUMMARIZING`, `AWAITING_CLARIFICATION`, `REPORT_DRAFTED`, `FINISHED`). It also defines the transitions between these states, triggered by events or conditions.
The beauty of an FSM is that at any given moment, your agent is in *one* specific state, and that state dictates what actions it can take and what transitions are valid. This immediately brings order to chaos.
When I refactored that research assistant agent, I ended up defining states like:
- `IDLE`: Waiting for a new research request.
- `PLANNING_RESEARCH`: Deciding on search queries.
- `EXECUTING_SEARCH`: Running search queries using a tool.
- `PROCESSING_SEARCH_RESULTS`: Filtering, ranking, or identifying key documents.
- `SUMMARIZING_DOCUMENTS`: Generating summaries for each document.
- `SYNTHESIZING_REPORT`: Combining summaries into a draft report.
- `AWAITING_CLARIFICATION`: If conflicting info is found, or report is too long, waiting for user input.
- `REFINING_REPORT`: Based on user clarification or internal checks.
- `PRESENTING_REPORT`: Final report ready.
- `ERROR`: Something went wrong.
Each state had a clear job. And crucially, I had a central `AgentState` object that held all the data relevant to the current task: the initial query, search results, individual summaries, the draft report, any clarification needed, and so on.
Practical Example: A Simplified Research Agent FSM
Let’s sketch out a barebones FSM using Python. We don’t need a heavy library for simple cases, but for more complex ones, `transitions` is excellent.
from enum import Enum
import time
class ResearchState(Enum):
IDLE = "idle"
PLANNING_RESEARCH = "planning_research"
EXECUTING_SEARCH = "executing_search"
PROCESSING_RESULTS = "processing_results"
SUMMARIZING = "summarizing"
SYNTHESIZING_REPORT = "synthesizing_report"
AWAITING_CLARIFICATION = "awaiting_clarification"
REFINING_REPORT = "refining_report"
FINISHED = "finished"
ERROR = "error"
class ResearchAgent:
def __init__(self):
self.state = ResearchState.IDLE
self.current_query = None
self.search_results = []
self.summaries = {}
self.draft_report = None
self.clarification_needed = False
self.user_input = None
def transition(self, new_state: ResearchState):
print(f"Transitioning from {self.state.value} to {new_state.value}")
self.state = new_state
def start_research(self, query: str):
if self.state != ResearchState.IDLE:
print("Agent is busy. Cannot start new research.")
return
self.current_query = query
self.transition(ResearchState.PLANNING_RESEARCH)
self._run_workflow()
def provide_clarification(self, user_input: str):
if self.state != ResearchState.AWAITING_CLARIFICATION:
print("Not awaiting clarification.")
return
self.user_input = user_input
self.clarification_needed = False # Clear the flag
self.transition(ResearchState.REFINING_REPORT)
self._run_workflow() # Resume workflow
def _run_workflow(self):
# This is the core loop that drives state transitions based on current state
while self.state != ResearchState.FINISHED and self.state != ResearchState.ERROR:
try:
if self.state == ResearchState.PLANNING_RESEARCH:
print(f"Agent planning research for: {self.current_query}")
# In a real agent, this would involve LLM calls or tool usage
time.sleep(1)
self.transition(ResearchState.EXECUTING_SEARCH)
elif self.state == ResearchState.EXECUTING_SEARCH:
print(f"Executing search for '{self.current_query}'...")
# Simulate search tool
self.search_results = [f"Doc A about {self.current_query}", f"Doc B about {self.current_query}"]
time.sleep(2)
self.transition(ResearchState.PROCESSING_RESULTS)
elif self.state == ResearchState.PROCESSING_RESULTS:
print("Processing search results...")
# Simulate processing
time.sleep(1)
self.transition(ResearchState.SUMMARIZING)
elif self.state == ResearchState.SUMMARIZING:
print("Summarizing documents...")
for doc in self.search_results:
self.summaries[doc] = f"Summary of {doc}"
time.sleep(2)
self.transition(ResearchState.SYNTHESIZING_REPORT)
elif self.state == ResearchState.SYNTHESIZING_REPORT:
print("Synthesizing draft report...")
# Simulate report synthesis
self.draft_report = f"Report on {self.current_query}:\n" + "\n".join(self.summaries.values())
time.sleep(3)
# Introduce conditional logic: does the report need clarification?
if "complex" in self.current_query.lower() and not self.user_input:
self.clarification_needed = True
self.transition(ResearchState.AWAITING_CLARIFICATION)
else:
self.transition(ResearchState.FINISHED)
elif self.state == ResearchState.AWAITING_CLARIFICATION:
print("Awaiting user clarification. Please call provide_clarification().")
return # Exit loop, waiting for external input
elif self.state == ResearchState.REFINING_REPORT:
print(f"Refining report based on user input: '{self.user_input}'")
self.draft_report += f"\n(Refined based on: {self.user_input})"
time.sleep(1)
self.transition(ResearchState.FINISHED)
except Exception as e:
print(f"An error occurred: {e}")
self.transition(ResearchState.ERROR)
break
if self.state == ResearchState.FINISHED:
print("\nResearch complete!")
print(f"Final Report:\n{self.draft_report}")
elif self.state == ResearchState.ERROR:
print("\nResearch failed.")
# --- Usage ---
print("--- Agent Run 1: Simple ---")
agent = ResearchAgent()
agent.start_research("quantum computing")
print("\n--- Agent Run 2: With Clarification ---")
agent_clarify = ResearchAgent()
agent_clarify.start_research("complex implications of AI ethics")
# At this point, the agent is in AWAITING_CLARIFICATION state
# We need to simulate user input
if agent_clarify.state == ResearchState.AWAITING_CLARIFICATION:
agent_clarify.provide_clarification("Focus on societal impact, less on technical details.")
Notice how the `_run_workflow` method is essentially the FSM’s “engine.” It looks at the current state and decides what to do next. The `AgentState` object (represented by `self` in `ResearchAgent`) explicitly holds all the context. If the agent needs to wait for user input, it simply stops processing and returns, then resumes when `provide_clarification` is called.
Pattern 2: Task Graphs for Complex, Dynamic Workflows
While FSMs are great for sequential workflows with clear states, they can become unwieldy if your agent needs to dynamically generate sub-tasks, run things in parallel, or if the dependencies between steps are more fluid. Imagine an agent that needs to:
- Research multiple sub-topics simultaneously.
- Generate code, then run tests, then debug if tests fail (potentially multiple times).
- Coordinate with other agents.
For these scenarios, a task graph (often a Directed Acyclic Graph or DAG) is a better fit. Here, each node in the graph represents a specific task or sub-task. Edges represent dependencies (Task B cannot start until Task A is complete).
Key components of a task graph approach:
- Task Definition: Each task has an ID, a status (pending, running, complete, failed), input requirements, and output.
- Task Scheduler/Orchestrator: Continuously checks for tasks that are ready to run (all dependencies met, not yet started).
- Global Context/Knowledge Base: A central store where tasks can read inputs from and write outputs to. This replaces the single `AgentState` object in FSMs, allowing for more granular data sharing.
My debug agent uses this pattern. When it encounters a bug, it doesn’t just go into a “debugging” state. It generates a series of tasks: “analyze error logs,” “propose fix,” “generate test case,” “apply fix,” “run test.” If a test fails, it might add new tasks like “re-analyze code with new test results” or “search for similar errors online.” The graph grows and shrinks dynamically.
Practical Example: Simplified Task Graph Structure
This is more about structure than executable code, as a full task graph implementation is quite involved.
import uuid
from typing import Dict, Any, List, Optional
class TaskStatus(Enum):
PENDING = "pending"
RUNNING = "running"
COMPLETED = "completed"
FAILED = "failed"
class AgentTask:
def __init__(self, name: str, description: str, dependencies: Optional[List[str]] = None):
self.id = str(uuid.uuid4())
self.name = name
self.description = description
self.status = TaskStatus.PENDING
self.dependencies = dependencies if dependencies is not None else []
self.inputs: Dict[str, Any] = {} # Data needed to start the task
self.outputs: Dict[str, Any] = {} # Data produced by the task
self.error_message: Optional[str] = None
def __repr__(self):
return f"Task(id={self.id[:4]}..., name='{self.name}', status={self.status.value})"
class AgentWorkflowManager:
def __init__(self):
self.tasks: Dict[str, AgentTask] = {}
self.global_context: Dict[str, Any] = {} # Central data store
def add_task(self, task: AgentTask):
if task.id in self.tasks:
raise ValueError(f"Task with ID {task.id} already exists.")
self.tasks[task.id] = task
def get_ready_tasks(self) -> List[AgentTask]:
ready_tasks = []
for task_id, task in self.tasks.items():
if task.status == TaskStatus.PENDING:
all_deps_met = True
for dep_id in task.dependencies:
if dep_id not in self.tasks or self.tasks[dep_id].status != TaskStatus.COMPLETED:
all_deps_met = False
break
if all_deps_met:
ready_tasks.append(task)
return ready_tasks
def execute_task(self, task_id: str):
task = self.tasks.get(task_id)
if not task:
raise ValueError(f"Task {task_id} not found.")
if task.status != TaskStatus.PENDING:
print(f"Task {task.name} is not pending. Current status: {task.status.value}")
return
task.status = TaskStatus.RUNNING
print(f"Executing task: {task.name}...")
try:
# Simulate actual task execution (e.g., calling an LLM, tool, or function)
# Task would read inputs from global_context or its own `inputs` dict
# And write results to its `outputs` dict
time.sleep(1)
task.outputs['result'] = f"Output for {task.name}"
self.global_context[task.name.replace(' ', '_').lower() + '_output'] = task.outputs['result'] # Update global context
task.status = TaskStatus.COMPLETED
print(f"Task {task.name} completed.")
except Exception as e:
task.status = TaskStatus.FAILED
task.error_message = str(e)
print(f"Task {task.name} failed: {e}")
def run_workflow(self):
while True:
ready_tasks = self.get_ready_tasks()
if not ready_tasks:
if all(t.status in [TaskStatus.COMPLETED, TaskStatus.FAILED] for t in self.tasks.values()):
print("All tasks processed.")
break # All tasks are either done or failed
print("No tasks ready to run. Waiting for dependencies or user input...")
time.sleep(0.5) # Prevent busy-waiting
continue
for task in ready_tasks:
self.execute_task(task.id)
# --- Usage ---
manager = AgentWorkflowManager()
# Define some tasks
task1 = AgentTask("Search Papers", "Find relevant research papers.")
task2 = AgentTask("Summarize Paper A", "Summarize paper A.", dependencies=[task1.id])
task3 = AgentTask("Summarize Paper B", "Summarize paper B.", dependencies=[task1.id]) # Can run in parallel with task2
task4 = AgentTask("Synthesize Report", "Combine summaries into a report.", dependencies=[task2.id, task3.id])
manager.add_task(task1)
manager.add_task(task2)
manager.add_task(task3)
manager.add_task(task4)
print("Starting workflow...")
manager.run_workflow()
print("\nFinal Global Context:")
print(manager.global_context)
In this simplified manager, the `run_workflow` method acts as the scheduler, pulling tasks that are ready. A real implementation would involve more sophisticated dependency management, possibly task priority, and a more structured `global_context` or even a database for persistent state. The key is that tasks are explicit, their dependencies are clear, and their progress is tracked.
Actionable Takeaways for Your Next Agent Project
- Don’t skip state management: It’s not an afterthought. Design how your agent will track its progress, data, and decisions from the very beginning.
- Choose the right pattern:
- For linear or mostly sequential workflows with clear branching, start with a Finite State Machine. It’s often simpler to implement and reason about.
- For dynamic, parallel, or highly conditional workflows where tasks can be generated on the fly, consider a Task Graph/DAG. It offers more flexibility but comes with more complexity.
- Externalize your state: Don’t rely solely on local variables. Create a dedicated `AgentState` object (for FSMs) or a `GlobalContext` (for task graphs) that holds all relevant information. This makes debugging easier and allows for pausing/resuming agents.
- Make transitions explicit: For FSMs, ensure state transitions are clearly defined and triggered by specific events or conditions. Avoid implicit state changes.
- Handle failures gracefully: Integrate error states and retry mechanisms into your state management. If a tool call fails, how does your agent react? Does it retry, switch to another tool, or transition to an `ERROR` state?
- Keep it inspectable: Being able to look at your agent’s current state and its history of actions is invaluable for debugging and understanding its behavior. Log state changes!
- Consider persistence: For long-running or critical agents, think about how you’ll save and load the agent’s state (e.g., to a database or file system). This allows your agent to survive restarts or be picked up by another instance.
My hope is that by talking about this, you can avoid some of the headaches I’ve had. Building truly capable agents isn’t just about crafting clever prompts; it’s about building robust engineering foundations. And explicit state management is a huge part of that foundation. Go build something cool!
đź•’ Published: