Building Human-in-the-Loop Agents with LangGraph
Learn how to build AI agents with human approval steps using LangGraph. This tutorial covers graph state, interrupt mechanics, checkpointing, approval workflows, and patterns for safely deploying agents that handle high-risk actions.
- LangGraph's interrupt_before and interrupt_after primitives let you pause agent execution at any node, collect human feedback, and resume exactly where execution stopped — without losing state or replaying earlier steps.
- Checkpointing with SqliteSaver or PostgresSaver serializes the full graph state including message history, tool call results, and custom annotations, enabling durable pause-resume cycles across process restarts.
- Human-in-the-loop is essential for high-risk actions like financial transactions, customer-facing messages, database mutations, and API calls with side effects — anywhere the cost of an error exceeds the cost of a brief human review.
- Production HITL architectures typically separate the agent runtime from the approval UI using a message queue or webhook pattern, allowing approvals from Slack, email, or a custom dashboard without blocking the agent process.
- You can combine HITL with LangGraph's branching and subgraph features to create tiered approval workflows where low-risk actions execute automatically while high-risk actions route to the appropriate human reviewer.
Why Human-in-the-Loop Matters for AI Agents
Autonomous AI agents are powerful, but autonomy without oversight is a liability. Every production agent eventually encounters a situation where it should not act without human confirmation. A customer service agent drafting a refund for $10,000. A coding agent about to run a database migration. A sales agent sending a pricing proposal to an enterprise client. These are moments where the cost of a mistake far exceeds the cost of pausing for thirty seconds of human review.
Human-in-the-loop (HITL) is not a sign of weakness in your agent architecture — it is a sign of maturity. The most effective agent systems are not fully autonomous or fully manual. They are selectively autonomous: the agent handles routine decisions independently and escalates high-stakes decisions to humans. This mirrors how human organizations actually work. A junior employee handles standard tasks but checks with their manager before committing to anything expensive or irreversible.
LangGraph is currently the best framework for building HITL agents because it treats interruption as a first-class concept rather than an afterthought. Unlike simple chain-based architectures where pausing mid-execution means losing all context, LangGraph's graph-based execution model with persistent checkpointing lets you freeze execution at any node, serialize the complete state, wait for human input (minutes, hours, or days), and resume exactly where you left off.
The key insight is that HITL is not just about adding an approval step. It is about designing your agent's state graph so that human intervention points are natural, well-defined, and minimally disruptive to the overall workflow. You want the agent to do as much autonomous work as possible — gathering information, reasoning through options, preparing a recommended action — and only pause at the moment of commitment. This keeps human reviewers focused on high-value decisions rather than micromanaging every step.
In this tutorial, we will build a complete HITL agent architecture using LangGraph. We will cover the core primitives (interrupt_before, interrupt_after, checkpointers), walk through real approval workflow patterns, and discuss production deployment considerations. By the end, you will have a clear mental model for where HITL fits in your agent design and the technical knowledge to implement it. If you want a deeper dive with hands-on projects, check out our LangGraph Practical Agents Course which covers HITL extensively.
Before diving into implementation, it helps to understand when HITL is worth the added complexity. A good heuristic: if the action is reversible and low-cost, let the agent act autonomously. If the action is irreversible or high-cost, require human approval. Financial transactions above a threshold, external communications to customers or partners, infrastructure changes, data deletions — these all warrant a human checkpoint. Conversely, reading data, generating internal summaries, or performing search queries can safely run without approval.
LangGraph Fundamentals: State, Nodes, and Edges
Before building HITL workflows, you need a solid understanding of how LangGraph structures agent execution. LangGraph models your agent as a directed graph where nodes are functions that transform state and edges define the flow between nodes. This is fundamentally different from LangChain's sequential chain model — graphs can branch, loop, and (critically for HITL) pause at any node.
The foundation of every LangGraph application is the state schema. State is a typed dictionary (typically using TypedDict or a Pydantic model) that carries all information the agent needs across nodes. For a HITL agent, your state typically includes: the conversation message history, any tool call results, the agent's proposed action, a human approval flag, and any metadata needed for routing decisions.
Here is a conceptual state schema for a HITL agent that handles financial operations:
- messages: The full message history using LangGraph's
add_messagesannotation, which automatically handles message deduplication and ordering when state is updated - proposed_action: A structured representation of what the agent wants to do (e.g., "transfer $5,000 from account A to account B") — this is what the human reviewer sees
- approval_status: An enum of
pending,approved, orrejected— set by the human review step - human_feedback: Optional text feedback from the reviewer explaining why they approved or rejected, which the agent can use to adjust its approach
- risk_score: A computed value that determines whether the action needs human review at all — actions below the threshold skip the approval node entirely
Nodes in LangGraph are plain Python functions that take the current state and return a partial state update. LangGraph merges the returned dictionary into the existing state using the annotations you defined. For example, a node that calls the LLM would return {"messages": [ai_message]}, and the add_messages annotation ensures this appends to (rather than replaces) the existing message list.
Edges define transitions between nodes. LangGraph supports three types: normal edges (always transition from A to B), conditional edges (a function examines the state and returns the name of the next node), and the special END node. For HITL, conditional edges are essential. After the human review node, a conditional edge checks approval_status: if approved, route to the execution node; if rejected, route back to the planning node or to END.
The graph compilation step (graph.compile()) is where you attach the checkpointer and specify interrupt points. The compiled graph is an executable that you invoke with an initial state and a thread ID. The thread ID is critical for HITL — it is the key that lets you resume a specific paused execution. Each thread maintains its own independent state history, so you can have thousands of paused agent executions waiting for different human approvals simultaneously. For a comparison of how this architecture differs from other frameworks, see our LangGraph vs CrewAI comparison.
One architectural pattern worth highlighting: separate your reasoning nodes from your action nodes. Have the agent reason about what to do in one node, propose the action in another node (this is where you interrupt), and execute the approved action in a third node. This separation makes your graph cleaner, your interrupts more precise, and your approval UI simpler — the reviewer sees a clean proposed action rather than a stream of reasoning tokens.
LangGraph also supports subgraphs, which are compiled graphs used as nodes within a parent graph. This is powerful for HITL because you can encapsulate complex approval workflows (multi-level approval chains, parallel approvals from multiple reviewers) as subgraphs and compose them into your main agent graph. The parent graph does not need to know the internal structure of the approval subgraph — it just sees a node that takes pending state and returns approved/rejected state.
Interrupt Mechanics: interrupt_before and interrupt_after
LangGraph provides two primary interrupt primitives: interrupt_before and interrupt_after. Both are specified at graph compilation time and accept a list of node names. When execution reaches an interrupted node, the graph serializes its complete state to the checkpointer and raises a GraphInterrupt exception that signals to the calling code that the graph has paused.
interrupt_before pauses execution before the specified node runs. The state at the interrupt point reflects everything that happened up to but not including that node. This is the most common HITL pattern: the agent has done its reasoning, proposed an action, and the graph pauses before the execution node so a human can review the proposed action. If the human approves, you resume and the execution node runs. If the human rejects, you can update the state (e.g., set approval_status to rejected) and resume, routing to a different branch.
interrupt_after pauses execution after the specified node completes. The state includes the output of that node. This is useful when you want the agent to perform an action and then wait for human verification before proceeding. For example, an agent that generates a customer email could run the generation node, interrupt after it, let a human review and optionally edit the generated email in the state, and then resume to send the approved version.
The syntax for specifying interrupts at compilation time is straightforward. You pass them as arguments to graph.compile(): graph.compile(checkpointer=checkpointer, interrupt_before=["execute_action"]). You can specify multiple nodes: interrupt_before=["execute_action", "send_email", "modify_database"]. Every node in the list becomes an interrupt point.
When a graph is interrupted, resuming it requires calling graph.invoke(None, config) or graph.invoke(updated_state, config) with the same thread configuration. Passing None resumes with the existing state unchanged — useful when the human approves without modifications. Passing an updated state lets you modify specific fields before resuming — useful for approval/rejection flags or when the human wants to edit the proposed action.
There is a newer, more flexible API using the interrupt() function directly within node code. Instead of specifying interrupt points at compile time, you call interrupt(value) inside any node function. This pauses execution at that exact point (mid-node, not just at node boundaries) and passes value to the calling code as context for the human reviewer. When the graph resumes, interrupt() returns the value that the human provided. This inline approach is more flexible because you can conditionally interrupt based on runtime state:
- Check the risk score: if it exceeds a threshold, call
interrupt({"proposed_action": action, "risk_score": score}) - The human reviewer sees the proposed action and risk score in their approval UI
- They respond with
{"approved": True, "notes": "Looks good"}or{"approved": False, "reason": "Amount too high"} - The
interrupt()call returns their response, and the node continues executing with that information
This inline interrupt pattern is particularly powerful for dynamic HITL — situations where you cannot know at compile time which actions will need approval. An agent that processes a batch of invoices might need approval for invoices over $10,000 but can auto-approve smaller ones. With compile-time interrupts, you would need separate nodes for different invoice sizes. With inline interrupt(), a single processing node handles both cases elegantly.
One important caveat: interrupt behavior interacts with graph cycles. If your agent has a loop (e.g., an agent loop that repeatedly calls tools), and one of the tool-calling nodes has an interrupt_before, the graph will pause on every iteration of the loop that reaches that node. This is usually what you want (approve each tool call individually), but if you want to approve only the first iteration, you need to manage that with conditional logic or a state flag. Understanding these nuances is essential for designing effective agent workflows.
Checkpointing: Durable State for Pause-Resume Workflows
Checkpointing is what makes LangGraph's HITL actually production-ready. Without checkpointing, a paused graph would lose all state the moment your process restarts. With checkpointing, the complete graph state is serialized to persistent storage after every node execution, enabling pause-resume cycles that span minutes, hours, or even days.
LangGraph provides several built-in checkpointer implementations. MemorySaver stores state in-memory — useful for development and testing but obviously not suitable for production since state is lost on process restart. SqliteSaver stores state in a SQLite database — good for single-server deployments and local development with persistence. PostgresSaver stores state in PostgreSQL — the recommended choice for production deployments because it supports multiple agent processes reading and writing state concurrently.
The checkpointer serializes the complete state at each step, not just deltas. This means you can inspect, replay, or branch from any point in the agent's execution history. For HITL, this has powerful implications: if a human reviewer rejects an action and the agent replans, you still have the full history of the original plan. You can audit exactly what the agent proposed, why it was rejected, and what it did differently on the second attempt.
Each checkpoint is identified by a combination of thread_id and checkpoint_id. The thread_id groups all checkpoints for a single agent execution (one conversation, one task, one workflow run). The checkpoint_id identifies a specific point in that execution. When you resume an interrupted graph, LangGraph loads the latest checkpoint for that thread and continues from there.
For production HITL systems, you should consider several checkpointing patterns:
- TTL (Time-to-Live) on pending approvals: If a human does not approve within a configurable window (e.g., 24 hours), automatically reject the action or escalate to a different reviewer. Implement this with a background job that queries for old pending checkpoints.
- Checkpoint metadata: Attach custom metadata to checkpoints using the
metadatafield in the config. Store the reviewer's identity, the timestamp of the interrupt, the approval channel (Slack, email, dashboard), and any audit trail information. - State size management: Message histories grow over long agent sessions. Use LangGraph's message trimming utilities or implement a summarization node that condenses older messages before checkpointing. Large states increase serialization time and storage costs.
- Concurrent access: PostgresSaver handles concurrent reads safely, but be aware that two processes should not try to resume the same thread simultaneously. Use a locking mechanism (database advisory locks or distributed locks) if your architecture allows multiple workers to pick up approval events.
The checkpoint data model also supports branching. You can create a new thread that starts from a specific checkpoint of an existing thread. This enables "what-if" scenarios: a reviewer could branch the agent's state, modify the proposed action, and run the branch forward to see what would happen — without affecting the original execution. While this is an advanced pattern, it is invaluable for building trust in agent systems where stakeholders want to understand the consequences of approval decisions before committing.
When designing your persistence layer, think about data retention and compliance. Checkpoints contain the full state of your agent, which may include sensitive information: customer data, financial details, internal business logic. Ensure your checkpoint storage meets your organization's data retention policies and compliance requirements (GDPR right to deletion, SOC 2 audit trails, etc.). PostgresSaver makes this manageable since you can apply standard database backup, encryption, and retention policies to the checkpoint tables.
One practical tip: during development, use SqliteSaver with a file-based database rather than MemorySaver. This lets you interrupt a graph, stop your development server, modify code, restart, and resume the interrupted execution. It dramatically speeds up the development cycle for HITL workflows because you do not have to replay the entire conversation from scratch every time you make a code change.
Production Approval Workflow Patterns
Now let us look at concrete architectural patterns for HITL approval workflows. These patterns move beyond the basic "pause and resume" mechanic into production-ready designs that handle real-world requirements like multiple reviewers, escalation chains, and asynchronous approval channels.
Pattern 1: Synchronous In-Process Approval. The simplest pattern. The agent runs in a web server, hits an interrupt, and the server returns the proposed action to the frontend. The user reviews it in the UI and clicks approve/reject. The frontend sends the decision back to the server, which resumes the graph. This works well for interactive applications where a human is actively engaged with the agent — chatbots, copilots, internal tools. The limitation is that the human must be present and responsive. If they close the browser tab, the agent waits indefinitely (unless you implement a TTL).
Pattern 2: Asynchronous Webhook-Based Approval. The agent runs as a background worker. When it hits an interrupt, it publishes an approval request to a message queue (Redis, RabbitMQ, SQS) or directly triggers a webhook. The webhook notifies the reviewer via Slack, email, SMS, or a custom dashboard. The reviewer approves through their preferred channel, which publishes an approval event back to the queue. A worker picks up the event, loads the checkpoint, and resumes the graph. This pattern decouples the agent runtime from the approval interface, making it suitable for workflows where approvals happen asynchronously — sometimes minutes or hours after the request.
Pattern 3: Tiered Approval with Risk Scoring. Not all actions need the same level of scrutiny. This pattern uses a risk scoring node that evaluates the proposed action against configurable rules: transaction amount thresholds, customer tier, action type, time of day, and historical patterns. Low-risk actions (score below threshold) bypass the approval node entirely via a conditional edge. Medium-risk actions route to a single reviewer. High-risk actions route to a multi-approver subgraph that requires sign-off from two or more reviewers before proceeding. The risk scoring logic can itself be an LLM call that classifies the risk, or it can be deterministic rules — the latter is often preferred because it is predictable and auditable.
Pattern 4: Edit-Before-Execute. Instead of a binary approve/reject, the reviewer can modify the agent's proposed action. The agent generates a customer email — the reviewer edits the wording, adjusts the tone, fixes a factual error — and then approves the edited version. Implement this by making the proposed action a mutable field in the state. When resuming after the interrupt, pass the reviewer's edited version as a state update. The execution node uses the (potentially modified) action from state, not the original LLM output. This pattern significantly increases the practical value of HITL because reviewers can fix small issues without triggering a complete replan.
Pattern 5: Escalation Chains. If the primary reviewer does not respond within a time window, escalate to a secondary reviewer. If the secondary does not respond, escalate to a manager or auto-reject with a notification. Implement this outside the graph itself — use a scheduled job that queries for stale pending checkpoints and triggers escalation webhooks. The graph does not need to model the escalation logic; it just stays paused until someone resumes it. This separation of concerns keeps your graph clean and your escalation logic independently testable.
When choosing a pattern, consider the approval latency budget. If your use case requires sub-second approvals (real-time chat), use Pattern 1. If approvals can take minutes (Slack-based workflows), use Pattern 2. If approvals can take hours (email-based enterprise workflows), use Pattern 2 with robust TTLs and escalation (Pattern 5). The beauty of LangGraph's checkpointing is that the graph itself does not care about the latency — it is paused and stateless between interrupt and resume. All the latency management happens in the orchestration layer around the graph.
For a comprehensive look at how these patterns compare across different agent frameworks, our LangGraph vs CrewAI comparison covers the HITL capabilities of each. And for designing the broader workflow around your approval patterns, see our guide on AI agent workflow design.
Architecture Walkthrough: Building a HITL Financial Agent
Let us walk through the architecture of a realistic HITL agent: a financial operations agent that processes expense reports, approves reimbursements, and executes payments. This example ties together all the concepts we have covered and illustrates how they work in a production context.
The agent's graph has seven nodes arranged in a clear pipeline with conditional branches:
- intake_node: Receives the expense report (OCR'd receipt images, amounts, categories, employee info). Uses an LLM to extract structured data and validate it against company policy. Outputs a structured
ExpenseReportobject in state. - policy_check_node: Evaluates the expense against company policies (per-diem limits, approved categories, required receipts). This is deterministic — no LLM needed. Flags any policy violations and computes a compliance score.
- risk_assessment_node: Combines the compliance score with contextual signals (employee history, amount relative to average, unusual patterns) to produce a risk score. Risk below 0.3 routes to auto-approve. Risk between 0.3 and 0.7 routes to manager approval. Risk above 0.7 routes to finance team approval.
- approval_node: This is the HITL interrupt point, specified as
interrupt_before=["approval_node"]. When the graph pauses here, the orchestration layer sends an approval request to the appropriate channel (Slack DM to the manager or a finance team queue). The state contains the full expense report, policy check results, and risk assessment — everything the reviewer needs. - approval_routing_node: A conditional edge after approval. Reads the
approval_statusfrom state. If approved, routes to execution. If rejected, routes to notification (tell the employee). If approved with modifications (reduced amount, different category), updates the expense report in state before routing to execution. - execute_payment_node: Calls the payment API to initiate the reimbursement. Records the transaction ID in state. This node is idempotent — if the graph crashes and resumes, it checks whether the payment was already initiated before sending a duplicate.
- notification_node: Sends confirmation to the employee (approved, rejected, or approved with modifications) and logs the complete workflow for audit purposes.
The checkpointer is PostgresSaver configured with the same database that stores employee records and transaction history. This gives you transactional consistency: the checkpoint update and any database writes happen in the same transaction, preventing inconsistencies if the process crashes between checkpointing and writing business data.
The orchestration layer runs as a separate service with three components: an event consumer that listens for approval events from Slack/email/dashboard and resumes the appropriate graph thread; a TTL monitor that checks for stale pending approvals every 5 minutes and triggers escalations; and a health check that monitors the backlog of pending approvals and alerts if it grows beyond expected thresholds.
For the Slack integration specifically, the flow works as follows: When the graph interrupts, the orchestrator posts a message to the reviewer's Slack DM with the expense details and two buttons (Approve / Reject). The buttons trigger a Slack interaction webhook back to the orchestrator. The orchestrator validates the interaction (correct reviewer, not expired), updates the state with the decision, and resumes the graph. The entire round-trip typically takes 2-15 minutes depending on how quickly the reviewer responds.
This architecture handles several edge cases that naive HITL implementations miss: reviewer unavailability (TTL + escalation), duplicate approvals (the graph ignores resume attempts if it is already past the approval node), process restarts (PostgresSaver persists state across restarts), and concurrent expense reports (each report runs in its own thread with an independent state). The graph itself remains clean and focused on business logic while the orchestration layer handles all the operational complexity.
If you want to build a similar system hands-on, our LangGraph Practical Agents Course includes a complete project that walks through implementing a HITL agent with Slack-based approvals from scratch, including the orchestration layer and deployment configuration.
Best Practices and Common Pitfalls
After building dozens of HITL agent systems, several patterns emerge as consistent best practices and common failure modes. Here is what we have learned about making HITL agents reliable in production.
Keep the approval surface small and clear. The most common mistake is presenting reviewers with too much information. If your approval request shows 50 lines of agent reasoning, message history, and intermediate tool results, reviewers will either rubber-stamp everything (defeating the purpose) or spend so long reviewing that the approval latency kills your workflow's value. Instead, present a concise summary: what the agent wants to do, why, the key data points, and the risk assessment. Store the full context in the checkpoint for audit purposes, but do not dump it on the reviewer.
Make rejection actionable. When a reviewer rejects an action, the agent needs enough information to do something useful. A bare "rejected" flag forces the agent to guess what went wrong. Instead, require reviewers to provide a reason (even if it is selected from a dropdown: "Amount too high", "Wrong category", "Need more documentation", "Policy violation"). Feed this reason back into the agent's state so it can adjust its approach. The best HITL systems create a feedback loop where the agent learns from rejections — not through fine-tuning, but through in-context examples of past rejections that inform future proposals.
Implement idempotent execution nodes. In distributed systems, messages can be delivered more than once. A reviewer might click "Approve" twice, or a network retry might send duplicate resume events. Your execution nodes must be idempotent — executing them twice with the same state should produce the same result as executing them once. For payment nodes, this means checking for existing transactions before initiating new ones. For email nodes, this means deduplication by message ID. LangGraph's checkpointing helps here because you can check the checkpoint history to see if the execution node has already run for this thread.
Test HITL workflows with automated approval. HITL introduces a human dependency that makes automated testing harder. Solve this by abstracting the approval channel behind an interface. In tests, use a mock approver that automatically approves, rejects, or modifies actions based on configurable rules. This lets you test all branches of your approval workflow (approve, reject, edit, timeout, escalation) without actual human involvement. In production, swap in the real Slack/email/dashboard approver.
Monitor approval metrics. Track approval rate, rejection rate, average approval latency, escalation frequency, and reviewer load. A sudden spike in rejections might indicate the agent's prompts need adjustment. Increasing approval latency might indicate reviewer fatigue or staffing issues. Low rejection rates might indicate that your HITL threshold is too conservative — you are wasting human time approving actions that should be auto-approved.
Common pitfalls to avoid:
- Forgetting to handle the timeout case. If no one approves, the graph stays paused forever. Always implement TTLs with clear escalation or auto-rejection policies.
- Serialization issues with complex state. LangGraph checkpointers serialize state using JSON (or pickle for MemorySaver). If your state contains non-serializable objects (database connections, file handles, running tasks), the checkpoint will fail. Keep your state purely data — no live objects.
- Over-interrupting. Adding HITL to every tool call turns your agent into a glorified CLI that requires human confirmation for each step. Be strategic about where you interrupt. The goal is to maximize autonomous useful work between interrupts.
- Ignoring the UX of the approval interface. The reviewer's experience matters enormously. If approvals come via email and require logging into a dashboard to approve, latency will be high and reviewer satisfaction will be low. Meet reviewers where they are — Slack buttons, mobile push notifications, or inline email actions reduce friction dramatically.
- Not versioning your graph. When you deploy a new version of your agent graph, existing paused executions are checkpointed against the old graph structure. If the new graph has different nodes or state schema, resuming old checkpoints may fail. Implement graph versioning and a migration strategy for in-flight executions.
For a broader perspective on evaluating different agent frameworks and their HITL capabilities, see our guide on AI agent workflow design. And if you are deciding between LangGraph and other frameworks for your next project, our detailed LangGraph vs CrewAI comparison covers the practical trade-offs.
FAQ
What is the difference between interrupt_before and interrupt_after in LangGraph?
interrupt_before pauses execution before a node runs, so the state does not include that node's output. interrupt_after pauses after a node completes, so the state includes its output. Use interrupt_before when you want to approve an action before it executes (e.g., approve a payment before it is sent). Use interrupt_after when you want to review a result before the agent continues (e.g., review a generated email before it is sent in the next node).
Can LangGraph HITL agents handle approvals that take hours or days?
Yes. LangGraph's checkpointing persists the complete graph state to durable storage (SQLite or PostgreSQL). The graph can stay paused indefinitely — minutes, hours, or days — and resume exactly where it stopped when the approval comes in. The key is using a persistent checkpointer (not MemorySaver) and implementing TTL/escalation logic for cases where approvals are delayed.
How do I connect LangGraph HITL to Slack for approvals?
The graph itself does not integrate with Slack directly. Instead, build an orchestration layer: when the graph interrupts, your service posts an approval request to Slack using the Slack API with interactive buttons. When a reviewer clicks approve or reject, Slack sends an interaction webhook to your service, which resumes the LangGraph thread with the updated approval state. This separation keeps your graph logic clean and your Slack integration independently testable.
When should I use human-in-the-loop vs fully autonomous agents?
Use HITL for irreversible or high-cost actions: financial transactions, customer-facing communications, database mutations, infrastructure changes, and any action where the cost of an error exceeds the cost of a brief human review. Use fully autonomous execution for reversible, low-cost actions: data retrieval, internal summaries, search queries, and routine operations. Many production systems use tiered approaches where risk scoring determines which path each action takes.
Does human-in-the-loop add significant latency to agent workflows?
It depends on your approval channel and reviewer availability. Synchronous approvals in a chat UI add seconds. Slack-based approvals typically add 2-15 minutes. Email-based approvals can add hours. The key is to design your workflow so that HITL only triggers for actions that genuinely need human review — let the agent handle routine decisions autonomously. Well-designed tiered approval systems minimize total latency by only interrupting when the stakes justify it.