OpenAI Agents SDK 入门指南：从零构建你的第一个 AI 智能体

一、什么是 AI Agent？#

在进入 Agents SDK 之前，我们先回答一个最基础的问题。

普通的 AI 对话 vs AI Agent#

你平时用 ChatGPT 或 Claude 聊天，是这样的：

1
你 → AI → 回答文本

AI 只能动嘴，不能动手。

而 AI Agent 是这样的：

1
你 → AI → 思考 → 调用工具（查数据库、发邮件、读写文件...） → 返回结果

Agent 能感知环境、使用工具、自主决策。 它不再是一个只会”说”的聊天机器人，而是一个能”做事情”的智能助手。

一个类比#

角色	类比	能力
普通 AI 对话	电话客服	只能口头解答问题
AI Agent	上门维修工	能带着工具箱到你家解决问题

二、什么是 OpenAI Agents SDK？#

OpenAI Agents SDK 是 OpenAI 官方推出的轻量级框架，帮你快速构建 Agent 应用。

为什么需要它？#

没有框架时，你需要自己处理：

怎么让 AI 调用工具？
工具调用后怎么把结果喂回给 AI？
AI 说”我要调用工具 A”时，怎么解析这个意图？
多个 Agent 之间怎么协作？

Agents SDK 把这些都封装好了，你只需要关注业务逻辑。

它支持什么语言？#

TypeScript（@openai/agents）
Python（openai-agents）

本文以 TypeScript 为例。

安装#

1
npm install @openai/agents

设置 API Key：

1
export OPENAI_API_KEY="sk-your-key-here"

NOTE
你需要一个 OpenAI API Key。可以在 platform.openai.com 获取。

三、核心概念速览#

先对全局有个印象，后面会逐一展开。

1
┌─────────────────────────────────────────────┐
2
│                 你的应用                      │
3
│                                             │
4
│   ┌─────────┐     ┌─────────┐              │
5
│   │ Agent A │────▶│ Agent B │   Handoff    │
6
│   └────┬────┘     └─────────┘   交接控制权   │
7
│        │                                    │
8
│        │ 调用工具                             │
9
│        ▼                                    │
10
│   ┌──────────┐  ┌──────────┐  ┌────────┐   │
11
│   │ 搜索工具  │  │ 数据库    │  │ MCP    │   │
12
│   └──────────┘  └──────────┘  └────────┘   │
13
│                                             │
14
│   Runner ─── 执行引擎，运行上述循环           │
15
│   Guardrails ─── 安全守卫，检查输入输出       │
16
│   Context ─── 共享数据，在工具间传递          │
17
└─────────────────────────────────────────────┘

概念	一句话解释
Agent	一个配置好的 AI，有名字、指令、工具
Runner	执行引擎，运行 Agent 直到得到最终结果
Tools	Agent 能使用的工具（搜索、数据库、自定义函数等）
Handoffs	Agent 之间的交接，把对话控制权交给另一个 Agent
Guardrails	安全检查，在输入/输出阶段拦截不当内容
Context	共享数据，让所有工具和守卫都能访问

四、Agent —— 你的第一个智能体#

最简示例#

1
import { Agent, Runner } from '@openai/agents';
2

3
// 创建一个 Agent
4
const agent = new Agent({
5
  name: '小助手',
6
  instructions: '你是一个友好的助手，用中文回答问题。',
7
});
8

9
// 运行它
10
const runner = new Runner();
11
const result = await runner.run(agent, '什么是机器学习？');
12

13
console.log(result.finalOutput);

这就是一个完整的 Agent 程序。三步走：

创建 Agent —— 告诉它叫什么、怎么行事
创建 Runner —— 准备执行引擎
运行 —— 传入用户输入，获取结果

Agent 的配置项#

1
const agent = new Agent({
2
  // 必填
3
  name: '助手名称',              // 用于日志和追踪
4

5
  // 推荐填写
6
  instructions: '系统提示词',     // 告诉 Agent 怎么行事
7

8
  // 可选
9
  model: 'gpt-4.1',             // 使用的模型，默认 gpt-4.1
10
  tools: [],                    // 可用的工具列表
11
  handoffs: [],                 // 可交接的 Agent 列表
12
  outputType: undefined,        // 结构化输出 schema
13
  inputGuardrails: [],          // 输入守卫
14
  outputGuardrails: [],         // 输出守卫
15
});

动态指令#

有时候提示词需要根据上下文变化。instructions 可以是一个函数：

1
const agent = new Agent({
2
  name: '客服助手',
3
  instructions: (context) => {
4
    const user = context.getContext('userInfo');
5
    return `你是客服助手。当前用户：${user.name}，会员等级：${user.level}。
6
请根据用户等级提供对应的服务。`;
7
  },
8
});

五、Runner —— 让 Agent 跑起来#

Runner 做了什么？#

Runner 的工作是一个循环：

1
第 1 轮：把用户输入发给 AI
2
         ↓
3
     AI 回复说："我需要调用搜索工具"
4
         ↓
5
     Runner 执行搜索工具，把结果喂回给 AI
6
         ↓
7
第 2 轮：AI 基于搜索结果继续回复
8
         ↓
9
     AI 说："根据搜索结果，答案是..."
10
         ↓
11
     没有更多工具调用 → 结束，返回最终结果

三种运行方式#

1
const runner = new Runner();
2

3
// 1. 同步运行 —— 等待完整结果
4
const result = await runner.run(agent, '你好');
5

6
// 2. 流式运行 —— 实时获取输出（适合打字机效果）
7
const streamResult = runner.runStreamed(agent, '写一首诗');
8
for await (const event of streamResult) {
9
  if (event.type === 'text_delta') {
10
    process.stdout.write(event.data);
11
  }
12
}
13

14
// 3. 单轮运行 —— 只执行一步，不自动循环
15
const turnResult = await runner.runTurn(agent, '你好');

运行参数#

1
const result = await runner.run(agent, userInput, {
2
  context: { userId: '123' },     // 传递给工具的共享数据
3
  maxTurns: 10,                    // 最多循环 10 轮（防止死循环）
4
});

TIP
设置 maxTurns 是个好习惯。如果 Agent 陷入了工具调用的死循环，maxTurns 会强制终止。

六、Tools —— 给 Agent 装备工具#

没有工具的 Agent 只能”纸上谈兵”。加上工具，它就能真正干活了。

SDK 提供 四大类工具：

1. 托管工具（OpenAI 提供的）#

最简单的工具，一行代码启用：

1
import { webSearch } from '@openai/agents/tools';
2

3
const agent = new Agent({
4
  name: '搜索助手',
5
  tools: [webSearch()],   // 启用网页搜索
6
});

可用的托管工具：

工具	用途
`webSearch()`	搜索互联网
`fileSearch(['vector_store_id'])`	在上传的文件中搜索（RAG）
`codeInterpreter()`	执行 Python 代码
`imageGeneration()`	生成图片

2. 自定义函数工具（最常用）#

这是你自己写的函数，Agent 可以调用它：

1
import { tool } from '@openai/agents';
2
import { z } from 'zod';
3

4
// 定义一个天气查询工具
5
const weatherTool = tool({
6
  name: 'get_weather',
7
  description: '获取指定城市的当前天气',
8
  parameters: z.object({
9
    city: z.string().describe('城市名称，如"北京"'),
10
  }),
11
  execute: async ({ city }) => {
12
    // 这里调用真实的天气 API
13
    const res = await fetch(`https://api.weather.com?city=${city}`);
14
    const data = await res.json();
15
    return `${city}当前温度 ${data.temp}°C，${data.condition}`;
16
  },
17
});
18

19
const agent = new Agent({
20
  name: '天气助手',
21
  instructions: '帮用户查天气，回答要简洁。',
22
  tools: [weatherTool],
23
});

关键点：

name 和 description 帮助 AI 理解什么时候该调用这个工具
parameters 使用 Zod 定义参数类型（TypeScript 类型安全）
execute 是实际执行的函数

Zod 是什么？
Zod 是一个 TypeScript 优先的数据校验库。它让你用一种简洁的方式定义数据结构，同时自动获得类型推断。本文所有代码中的 z.object()、z.string() 都来自 Zod。

3. Agent 即工具#

把一个 Agent 当作另一个 Agent 的工具来用：

1
const translateAgent = new Agent({
2
  name: '翻译专家',
3
  instructions: '将任何内容翻译为英文，只输出翻译结果。',
4
});
5

6
const mainAgent = new Agent({
7
  name: '主助手',
8
  instructions: '你是一个万能助手。需要翻译时使用翻译工具。',
9
  tools: [translateAgent.asTool()],  // 把翻译 Agent 当工具用
10
});

4. MCP 服务器工具#

MCP（Model Context Protocol）是一种标准协议，让你的 Agent 接入外部工具服务：

1
import { MCPServerStdio } from '@openai/agents/mcp';
2

3
const fileServer = new MCPServerStdio({
4
  name: '文件服务',
5
  command: 'npx',
6
  args: ['-y', '@modelcontextprotocol/server-filesystem', './data'],
7
});
8

9
const agent = new Agent({
10
  name: '文件助手',
11
  tools: [fileServer],  // Agent 现在可以读写 ./data 目录
12
});

关于 MCP
MCP 是 Anthropic 提出的一种开放协议，让 AI 应用能标准化地接入各种外部工具和数据源。你可以把它理解为”AI 的 USB 接口”——只要工具支持 MCP，Agent 就能直接使用。

七、Handoffs —— Agent 之间的交接#

为什么需要交接？#

想象一个客服系统：

1
用户："我想退货"
2
  → 路由 Agent 判断这是售后问题 → 交接给售后 Agent
3

4
用户："怎么安装？"
5
  → 路由 Agent 判断这是技术问题 → 交接给技术支持 Agent

实现方式#

1
const afterSalesAgent = new Agent({
2
  name: '售后客服',
3
  instructions: '你处理退货、换货、退款等售后问题。',
4
});
5

6
const techAgent = new Agent({
7
  name: '技术支持',
8
  instructions: '你处理产品使用、安装等技术问题。',
9
});
10

11
const triageAgent = new Agent({
12
  name: '路由客服',
13
  instructions: `你是客服路由。根据用户问题选择合适的客服：
14
- 退货/换货/退款 → 交给售后客服
15
- 安装/使用/故障 → 交给技术支持
16
- 其他问题 → 自己回答`,
17
  handoffs: [afterSalesAgent, techAgent],  // 可交接的目标
18
});

Handoff vs Agent-as-Tool#

这两种模式容易混淆，核心区别是谁掌握控制权：

1
Handoff（交接）：
2
  Agent A ──交出控制权──▶ Agent B
3
  后续对话全部由 B 处理
4

5
Manager（Agent-as-Tool）：
6
  Agent A ──调用──▶ Agent B ──返回结果──▶ Agent A
7
  A 始终掌控全局，B 只是 A 的"工具人"

	Handoff 交接	Manager 模式（asTool）
控制权	转交给目标 Agent	主 Agent 保持控制
后续对话	目标 Agent 全权处理	主 Agent 决定下一步
适合场景	不同领域分工明确	需要协调多个结果

什么时候用哪个？

用 Handoff：每个 Agent 负责完全不同的领域，用户进来后不需要再回到主 Agent
用 Manager：主 Agent 需要综合多个 Agent 的结果做决策

八、Guardrails —— 安全守卫#

Agent 有工具就能干活，但干的事安不安全？Guardrails 就是安全检查员。

三种守卫位置#

1
用户输入 ──▶ [输入守卫] ──▶ Agent 处理 ──▶ [输出守卫] ──▶ 返回结果
2
                                    │
3
                              [工具守卫]（调用工具前后）

输入守卫：检查用户输入#

1
import { guardrail } from '@openai/agents';
2

3
const noHarmfulContent = guardrail({
4
  name: '有害内容检测',
5
  execute: async (input) => {
6
    const isHarmful = await checkContent(input);
7
    if (isHarmful) {
8
      return { tripwire: true, reason: '输入包含不当内容' };
9
    }
10
    return { tripwire: false };
11
  },
12
});
13

14
const agent = new Agent({
15
  name: '安全助手',
16
  inputGuardrails: [noHarmfulContent],
17
});

当守卫触发（tripwire: true）时，Runner 会抛出 GuardrailTripwireTriggered 异常，阻止执行继续。

输出守卫：检查 Agent 回复#

1
const noSecretLeak = guardrail({
2
  name: '敏感信息检查',
3
  execute: async (output) => {
4
    if (containsSecret(output)) {
5
      return { tripwire: true, reason: '输出包含敏感信息' };
6
    }
7
    return { tripwire: false };
8
  },
9
});
10

11
const agent = new Agent({
12
  name: '安全助手',
13
  outputGuardrails: [noSecretLeak],
14
});

工具守卫：检查工具调用#

直接在工具定义中添加校验：

1
const dbTool = tool({
2
  name: 'query_database',
3
  description: '查询数据库',
4
  parameters: z.object({ sql: z.string() }),
5
  execute: async ({ sql }) => {
6
    return executeSQL(sql);
7
  },
8
  // 工具守卫：执行前检查 SQL
9
  validateInput: async (input) => {
10
    if (input.sql.toLowerCase().includes('drop')) {
11
      throw new Error('禁止执行 DROP 操作');
12
    }
13
    if (input.sql.toLowerCase().includes('delete') && !input.sql.includes('where')) {
14
      throw new Error('DELETE 必须带 WHERE 条件');
15
    }
16
  },
17
});

WARNING
Guardrails 不是万能的。它们是最后一道防线，不应该替代其他安全措施（如数据库权限控制、API 认证等）。

九、Context —— 跨工具共享数据#

问题场景#

你的 Agent 有多个工具，它们都需要知道”当前用户是谁”：

1
工具 A（查订单）需要 userId
2
工具 B（查余额）需要 userId
3
工具 C（发通知）需要 userId

总不能每个工具都让用户提供一遍吧？

解决方案：RunContext#

1
// 1. 定义上下文类型
2
interface AppContext {
3
  userId: string;
4
  userName: string;
5
  permissions: string[];
6
}
7

8
// 2. 工具中访问上下文
9
const orderTool = tool({
10
  name: 'query_orders',
11
  description: '查询当前用户的订单',
12
  parameters: z.object({ keyword: z.string().optional() }),
13
  execute: async ({ keyword }, context) => {
14
    // context 的类型自动推断为 RunContext<AppContext>
15
    console.log(`为用户 ${context.userId} 查询订单`);
16

17
    if (!context.permissions.includes('order_read')) {
18
      throw new Error('无权限查看订单');
19
    }
20

21
    return queryOrders(context.userId, keyword);
22
  },
23
});
24

25
// 3. 运行时传入上下文
26
const result = await runner.run(agent, '查看我的订单', {
27
  context: {
28
    userId: 'user_123',
29
    userName: '小明',
30
    permissions: ['order_read', 'balance_read'],
31
  },
32
});

Context 本质上就是依赖注入——在运行时注入共享数据，所有工具和守卫都能访问。

十、结构化输出#

有时候你不想要一段文字，而是要一个结构化的数据对象。

使用 outputType#

1
import { z } from 'zod';
2

3
const analysisAgent = new Agent({
4
  name: '情感分析',
5
  instructions: '分析用户文本的情感倾向。',
6
  outputType: z.object({
7
    sentiment: z.enum(['正面', '负面', '中性']),
8
    confidence: z.number().min(0).max(1),
9
    keywords: z.array(z.string()),
10
    summary: z.string(),
11
  }),
12
});
13

14
const result = await runner.run(analysisAgent, '这个产品太棒了，推荐购买！');
15

16
// result.finalOutput 的类型是：
17
// { sentiment: '正面', confidence: 0.95, keywords: ['棒', '推荐'], summary: '...' }
18
console.log(result.finalOutput.sentiment);  // "正面"

TIP
使用 outputType 后，SDK 会在内部自动做 JSON 解析和类型校验。如果 AI 返回的格式不对，会自动重试。

十一、状态管理 —— 让 Agent 记住上下文#

默认情况下，每次 runner.run() 都是独立的。但真实应用需要多轮对话。

方法一：手动传递历史（最灵活）#

1
const result1 = await runner.run(agent, '我叫小明');
2
// Agent 回复："你好小明！"
3

4
const result2 = await runner.run(agent, '我叫什么名字？', {
5
  history: result1.history,  // 把上一轮的对话历史传进去
6
});
7
// Agent 回复："你叫小明。"

方法二：使用 Session（最简单）#

1
import { Session } from '@openai/agents';
2

3
const session = new Session();
4

5
// Session 自动管理对话历史
6
await session.run(agent, '我叫小明');
7
await session.run(agent, '我叫什么名字？');  // 它记得你叫小明

两种方法怎么选？#

场景	推荐
快速原型、简单应用	Session
需要精确控制历史内容	手动传递 `history`
需要持久化到数据库	手动传递，自己存取

十二、实战：构建一个多 Agent 客服系统#

把前面学到的所有知识串起来，做一个完整的示例。

需求描述#

构建一个智能客服系统，能够：

自动路由到不同的专业 Agent
查询订单和天气
校验用户权限
返回结构化的回复

完整代码#

1
import {
2
  Agent,
3
  Runner,
4
  tool,
5
  guardrail,
6
} from '@openai/agents';
7
import { z } from 'zod';
8

9
// ==========================================
10
// 1. 定义上下文类型
11
// ==========================================
12
interface AppContext {
13
  userId: string;
14
  permissions: string[];
15
}
16

17
// ==========================================
18
// 2. 定义工具
19
// ==========================================
20
const queryOrders = tool({
21
  name: 'query_orders',
22
  description: '查询用户订单',
23
  parameters: z.object({
24
    status: z.enum(['全部', '待发货', '已发货', '已完成']).optional(),
25
  }),
26
  execute: async ({ status }, context) => {
27
    if (!context.permissions.includes('order_read')) {
28
      return '错误：无权限查看订单';
29
    }
30
    // 模拟查询
31
    return `用户 ${context.userId} 的${status ?? '全部'}订单：订单 #001（已发货）、订单 #002（待发货）`;
32
  },
33
});
34

35
const getWeather = tool({
36
  name: 'get_weather',
37
  description: '查询城市天气',
38
  parameters: z.object({
39
    city: z.string().describe('城市名称'),
40
  }),
41
  execute: async ({ city }) => {
42
    return `${city}今天晴，25°C，适合出行`;
43
  },
44
});
45

46
// ==========================================
47
// 3. 定义守卫
48
// ==========================================
49
const inputGuard = guardrail({
50
  name: '输入检查',
51
  execute: async (input) => {
52
    if (input.length > 500) {
53
      return { tripwire: true, reason: '输入过长，请控制在 500 字以内' };
54
    }
55
    return { tripwire: false };
56
  },
57
});
58

59
// ==========================================
60
// 4. 定义专业 Agent
61
// ==========================================
62
const orderAgent = new Agent<AppContext>({
63
  name: '订单客服',
64
  instructions: '你处理订单相关问题，如查询订单、物流追踪等。用简洁友好的语气回答。',
65
  tools: [queryOrders],
66
});
67

68
const infoAgent = new Agent<AppContext>({
69
  name: '信息助手',
70
  instructions: '你处理生活信息类问题，如天气查询等。用简洁的语气回答。',
71
  tools: [getWeather],
72
});
73

74
// ==========================================
75
// 5. 定义路由 Agent（主入口）
76
// ==========================================
77
const routerAgent = new Agent<AppContext>({
78
  name: '智能客服',
79
  instructions: `你是智能客服的路由中心。根据用户问题选择合适的客服：
80
- 订单/物流/退货相关 → 交给订单客服
81
- 天气/生活信息 → 交给信息助手
82
- 简单问候/闲聊 → 自己回答
83
请快速判断，不要犹豫。`,
84
  handoffs: [orderAgent, infoAgent],
85
  inputGuardrails: [inputGuard],
86
  outputType: z.object({
87
    answer: z.string(),
88
    routedTo: z.string().optional(),
89
  }),
90
});
91

92
// ==========================================
93
// 6. 运行
94
// ==========================================
95
async function main() {
96
  const runner = new Runner();
97

98
  // 场景 1：订单查询
99
  const result1 = await runner.run(routerAgent, '我的订单到哪了？', {
100
    context: { userId: 'user_001', permissions: ['order_read'] },
101
  });
102
  console.log('场景 1:', result1.finalOutput);
103
  // { answer: '...', routedTo: '订单客服' }
104

105
  // 场景 2：天气查询
106
  const result2 = await runner.run(routerAgent, '北京今天天气怎么样？', {
107
    context: { userId: 'user_001', permissions: ['order_read'] },
108
  });
109
  console.log('场景 2:', result2.finalOutput);
110

111
  // 场景 3：简单问候
112
  const result3 = await runner.run(routerAgent, '你好！', {
113
    context: { userId: 'user_001', permissions: ['order_read'] },
114
  });
115
  console.log('场景 3:', result3.finalOutput);
116
}
117

118
main();

运行流程解析#

以场景 1 “我的订单到哪了？” 为例：

1
1. Runner 检查输入守卫 → 通过
2
2. Router Agent 收到消息 → 判断这是订单问题
3
3. Router Agent 触发 Handoff → 交接给订单客服
4
4. 订单客服收到消息 → 决定调用 query_orders 工具
5
5. Runner 执行 query_orders → 查询数据库 → 返回结果
6
6. 订单客服基于结果生成回复
7
7. Runner 检查输出守卫 → 通过
8
8. 返回结构化结果 { answer: '...', routedTo: '订单客服' }

十三、总结与学习路径#

核心概念一图总结#

1
Agent（智能体）
2
  ├── instructions  ── 灵魂：告诉它怎么做事
3
  ├── tools         ── 双手：让它能干活
4
  ├── handoffs      ── 社交：让它能交接
5
  ├── guardrails    ── 底线：让它别乱来
6
  ├── outputType    ── 模具：让输出有结构
7
  └── context       ── 记忆：让工具共享数据
8

9
Runner（执行引擎）
10
  ├── run()         ── 同步运行
11
  ├── runStreamed() ── 流式运行
12
  └── runTurn()     ── 单步运行
13

14
Tools（工具）四大类
15
  ├── 托管工具      ── OpenAI 提供的
16
  ├── 函数工具      ── 你自己写的
17
  ├── Agent-as-Tool ── 拿 Agent 当工具
18
  └── MCP 工具      ── 接入外部服务
19

20
状态管理
21
  ├── history       ── 手动管理
22
  └── Session       ── 自动管理

进阶方向#

掌握基础后，可以继续探索：

Streaming：实时流式输出，适合打字机效果的 UI
Tracing：可视化 Agent 的调用链，用于调试和监控
Voice Agent：RealtimeAgent 实现实时语音交互
Sandbox Agent：在隔离环境中安全执行代码
MCP 生态：接入越来越多的 MCP 工具服务器

一、什么是 AI Agent？#

普通的 AI 对话 vs AI Agent#

一个类比#

二、什么是 OpenAI Agents SDK？#

为什么需要它？#

它支持什么语言？#

安装#

三、核心概念速览#

四、Agent —— 你的第一个智能体#

最简示例#

Agent 的配置项#

动态指令#

五、Runner —— 让 Agent 跑起来#

Runner 做了什么？#

三种运行方式#

运行参数#

六、Tools —— 给 Agent 装备工具#

1. 托管工具（OpenAI 提供的）#

2. 自定义函数工具（最常用）#

3. Agent 即工具#

4. MCP 服务器工具#

七、Handoffs —— Agent 之间的交接#

为什么需要交接？#

实现方式#

Handoff vs Agent-as-Tool#

八、Guardrails —— 安全守卫#

三种守卫位置#

输入守卫：检查用户输入#

输出守卫：检查 Agent 回复#

工具守卫：检查工具调用#

九、Context —— 跨工具共享数据#

问题场景#

解决方案：RunContext#

十、结构化输出#

使用 outputType#

十一、状态管理 —— 让 Agent 记住上下文#

方法一：手动传递历史（最灵活）#

方法二：使用 Session（最简单）#

两种方法怎么选？#

十二、实战：构建一个多 Agent 客服系统#

需求描述#

完整代码#

运行流程解析#

十三、总结与学习路径#

核心概念一图总结#

推荐学习路径#

进阶方向#

参考资源#