壁纸神器
122.67M · 2026-04-06
摘要: 当所有人都在关注模型能力时,真正的差异化正在向下移动到 Harness 层。本文从 OpenHarness 等开源项目出发,系统讲解 Agent Harness 的核心架构、关键组件和工程实践,帮助你理解"model decides what, harness handles how safely"的深层含义。
2026 年的 AI 开发生态出现了一个有趣的现象:模型能力趋同,但 agent 体验差异巨大。同样的 Qwen3.5-Plus 或 Kimi K2.5,在不同的 agent 系统中表现判若两人。
根因在于:模型只提供 intelligence,harness 才提供 hands、eyes、memory 和 safety boundaries。
Agent = Model (intelligence) + Harness (infrastructure)
Harness Engineering 的核心价值:
所有 agent harness 共享同一核心模式:
while True:
# 1. 调用模型,流式响应
response = await api.stream(messages, tools)
# 2. 检查是否完成
if response.stop_reason != "tool_use":
break # 模型完成任务
# 3. 执行工具调用
for tool_call in response.tool_uses:
# Permission check → Hook → Execute → Hook → Result
result = await harness.execute_tool(tool_call)
messages.append(tool_results)
# 4. 循环继续 - 模型看到结果,决定下一步
关键设计点:
工具是 harness 的"手"。典型分类:
| 类别 | 工具示例 | 关键能力 |
|---|---|---|
| File I/O | Read, Write, Edit, Glob, Grep | 权限检查、路径沙箱 |
| Shell | Bash | 命令白名单、超时控制 |
| Search | WebSearch, WebFetch | 速率限制、结果缓存 |
| Agent | Subagent, SendMessage | 隔离执行、结果聚合 |
| Task | TaskCreate, TaskStop | 后台生命周期管理 |
| MCP | MCPTool, ListResources | 协议适配、资源发现 |
工程要点:
生产级 harness 必须有多级权限控制:
{
"permission": {
"mode": "default",
"path_rules": [{"pattern": "/etc/*", "allow": false}],
"denied_commands": ["rm -rf /", "DROP TABLE *"]
}
}
三级权限模式:
| 模式 | 行为 | 使用场景 |
|---|---|---|
| Default | 写/执行前询问 | 日常开发 |
| Auto | 允许所有 | 沙箱环境、CI/CD |
| Plan Mode | 阻止所有写入 | 大型重构、先审查 |
Hooks 机制:
Harness 的记忆分层:
┌─────────────────────────────────────┐
│ Active Context (RAM) │ ← 当前任务 working set
├─────────────────────────────────────┤
│ Retrieval Layer (Index) │ ← 按需检索
├─────────────────────────────────────┤
│ Durable Memory (Disk) │ ← 跨 session 事实/事件
└─────────────────────────────────────┘
关键原则:
技能是可复用的指令 + 工作流打包单元:
---
name: code-review
description: Systematic code review for bugs and quality
---
# Code Review Skill
## When to use
Use when the user asks for code review.
## Workflow
1. ASSESS: Check file size, language, diff scope
2. ANALYZE: Read code, identify patterns
3. PLAN: Prioritize issues (critical → minor)
4. EXECUTE: Generate review comments
5. VALIDATE: Ensure actionable feedback
设计要点:
Harness 支持子代理生成和团队协作:
# 主 agent 生成子代理
subagent = await harness.spawn_subagent(
role="security-auditor",
tools=["read", "grep", "glob"], # 最小权限
memory="isolated" # 独立内存空间
)
result = await subagent.execute(task)
关键模式:
LLM 是非确定性的,harness 必须提供确定性保障:
# 错误示例:完全依赖 LLM
security_analysis = await llm.analyze(code)
# 正确做法:确定性规则 + LLM 增强
yara_results = await yara.scan(code) # 确定性
llm_results = await llm.augment(yara_results) # 补充
final_report = merge(yara_results, llm_results)
实践要点:
Agentic system 进入"会采取行动"阶段后,仅记录 prompt/output 已不足:
{
"decision_log": {
"goal": "Fix the authentication bug",
"retrieved_context": ["auth.py:45-67", "session.md"],
"tool_calls": [{"name": "read", "args": {...}}],
"reasoning_path": "Step 1 → Step 2 → Step 3",
"chosen_action": "Edit auth.py line 52",
"outcome": "Success"
}
}
为什么重要:
"Loop and hope"型 agent 风险在于模型直接持有执行权:
┌──────────────┐ propose ┌──────────────┐
│ LLM │ ────────────────> │ Deterministic │
│ (Intelligence)│ │ DAG/Contract│
└──────────────┘ └──────────────┘
│
▼ execute
┌──────────────┐
│ External │
│ Systems │
└──────────────┘
更安全架构:
LLM 更适合做"DJ"而不是"singer"——负责决定调用什么工具、如何解释结果,但不应被迫充当所有 tool output 的 transport layer:
┌─────────┐ tool_call ┌─────────┐
│ LLM │ ────────────> │ Tool │
└─────────┘ └─────────┘
▲ │
│ semantic_result │ event_stream
│ (compact) │ (direct to client)
│ ▼
│ ┌─────────┐
└────────────────────│ Client │
└─────────┘
更合理的 tool contract:
semantic_result - 紧凑状态回流模型event_stream - 长文本/音频/图像/进度直接流向 client同一模型,改 harness 可提升 52.8%→66.5%(+13.7%)效果:
| Knob | 说明 | 影响 |
|---|---|---|
| System Prompt | 角色定义、行为约束 | 高 |
| Tools | 工具集选择、描述质量 | 高 |
| Middleware | Hooks、重试策略、缓存 | 中 |
| Context Management | 索引、压缩、检索策略 | 高 |
| Permission Mode | 默认权限、路径规则 | 中 |
class AgentHarness:
def __init__(self, api, tools, permissions):
self.api = api
self.tools = tools
self.permissions = permissions
self.messages = []
async def run(self, prompt, max_turns=10):
self.messages.append({"role": "user", "content": prompt})
for turn in range(max_turns):
response = await self.api.stream(self.messages, self.tools)
if response.stop_reason != "tool_use":
break
for tool_call in response.tool_uses:
if not self.permissions.check(tool_call):
raise PermissionError(f"Blocked: {tool_call.name}")
result = await self.tools.execute(tool_call)
self.messages.append({"role": "tool", "content": result})
return self.messages[-1]["content"]
from pydantic import BaseModel, Field
class ReadFileInput(BaseModel):
path: str = Field(description="File path to read")
limit: int = Field(default=2000, description="Max lines to read")
class ReadFileTool(BaseTool):
name = "read_file"
description = "Read contents of a file"
input_model = ReadFileInput
async def execute(self, args: ReadFileInput) -> ToolResult:
# Permission check
if not self.permissions.allow_path(args.path):
return ToolResult(error=f"Path not allowed: {args.path}")
# Execute
with open(args.path) as f:
lines = list(islice(f, args.limit))
return ToolResult(output="".join(lines))
class HookRegistry:
def __init__(self):
self.pre_tool_use = []
self.post_tool_use = []
def register_pre(self, hook):
self.pre_tool_use.append(hook)
def register_post(self, hook):
self.post_tool_use.append(hook)
# 使用示例
hooks = HookRegistry()
@hooks.register_pre
async def log_tool_call(tool_call):
print(f"Calling: {tool_call.name}({tool_call.args})")
@hooks.register_post
async def audit_result(tool_call, result):
await audit_log.write(tool_call, result)
class PermissionEngine:
def __init__(self, config):
self.mode = config["mode"]
self.path_rules = config["path_rules"]
self.denied_commands = config["denied_commands"]
def check(self, tool_call):
if self.mode == "auto":
return True
if self.mode == "plan":
return tool_call.name in ["read", "glob", "grep"]
# Default mode: check rules
if tool_call.name == "bash":
for denied in self.denied_commands:
if denied in tool_call.args.get("command", ""):
return False
return True
def allow_path(self, path):
for rule in self.path_rules:
if fnmatch(path, rule["pattern"]):
return rule["allow"]
return True # Default allow
┌─────────────────────────────────────┐
│ Application Layer (Workflows) │ ← 业务逻辑
├─────────────────────────────────────┤
│ Orchestration Layer (Coordinator) │ ← 多 agent 协调
├─────────────────────────────────────┤
│ Harness Layer (Tools/Memory) │ ← 基础设施
├─────────────────────────────────────┤
│ Model Layer (LLM APIs) │ ← intelligence
└─────────────────────────────────────┘
未来 harness 将更像"乐高积木":
Harness Engineering 的核心洞察:
当所有人都在卷模型时,真正的护城河正在 harness 层形成。
参考项目:
延伸阅读:
作者:ken-kit | 发布日期:2026-04-03