您的位置: 首页> Python

AI Agent 系统中的常用 Workflow 模式(2) Evaluator-Optimizer模式

匿名上传

发布时间:2026-03-11 13:59:01

Agent 中的 Evaluator-Optimizer 模式

在大语言模型（LLM）智能体的开发中，如何确保生成的结果质量高、符合要求，是一个核心挑战。本文将结合 Anthropic 团队的研究经验，深入分析 evaluator_optimizer 模式的实现与应用，为开发者提供构建高效智能体的实用指南。

什么是 Evaluator-Optimizer 模式

Evaluator-Optimizer 模式是一种工作流设计，通过"生成-评估-优化"的循环过程，不断提升 LLM 输出的质量。根据 Anthropic 的定义，这种模式属于工作流类型的智能体系统，通过预定义的代码路径编排 LLM 和工具的交互。

核心实现分析

我们来看一下 evaluator_optimizer 目录中的代码实现：

1. 核心工作流程

def loop(
    task: str, evaluator_prompt: str, generator_prompt: str, max_attempts: int = 10
) -> tuple[str, str]:
    """
    循环评估-优化
    :param task: 任务描述
    :param evaluator_prompt: 评估提示词
    :param generator_prompt: 生成提示词
    :param max_attempts: 最大尝试次数
    :return: 最终响应
    """
    memory = []
    chain_of_thought = []

    thoughts, result = generate(generator_prompt, task)
    memory.append(result)
    chain_of_thought.append(
        {
            "thoughts": thoughts,
            "result": result,
        }
    )
    attempt = 0
    while True:
        evaluation, feedback = evaluate(evaluator_prompt, result, task)
        if evaluation == "PASS":
            return result, chain_of_thought
        attempt += 1
        if attempt >= max_attempts:
            return result, chain_of_thought
        # context 只包含结果和上一次的反馈，不保留思考过程
        context = "n".join(
            [
                "Previous attemps:",
                *[f"- {m}" for m in memory],
                f"nFeedback: {feedback}",
            ]
        )
        thoughts, result = generate(generator_prompt, task, context)
        memory.append(result)
        chain_of_thought.append(
            {
                "thoughts": thoughts,
                "result": result,
            }
        )

2. 生成与评估函数

def generate(prompt: str, task: str, context: str = "") -> tuple[str, str]:
    """
    生成响应
    :param prompt: 提示词
    :param task: 任务描述
    :param context: 上下文
    :return: 响应
    """
    full_prompt = (
        f"{prompt}n{context}nTask: {task}" if context else f"{prompt}nTask: {task}"
    )
    response = llm_call(full_prompt)
    thoughts = extract_xml(response, "thoughts")
    result = extract_xml(response, "result")
    print("n开始生成响应n")
    print(f"提示词:n {full_prompt} n")
    print(f"思考:n {thoughts}n")
    print(f"结果:n {result}n")
    print("n响应生成完成n")
    return thoughts, result


def evaluate(prompt: str, content: str, task: str) -> tuple[str, str]:
    """
    评估响应
    :param prompt: 提示词
    :param content: 响应内容
    :param task: 任务描述
    :return: 评估结果
    """
    full_prompt = f"{prompt}nOriginal task: {task}nContent to evaluate: {content}"
    response = llm_call(full_prompt)
    evaluation = extract_xml(response, "evaluation")
    feedback = extract_xml(response, "feedback")
    print("n开始评估响应n")
    print(f"提示词:n {full_prompt} n")
    print(f"评估:n {evaluation}n")
    print(f"反馈:n {feedback}n")
    print("n评估完成n")
    return evaluation, feedback

3. 辅助函数

def llm_call(
    prompt: str, system_prompt: str = "", model="qwen-plus", debug=False
) -> str:
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": prompt},
    ]
    if debug:
        print(messages)
    reponse = dashscope.Generation.call(
        api_key="************",
        model=model,
        messages=messages,
        temperature=0.8,
        top_p=0.8,
        enable_thinking=False,
        stream=False,
        incremental_output=True,
        response_format={"type": "text"},
        result_format="message",  # or text
        enable_search=False,
        max_tokens=4096,
    )
    return reponse.output.choices[0].message.content


def extract_xml(text: str, tag: str) -> str:
    """
    Extracts the content of the specified XML tag from the given text. Used for parsing structured responses

    Args:
        text (str): The text containing the XML.
        tag (str): The XML tag to extract content from.

    Returns:
        str: The content of the specified XML tag, or an empty string if the tag is not found.
    """
    match = re.search(f"<{tag}>(.*?)</{tag}>", text, re.DOTALL)
    return match.group(1) if match else ""