您的位置: 首页> 开发工具

打造 Claude 全自动长运行智能体：Harness Agent 开发中的 Ctrl+C 陷阱与解决方案

匿名上传

发布时间:2026-03-26 08:42:01

背景：为什么开发这个 Plugin？

Claude Code 是 Anthropic 推出的命令行 AI 编程助手。它支持插件系统，允许开发者创建自定义的自动化工作流。

我正在帮助用户开发一个名为 long-running-agent 的 Claude Code 插件。这个插件的核心功能是：

实现一个全自动的"多轮迭代 Agent" - 让 Claude 自动进行多轮编码、测试、修复循环，直到所有功能通过测试。

插件架构

long-running-agent/
├── harness.py          # 主程序：跨平台多会话运行器
├── commands/
│   ├── init.md         # 初始化 Agent 提示词
│   └── code.md         # 编码 Agent 提示词
└── README.md

工作流程

Session 1 - 初始化：Claude 分析需求，创建功能列表和项目结构
Session 2+ - 编码循环：Claude 逐个实现功能，运行测试，修复问题
自动终止：当所有功能通过测试，或达到最大会话数时停止

核心挑战

harness.py 需要管理 Claude CLI 子进程，处理：

子进程的启动和终止
输出流的实时解析和显示
超时控制
用户中断（Ctrl+C）

就是在处理用户中断时，我们遇到了这个诡异的问题。

问题：Ctrl+C 完全无响应

用户报告：在 Windows 上运行 harness.py 时，按 Ctrl+C 完全没有反应。

问题代码

def run_claude(...):
    cmd = ['claude', '-p', prompt, '--output-format', 'stream-json']
    
    popen_kw = dict(
        stdout=subprocess.PIPE,
        stderr=subprocess.DEVNULL,
        cwd=str(project_dir),
        encoding='utf-8',
    )
    
    proc = subprocess.Popen(cmd, **popen_kw)
    
    # 主循环读取输出...
    while True:
        # 处理输出...
        _stop_event.wait(timeout=0.1)

def _setup_signals():
    def _handler(signum, frame):
        print(f"Signal {signum} received!")
        # 清理并退出...
    
    signal.signal(signal.SIGINT, _handler)

信号处理器明明注册了，按 Ctrl+C 就是不触发！

之前的尝试（都失败了）

用户已经尝试过：

使用 SetConsoleCtrlHandler via ctypes
添加 signal.SIGBREAK 处理器
改用 os._exit(1)
添加 atexit.register()
优化进程树终止逻辑

都没有解决问题。

调试过程

第一步：验证基础功能

我首先创建了一个最简单的测试脚本：

import signal
import sys
import time

def handler(signum, frame):
    print(f"nSignal {signum} received!")
    sys.exit(0)

signal.signal(signal.SIGINT, handler)
print("Press Ctrl+C to test...")
while True:
    time.sleep(0.1)

结果： Ctrl+C 正常工作

这证明 Python 的信号处理在 Windows 上是正常的。

第二步：添加 subprocess

import subprocess

cmd = ['ping', '-t', '127.0.0.1']
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, encoding='utf-8')

while True:
    time.sleep(0.1)

结果： Ctrl+C 正常工作

第三步：添加 threading

import threading

stop_event = threading.Event()

def reader_thread():
    while not stop_event.is_set():
        raw = proc.stdout.readline()
        # ...

reader = threading.Thread(target=reader_thread, daemon=True)
reader.start()

结果： Ctrl+C 正常工作

第四步：使用真实的 claude CLI

cmd = ['claude', '-p', 'say hello', '--output-format', 'stream-json']
proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, encoding='utf-8')

结果： Ctrl+C 无响应！

终于复现了问题！问题出在 claude CLI 上。

第五步：深入分析

我注意到一个关键细节：当我疯狂按 Ctrl+C 时，程序没有反应；但当 claude CLI 自然结束后，信号处理器突然触发了！

这说明：

信号确实被发送了
但被 claude CLI 拦截了
直到 claude 退出，信号才传递到父进程

第六步：查阅文档

通过 claude --help，我发现 -p / --print 参数的说明：

-p, --print    Print response and exit (useful for pipes). 
               Note: The workspace trust dialog is skipped when Claude 
               is run with the -p mode.

-p 模式是非交互模式，用于管道操作。在这个模式下，claude CLI 可能会捕获并处理 Ctrl+C 信号。

第七步：解决方案

问题的根源是：claude CLI 与父进程共享同一个控制台（TTY），当用户按 Ctrl+C 时，信号同时发送到两个进程。claude CLI 捕获了这个信号但没有正确传播。

解决方案是隔离子进程的 stdin：

#  原来的代码
popen_kw = dict(
    stdout=subprocess.PIPE,
    stderr=subprocess.DEVNULL,
)

#  修复后的代码
popen_kw = dict(
    stdin=subprocess.DEVNULL,  # 关键！
    stdout=subprocess.PIPE,
    stderr=subprocess.DEVNULL,
)

通过设置 stdin=subprocess.DEVNULL，子进程的 stdin 与父进程分离，Ctrl+C 信号不会被 claude CLI 拦截。

结果： Ctrl+C 正常工作！

额外优化：防误触机制

考虑到这个插件会运行很长时间（可能几小时），误触 Ctrl+C 会很麻烦。我添加了双击确认机制：

_ctrl_c_count: int = 0
_last_ctrl_c_time: float = 0.0
CTRL_C_TIMEOUT: float = 2.0  # 两次 Ctrl+C 之间的最大间隔

def _handler(signum, frame):
    global _ctrl_c_count, _last_ctrl_c_time
    current_time = time.time()
    
    if current_time - _last_ctrl_c_time > CTRL_C_TIMEOUT:
        _ctrl_c_count = 1
    else:
        _ctrl_c_count += 1
    
    _last_ctrl_c_time = current_time
    
    if _ctrl_c_count >= 2:
        _cleanup_and_exit(f"signal {signum} — stopping…")
    else:
        log("按 Ctrl+C 再一次确认退出 (或等待 2 秒继续运行)")

还添加了一个后台线程，在超时后提示用户程序继续运行：

def _timeout_checker():
    global _ctrl_c_count
    while True:
        time.sleep(0.5)
        if _ctrl_c_count == 1 and (time.time() - _last_ctrl_c_time > CTRL_C_TIMEOUT):
            _ctrl_c_count = 0
            log("继续运行...")

checker = threading.Thread(target=_timeout_checker, daemon=True)
checker.start()

技术总结

问题根因

当子进程与父进程共享同一个控制台时，Ctrl+C 信号会同时发送到所有进程。如果子进程捕获了信号但没有正确传播，父进程的信号处理器就不会被触发。

这在开发 Harness Agent 类应用时特别常见，因为：

你需要用 -p 模式运行 Claude CLI（非交互模式）
你需要实时读取子进程输出
你需要让用户能够随时中断

解决方案

场景	解决方案
子进程需要隔离信号	`stdin=subprocess.DEVNULL` 或 `stdin=subprocess.PIPE`
Windows 进程组隔离	`creationflags=CREATE_NEW_PROCESS_GROUP`
Unix 进程组隔离	`start_new_session=True`

经验教训

逐步排查：从最简单的代码开始，逐步添加复杂度，定位问题所在
理解工具行为：-p 模式的 claude CLI 有特殊的信号处理行为
隔离是关键：在 subprocess 中隔离 stdin/stdout/stderr 可以避免很多信号处理问题
用户体验：长时间运行的程序需要防误触机制

完整代码

修复后的 harness.py 关键部分：

# 信号处理
_ctrl_c_count: int = 0
_last_ctrl_c_time: float = 0.0
CTRL_C_TIMEOUT: float = 2.0

def _setup_signals() -> None:
    global _ctrl_c_count, _last_ctrl_c_time
    
    def _timeout_checker():
        global _ctrl_c_count
        while True:
            time.sleep(0.5)
            if _ctrl_c_count == 1 and (time.time() - _last_ctrl_c_time > CTRL_C_TIMEOUT):
                _ctrl_c_count = 0
                log("继续运行...")
    
    def _handler(signum, frame):
        global _ctrl_c_count, _last_ctrl_c_time
        current_time = time.time()
        
        if current_time - _last_ctrl_c_time > CTRL_C_TIMEOUT:
            _ctrl_c_count = 1
        else:
            _ctrl_c_count += 1
        
        _last_ctrl_c_time = current_time
        
        if _ctrl_c_count >= 2:
            _cleanup_and_exit(f"signal {signum} — stopping…")
        else:
            log("按 Ctrl+C 再一次确认退出 (或等待 2 秒继续运行)")

    signal.signal(signal.SIGINT, _handler)
    signal.signal(signal.SIGTERM, _handler)
    if IS_WINDOWS:
        signal.signal(signal.SIGBREAK, _handler)
    
    checker = threading.Thread(target=_timeout_checker, daemon=True)
    checker.start()

# 子进程启动
popen_kw: dict = dict(
    stdin=subprocess.DEVNULL,  # 隔离 stdin，防止子进程捕获 Ctrl+C
    stdout=subprocess.PIPE,
    stderr=subprocess.DEVNULL,
    cwd=str(project_dir),
    encoding='utf-8',
    errors='replace',
)

if IS_WINDOWS:
    CREATE_NEW_PROCESS_GROUP = 0x00000200
    popen_kw['creationflags'] = CREATE_NEW_PROCESS_GROUP
else:
    popen_kw['start_new_session'] = True

proc = subprocess.Popen(cmd, **popen_kw)

给 Harness Agent 开发者的建议

如果你也在开发类似的自动化 Agent 工具，以下是一些建议：

1. 信号处理

# 必须隔离 stdin
stdin=subprocess.DEVNULL  # 或 subprocess.PIPE

# Windows 需要进程组隔离
if IS_WINDOWS:
    popen_kw['creationflags'] = 0x00000200  # CREATE_NEW_PROCESS_GROUP

2. 输出流处理

使用线程读取输出，避免阻塞主线程：

def reader_thread():
    while not stop_event.is_set():
        raw = proc.stdout.readline()
        if not raw:
            break
        output_queue.append(raw)

reader = threading.Thread(target=reader_thread, daemon=True)
reader.start()

3. 进程清理

确保子进程及其子进程都被终止：

def kill_process_tree(pid: int) -> None:
    if IS_WINDOWS:
        subprocess.run(['taskkill', '/T', '/F', '/PID', str(pid)])
    else:
        os.killpg(os.getpgid(pid), signal.SIGKILL)