您的位置: 首页> AI模型

RAG-项目实战一（GraphRAG优化）

匿名上传

发布时间:2026-02-04 14:15:01

一.GraphRAG与传统RAG的区别

特性	传统 RAG	GraphRAG
检索核心	语义相似度（向量距离）	语义 + 拓扑关系（点线网）
理解深度	只能找到“长得像”的片段	能通过“多跳”发现隐含联系
信息组织	孤立的文档块（Chunks）	结构化的实体与关系网络
复杂问题	难以回答“为什么”、“有什么区别”	擅长处理需要逻辑推理的复杂查询

二.项目结构

1.数据准备层

流程：从 Neo4j 提取菜谱、食材、步骤节点，并利用 Cypher 关系将它们“缝合”成一篇篇结构化的 Markdown 文档。

# 这是 Python 连接 Neo4j 的标准写法。
# 它创建了一个会话（Session），with 语句确保查询完成后，连接会被正确关闭，不会占用资源。
        with self.driver.session() as session:
            # 加载所有菜谱节点，从Category关系中读取分类信息
            # match匹配节点->where过滤条件->optional可选操作->with聚合处理
            recipes_query = """
            MATCH (r:Recipe)
            WHERE r.nodeId >= '200000000'
            OPTIONAL MATCH (r)-[:BELONGS_TO_CATEGORY]->(c:Category)
            WITH r, collect(c.name) as categories
            RETURN r.nodeId as nodeId, labels(r) as labels, r.name as name, 
                   properties(r) as originalProperties,
                   CASE WHEN size(categories) > 0 
                        THEN categories[0] 
                        ELSE COALESCE(r.category, '未知') END as mainCategory,
                   CASE WHEN size(categories) > 0 
                        THEN categories 
                        ELSE [COALESCE(r.category, '未知')] END as allCategories
            ORDER BY r.nodeId
            """

作用：将图里的“点”重新变成 LLM 容易理解的“文”，同时保留了节点 ID 供后续回溯。

2.索引构建层

流程：

向量索引：用Milvus存储文本向量，负责“模糊语义搜素”；
图索引（KV）：K：索引键（简短词汇或短语），V：详细描述段落（包含相关文本片段）。将实体（菜谱、食材，烹饪步骤）存入内存字典，负责“精确关键词匹配”。

# 处理菜谱实体
    for recipe in recipes:
        entity_id = recipe.node_id
        # 没有名字就用id拼凑一个
        entity_name = recipe.name or f"菜谱_{entity_id}"
        
        # 构建详细内容
        content_parts = [f"菜品名称: {entity_name}"]
        
        # 解析菜谱里的属性
        if hasattr(recipe, 'properties'):
            props = recipe.properties
            if props.get('description'):
                content_parts.append(f"描述: {props['description']}")
            if props.get('category'):
                content_parts.append(f"分类: {props['category']}")
            if props.get('cuisineType'):
                content_parts.append(f"菜系: {props['cuisineType']}")
            if props.get('difficulty'):
                content_parts.append(f"难度: {props['difficulty']}")
            if props.get('cookingTime'):
                content_parts.append(f"制作时间: {props['cookingTime']}")
        
        # 创建键值对
        entity_kv = EntityKeyValue(
            entity_name=entity_name,
            index_keys=[entity_name],  # 使用名称作为唯一索引键
            value_content='n'.join(content_parts),
            entity_type="Recipe",
            metadata={
                "node_id": entity_id,
                "properties": getattr(recipe, 'properties', {})
            }
        )
        
        # 以id为主键存入主仓库
        self.entity_kv_store[entity_id] = entity_kv
        # 以名称为主键建立搜索目录
        self.key_to_entities[entity_name].append(entity_id)

作用：确保系统既能听懂用户的“言外之意”（向量），也能记住“菜谱”（键值对）。

3.智能路由层

流程：利用LLM预先分析用户的问题。
- 简单问题 $\to rightarrow$ 走传统混合检索。
- 复杂问题（带“为什么”、“如何”、“关联”） $\to rightarrow$ 走图 RAG 检索。
作用：降本增效。简单问题不用大费周章查全图，复杂问题不遗漏。

4.核心检索层

流程：
- 双层检索：同时在实体级（具体菜谱）和主题级（菜系/风格）发力。
- 多跳遍历：这是图RAG的核心优势，通过图结构发现隐含的知识关联。沿着图的箭头走 2-3 步，找关联知识。
- 合并策略：用 Round-robin（轮询）把图谱结果和向量结果公平地凑在一起。
作用：这是 GraphRAG 的精髓，它带回来的不是一段话，而是一张逻辑关联网。