融合知识图谱和语义信息的烟叶分级问答系统
CSTR:
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

基金项目:

国家自然科学基金项目(61761024)


Tobacco grading question answering system integrating knowledge graph and semantic information
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    针对烟叶分级领域知识冗余且没有专业化平台用于学术检索的现状,采集多源烟叶分级数据并结合自顶向下的方法构建烟叶分级知识图谱,并以此为基础开发智能问答系统。其核心技术主要包括:1) 采集烟叶分级数据,经过命名实体识别(NER)以及关系抽取(RE)后提取三元组信息,并将其导入Neo4j平台储存;2) 对于问句语义解析,采用融合图谱数据的BERT-BiGRU-MHSA-CRF模型提升问句实体识别效果,同时将自注意力机制融入BERT-TextCNN模型中,用于解析用户分级意图,再通过匹配模板并替换槽位信息以便自动化构建cypher查询语句,在Neo4j知识库中查询最精确的答案并返回。结果表明:构建的知识图谱包含6 620个实体,超过14 000条关系;基于问句实体识别模型BERT-BiGRU-MHSA-CRF的调和平均值F1为94.12%,分级意图识别模型BERT-TextCNN- Attention的F1为98.77%。综上,该系统实现了对烟叶分级相关的多类问题的快速检索和精确回答,可以为分级人员提供辅助。

    Abstract:

    In view of the redundancy of knowledge in the field of tobacco grading and the absence of a professional platform for academic retrieving, the knowledge graph of tobacco grading was constructed by collecting multi-source tobacco grading data and combining the top-down method, and an intelligent question and answer system was developed on this basis. The core technologies are as follows. 1) Collecting tobacco leaf grading data through named entity recognition(NER) and relation extraction(RE) to extract triplet information, and import it into the Neo4j platform for storage. 2) For question semantic parsing, the BERT-BiGRU-MHSA-CRF model fused with graph data was used to improve the entity recognition effect of question sentences, and the self-attention mechanism was integrated into the BERT-TextCNN model to parse user hierarchical intent. Then, the cypher query statement was automatically constructed by matching the template and replacing the slot information, and the most accurate answer was retrieved and returned in the Neo4j knowledge base. The results showed that the constructed knowledge graph contains 6 620 entities and more than 14 000 relationships. The harmonic mean F1 of the question entity recognition model BERT-BiGRU-MHSA-CRF was 94.12%, and the F1 of the hierarchical intent recognition model BERT-TextCNN-Attention was 98.77%. In summary, the system can quickly retrieve and accurately answer multiple types of questions related to tobacco grading, which can provide auxiliary functions for graders.

    参考文献
    相似文献
    引证文献
引用本文

陈婷,朱昌群.融合知识图谱和语义信息的烟叶分级问答系统[J].湖南农业大学学报(自然科学版),2025,51(3):97-109.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2025-07-15
  • 出版日期:
文章二维码