首页> 外文会议>Third International Symposium on Information Processing >Auto-labeling Terms Based on Multi-scanning Strategy
【24h】

Auto-labeling Terms Based on Multi-scanning Strategy

机译:基于多重扫描策略的自动标记术语

获取原文

摘要

In order to construct the term corpus of physics teaching materials for elementary education, the characters of physics terms were studied, the prediction templates for the unknown terms was built, all kinds of rules for identifying terms was extracted, and the labeling errors of maximum matching algorithm was analyzed, at last, an auto-labeling system was developed. Firstly, this algorithm scans and labels terms which match the rule templates. Secondly, it takes terms in the base glossary as anchor points, and finds out every anchor point with the maximum matching algorithm. Finally scans the context of the anchor point so as to judge whether the candidate strings is a term or not. Together with the prediction and limited function of rules, this method makes full use of the information of terms in base glossary and achieves a higher precision and recall rate. The F-index reaches about 84% in open test.
机译:为了构建基础教育物理教材的术语库,研究了物理术语的特点,建立了未知术语的预测模板,提取了各种术语识别规则,最大匹配的标注误差对算法进行了分析,最后开发了自动标注系统。首先,该算法扫描并标记与规则模板匹配的术语。其次,将基本词汇表中的术语作为锚点,并使用最大匹配算法找出每个锚点。最后,扫描锚点的上下文,以判断候选字符串是否为术语。结合规则的预测和有限功能,该方法充分利用了基础术语表中的术语信息,具有较高的准确性和查全率。在开放测试中,F指数达到约84%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号