...
首页> 外文期刊>New Generation Computing >Transliteration Pair Extraction from Classical Chinese Buddhist Literature Using Phonetic Similarity Measurement
【24h】

Transliteration Pair Extraction from Classical Chinese Buddhist Literature Using Phonetic Similarity Measurement

机译:利用语音相似度测量从中国古典佛教文献中音译对提取

获取原文
获取原文并翻译 | 示例
           

摘要

Transliteration pair extraction, the identification of transliterations of foreign loanwords in literature, is a challenging task in research fields such as historical linguistics and digital humanities. In this paper, we focus on one important type of historical literature: classical Chinese Buddhist texts. We propose an approach which can identify transliteration pairs automatically in classical Chinese texts. Our approach comprises two stages: transliteration extraction and transliteration pair identification. In order to extract more possible transliterations without introducing too many false positives, we adopt a hybrid method consisting of a suffix-array-based extraction step and a language-model based filtering process. Using the ALINE algorithm, we then compare the extracted transliteration candidates for phonetic similarity based on their pronunciations in the middle Chinese rime book Guangyun ( ). Pairs with similarity above a certain threshold are considered transliteration pairs. To evaluate our method, we constructed an evaluation set from several Buddhist texts such as the Samyuktagama and the Mahavibhasa, which were translated into Chinese in different eras. Precision and recall are used to measure and show the effectiveness of our method. Keywords Transliteartion Pair Extraction Phonetic Similarity Classical Chinese Processing Page %P Close Plain text Look Inside Reference tools Export citation EndNote (.ENW) JabRef (.BIB) Mendeley (.BIB) Papers (.RIS) Zotero (.RIS) BibTeX (.BIB) Add to Papers Other actions Register for Journal Updates About This Journal Reprints and Permissions Share Share this content on Facebook Share this content on Twitter Share this content on LinkedIn Related Content Supplementary Material (0) References (16) References1.Shieh, Y.-P., “Appositional Term Clip: A Subject-oriented Appositional Term Extraction Algorithm,” New Eyes for Discovery: Foundations and Imaginations of Digital Humanities, National Taiwan University Press, pp. 133–162, 2011.2.Sherif, T. and Kondrak, G., “Bootstrapping a stochastic transducer for Arabic-English transliteration extraction,” In Proc. of Annual Meeting-Association for Computational Linguistics, 2007.3.Kuo, J-S., Li, H. and Yang, Y-K., “A Phonetic Similarity Model for Automatic Extraction of Transliteration Pairs,” ACM Trans. Asian Language Information Processing, 6, 2, 2007.4.Oh J., Choi K.: “A statistical model for Automatic Extraction of Korean Transliterated Foreign words”. International Journal of Computer Processing of Oriental Languages 16(1), 41–62 (2003)CrossRef5.Goldberg, Y. and Elhadad, M., “Identification of transliterated foreign words in Hebrew script,” Computational Linguistics and Intelligent Text Processing, 2008.6.Covington M.A.: “An algorithm to align words for historical comparison”. Computational Linguistics 22(4), 481–496 (1996)7.Kondrak G.: “Phonetic alignment and similarity”. Computers and the Humanities 37(3), 273–291 (2003)CrossRef8.Tiedemann, J., “Extraction of translation equivalents from parallel corpora,” Proc. of the 11th Nordic conference on computational linguistics, pp. 120–128, 1998.9.Nakov, P., Pacovski, V. and Paskaleva, E., “Extraction of translation equivalents from parallel corpora,” Proc. of the 11th Nordic conference on computational linguistics, pp. 120–128, 1998.10.Ristad E.S., Yianilos P.N.: “Learning string-edit distance”. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(5), 522–532 (1998)CrossRef11.Mackay, W. and Kondrak, G., “Computing word similarity and identifying cognates with Pair Hidden Markov Models,” Proc. of the Ninth Conference on Computational Natural Language Learning, pp. 40–47, 2005.12.Manzini G., Ferragina P.: “Engineering a lightweight suffix array construction algorithm”. Algorithmica 40(1), 33–50 (2004)MathSciNetCrossRefMATH13.Wang, L., Historical Chinese Phonology, Zhonghua Book Company, 2002.14.Cambel, L., Historical linguistics: an introduction, The MIT Press, 1987.15.Ciyi, Fo Guang Buddhist Dictionary, Buddha’s Light Publishing, 1988.16.Ding, F.-B., Great Dictionary of Buddhism, The Medical Press, 1922. About this Article Title Transliteration Pair Extraction from Classical Chinese Buddhist Literature Using Phonetic Similarity Measurement Journal New Generation Computing Volume 31, Issue 4 , pp 265-283 Cover Date2013-10 DOI 10.1007/s00354-013-0402-1 Print ISSN 0288-3635 Online ISSN 1882-7055 Publisher Springer Japan Additional Links Register for Journal Updates Editorial Board About This Journal Manuscript Submission Topics Artificial Intelligence (incl. Robotics) Computer Hardware Computer Systems Organization and Communication Networks Software Engineering/Programming and Operating Systems Computing Methodologies Keywords Transliteartion Pair Extraction Phonetic Similarity Classical Chinese Processing Industry Sectors IT & Software Telecommunications Authors Yu-Chun Wang (1) (2) Chun-Kai Wu (3) Richard Tzong-Han Tsai (4) Jieh Hsiang (1) Author Affiliations 1. Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan 2. Telecommunication Laboratories, Chunghwa Telecom, Taipei, Taiwan 3. Department of Computer Science and Engineering, Yuan Ze University, Zhongli, Taiwan 4. Department of Computer Science and Information Engineering, National Central University, Zhongli, Taiwan Continue reading... To view the rest of this content please follow the download PDF link above.
机译:在历史语言学和数字人文科学等研究领域,音译对提取是文学中外来借词音译的识别,是一项艰巨的任务。在本文中,我们关注一种重要的历史文献类型:中国古典佛教文字。我们提出了一种可以自动识别古典中文文本中音译对的方法。我们的方法包括两个阶段:音译提取和音译对识别。为了在不引入过多误报的情况下提取更多可能的音译,我们采用了一种混合方法,该方法由基于后缀数组的提取步骤和基于语言模型的过滤过程组成。然后,使用ALINE算法,根据中古霜书《广韵()》中提取的音译相似度,比较提取出的音译候选音。相似性高于某个阈值的对被视为音译对。为了评估我们的方法,我们根据几种佛教文本(例如Samyuktagama和Mahavibhasa)构建了一个评估集,这些文本在不同时代被翻译成中文。精度和查全率用于衡量和证明我们方法的有效性。关键字Transliteartion对提取语音相似度古典中文处理页%P关闭纯文本查找内部参考工具导出引用EndNote(.ENW)JabRef(.BIB)Mendeley(.BIB)论文(.RIS)Zotero(.RIS)BibTeX(.BIB )添加到论文其他操作注册期刊更新关于此期刊的转载和权限共享在Facebook上共享此内容在Twitter上共享此内容在LinkedIn上共享此内容相关内容补充材料(0)参考(16)参考1. Shieh,Y.-P等,“方位词剪辑:面向主题的方位词提取算法”,《发现的新视界:数字人文科学的基础和想象》,台湾大学出版社,第133–162页,2011.2.Sherif,T. an d Kondrak,G。,“引导随机转换器进行阿拉伯-英语音译提取”,Proc.Natl.Acad.Sci。计算语言学协会年会,2007.3.Kuo,J-S。,Li,H.和Yang,Y-K。,“音译对自动提取的语音相似模型”,ACM Trans。亚洲语言信息处理,6,2,2007.4。Oh J.,Choi K .:“自动提取朝鲜语音译外语的统计模型”。国际东方语言计算机处理杂志16(1),41–62(2003)CrossRef5.Goldberg,Y.和Elhadad,M.,“希伯来语脚本中音译外来词的识别”,计算语言学和智能文本处理,2008.6 .Covington MA:“一种用于对齐单词以进行历史比较的算法”。计算语言学22(4),481–496(1996)7。Kondrak G .:“语音对齐和相似性”。计算机与人文科学37(3),273–291(2003)CrossRef8.Tiedemann,J.,“从平行语料库中提取翻译等价物”,Proc.Natl.Acad.Sci.USA,87:3877-5。第11届北欧计算语言学会议的论文集,第120-128页,1998.9。第11届北欧计算语言学会议论文集,第120-128页,1998.10。RistadE.S.,Yaniallos P.N .:“学习字符串编辑距离”。 IEEE Transactions on Pattern Analysis and Machine Intelligence 20(5),522–532(1998)CrossRef11.Mackay,W. and Kondrak,G.,“计算单词相似度并使用配对隐马尔可夫模型识别同义词”,过程第九届计算自然语言学习会议论文集,第40–47页,2005.12。ManziniG.,Ferragina P .:“设计轻量级后缀数组构造算法”。 Algorithmica 40(1),33–50(2004)MathSciNetCrossRefMATH13.Wang,L.历史汉语语音学,中华书局,2002.14.Cambel,L.,《历史语言学:导论》,麻省理工学院出版社,1987.15。Cii,Fo Guang 《佛教大辞典》,佛光出版社,1988.16。丁大法,《佛教大辞典》,医学出版社,1922年。关于本文使用语音相似性测量从中国古典佛教文学中音译对的提取新一代计算第31卷,第4期,第265-283页封面日期2013-10 DOI 10.1007 / s00354-013-0402-1打印ISSN 0288-3635在线ISSN 1882-7055出版商Springer日本其他链接注册期刊更新编辑委员会关于本期刊论文投稿主题人工智能(含税)机器人)计算机硬件计算机系统组织和通信网络软件工程/编程和操作系统计算方法关键字Transliteartion对提取语音相似度古典中国加工行业IT和软件电信作者王玉春(1)(2)吴春凯( 3)蔡子汉(4)项祥(1)所属单位1.国立台湾大学计算机科学与信息工程系,台北,台湾2.中华台北电信实验室,电讯实验室,台湾3.中原大学计算机科学与工程系,台湾中立4.国立中央大学计算机科学与信息工程系,台湾中立,继续阅读.. 。要查看此内容的其余部分,请单击上面的下载PDF链接。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号