首页> 外文期刊>Journal of Zhejiang University Science: An international applied physics & engineering journal >Preserving the literary past, looking to the future: the first Hong Kong Literature Database
【24h】

Preserving the literary past, looking to the future: the first Hong Kong Literature Database

机译:保留文学过去,展望未来:首个香港文学数据库

获取原文
获取原文并翻译 | 示例
           

摘要

In the last two decades of the 20th century, there has been an increasing interest in and emphasis on the study of the Hong Kong literature in both the academic and general public in Hong Kong. Recognizing the emergent need of the resources on Hong Kong literature, the University Library System of the Chinese University of Hong Kong set up the Hong Kong Literature Database (the "Database"), which was the first Chinese literature database in the Internet in 2000. The paper will examine how the database is constructed using XML technology and metadata schema. The database also employs Unicode UTF-8 as the internal code. A mapping table for traditional and simplified Chinese characters was created based on Unihan and is used behind the scene so that a user can either input traditional or simplified Chinese characters and retrieval will give both traditional and simplified Chinese characters. Currently 65 percent of journals use OCR technology so that full-text searching is possible. The Chinese OCR technology will be examined in greater detail. Special features of the Database such as, page-by-page browse mode, position-highlight for full-page newspaper, linking Table-Of-Contents and book jackets from the Library catalogue, etc. are described. The paper will also bring out the problem of massive downloading and compare the state-of-the-art technology and their shortcomings. This paper shows how the Hong Kong Literature Database facilitates future collaboration and data exchange by using open standard, shareable structure and the latest technology.
机译:在20世纪的最后二十年中,香港学术界和普通公众对香港文学的研究越来越引起人们的关注和重视。认识到香港文学资源的迫切需求,香港中文大学的大学图书馆系统建立了香港文学数据库(“数据库”),这是2000年互联网上第一个中文文学数据库。本文将研究如何使用XML技术和元数据架构来构建数据库。该数据库还采用Unicode UTF-8作为内部代码。基于Unihan创建了繁体和简体汉字映射表,并在后台使用了该表,以便用户可以输入繁体或简体汉字,并且检索将提供繁体和简体汉字。目前,有65%的期刊使用OCR技术,因此可以进行全文搜索。将对中国的OCR技术进行更详细的研究。描述了数据库的特殊功能,例如逐页浏览模式,整页报纸的位置突出显示,链接目录和图书馆目录中的书皮等。本文还将提出大量下载的问题,并比较最先进的技术及其缺点。本文展示了香港文学数据库如何通过使用开放标准,可共享的结构和最新技术来促进未来的合作和数据交换。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号