Automatic extraction of catalog data from digital images of historical manuscripts

Roni Shweka; Yaacov Choueka; Lior Wolf; Nachum Dershowitz

首页> 外文期刊>Literary & linguistic computing >Automatic extraction of catalog data from digital images of historical manuscripts

【24h】

Automatic extraction of catalog data from digital images of historical manuscripts

机译：从历史手稿的数字图像中自动提取目录数据

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The Cairo Genizah, discovered in the late 19th century, is a collection of handwritten historical documents containing approximately 350,000 fragments of mainly Jewish texts. The fragments are today spread out in more than seventy libraries and private collections worldwide, and there is an ongoing effort to document and catalog all extant fragments. We explore three levels of extraction of catalog data from digital images of the fragments. First, images should be captured in a way that permits standardized automatic processing. Second, the images can be processed to detect elements such as image foreground, regions of written text, and lines of the text, thereby allowing for the automatic assignment of conventional catalog measurements. Third, modern computer-vision tools and statistical inference techniques may be used to identify fragments that might originate from the same original codex. Such matched fragments, commonly referred to as 'joins', were heretofore identified manually by experts, and presumably only a small fraction of existing joins have been discovered to date. Overall, we present what might be the first effort to address all three levels successfully within a large-scale project, detailing the various design choices and describing the techniques and algorithms used for the Cairo Genizah digitization project.

机译：开罗Genizah于19世纪后期发现，是手写的历史文献的集合，其中包含约35万个主要是犹太文本的片段。如今，这些碎片已散布到全球70多个图书馆和私人馆藏中，并且正在努力记录和分类所有现存的碎片。我们探索从片段的数字图像中提取目录数据的三个级别。首先，应以允许标准化自动处理的方式捕获图像。其次，可以对图像进行处理以检测元素，例如图像前景，书面文本区域和文本行，从而可以自动分配常规目录度量。第三，现代计算机视觉工具和统计推断技术可用于识别可能源自同一原始抄本的片段。迄今为止，此类匹配的片段（通常称为“连接”）是由专家手动识别的，迄今为止，大概只有一小部分现有连接被发现。总体而言，我们将介绍在大型项目中成功解决所有三个级别的第一个工作，详细介绍各种设计选择并描述开罗Genizah数字化项目所使用的技术和算法。

著录项

来源
《Literary & linguistic computing》 |2013年第2期|315-330|共16页
作者
Roni Shweka; Yaacov Choueka; Lior Wolf; Nachum Dershowitz;
展开▼
作者单位

The Friedberg Genizah Project, Jerusalem, Israel;

The Friedberg Genizah Project, Jerusalem, Israel;

The Blavatnik School of Computer Science, Tel Aviv University,Ramat Aviv, Israel;

The Blavatnic School of Computer Science,Tel Aviv University,Ramat Aviv;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Manuscripts and Metadata: Descriptive Metadata in Three Manuscript Catalogs: DigCIM, MALVINE, and Digital Scriptorium [J] . Joy Humphrey Cataloging & classification quarterly . 2007,第2期

机译：手稿和元数据：DigCIM，MALVINE和Digital Scriptorium这三个手稿目录中的描述性元数据
2. Evaluation of Algorithms for Automatic Data Extraction from Digital Holographic Images of Particles [J] . Dyomin V. V., Kamenev D. V. Russian physics journal . 2016,第10期

机译：从粒子数字全息图像中自动提取数据的算法评估
3. Automatic segmentation of digitalized historical manuscripts [J] . Costantino Grana, Daniele Borghesani, Rita Cucchiara Multimedia Tools and Applications . 2011,第3期

机译：数字化历史手稿的自动分割
4. Direct Unsupervised Text Line Extraction from Colored Historical Manuscript Images Using DCT [C] . Asim Baig, Somaya Al-Maadeed, Ahmed Bouridane, International conference on image analysis and recognition . 2016

机译：使用DCT从彩色历史手稿图像中直接无监督地提取文本行
5. Automatic extraction of terrain features from digital terrain data: A multi-faceted study. [D] . Lay, Jinn-Guey. 1993

机译：从数字地形数据中自动提取地形特征：多方面的研究。
6. DR HAGIS—a fundus image database for the automatic extraction of retinal surface vessels from diabetic patients [O] . Sven Holm, Greg Russell, Vincent Nourrit, 2017

机译：DR HAGIS-一种眼底图像数据库用于从糖尿病患者中自动提取视网膜表面血管
7. Automatic Extraction of Catalog Data from Digital Images of Historical Manuscripts [O] . Roni Shweka, Yaacov Choueka, Lior Wolf, 2012

机译：从历史手稿的数字图像中自动提取目录数据
8. Automatic feature extraction and classification from digital x-ray images. Final report, period ending 1 May 1995 [R] . Richardson, J. 1995

机译：从数字X射线图像中自动提取和分类。最后报告，截至1995年5月1日

Automatic extraction of catalog data from digital images of historical manuscripts

摘要

著录项

相似文献

相关主题

期刊订阅