首页> 外文学位 >Visual-based web page analysis.
【24h】

Visual-based web page analysis.

机译:基于视觉的网页分析。

获取原文
获取原文并翻译 | 示例

摘要

This research investigates efforts to identify different content areas appearing on a webpage by comparing the visual features and the relative characteristics of each content area, called visual block in this study. The process is to use the Image Segmentation technique to extract and parse a webpage's visual features, as well as analyze it to identify the functionality of each content area based on its layout and position.;To accomplish this, this study reviews several techniques that have been used in related fields and discusses the strengths and the weaknesses of these techniques. The main weakness for the past techniques is they rely heavily on HTML; in other words, they are language-dependent. This paper proposes a visual-based technique that focuses on using visual features rather than HTML; hence it is more language-independent. To determine the functionality of each visual block, the technique uses an algorithm to parse webpages into a tree structure and apply a rule of how humans determine the relationship between two objects on a 2D monitor.;The goal of this research is to design an automated visual-based algorithm to exam each visual block showing on the webpage and apply human cognitive processes to decide the role of each block. For example, one might wish to identify the main content, the sub content, the navigation menu, and the advertisement.;Chapter 1 describes the motivation, the issue, and possible solution to the problem. Chapter 2 reviews several different technologies that can be used to solve the problem and elucidates possible future research. Chapter 3 focuses on explaining how to prepare the test environment and techniques that have been used. Chapter 4 describes the result, what was accomplished, what was missing, and necessary further research. Chapter 5 concludes with the possibilities of this research and how future research might help accomplish the final goal of this research.
机译:这项研究调查了通过比较每个内容区域的视觉特征和相对特征(在本研究中称为视觉块)来识别网页上出现的不同内容区域的努力。该过程是使用图像分割技术提取和解析网页的视觉特征,并对其进行分析以根据其内容和布局确定每个内容区域的功能。为此,本研究回顾了几种具有以下特点的技术:已在相关领域中使用,并讨论了这些技术的优点和缺点。过去技术的主要缺点是它们严重依赖HTML。换句话说,它们取决于语言。本文提出了一种基于视觉的技术,重点是使用视觉功能而不是HTML。因此,它与语言无关。为了确定每个可视块的功能,该技术使用一种算法将网页解析为树形结构,并应用人类如何确定2D监视器上两个对象之间关系的规则。本研究的目的是设计一种自动化的基于视觉的算法,用于检查网页上显示的每个视觉块,并应用人类认知过程来确定每个块的作用。例如,可能希望识别主要内容,子内容,导航菜单和广告。第1章介绍了动机,问题和可能的解决方案。第2章回顾了可用于解决问题的几种不同技术,并阐明了未来可能的研究。第3章重点介绍如何准备已使用的测试环境和技术。第4章介绍了结果,完成的内容,缺少的内容以及必要的进一步研究。第五章总结了这项研究的可能性以及未来的研究如何帮助完成这项研究的最终目标。

著录项

  • 作者

    Lee, Kuang-Yao.;

  • 作者单位

    San Diego State University.;

  • 授予单位 San Diego State University.;
  • 学科 Computer Science.
  • 学位 M.S.
  • 年度 2014
  • 页码 46 p.
  • 总页数 46
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号