首页> 中国专利> 基于网络爬虫技术的中文文献数据自动化获取方法

基于网络爬虫技术的中文文献数据自动化获取方法

页面导航

摘要
著录项
法律信息
相似文献

摘要

本发明公开了基于网络爬虫技术的中文文献数据自动化获取方法，结合目标网页结构特点通过调用Python中Selenium库及其他相关模块，构建来一套自动化获取中文文献数据的网络框架。本发明从网页结构分析出发，通过分析网页中的Xpath路径表达式，获取所需文本的参数化表达，并通过大量实验调试，实现了数据高效准确的自动化爬取。本发明对于建立中文的科学文献数据库和推动科学学发展具有重要意义。

著录项

公开/公告号CN111368167A

专利类型发明专利
公开/公告日2020-07-03

原文格式PDF
申请/专利权人北京师范大学;
展开▼

申请/专利号CN202010151141.9
发明设计人赵子鸣;李本继;陈清华;李小萌;
展开▼

申请日2020-03-06
分类号
代理机构
代理人
地址 100875 北京市海淀区新街口外大街19号
入库时间 2023-12-17 10:08:05

法律信息

法律状态公告日

法律状态信息

法律状态
2020-07-28

实质审查的生效 IPC(主分类):G06F16/951 申请日:20200306

实质审查的生效
2020-07-03

公开

公开

相似文献

专利
中文文献
外文文献

1. 基于网络爬虫技术的中文文献数据自动化获取方法 [P] . 中国专利： CN111368167A . 2020-07-03
2. 一种基于网络爬虫技术的水文水质数据采集方法及系统 [P] . 中国专利： CN111859067A . 2020-10-30
3. automated music composition and generation system, automated music composition and generation process, automated music composition and generation, toy musical instrument, music accompaniment and music composition toy instrument, automated composition toy instrument system and music generation, electronic information processing and display system, enterprise-class internet-based music composition and generation system, network system for automatically generating and delivering digital composite music, stand-alone music-based music composition and performance system artificial intelligence for use in a music environment, autonomous composition process music generation and performance based on artificial intelligence, autonomous analysis instrument system, network for setting up an automated music composition and generation engine, geometry method music theory system operational parameter mapping, method of composing and generating digital music in an automated manner, parameter transform [P] . BR112018006194A2 . 2018-10-09

机译：自动化音乐创作和生成系统，自动化音乐创作和生成过程，自动化音乐创作和生成，玩具乐器，音乐伴奏和音乐创作玩具乐器，自动化创作玩具乐器系统和音乐生成，电子信息处理和显示系统，企业基于互联网的一流音乐创作和生成系统，用于自动生成和传送数字复合音乐的网络系统，用于音乐环境的基于独立音乐的音乐创作和表演系统人工智能，基于音乐的自主创作过程音乐的生成和表演人工智能，自主分析仪器系统，用于建立自动音乐创作和生成引擎的网络，几何方法音乐理论系统操作参数映射，以自动方式构成和生成数字音乐的方法，参数转换
4. AUTOMATION NETWORK WITH PACKET-BASED COMMUNICATION BETWEEN HOST AND CLIENT AND METHOD FOR OPERATING AN AUTOMATION NETWORK [P] . EP3863231A1 . 2021-08-11

机译：自动化网络具有基于数据包的主机和客户端的通信以及操作自动化网络的方法
5. Method for transmitting data between at least one mobile data acquisition device and a server device connected to a telecommunications network based on visually detectable image information, system, data acquisition device, computer program and computer program product having at least one object class [P] . 德国专利： DE102014225879A1 . 2016-06-16

机译：用于基于视觉上可检测的图像信息在至少一个移动数据获取设备和连接到电信网络的服务器设备之间传输数据的方法，系统，数据获取设备，具有至少一个对象类别的计算机程序和计算机程序产品