首页> 中文期刊> 《计算机工程与应用》 >COX:高压缩率的中文XML文档压缩技术

COX:高压缩率的中文XML文档压缩技术

         

摘要

To overcome the shortcoming of the current XML compression algorithms which do not distinguish between Chinese characters and English words, it presents a Chinese-oriented XML compressor with high compression ratio, called COX. The input documents are preprocessed by using the technology of Chinese word segmentation, the sorted dictionary is obtained by counting the woid frequency, and then the high-frequency and long-size words are coded by using the Huffman coding method. The items in the XML documents are classified by analyzing the documents, the items with the same class tag are sent to the same container. Moreover, the numerical data are processed especially in COX. The experimental results show that, compared to the general compression algorithms, COX achieves higher compression ratio if the XML documents contain more Chinese words, while needing more compression and decompression time as return.%针对当前常用的XML压缩算法没有考虑中文特点的情况,结合中文与XML的特点,提出一种高压缩率的适合中文XML文档的压缩算法COX.利用中文分词技术对XML文档进行分词处理,通过统计词频后获得排序的词典,利用Huffman编码思想对高频及长词汇进行压缩编码;解析XML文档后,把文档元素进行分类,同一类型的元素放入同一容器之中;算法还特别针对数字类型的数据进行了特殊处理.实验结果显示,相对于通用的压缩软件,COX具有更好的压缩效果,但压缩和解压缩时间要慢一些.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号