Feature Hashing for Language and Dialect Identification

机译：语言和方言识别的特征散列

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

We evaluate feature hashing for language identification (LID), a method not previously used for this task. Using a standard dataset, we first show that while feature performance is high, LID data is highly dimensional and mostly sparse (>99.5%) as it includes large vocabularies for many languages; memory requirements grow as languages are added. Next we apply hashing using various hash sizes, demonstrating that there is no performance loss with dimensionality reductions of up to 86%. We also show that using an ensemble of low-dimension hash-based classifiers further boosts performance. Feature hashing is highly useful for LID and holds great promise for future work in this area.

机译：我们评估用于语言识别（LID）的功能哈希，这是以前未用于此任务的方法。使用标准数据集，我们首先显示出虽然功能性能很高，但是LID数据具有很高的维数，并且稀疏（> 99.5％），因为它包含许多语言的大量词汇。内存需求随着语言的添加而增长。接下来，我们使用各种散列大小应用散列，这表明在降维高达86％的情况下不会出现性能损失。我们还表明，使用低维基于散列的分类器集合可进一步提高性能。特征散列对于LID非常有用，并为该领域的未来工作带来了广阔前景。

著录项

来源
《Annual meeting of the Association for Computational Linguistics》|2017年|399-403|共5页
会议地点
作者
Shervin Malmasi; Mark Dras;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Dialect Identification of Assamese Language using Spectral Features [J] . Tanvira Ismail, L. Joyprakash Singh Indian Journal of Science and Technology . 2017,第20期

机译：利用频谱特征识别阿萨姆语的方言
2. Language model adaptation for language and dialect identification of text [J] . Jauhiainen T., Linden K., Jauhiainen H. Natural language engineering . 2019,第5期

机译：语言模型适应文本的语言和方言识别
3. Variance Normalised Features for Language and Dialect Discrimination [J] . Miao Xiaoxiao, McLoughlin Ian, Song Yan Circuits, systems and signal processing . 2021,第7期

机译：方差标准化语言和方言歧视的特征
4. Feature Hashing for Language and Dialect Identification [C] . Shervin Malmasi, Mark Dras Annual meeting of the Association for Computational Linguistics . 2017

机译：用于语言和方言识别的功能散列
5. Features and methods for automatic dialect identification. [D] . Rojas, David Michael. 2010

机译：自动识别方言的功能和方法。
6. Bodyprint—A Meta-Feature Based LSTM Hashing Model for Person Re-Identification [O] . Danilo Avola, Luigi Cinque, Alessio Fagioli, 2020

机译：BodyPrint-A基于元特征的LSTM散列模型用于人重新识别
7. Using Artificial Neural Networks in Dialect Identification in Less-resourced Languages - The Case of Kurdish Dialects Identification [O] . Hossein Hassani, Oussama H. Hamid 2017

机译：在较少资源语言中使用人工神经网络在方言识别中 - 基于库尔德方言识别的情况

Feature Hashing for Language and Dialect Identification

摘要

著录项

相似文献

相关主题

期刊订阅