Validity of the best practice in splitting data for hold-out validation strategy as performed on the ink strokes in the context of forensic science

Lee Loong Chuen; Liong Choong-Yeun; Jemain Abdul Aziz

首页> 外文期刊>Microchemical Journal: Devoted to the Application of Microtechniques in all Branches of Science >Validity of the best practice in splitting data for hold-out validation strategy as performed on the ink strokes in the context of forensic science

【24h】

Validity of the best practice in splitting data for hold-out validation strategy as performed on the ink strokes in the context of forensic science

机译：在法医科学背景下对墨水冲程执行的拆除验证策略的最佳实践的有效性

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

External testing (ET), known also as the hold-out validation, is currently considered to be one of the most reliable ways to estimate predictive ability of a statistical model. One safeguard to prevent impermissible peeking in ET is to ensure all replicates of a particular sample is only included in either the test or the training set. Assuming a sample X1 consists of two replicates (i.e. X1a and X1b). The model is claimed to enjoy impermissible peeking if the X1a and X1b are split into the training and the test sets, respectively. Eventually, the resulting prediction model is expected to predict the test sets easily and presents an over-optimistic model performance. In forensic document examinations, an individual pen (IP) can be used to produce multiple ink strokes. In real-world practice, pens are manufactured via bulk production such that one big tank of ink is used to produce a wealth of IPs. In other words, ink strokes produced by varying IPs but of the same pen model are indeed originated from one single source (i.e. the same tank of ink). Eventually, with respect to the aforementioned safeguard, how shall one treat the ink strokes? Are they replicates or independent samples? In this context, the aim of the work is to investigate the validity of the safeguard in splitting dataset for hold-out validation strategy (i.e. ET) in the domain of forensic pen ink analysis. An infrared (IR) spectra of blue gel pen inks was used to demonstrate the practical aspect. The IR spectral data were collected from 1361 ink strokes that originated from 273 IPs of 23 pen models and 10 pen brands. Iterative stratified random sampling was employed to prepare 1000 pairs of training and test sets that were split at ratio 7:3 using two different principles: (a) set IP - selection was conducted at IP level to ensure all the ink strokes originated from a particular IP must be included into either the training or the test sets only; and (b) set NIP - ink strokes of a particular IP were a

机译：外部测试（ET），也称为阻止验证，目前被认为是估算统计模型预测能力的最可靠方式之一。一个防止彼得不允许偷看的保障是为了确保特定样本的所有重复仅包括在测试或培训集中。假设样品X1由两个复制（即x1a和x1b）组成。如果X1A和X1B分别分配到训练和测试集，则声称该模型允许允许不允许的偷看。最终，预期所得到的预测模型将容易地预测测试集并呈现过乐观的模型性能。在法医文献检查中，单个笔（IP）可用于产生多个墨水冲程。在真实的实践中，钢笔通过散装生产制造，使得一个大型墨水罐用于产生丰富的IP。换句话说，通过改变IPS但相同的笔模型产生的墨水冲程确实是来自一个单个源（即相同的墨水罐）。最终，关于上述保障，如何治疗墨水冲程？它们是否重复或独立样本？在这种情况下，该工作的目的是调查拆除验证策略（即Et）在法医笔墨水分析领域的拆分数据集中的保障措施的有效性。使用蓝色凝胶笔墨的红外线（IR）光谱来证明实际方面。从1361个墨水冲程收集IR光谱数据，该墨水冲程源于23个笔模型和10个笔品牌的273 IP。采用迭代分层随机采样制备1000对训练和测试集，其使用两种不同的原理分开7：3：（a）设置IP - 选择在IP水平下进行，以确保所有墨水冲程源自特定的墨水冲程IP必须仅包含在培训或测试集中; （b）设置了特定IP的墨水笔划是一个

著录项

来源
《Microchemical Journal: Devoted to the Application of Microtechniques in all Branches of Science》 |2018年第2018期|共9页
作者
Lee Loong Chuen; Liong Choong-Yeun; Jemain Abdul Aziz;
展开▼
作者单位

Univ Kebangsaan Malaysia FSK Forens Sci Program Jalan Raja Muda Abdul Aziz Kuala Lumpur 50300 Malaysia;

Univ Kebangsaan Malaysia FST Sch Math Sci Bangi 43600 Selangor Malaysia;

Univ Kebangsaan Malaysia FST Sch Math Sci Bangi 43600 Selangor Malaysia;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类分析化学;
关键词
PLS-DA; Replicates; Data splitting; Model validation; IR spectrum; Forensic science;

机译：PLS-DA;复制;数据分裂;模型验证;IR谱;法医科学;

相似文献

外文文献
中文文献
专利

1. Validity of the best practice in splitting data for hold-out validation strategy as performed on the ink strokes in the context of forensic science [J] . Lee Loong Chuen, Liong Choong-Yeun, Jemain Abdul Aziz Microchemical Journal: Devoted to the Application of Microtechniques in all Branches of Science . 2018,第期

机译：在法医科学背景下对墨水冲程执行的拆除验证策略的最佳实践的有效性
2. Development and Validation of a Gradient-HPLC-PDAD Method for the Identification of Ballpoint Pen Ink Components: Study of Their Decomposition on Aging for Forensic Science Appplications [J] . V. F. Samanidou, K. I. Nikolaidou, I. N. Papadoyannis Journal of liquid chromatography and related technologies . 2004,第2期

机译：鉴定圆珠笔墨水成分的梯度HPLC-PDAD方法的开发和验证：法医学应用的老化分解研究
3. New perspectives in the use of ink evidence in forensic science: Part I. Development of a quality assurance process for forensic ink analysis by HPTLC. [J] . Neumann C, Margot P Forensic science international . 2009,第1a3期

机译：在法医学中使用墨水证据的新观点：第一部分：HPTLC为法墨水分析开发质量保证程序。
4. Analysis of k-Fold Cross-Validation over Hold-Out Validation on Colossal Datasets for Quality Classification [C] . Sanjay Yadav, Sanyam Shukla . 2016

机译：对用于质量分类的巨大数据集进行保留验证的k折交叉验证分析
5. Science in the Sun: How Science is Performed as a Spatial Practice. [D] . Kass, Natalie. 2017

机译：阳光下的科学：空间实践是如何进行科学的。
6. Clinical science: Development and validation of prediction models to estimate risk of primary total hip and knee replacements using data from the UK: two prospective open cohorts using the UK Clinical Practice Research Datalink [O] . Dahai Yu, Kelvin P Jordan, Kym I E Snell, -1

机译：临床科学：使用来自英国的数据开发和验证预测模型以评估一次主要的全髋和膝关节置换的风险：使用UK Clinical Practice Research Datalink进行的两个前瞻性队列研究
7. Interdisciplinary Evaluations Performed by Forensic Science Organizations of the Russian Ministry of Justice: Current Trends in Forensic Linguistics and Forensic Psychology [O] . S. A. Smirnova, T. N. Sekerazh, V. O. Kuznetsov 2017

机译：俄罗斯司法部法医学科学组织进行的跨学科评估：法医语言学和法医心理学的当前趋势

Validity of the best practice in splitting data for hold-out validation strategy as performed on the ink strokes in the context of forensic science

摘要

著录项

相似文献

相关主题

期刊订阅