首页> 外文会议>International Conference on Technical Debt >Prioritizing Technical Debt in Database Normalization Using Portfolio Theory and Data Quality Metrics
【24h】

Prioritizing Technical Debt in Database Normalization Using Portfolio Theory and Data Quality Metrics

机译:使用投资组合理论和数据质量指标在数据库归一化中优先考虑技术债务

获取原文

摘要

Database normalization is the one of main principles for designing relational databases. The benefits of normalization can be observed through improving data quality and performance, among the other qualities. We explore a new context of technical debt manifestation, which is linked to ill-normalized databases. This debt can have long-term impact causing systematic degradation of database qualities. Such degradation can be liken to accumulated interest on a debt. We claim that debts are likely to materialize for tables below the fourth normal form. Practically, achieving fourth normal form for all the tables in the database is a costly and idealistic exercise. Therefore, we propose a pragmatic approach to prioritize tables that should be normalized to the fourth normal form based on the metaphoric debt and interest of the ill-normalized tables, observed on data quality and performance. For data quality, tables are prioritized using the risk of data inconsistency metric. Unlike data quality, a suitable metric to estimate the impact of weakly or un-normalized tables on performance is not available. We estimate performance degradation and its costs using Input/Output (I/O) cost of the operations performed on the tables and we propose a model to estimate this cost for each table. We make use of Modern Portfolio Theory to prioritize tables that should be normalized based on the estimated I/O cost and the likely risk of cost accumulation in the future. To evaluate our methods, we use a case study from Microsoft, AdventureWorks. The results show that our methods can be effective in reducing normalization debt and improving the quality of the database.
机译:数据库规范化是设计关系数据库的主要原则之一。可以通过提高数据质量和性能以及其他质量来观察归一化的好处。我们探索了技术债务表现形式的新情况,该情况与规范化的数据库有关。该债务可能具有长期影响,导致数据库质量的系统下降。这种降级可以像累积债务利息一样。我们声称,对于低于第四范式的表格,债务很可能实现。实际上,为数据库中的所有表获取第四范式是一项昂贵且理想的工作。因此,我们提出了一种务实的方法来对表进行优先级排序,该表应基于对数据质量和性能的观察,根据不规范化表的隐喻债务和利息将其标准化为第四范式。对于数据质量,使用数据不一致风险度量标准对表进行优先级排序。与数据质量不同,没有合适的指标来评估弱或未规范化的表对性能的影响。我们使用在表上执行的操作的输入/输出(I / O)成本估算性能下降及其成本,并提出一个模型来估算每个表的成本。我们使用现代投资组合理论对表进行优先排序,这些表应基于估计的I / O成本和将来可能发生的成本累积风险进行规范化。为了评估我们的方法,我们使用Microsoft AdventureWorks的案例研究。结果表明,我们的方法可以有效减少归一化债务并提高数据库的质量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号