This study presents a method laying the groundwork for systematically monitoring food quality and the healthfulness of consumers’ point-of-sale grocery purchases. The method automates the process of identifying United States Department of Agriculture (USDA) Food Patterns Equivalent Database (FPED) components of grocery food items. The input to the process is the compact abbreviated descriptions of food items that are similar to those appearing on the point-of-sale sales receipts of most food retailers. The FPED components of grocery food items are identified using Natural Language Processing techniques combined with a collection of food concept maps and relationships that are manually built using the USDA Food and Nutrient Database for Dietary Studies, the USDA National Nutrient Database for Standard Reference, the What We Eat In America food categories, and the hierarchical organization of food items used by many grocery stores. We have established the construct validity of the method using data from the National Health and Nutrition Examination Survey, but further evaluation of validity and reliability will require a large-scale reference standard with known grocery food quality measures. Here we evaluate the method’s utility in identifying the FPED components of grocery food items available in a large sample of retail grocery sales data (~190 million transaction records).
展开▼
机译:这项研究提出了一种方法,为系统地监视食品质量和消费者的销售点杂货店的健康状况奠定基础。该方法使识别美国农业部(USDA)食品模式等效数据库(FPED)杂货食品项目的过程自动化。该过程的输入是对食品的紧凑缩写描述,与大多数食品零售商的销售点销售收据上显示的相似。使用自然语言处理技术,结合使用美国农业部饮食研究营养和营养数据库,美国农业部国家营养标准参考数据库手动构建的食品概念图和关系集合,可以识别杂货食品的FPED成分。我们在美国吃的食物类别以及许多杂货店使用的食物的等级组织。我们已经使用来自美国国家健康与营养检查调查(National Health and Nutrition Examination Survey)的数据确定了该方法的构造效度,但是进一步评估效度和可靠性将需要具有已知食品杂货质量措施的大规模参考标准。在这里,我们评估了该方法在识别大量零售杂货销售数据(约1.9亿笔交易记录)中可用的杂货食品中FPED成分时的效用。
展开▼