Investigating Data Consistency in the ASHRAE Dataset Using Clustering and Label Matching

Citation

Tan, Hui Hui and Tan, Yi Fei and Tan, Wooi Haw and Ooi, Chee Pun (2025) Investigating Data Consistency in the ASHRAE Dataset Using Clustering and Label Matching. IEEE Access, 13. pp. 170307-170327. ISSN 2169-3536

[img] Text
14689.pdf - Published Version
Restricted to Repository staff only

Download (3MB)

Abstract

Data is a critical component in various fields, enabling researchers to perform analyses, improve decision-making, optimization, and scientific research. However, poor data quality can lead to flawed decisions and inefficiencies. In evaluating data quality, four primary aspects are typically considered: accuracy, completeness, consistency, and timeliness. The existence of human preferences in the dataset may present issues with data inconsistency. One of the popular datasets in the thermal comfort area is the ASHRAE Comfort Database II, which contains human preference features that may cause data inconsistency and affect the performance of machine learning models. In this paper, Clustering and Label Matching is proposed to clean the data and achieve consistency. Outlier detection techniques are employed as reference methods to evaluate how well different approaches can improve dataset consistency, with performance assessed using cluster analysis and validated through prediction model evaluation. The results show that the proposed method is capable of performing deep cleaning to enhance dataset consistency. © 2013 IEEE.

Item Type: Article
Uncontrolled Keywords: ASHRAE comfort database II, clustering, data consistency, label matching, outlier
Subjects: Q Science > QA Mathematics
Divisions: Faculty of Artificial Intelligence & Engineering (FAIE)
Depositing User: Nurin Syazwani Azmi
Date Deposited: 04 Nov 2025 08:03
Last Modified: 04 Nov 2025 08:03
URII: http://shdl.mmu.edu.my/id/eprint/14689

Downloads

Downloads per month over past year

View ItemEdit (login required)