Applying Data Mining on Personal Computer for Document Classification

Citation

Chai, Ian and Salleh, Ahmad Zarif (2025) Applying Data Mining on Personal Computer for Document Classification. JOIV : International Journal on Informatics Visualization, 9 (3). pp. 1252-1262. ISSN 2549-9610

[img] Text
3473-11778-1-PB.pdf - Published Version
Restricted to Repository staff only

Download (3MB)

Abstract

The typical user creates documents over many years of computer usage. As people move from computer to computer, they tend to copy the files to the new computer, because "you never know when we might need to refer to something from the past." Hence, the collection grows larger and larger, expanding to hundreds and thousands. This collection soon exceeds the ability of most people to remember what each document was, even if they have been keeping them in some order in folders – and many people fail to anticipate how the folders and subfolders should be arranged as time passes – and by the time they realize it, most find it too daunting a task to reclassify them all manually. Therefore, we sought to solve this problem using a data mining-based solution, specifically multinomial naive Bayes. We developed a document classification program to automatically categorize all documents stored on a person's personal computer hard drive, eliminating the need for manual classification. The proposed algorithm achieved a score of 0.853 for accuracy, 9,833 for precision, 0.661 for recall, and 0.767 for the F1 metric. It should be possible, with further refinement and improvement, for example by balancing the dataset and increasing its size, for this technique to be applied in practical applications that enable automatic document classifications on the computers of most computer users.

Item Type: Article
Uncontrolled Keywords: Data mining, text classification, document classification, multinomial naive bayes, natural language processing
Subjects: Q Science > QA Mathematics > QA71-90 Instruments and machines > QA75.5-76.95 Electronic computers. Computer science
Divisions: Faculty of Computing and Informatics (FCI)
Depositing User: Ms Suzilawati Abu Samah
Date Deposited: 27 Aug 2025 03:11
Last Modified: 03 Sep 2025 07:22
URII: http://shdl.mmu.edu.my/id/eprint/14419

Downloads

Downloads per month over past year

View ItemEdit (login required)