Citation
Yow, Li Wen (2025) Bacterial species classification using genome-derived motif sequences and machine learning. Masters thesis, Multimedia University. Full text not available from this repository.Abstract
Microbial classification is a fundamental area of microbiology research, providing insights into bacterial diversity, evolutionary relationships, and disease tracking. Bacterial taxonomy faces challenges in distinguishing closely related strains due to the complexity of genomic architecture. Traditional methods often show inconsistent accuracy in resolving genetic variations among closely related bacteria, resulting in taxonomic ambiguity and reduced classification reliability. Although modern classification frameworks have improved species-level differentiation, achieving high precision among genetically similar populations remains challenging. This research introduces a computational approach to overcome limitations in bacterial taxonomic identification accuracy. By utilising conserved genetic motifs as evolutionary molecular markers, the study integrates motif-based feature engineering with machine learning algorithms to develop a framework for improving bacterial taxonomic identification. Salmonella enterica was used as a model species due to its well-documented genetic complexity and multi-level taxonomy structure, including species, subspecies, and serovar levels. The dataset comprised 240 whole genome sequences from six Salmonella subspecies and ten serovars. Genomic data were processed through genome annotation, orthology prediction, and motif identification. Identified motifs were encoded into machine-readable formats, and single nucleotide polymorphism (SNP) patterns within conserved motifs were extracted as classification features. Random Forest and Support Vector Machine (SVM) were implemented to train and evaluate the classification framework. The research achieved a 98% classification accuracy in differentiating Salmonella subspecies and serovars. Validation analyses, including Basic Local Alignment Search Tool (BLAST) verification, feature importance assessment, and taxonomic analysis of misclassified strains, confirmed the model’s taxonomic precision. The findings demonstrate that genetic motifs can serve as molecular signatures for bacterial classification and provide a feature engineering strategy for genomic classification modelling. This framework provides a potentially generalisable approach for strain classification across bacterial genera.
| Item Type: | Thesis (Masters) |
|---|---|
| Additional Information: | Call No.: R859.7.M33 Y69 2025 |
| Uncontrolled Keywords: | Machine learning—Medical applications |
| Subjects: | R Medicine > R Medicine (General) > R858-859.7 Computer applications to medicine. Medical informatics |
| Divisions: | Faculty of Information Science and Technology (FIST) |
| Depositing User: | Ms Nurul Iqtiani Ahmad |
| Date Deposited: | 22 May 2026 08:55 |
| Last Modified: | 22 May 2026 08:55 |
| URII: | http://shdl.mmu.edu.my/id/eprint/15909 |
Downloads
Downloads per month over past year
Edit (login required) |
