Alignment-free distance measures for clustering Expressed Sequence Tags

Citation

Ngo, Keng Hoong (2013) Alignment-free distance measures for clustering Expressed Sequence Tags. PhD thesis, Multimedia University.

Full text not available from this repository.

Abstract

Clustering of expressed sequence tags (ESTs) is a vital step in EST analysis pipeline. The main goal of clustering is to gather overlapping ESTs from the same transcript of a single gene into a distinct cluster. A simple way to cluster ESTs is by comparing their similarity in a pair-wise manner. In fact, earlier EST clustering was implemented using the alignment-based distance measures such as BLAST, FASTA, Smith-Waterman algorithm and etc. However, the main shortcoming of the alignment-based approach is the high computational cost resulting from pair-wise alignment. This makes it impractical for very large EST datasets. This has motivated the introduction of alignment-free distance measures for EST clustering. Established EST clustering methods such as d2_cluster, wcd and PEACE apply alignment-free distance measures. Performance-wise, they yield faster computation time with acceptable clustering accuracy as compared to the alignment based methods. In EST clustering, it is common to implement a windowing strategy in conjunction with the alignment-free distance measures. Some distance measures also use heuristics to speed up the comparisons. Consequently, the clustering results produced by them can vary significantly from one dataset to another. It means that the clustering performance is excellent when the distance measure is able to detect and quantify the features found in the dataset efficiently. On the other hand, it can perform poorly when it deals with another dataset with different characteristics, where the distance measure fails to capture and quantify them correctly.

Item Type: Thesis (PhD)
Additional Information: Call Number: QA278 N45 2013
Subjects: Q Science > QA Mathematics > QA299.6-433 Analysis
Divisions: Faculty of Computing and Informatics (FCI)
Depositing User: Ms Nurul Iqtiani Ahmad
Date Deposited: 26 Feb 2014 01:43
Last Modified: 26 Feb 2014 01:43
URII: http://shdl.mmu.edu.my/id/eprint/5237

Downloads

Downloads per month over past year

View ItemEdit (login required)