Deep-WET: a deep learning-based approach for predicting DNA-binding proteins using word embedding techniques with weighted features

Citation

Mahmud, S. M. Hasan and Goh, Michael Kah Ong and Hosen, Md. Faruk and Nandi, Dip and Shoombuatong, Watshara (2024) Deep-WET: a deep learning-based approach for predicting DNA-binding proteins using word embedding techniques with weighted features. Scientific Reports, 14 (1). ISSN 2045-2322

[img] Text
12.pdf - Published Version
Restricted to Repository staff only

Download (3MB)

Abstract

DNA-binding proteins (DBPs) play a signifcant role in all phases of genetic processes, including DNA recombination, repair, and modifcation. They are often utilized in drug discovery as fundamental elements of steroids, antibiotics, and anticancer drugs. Predicting them poses the most challenging task in proteomics research. Conventional experimental methods for DBP identifcation are costly and sometimes biased toward prediction. Therefore, developing powerful computational methods that can accurately and rapidly identify DBPs from sequence information is an urgent need. In this study, we propose a novel deep learning-based method called Deep-WET to accurately identify DBPs from primary sequence information. In Deep-WET, we employed three powerful feature encoding schemes containing Global Vectors, Word2Vec, and fastText to encode the protein sequence. Subsequently, these three features were sequentially combined and weighted using the weights obtained from the elements learned through the diferential evolution (DE) algorithm. To enhance the predictive performance of Deep-WET, we applied the SHapley Additive exPlanations approach to remove irrelevant features. Finally, the optimal feature subset was input into convolutional neural networks to construct the Deep-WET predictor. Both cross-validation and independent tests indicated that Deep-WET achieved superior predictive performance compared to conventional machine learning classifers. In addition, in extensive independent test, Deep-WET was efective and outperformed than several state-of-the-art methods for DBP prediction, with accuracy of 78.08%, MCC of 0.559, and AUC of 0.805. This superior performance shows that Deep-WET has a tremendous predictive capacity to predict DBPs. The web server of Deep-WET and curated datasets in this study are available at https://deepwet-dna.monarcatechnical.com/. The proposed Deep-WET is anticipated to serve the community-wide efort for large-scale identifcation of potential DBPs.

Item Type: Article
Uncontrolled Keywords: Machine-learning
Subjects: Q Science > Q Science (General) > Q300-390 Cybernetics
Divisions: Faculty of Information Science and Technology (FIST)
Depositing User: Ms Nurul Iqtiani Ahmad
Date Deposited: 04 Mar 2024 01:51
Last Modified: 04 Mar 2024 01:51
URII: http://shdl.mmu.edu.my/id/eprint/12148

Downloads

Downloads per month over past year

View ItemEdit (login required)