Forum Text Processing and Summarization

Citation

Mak, Yen Wei and Goh, Hui Ngo and Lim, Amy Hui Lan (2024) Forum Text Processing and Summarization. JOIV : International Journal on Informatics Visualization, 8 (1). p. 425. ISSN 2549-9610

[img] Text
2279-6593-1-PB.pdf - Published Version
Restricted to Repository staff only

Download (3MB)

Abstract

Frequently Asked Questions (FAQs) are extensively studied in general domains like the medical field, but such frameworks are lacking in domains such as software engineering and open-source communities. This research aims to bridge this gap by establishing the foundations of an automated FAQ Generation and Retrieval framework specifically tailored to the software engineering domain. The framework involves analyzing, ranking, performing sentiment analysis, and summarization techniques on open forums like StackOverflow and GitHub issues. A corpus of Stack Overflow post data is collected to evaluate the proposed framework and the selected models. Integrating state-of-the-art models of string-matching models, sentiment analysis models, summarization models, and the proprietary ranking formula proposed in this paper forms a robust Automatic FAQ Generation and Retrieval framework to facilitate developers' work. String matching, sentiment analysis, and summarization models are evaluated, and F1 scores of 71.31%, 74.90%, and 53.4% were achieved. Given the subjective nature of evaluations in this context, a human review is used to further validate the effectiveness of the overall framework, with assessments made on relevancy, preferred ranking, and preferred summarization. Future work includes improving summarization models by incorporating text classification and summarizing them individually (Kou et al, 2023), as well as proposing feedback loop systems based on human reinforcement learning. Furthermore, efforts will be made to optimize the framework by utilizing knowledge graphs for dimension reduction, enabling it to handle larger corpora effectively

Item Type: Article
Uncontrolled Keywords: Deep Learning (DL); Forum Processing; Natural Language Processing (NLP); Summarization; Sentiment Analysis
Subjects: L Education > LB Theory and practice of education > LB1060 Learning
Divisions: Faculty of Computing and Informatics (FCI)
Depositing User: Ms Nurul Iqtiani Ahmad
Date Deposited: 03 May 2024 02:43
Last Modified: 03 May 2024 02:43
URII: http://shdl.mmu.edu.my/id/eprint/12423

Downloads

Downloads per month over past year

View ItemEdit (login required)