Automated Semantic Analysis of Soccer Videos Using Audio and Visual Feature Extraction


Ahmed Elgamml, Mohamed Mosleh (2020) Automated Semantic Analysis of Soccer Videos Using Audio and Visual Feature Extraction. PhD thesis, Multimedia University.

Full text not available from this repository.
Official URL:


The semantic understanding and the suitable definition of any video content become an attracting search point for many researchers worldwide, who produced several algorithms for automatic semantic analysis, annotation, retrieval, and summarisation. The fast development in the accessibility of video information is increasing the demand for proficient methodologies for video comprehension and understanding at the semantic level. The wide dispersal of multimedia contents of dissimilar types and format led to the need for robust and accurate approaches to efficiently address the data analysis. This study concentrates on soccer video semantic processing within the broad area of video semantic analysis. This work proposed an automated semantic analysis framework based on audio, low-level and high-level visual features. The framework is validated using actual footage of soccer videos. This work proposes a framework for automatically generating and annotating the highlights from soccer videos. Soccer scenes in view of visual and audio energy components, characterise the individual soccer scenes, as well as detected events, are classified and annotated into classes such as replay, goal, yellow and red cards and saves using support vector machine and deep learning. Two classification approaches are proposed. The first approach divides each soccer video into labelled shots using the video features before undergoing the classification using Support Vector Machine. The second approach involves creating a network for video classification by converting the videos into sequences of feature vectors using a pre-trained convolutional neural network and then combining the pre-trained image classification model and a long short-term memory network. This study effectively produced automated labelled highlights for soccer video depending on three factors data reduction, automation and performance by testing the two approaches using a manually annotated and tagged real dataset. The result of the classification process shows that the SVM model achieved 92.75% accuracy in the average accuracy. While the proposed DL model achieved 88.4% in the average accuracy but it needs only 837 minutes to complete the extracting and classification process compared by 1454 minutes needed for the SVM model.

Item Type: Thesis (PhD)
Additional Information: Call No: QA76.5913 .M64 2020
Uncontrolled Keywords: Semantic computing
Subjects: Q Science > QA Mathematics > QA71-90 Instruments and machines > QA75.5-76.95 Electronic computers. Computer science
Divisions: Faculty of Engineering and Technology (FET)
Depositing User: Ms Nurul Iqtiani Ahmad
Date Deposited: 19 May 2023 07:33
Last Modified: 19 May 2023 07:33


Downloads per month over past year

View ItemEdit (login required)