Main Menu

Encoding laparoscopic image to words using vision transformer for distortion classification and ranking in laparoscopic videos

Citation

AlDahoul, Nouar and Abdul Karim, Hezerul and Momo, Mhd Adel and Tan, Myles Joshua Toledo and Fermin, Jamie Ledesma (2025) Encoding laparoscopic image to words using vision transformer for distortion classification and ranking in laparoscopic videos. Multimedia Tools and Applications, 84. pp. 7159-7181. ISSN 1573-7721

Text
Encoding laparoscopic image to words using vision transformer for distortion classification and ranking in laparoscopic videos.pdf - Published Version
Restricted to Repository staff only
Download (1MB)

Official URL: https://doi.org/10.1007/s11042-024-19089-9

Abstract

Laparoscopic videos are tools used by surgeons to insert narrow tubes into the abdomen and keep the skin without large incisions. The videos captured by a camera are prone to numerous distortions such as uneven illumination, motion blur, defocus blur, smoke, and noise which have impact on visual quality. Automatic detection and identifcation of distortions are signifcant to enhance the quality of laparoscopic videos to avoid errors during surgery. The video quality assessment includes two stages: classifcation of distortions afecting the video frames to identify their types and ranking of distortions to estimate the intensity levels. The dataset generated in ICIP2020 challenge including laparoscopic videos was utilized for training, validation, and testing the proposed solution. The difculty of this dataset is caused by having fve categories of distortions and four levels of severity. Additionally, the availability of multiple distortion categories in one video is considered the most challenging part of this dataset. The work presented in this paper contributes to solve the multi-label distortion classifcation and ranking problem. This paper aims to enhance the performance of distortion classifcation solutions. Vision transformer which is a deep learning model was used to extract informative features by transferring learning and representation from the general domain to the medical domain (laparoscopic videos). Additionally, six parallel multilayer perceptron (MLP) classifers were added and attached to vision transformer for distortion classifcation and ranking. The experiment showed that the proposed solution outperforms existing distortion classifcation methods in terms of average accuracy (89.7%), average single distortion F1 score (94.18%), and average of both single and multiple distortions F1 score (96.86%). Moreover, it can also rank the distortions with an average accuracy of 79.22% and average F1 score of 78.44%. Hence, the high performance of the method proposed in this paper opens the door to integrate our solution in the intelligent video enhancement system.

Item Type:	Article
Uncontrolled Keywords:	Laparoscopic
Subjects:	R Medicine > RG Gynecology and obstetrics
Divisions:	Faculty of Engineering (FOE)
Depositing User:	Ms Nurul Iqtiani Ahmad
Date Deposited:	28 May 2024 09:16
Last Modified:	02 May 2025 07:41
URII:	http://shdl.mmu.edu.my/id/eprint/12465

Downloads

Downloads per month over past year

Edit (login required)