A Multilingual BERT Embeddings Approach in Identifying Factors Influencing Employability Among Pre-University Students

Citation

Law, Theng Jia and Ting, Choo Yee and Ng, Hu and Goh, Hui Ngo and Albert, Quek (2024) A Multilingual BERT Embeddings Approach in Identifying Factors Influencing Employability Among Pre-University Students. In: 2024 IEEE International Conference on Artificial Intelligence in Engineering and Technology (IICAIET), 26-28 August 2024, Kota Kinabalu, Malaysia.

[img] Text
A Multilingual BERT Embeddings Approach in Identifying Factors Influencing Employability Among Pre-University Students.pdf - Published Version
Restricted to Repository staff only

Download (451kB)

Abstract

Identifying the important factors influencing employability among the pre-university students is a non-trivial task due to the multifaceted educational data. By leveraging Machine Learning techniques, researchers have investigated various important factors influencing employability, but it remains challenging when existing feature encoding increases the data dimensionality and misrepresentation due to a greater number of nominal values. Therefore, this study has aimed to (i) identify the important factors influencing employability among pre-university students, (ii) explore the effectiveness of feature embeddings in identifying employability compared to existing feature encoding methods, and (iii) evaluate the important factors influencing the ability in identifying employability among pre-university students. The dataset is collected from 4007 graduates of a university from the years 2021 and 2022 with 27 variables. To study the effectiveness of feature embeddings, multilingual Bidirectional Encoder Representations from Transformers embeddings are generated after tokenizing the feature values. By applying Boruta and feature embeddings, home state, district, race, gender, disability, date of birth, program description, faculty domain and description, entry eligibility, the grades of Malay Language, Mathematics and Malaysian University English Test were identified. With these important variables, predictive models were constructed and evaluated. The findings revealed that the important variables identified has improved the model performance in identifying employability based on accuracy, precision, recall, F1-score, and Area under the Curve. Moreover, Logistic Regression, Gaussian Naïve Bayes, and Random Forest achieved an improvement of Geometric Mean with 60.80%, 63.40%, and 64.00% respectively, showcasing the superiority of feature embeddings with Boruta compared to existing feature encoding methods.

Item Type: Conference or Workshop Item (Paper)
Uncontrolled Keywords: Machine learning
Subjects: L Education > LB Theory and practice of education > LB2300 Higher Education
Q Science > Q Science (General) > Q300-390 Cybernetics
Divisions: Faculty of Computing and Informatics (FCI)
Depositing User: Ms Nurul Iqtiani Ahmad
Date Deposited: 03 Dec 2024 00:33
Last Modified: 03 Dec 2024 00:33
URII: http://shdl.mmu.edu.my/id/eprint/13143

Downloads

Downloads per month over past year

View ItemEdit (login required)