Main Menu

Domain-Independent true fact identification from knowledge graphs

Citation

Pillai, Sini Govinda (2024) Domain-Independent true fact identification from knowledge graphs. PhD thesis, Multimedia University.

Full text not available from this repository.

Official URL: http://erep.mmu.edu.my/

Abstract

Large knowledge graphs (KGs) are the primary source for structured data in many Semantic Web applications. However, various heuristic methods used in the construction processes of these large-scale knowledge graphs often introduce noise into the graph. In order for machines to efficiently process this data for different data mining, entity linking and information retrieval tasks, it is particularly beneficial to have as many reliable facts from KGs as possible. Therefore, KGs are extensively researched individually or in comparison to one another concerning different targets. In fact, the trustworthiness of information in the KG is determined by the trustworthiness of information at the fact level. Therefore, the main objective of this research is to increase the trustworthiness of entity relations from KGs. When it comes to the entity relations of a specific entity from any random class, we do not have any single KG containing all the information. Our research first integrates knowledge from multiple graphs and thus utilizes information from multiple KGs to obtain the true facts. As such, our first contribution is to propose an approach for detecting true facts of entity information from various KGs by integrating Linked Open Data (LOD), string similarity measures and semantic similarity measures. We have created a Gold Standard data rich with true facts from popular domains, person, organization and location to evaluate true facts identification. Most of the existing error detection approaches were applied to specific KGs. A large percentage of error detection approaches work well on DBpedia, particularly. Therefore, our next contribution is an error detection and correction approach using RDF Reification on the integrated environment, independent of any particular KG. This research proposes a framework for identifying true facts entity information that combines two independent approaches for true fact identification. The research was conducted on related and diverse knowledge graphs, DBpedia, YAGO and Wikidata. Our approach was evaluated on the Gold Standard data and the real-world entity information from DBpedia, YAGO and Wikidata. In addition, the effectiveness of RDF reification for identifying true facts is evaluated on Wikidata on selected entities. Our experiment has demonstrated that most entities belonging to a particular domain have similar patterns of errors. Therefore, this can be the basis for grouping refinement approaches by domain. Our studies revealed that, compared to other domains, entity information with respect to location showed more discrepancy. On Gold Standard data, the first approach employing multiple KGs performed better and error detection and correction employing RDF reification performed better on the dataset directly retrieved by the Simple Protocol and RDF Query Language (SPARQL), from the selected KGs. Since our final proposed approach combines both these approaches, identifying true facts from multiple KGs and employing RDF reification to check for provenance information, our proposed approach demonstrated promising results on both datasets.

Item Type:	Thesis (PhD)
Additional Information:	Call No.: Q387.5 .P55 2024
Uncontrolled Keywords:	Semantic networks (Information theory)
Subjects:	Q Science > Q Science (General) > Q300-390 Cybernetics
Divisions:	Faculty of Computing and Informatics (FCI)
Depositing User:	Ms Nurul Iqtiani Ahmad
Date Deposited:	30 Sep 2025 01:46
Last Modified:	30 Sep 2025 01:46
URII:	http://shdl.mmu.edu.my/id/eprint/14521

Downloads

Downloads per month over past year

Edit (login required)