Learning to extract domain-specific relations from complex sentences

Citation

Tan, Saravadee Sae and Lim, Tek Yong and Soon, Lay Ki and Tang, Enya Kong (2016) Learning to extract domain-specific relations from complex sentences. Expert Systems with Applications, 60. pp. 107-117. ISSN 0957-4174

[img] Text
1-s2.0-S0957417416302226-main.pdf
Restricted to Repository staff only

Download (1MB)

Abstract

Open Information Extraction (OIE) systems focus on identifying and extracting general relations from text. Most OIE systems utilize simple linguistic structure, such as part-of-speech or dependency features, to extract relations and arguments from a sentence. These approaches are simple and fast to implement, but suffer from two main drawbacks: i) they are less effective to handle complex sentences with multiple relations and shared arguments, and ii) they tend to extract overly-specific relations. This paper proposes an approach to Information Extraction called SemIE, which addresses both drawbacks. SemIE identifies significant relations from domain-specific text by utilizing a semantic structure that describes the domain of discourse. SemIE exploits the predicate-argument structure of a text, which is able to handle complex sentences. The semantics of the arguments are explicitly specified by mapping them to relevant concepts in the semantic structure. SemIE uses a semi-supervised learning approach to bootstrap training examples that cover all relations expressed in the semantic structure. SemIE inputs pairs of structured documents and uses a Greedy Mapping module to bootstrap a full set of training examples. The training examples are then used to learn the extraction and mapping rules. We evaluated the performance of SemIE by comparing it with OLLIE, a state-of-the-art OIE system. We tested SemIE and OLLIE on the task of extracting relations from text in the “movie” domain and found that on average, SemIE outperforms OLLIE. Furthermore, we also examined how the performance varies with sentence complexity and sentence length. The results prove the effectiveness of SemIE in handling complex sentences.

Item Type: Article
Uncontrolled Keywords: Open information extraction, Structure mapping, Bootstrapping training examples, Greedy mapping
Subjects: Q Science > QA Mathematics > QA71-90 Instruments and machines > QA75.5-76.95 Electronic computers. Computer science > QA76.75-76.765 Computer software
Divisions: Faculty of Computing and Informatics (FCI)
Depositing User: Ms Suzilawati Abu Samah
Date Deposited: 12 Feb 2018 17:59
Last Modified: 12 Feb 2018 17:59
URII: http://shdl.mmu.edu.my/id/eprint/6685

Downloads

Downloads per month over past year

View ItemEdit (login required)