Ontology-driven information extraction

Adrian, Weronika Teresa; Leone, Nicola; Manna, Marco

Ontology-driven information extraction

dc.contributor.author	Adrian, Weronika Teresa
dc.contributor.author	Leone, Nicola
dc.contributor.author	Manna, Marco
dc.date.accessioned	2020-07-28T09:21:42Z
dc.date.available	2020-07-28T09:21:42Z
dc.date.issued	2017-07-20
dc.description	Dottorato di Ricerca in Matematica ed Informatica. Ciclo XXIX	en_US
dc.description.abstract	Information Extraction consists in obtaining structured information from unstructured and semi-structured sources. Existing solutions use advanced methods from the field of Natural Language Processing and Artificial Intelligence, but they usually aim at solving sub-problems of IE, such as entity recognition, relation extraction or co-reference resolution. However, in practice, it is often necessary to build on the results of several tasks and arrange them in an intelligent way. Moreover, nowadays, Information Extraction faces new challenges related to the large-scale collections of documents in complex formats beyond plain text. An apparent limitation of existing works is the lack of uniform representation of the document analysis from multiple perspectives, such as semantic annotation of text, structural analysis of the document layout and processing of the integrated knowledge. The recent proposals of ontology-based Information Extraction do not fully exploit the possibilities of ontologies, using them only as a reference model for a single extraction method, such as semantic annotation, or for defining the target schema for the extraction process. In this thesis, we address the problem of Information Extraction from homogeneous collections of documents i.e., sets of files that share some common properties with respect to the content or layout. We observe that interleaving semantic and structural analysis can benefit the results of the IE process and propose an ontology-driven approach that integrates and extends existing solutions. The contributions of this thesis are of theoretical and practical nature. With respect to the first, we propose a model and a process of Semantic Information Extraction that integrates techniques from semantic annotation of text, document layout analysis, object-oriented modeling and rule-based reasoning. We adapt existing solutions to enable their integration under a common ontological view and advance the state-of-the-art in the field of semantic annotation and document layout analysis. In particular, we propose a novel method for automatic lexicon generation for semantic annotators, and an original approach to layout analysis, based on common labels identification and structure recognition. We design and implement a framework named KnowRex that realize the proposed methodology and integrates the elaborated solutions.	en_US
dc.description.sponsorship	Università della Calabria	en_US
dc.identifier.uri	http://hdl.handle.net/10955/2092
dc.identifier.uri	https://doi.org/10.13126/unical.it/dottorati/2092
dc.language.iso	en	en_US
dc.relation.ispartofseries	INF/01;
dc.subject	Computer science	en_US
dc.subject	Information extraction	en_US
dc.subject	Ontologie	en_US
dc.title	Ontology-driven information extraction	en_US
dc.type	Thesis	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: PhD Thesis - Weronika Adrian.pdf
Size:: 1.78 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Dipartimento di Matematica e Informatica - Tesi di Dottorato