Ontology-driven information extraction

dc.contributor.authorAdrian, Weronika Teresa
dc.contributor.authorLeone, Nicola
dc.contributor.authorManna, Marco
dc.date.accessioned2020-07-28T09:21:42Z
dc.date.available2020-07-28T09:21:42Z
dc.date.issued2017-07-20
dc.descriptionDottorato di Ricerca in Matematica ed Informatica. Ciclo XXIXen_US
dc.description.abstractInformation Extraction consists in obtaining structured information from unstructured and semi-structured sources. Existing solutions use advanced methods from the field of Natural Language Processing and Artificial Intelligence, but they usually aim at solving sub-problems of IE, such as entity recognition, relation extraction or co-reference resolution. However, in practice, it is often necessary to build on the results of several tasks and arrange them in an intelligent way. Moreover, nowadays, Information Extraction faces new challenges related to the large-scale collections of documents in complex formats beyond plain text. An apparent limitation of existing works is the lack of uniform representation of the document analysis from multiple perspectives, such as semantic annotation of text, structural analysis of the document layout and processing of the integrated knowledge. The recent proposals of ontology-based Information Extraction do not fully exploit the possibilities of ontologies, using them only as a reference model for a single extraction method, such as semantic annotation, or for defining the target schema for the extraction process. In this thesis, we address the problem of Information Extraction from homogeneous collections of documents i.e., sets of files that share some common properties with respect to the content or layout. We observe that interleaving semantic and structural analysis can benefit the results of the IE process and propose an ontology-driven approach that integrates and extends existing solutions. The contributions of this thesis are of theoretical and practical nature. With respect to the first, we propose a model and a process of Semantic Information Extraction that integrates techniques from semantic annotation of text, document layout analysis, object-oriented modeling and rule-based reasoning. We adapt existing solutions to enable their integration under a common ontological view and advance the state-of-the-art in the field of semantic annotation and document layout analysis. In particular, we propose a novel method for automatic lexicon generation for semantic annotators, and an original approach to layout analysis, based on common labels identification and structure recognition. We design and implement a framework named KnowRex that realize the proposed methodology and integrates the elaborated solutions.en_US
dc.description.sponsorshipUniversità della Calabriaen_US
dc.identifier.urihttp://hdl.handle.net/10955/2092
dc.identifier.urihttps://doi.org/10.13126/unical.it/dottorati/2092
dc.language.isoenen_US
dc.relation.ispartofseriesINF/01;
dc.subjectComputer scienceen_US
dc.subjectInformation extractionen_US
dc.subjectOntologieen_US
dc.titleOntology-driven information extractionen_US
dc.typeThesisen_US

Files

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
PhD Thesis - Weronika Adrian.pdf
Size:
1.78 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: