Tesi di Dottorato
Permanent URI for this communityTesi di Dottorato
Browse
4 results
Search Results
Item Ontology-driven information extraction(2017-07-20) Adrian, Weronika Teresa; Leone, Nicola; Manna, MarcoInformation Extraction consists in obtaining structured information from unstructured and semi-structured sources. Existing solutions use advanced methods from the field of Natural Language Processing and Artificial Intelligence, but they usually aim at solving sub-problems of IE, such as entity recognition, relation extraction or co-reference resolution. However, in practice, it is often necessary to build on the results of several tasks and arrange them in an intelligent way. Moreover, nowadays, Information Extraction faces new challenges related to the large-scale collections of documents in complex formats beyond plain text. An apparent limitation of existing works is the lack of uniform representation of the document analysis from multiple perspectives, such as semantic annotation of text, structural analysis of the document layout and processing of the integrated knowledge. The recent proposals of ontology-based Information Extraction do not fully exploit the possibilities of ontologies, using them only as a reference model for a single extraction method, such as semantic annotation, or for defining the target schema for the extraction process. In this thesis, we address the problem of Information Extraction from homogeneous collections of documents i.e., sets of files that share some common properties with respect to the content or layout. We observe that interleaving semantic and structural analysis can benefit the results of the IE process and propose an ontology-driven approach that integrates and extends existing solutions. The contributions of this thesis are of theoretical and practical nature. With respect to the first, we propose a model and a process of Semantic Information Extraction that integrates techniques from semantic annotation of text, document layout analysis, object-oriented modeling and rule-based reasoning. We adapt existing solutions to enable their integration under a common ontological view and advance the state-of-the-art in the field of semantic annotation and document layout analysis. In particular, we propose a novel method for automatic lexicon generation for semantic annotators, and an original approach to layout analysis, based on common labels identification and structure recognition. We design and implement a framework named KnowRex that realize the proposed methodology and integrates the elaborated solutions.Item Pairings and symmetry notions. A new unifying perspective in mathematics and computer science(2018-01-19) Infusino, Federico Giovanni; Leone, Nicola; Chiaselotti, Giampiero ,; Oliverio, Paolo A.; Polizzi, FrancescoItem Enhancing and Applying Answer Set Programming: Lazy Constraints, Partial Compilation and Question Answering(2019-01-17) Cuteri, Bernardo; Leone, Nicola; Ricca, FrancescoThis work is focused on Answer Set Programming (ASP), that is an expressive formalism for Knowledge Representation and Reasoning. Over time, ASP has been more and more devoted to solving real-world problems thanks to the availability of e cient systems. This thesis brings two main contributions in this context: (i) novel strategies for improving ASP programs evaluation, and (ii) a real-world application of ASP to Question Answering in Natural Language. Concerning the rst contribution, we study some cases in which classical evaluation fails because of the so-called grounding bottleneck. In particular, we rst focus on cases in which the standard evaluation strategy is ine ective due to the grounding of problematic constraints. We approach the problem using custom propagators and lazy instantiators, proving empirically when this solution is e ective, which is an aspect that was never made clear in the existing literature. Despite the development of propagators can be effective, it has two main disadvantages: it requires deep knowledge of the ASP systems, and the resulting solution is not declarative. We propose a technique for overcoming these issues which we call program compilation. In our approach, the propagators for some of the logic rules (not only for the constraints) of a program are generated automatically by a compiler. We provide some su cient conditions for identifying the rules that can be compiled in an approach that ts a propagator-based system architecture. An empirical analysis shows the performance bene ts obtained by introducing (partial) compilation into ASP programs evaluation. To the best of our knowledge, this is the rst work on compilation-based techniques for ASP. Concerning the second part of the thesis, we present the development of a Natural Language Question Answering System whose core is based on ASP. The proposed system gradually transforms input questions into SPARQL queries that are executed on an ontological knowledge base. The system integrates several state-of-the NLP models and tools with a special focus on the Italian language and the Cultural Heritage domain. ASP is used to classify questions from a syntactical point of view. The resulting system is the core module of the PIUCULTURA project, funded by the Italian Ministry of Economic Development, that has the aim to devise a system for promoting and improving the fruition of Cultural Heritage.Item Domain specific languages for parallel numerical modeling on structured grids(2019-01-17) De Rango, Alessio; Leone, Nicola; D'Ambrosio, Donato; Spataro, William; Mudalige, GihanHigh performance computing (HPC) is undergoing a period of enormous change. Due to the di culties in increasing clock frequency inde nitely (i.e., the breakdown of Dennard's scaling and power wall), the current direction is towards improving performance through increasing parallelism. However, there is no clear consensus yet on the best architecture for HPC, and di erent solutions are currently employed. As a consequence, applications targeting a given architecture can not be easily adapted to run on alternative solutions, since this would require a great e ort due to the need to deal with platform-speci c details. Since it is not known a priori which HPC architecture will prevail, the Scienti c Community is looking for a solution that could tackle the above mentioned issue. A possible solution consists in the adoption of a high-level abstraction development strategy based on Domain Speci c Languages (DSLs). Among them, OpenCAL (Open Computing Abstraction Layer) and OPS (Oxford Parallel Structured) have been proposed as domain speci c C/C++ data parallel libraries for structured grids. The aim of these libraries is to provide an abstract computing model able to hide any parallelization detail by targeting, at the same time, di erent current (and possibly future) parallel architectures. In this Thesis, I have contributed to the design and development of both the OpenCAL and OPS projects. In particular, my contribution to OpenCAL has regarded the development of the single-GPU and multi-GPU/multi-node components, namely OpenCAL-CL and OpenCAL-CLM, while my contribution to OPS has regarded the introduction of the OpenMP 4.0/4.5 support, as an alternative to OpenCL, CUDA and OpenACC, for exploiting modern many-core computing systems. Both the improved DSLs have been tested on di erent benchmarks, among which a fractal set generator, a graphics lter routine, and three di erent uid- ows applications, with more than satisfying results. In particular, OpenCAL was able to e ciently scale over larger computational domains with respect to its original implementation, thanks to the new multi-GPU/multi-node capabilities, while OPS was able to reach near optimal performance using the high-level OpenMP 4.0/4.5 speci cations on many-core accelerators with respect to the alternative low-level CUDA-based version.