Browsing by Author "Talia, Domenico"

Now showing 1 - 17 of 17

Declarative Semantics for Consistency Maintenance
(2006) Caroprese, Luciano; Zumpano, Ester; Talia, Domenico
Designing Cloud services for data processing and knowledge discovery
(2012-10-24) Marozzo, Fabrizio; Palopoli, Luigi; Talia, Domenico; Trunfio, Paolo
A DLP-Based System for Ontology Representation and Reasoning
(2012-11-09) Gallucci, Lorenzo; Leone, Nicola; Talia, Domenico
In the last few years, the need for knowledge-based technologies is emerging in several application areas. Industries are now looking for semantic instruments for knowledge-representation and reasoning. In this context, ontologies (i.e., abstract models of a complex domain) have been recognized to be a fundamental tool; and the World Wide Web Consortium (W3C) has recommended OWL [58] as a standard language for ontologies. Some semantic assumptions of OWL, like Open World Assumption and non- Unique Name Assumption, make sense for the Web, but they are unsuited for Enterprise ontologies, that are specifications of information of business enterprises, which often evolve from relational databases, where both CWA and UNA are adopted. The subject of this thesis is OntoDLV , a system based on Disjunctive Logic Programming (DLP) for the specification and reasoning on enterprise ontologies. OntoDLP, the language of the system, overcomes the above-mentioned limitations of OWL, it adopts both CWA and UNA avoiding ”semantic clash” to enterprise databases. OntoDLP extends DLP with all the main ontology constructs including classes, inheritance, relations and axioms. The language is strongly typed, and includes also complex type constructors, like lists and sets. Importantly, OntoDLV supports a powerful interoperability mechanism with OWL, allowing the user to retrieve information also from OWL Ontologies and to reason on top of that by exploiting OntoDLP powerful deduction rules. The system is endowed with a powerful Application Programming Interface and is already used in a number of real-world applications,including agent-based systems and information extraction applications.
Effective Histogram-based Techniques for Summarizing Multi-dimensional Data
(2012-11-09) Mazzeo, Giuseppe Massimiliano; Saccà, Domenico; Talia, Domenico
This thesis is an effort towards the definition of effective summarization techniques for multi-dimensional data. The state-of-the-art techniques are examviii Preface ined and the issues related to their inability in effectively summarizing multidimensional data are pointed out. The attention is particularly focused on histogram-based summarization techniques, which are the most flexible, the most studied and the most adopted in commercial systems. In particular, hierarchical binary partitions are studied as a basis for effective multi-dimensional histograms, focusing the attention on two aspects which turn out to be crucial for histogram accuracy: the representation model and the strategy adopted for partitioning data into buckets. As regards the former, a very specific spaceefficient representation model is proposed where bucket boundaries are represented implicitly by storing the partition tree. Histograms adopting this representation model (which will be said to be Hierarchical Binary Histograms - HBH) can store a larger number of buckets within a given amount of memory w.r.t. histograms using a “flat” explicit storage of bucket boundaries. On top of that, the introduction of a constraint on the hierarchical partition scheme is studied, allowing each bucket to be partitioned only by splits lying onto a regular grid defined on it: histograms adopting such a constrained partitioning paradigm will be said to be Grid Hierarchical Binary Histograms (GHBH). The introduction of the grid-constrained partitioning of GHBHs can be exploited to further enhance the physical representation efficiency of HBHs. As regards the construction of effective partitions, some new heuristics are introduced, guiding the data summarization by locating inhomogeneous regions of the data where a finer-grain partition is needed. The two physical representation schemes adopted by HBH and GHBH can be viewed as a form of lossless compression to be used on top of the summarization accomplished by histograms (which is a form of lossy compression). The combination of these forms of compression are shown to result in a relevant improvement of histograms effectiveness. On the one hand, the proposed compression-based representation models provide a mechanism for efficiently locating the buckets involved in query estimation, thus reducing the amount of time needed to estimate queries w.r.t. traditional flat representation models. On the other hand, applying lossless compression on top of summarization reduces the loss of information due to summarization, as it enables a larger amount of summary data to be stored within a given storage space bound: this turns out to yield lower error rates of query estimates. By means of experiments, a thorough analysis of different classes of histograms based on hierarchical partitions is provided: the accuracy provided by combining different heuristics (both the new proposals and the “classical” heuristics of two well-known techniques, namely MHIST and Min-Skew) with either the traditional MBR-based representation model or the novel specific tree-based ones (both the unconstrained and the grid-constrained one) is studied. These results provide an insight into the value of compression in the context of histograms based on hierarchical partitions. Interestingly, it is shown that the impact of both HBH and GHBH representation models on the accuracy of query estimates is not simply orthogonal to the adopted heuristic. Thus, the best combination of these different features is identified, Preface ix which turns out from adopting the grid-constrained hierarchical partitioning of GHBHs guided by one of the new heuristics. GHBH is compared with state-of-the-art techniques (MHIST, Min-Skew, GENHIST, as well as other wavelet-based summarization approaches), showing that new technique results in much lower error rates and satisfiable degree of accuracy also at high-dimensionality scenarios. Another important contribution of this thesis consists in the proposal of a new approach for constructing effective histograms. The superiority of GHBH w.r.t. the other histogram-based techniques has been found to depend primarily on the most accurate adopted criterion for guiding the data domain partitioning. In fact, traditional techniques for constructing histograms often yield partitions where dense and sparse regions are put together in the same bucket, thus yielding poor accuracy in estimating queries on summarized data. Despite GHBH adopts a criterion which in theory avoid this situation, there is an intrinsic limit in all the top-down partitioning techniques. That is, histograms which are obtained by iteratively splitting blocks by starting from the block coinciding with the whole data domain, could not have the actual possibility to reach all the dense regions in order to isolate them. In fact, each split yields an increase in the number of bucket and, as the number of buckets is bounded, the number of split that can be performed is bounded as well. Therefore, in large domains, where data are particularly skewed, the number of available splits could not be large enough to reach in a top-down split-sequence all the dense regions. Thus, it could happen that GHBH starts the partitioning of data domain following the correct direction which leads to isolating dense regions, but at a certain point the number of available buckets, and thus of available splits, is saturated. This problem could be avoided by adopting a bottom-up strategy, which first locates the dense region of the data, and then aggregates them into buckets according to some suitable strategy. The problem of searching dense regions is very close to the data clustering problem, that is the problem of grouping database objects into a set of meaningful classes. The enhancement of the histogram construction has been tried by exploiting the capability of clustering techniques to locate dense regions. A new technique, namely CHIST (Clustering-based Histograms), for constructing multidimensional histograms on the basis of a well known density-based clustering algorithm, namely DBSCAN, is proposed. CHIST algorithm first invokes DBSCAN algorithm for partitioning the data into dense and sparse regions, and then further refines this partitioning by adopting a grid-based paradigm. CHIST is compared to GHBH and it is shown to provide lower error rates, especially in “critical” settings, that is when query selectivity is particularly low or the compression ratio is very high. It is worth remarking that in these settings, experiments comparing GHBH to the other techniques showed that GHBH still provides acceptable error rates, while those provided by other techniques are completely unacceptable. CHIST is also extended to the case that data to be summarized are dynamic. In this case, the re-execution of the clustering algorithm at each data update could result prohibitive, due to the high computational cost of this task. Thus, on the basis of Incremental DBSCAN algorithm, a strategy for efficiently propagating data updates to the histogram is proposed. By means of experiments it is shown that the incremental approach, for small updates (i.e., bulk of updates 100 times smaller than the overall data size) can be computed from 100 to 200 times faster than the from-scratch re-computation of the histogram, and the accuracy remains almost unaffected.
Entity resolution: effective schema and data reconciliation.
(2004) Folino, Francesco; Talia, Domenico; Saccà, Domenico; Manco, Giuseppe
Knowledge Management and Extraction in XML Data
(2012-11-09) Costa, Giovanni; Talia, Domenico; Saccà, Domenico; Manco, Giuseppe
Mobile Computing: energy-aware tecniques and location-based methodologies
(2014-12-01) Falcone, Deborah; Talia, Domenico; Greco, Sergio
Modelling complex data mining applications in a formal framework
(2008) Locane, Antonio; Saccà, Domenico; Manco, Giuseppe; Talia, Domenico
Ontology-Driven Modelling and analyzing of Business Process
(2014-03-10) Gualtieri, Andrea; Saccà, Domenico; Talia, Domenico
Ontology-Driven Modelling and analyzing of Business Process
(2014-03-10) Gualtieri, Andrea; Saccà, Domenico; Talia, Domenico
Protocol Architectures for wireless networks: issues, perspectives and enhancements
(2012-11-09) Loscrì, Valeria; Talia, Domenico; Marano, Salvatore
In this work we have analyzed different types of wireless networks and we have investigated the behavior of different MAC and routing protocols and different protocol architectures. Specifically, we have analyzed a TDMA MAC protocol for Mobile Ad hoc NETworks called Evolutionary-TDMA (E-TDMA). Based on this protocol we have developed a cross-layer approach that uses multiple linkdisjoint paths for a pair source-destination supporting a soft QoS. For this latter point we have developed a Multipath Forward Algorithm to manage and assign the slots. We have developed two different schemes: QoS scheme 1 and QoS scheme 2 and compared these schemes in terms of throughput, delay and overhead with a well-known multipath protocol routing, the AOMDV over 802.11 MAC protocol and AOMDV over E-TDMA. We have obtained good performance exploiting advantages of a TDMA MAC protocol and a Multipath routing protocol. Simulation results have shown as a cross-layer approach permits good performance in terms of throughput, end-to-end data packets delay and overhead to be obtained. In fact, multipath routing protocols for wireless ad hoc networks have been investigated because the use of alternate paths permits a greater fault-tolerance to be obtained. As far as multipath routing protocol is concerned we have developed and analyzed a multipath routing protocol based on geographic positions of the nodes in the network, Geographic Multipath Protocol (GMP). Results, in terms of throughput and delay, show that the multiple paths with minimum interference are better. We have investigated the cross-layer approach also for Wireless Sensor Networks. Although, this kind of networks could be considered an extension of Ad hoc networks, characteristics of WSNs are so different that protocols explicitly developed for MANETs cannot be directly applied for sensors networks. After analyzed some protocols we have proposed some solutions to increase the lifetime of the sensor networks considering other important parameters as throughput and latency. Finally, our analysis of wireless networks finishes with the analysis of Wireless Mesh Networks. We have investigated multi-hop wireless networks based on the IEEE 802.16 technology. Specifically, we have analyzed the Coordinated Distributed Scheme (CDS) of the IEEE Std. 802.16 and we have developed a MAC module 6 Conlcusions 158 supporting the CDS in ns2. Moreover, we have developed two different totally distributed schemes that do not require any changes of the structure of the hardware used in the 802.16. The two schemes proposed manage the assignment of the control slots (Transmit Opportunity XmtOP), in a different fashion. We have developed the two scheme in ns2 in order to compare them with the CDS of the 802.16. Our approaches permit good performance in terms of throughput and delay to be obtained. Our approaches are independent from parameters of the network as density or topology. On the other hand, the CDS is not robust for different network conditions because the behavior of the scheme depends from the setting of particular parameters.
Querying Inconsistent Data: Repairs and Consistent Answers
(2012-11-09) Parisi, Francesco; Flesca, Sergio; Talia, Domenico
In this dissertation we provide an extensive survey of the techniques for repairing and querying inconsistent relational databases. We distinguish four parameters for classifying and comparing of the existing techniques. First, we discern two repairing paradigms, namely the tuple-based and the attribute-based repairing paradigm. According to the former paradigm a re- pair for a database is obtained by inserting and=or deleting tuples, whereas according to the latter a repair is obtained by (also) modifying attribute values within tuples. Second, we distinguish several repair semantics which entail di®erent orders among the set of consistent database instances that can be obtained for an inconsistent database with respect to a given set of integrity constraints. Third, we classify the techniques on the basis of the classes of queries considered for computing consistent answers. Finally, we compare the di®erent approaches in literature on basis of the classes of integrity constraints which are assumed to be de¯ned on the database. 2) We investigate the problem of repairing and extracting reliable information from data violating a given set of aggregate constraints. These constraints consist of linear inequalities on aggregate-sum queries issued on measure values stored in the database. This syntactic form enables meaningful con- straints to be expressed. Indeed, aggregate constraints frequently occur in many real-life scenarios where guaranteeing the consistency of numerical data is mandatory. We consider database repairs consisting of sets of value-update opera- tions aiming at re-constructing the correct measure values of inconsistent data. We adopt two di®erent criteria for determining whether a set of update operations repairing data can be considered \reasonable" or not: set-minimal semantics and card-minimal semantics. Both these semantics aim at preserving the information represented in the source data as much as possible. They correspond to di®erent repairing strategies which turn out to be well-suited for di®erent application scenarios. We provide the complexity characterization of three fundamental prob- lems: (i) repairability: is there at least one (possible not minimal) repair for the given database with respect to the speci¯ed constraints? (ii) repair checking: given a set of update operations, is it a minimal repair? (iii) consistent query answer: is a given query true in every minimal repair? 3) We provide a method for computing card-minimal repairs for a database in presence of steady aggregate constraints, a restricted but expressive class of aggregate constraints. Under steady aggregate constraints, an instance of the problem of computing a card-minimal repair can be transformed into an instance of a Mixed-Integer Linear Programming (MILP) problem. Thus, standard techniques and optimizations addressing MILP problems can be re-used for computing a repairs. On the basis of this data-repairing framework, we propose an architecture providing robust data acquisition facilities from input documents contain- ing tabular data. We exploit integrity constraints de¯ned on the input data to support the detection and the repair of inconsistencies in the data arising from errors occurring in the acquisition phase performed on input data.
Resource reservation protocol and predictive algorithms for QoS support in wireless environments
(2008-01) Fazio, Peppino; Talia, Domenico; Marano, Salvatore
Scalable data analysis: methods, tools and applications
(2017-07-26) Belcastro, Loris; Crupi, Felice; Talia, Domenico
A spatial data infrastructure
(2012-10-24) D'Amore, Francesco; Palopoli, Luigi; Talia, Domenico; Cinnirella, Sergio
Swarm-Based Algorithms for Decentralized Clustering and Resource Discovery in Grids
(2012-11-09) Forestiero, Agostino; Spezzano, Giandomenico; Talia, Domenico
In this thesis, some novel algorithms based on swarm intelligent paradigm are proposed. In particular, the swarm agents, was exploited to tackle the following issues: - P2P Clustering. A swarm-based algorithm is used to cluster distributed data in a peer-to-peer environment through a small worlds topology. Moreover, to perform spatial clustering in every peer, two novel algorithms are proposed. They are based on the stochastic search of the ocking algorithm and on the main principles of two popular clustering algorithms, DBSCAN and SNN. - Resource discovery in Grids. An approach based on ant systems is exploited to replicate and map Grid services information on Grid hosts according to the semantic classi cation of such services. To exploit this mapping, a semi-informed resource discovery protocol which makes use of the ants' work has been achieved. Asynchronous query messages (agents) issued by clients are driven towards "representative peers" which maintain information about a large number of resources having the required characteristics.
Unified approach for measurement on modulated signal
(2012-11-09) Carnì, Domenico Luca; Talia, Domenico; Grimaldi, Domenico
The functional block architecture implementing the method to measure the carrier frequency error of the single carrier digital modulations is presented. It is able to operate on the single carrier digital modulated signals M-ASK, M-QAM and M-PSK. The functional block architecture, obtained from the Software Radio architecture, is based on the cascade of the Analog to Digital Converter (ADC), the Digital Down Converter and the base band processing. The performance of this cascaded architecture is analyzed by considering three different ADC architectures based on: the pipeline, the single quantizer loop modulator, and the Multistage Noise Shaper (MASH) modulator. Numerical tests confirm the important role of the ADC in this architecture, and highlight that the Band Pass MASH modulator based ADC offers interesting performances. Consequently, it candidates to be implemented in hardware and to be used in advanced measurement instrument. The results of research given to pointed out the functional block implementing the method for the carrier frequency error measurement are presented in [97].