Tesi di Dottorato

Permanent URI for this communityTesi di Dottorato

Browse

Search Results

Now showing 1 - 3 of 3
  • Item
    Data mining techniques for large and complex data
    (2017-11-13) Narvaez Vilema, Miryan Estela; Crupi, Felice; Angiulli, Fabrizio
    During these three years of research I dedicated myself to the study and design of data mining techniques for large quantities of data. Particular attention was devoted to training set condensing techniques for the nearest-neighbor classification rule and to techniques for node anomaly detection in networks. The first part of this thesis was focused on the design of strategies to reduce the size of the subset extracted from condensing techniques and to their experimentation. The training set condensing techniques aim to determine a subset of the original training set having the property of allowing to correctly classify all the training set examples. The subset extracted from these techniques also known as consistent subset. The result of the research was the development of various strategies of subset selection, designed to determine during the training phase the most promising subset based on different methods of estimating test accuracy. Among them, the PACOPT strategy is based on Pessimistic Error Estimate (PEE) to estimate generalization as a trade-off between training set accuracy and model complexity. The experimental phase has had for reference the FCNN technique of condensation. Among the methods of condensation based on the nearest neighbor decision rule (NN rule), FCNN (for Fast Condensed NN) it is one of the most advantageous technique, particularly in terms of time performance. We showed that the designed selection strategies guarantee to preserve the accuracy of a consistent subset. We also demonstrated that the proposed selection strategies guarantee to significantly reduce the size of the model. Comparison with notable training-set reduction techniques for the NN rule witness for state-of-the-art performances of the here introduced strategies. The second part of the thesis is directed towards the design of analysis tools for network structured data. Anomaly detection is an area that has received much attention in recent years. It has a wide variety of applications, including fraud detection and network intrusion detection. The techniques focused on anomaly detection in static graphs assume that the networks do not change and are capable of representing only a single snapshot of data. As real-world networks are constantly changing, there has been a shift in focus to dynamic graphs, which evolve over time. We present a technique for node anomaly detection in networks where arcs are annotated with time of creation. The technique aims at singling out anomalies by taking simultaneously into account information concerning both the structure of the network and the order in which connections have been established. The latter information is obtained by timestamps associated with arcs. A set of temporal structures is induced by checking certain conditions on the order of arc appearance denoting different kinds of user behaviors. The distribution of these structures is computed for each node and used to detect anomalies. We point out that the approach here investigated is substantially different from techniques dealing with dynamic networks. Indeed, our aim is not to determine the points in time in which a certain portion of the networks (typically a community or a subgraph) exhibited a significant change, as usually done by dynamic-graph anomaly detection techniques. Rather, our primary aim is to analyze each single node by taking simultaneously into account its temporal footprint.
  • Item
    Scalable data analysis: methods, tools and applications
    (2017-07-26) Belcastro, Loris; Crupi, Felice; Talia, Domenico
  • Item
    User behavioral problems in complex social networks
    (2019-06-20) Perna, Diego; Tagarelli, Andrea; Crupi, Felice
    Over the past two decades, we witnessed the advent and the rapid growth of numerous social networking platforms. Their pervasive diffusion dramatically changed the way we communicate and socialize with each other. They introduce new paradigms and impose new constraints within their scope. On the other hand, online social networks (OSNs) provide scientists an unprecedented opportunity to observe, in a controlled way, human behaviors. The goal of the research project described in this thesis is to design and develop tools in the context of network science and machine learning, to analyze, characterize and ultimately describe user behaviors in OSNs. After a brief review of network-science centrality measures and ranking algorithms, we examine the role of trust in OSNs, by proposing a new inference method for controversial situations. Afterward, we delve into social boundary spanning theory and define a ranking algorithm to rank and consequently identify users characterized by alternate behavior across OSNs. The second part of this thesis deals with machine-learning-based approaches to solve problems of learning a ranking function to identify lurkers and bots in OSNs. In the last part of this thesis, we discuss methods and techniques on how to learn a new representational space of entities in a multilayer social network.