Tesi di Dottorato

Permanent URI for this communityTesi di Dottorato

Browse

Search Results

Now showing 1 - 3 of 3
  • Item
    Data mining techniques for large and complex data
    (2017-11-13) Narvaez Vilema, Miryan Estela; Crupi, Felice; Angiulli, Fabrizio
    During these three years of research I dedicated myself to the study and design of data mining techniques for large quantities of data. Particular attention was devoted to training set condensing techniques for the nearest-neighbor classification rule and to techniques for node anomaly detection in networks. The first part of this thesis was focused on the design of strategies to reduce the size of the subset extracted from condensing techniques and to their experimentation. The training set condensing techniques aim to determine a subset of the original training set having the property of allowing to correctly classify all the training set examples. The subset extracted from these techniques also known as consistent subset. The result of the research was the development of various strategies of subset selection, designed to determine during the training phase the most promising subset based on different methods of estimating test accuracy. Among them, the PACOPT strategy is based on Pessimistic Error Estimate (PEE) to estimate generalization as a trade-off between training set accuracy and model complexity. The experimental phase has had for reference the FCNN technique of condensation. Among the methods of condensation based on the nearest neighbor decision rule (NN rule), FCNN (for Fast Condensed NN) it is one of the most advantageous technique, particularly in terms of time performance. We showed that the designed selection strategies guarantee to preserve the accuracy of a consistent subset. We also demonstrated that the proposed selection strategies guarantee to significantly reduce the size of the model. Comparison with notable training-set reduction techniques for the NN rule witness for state-of-the-art performances of the here introduced strategies. The second part of the thesis is directed towards the design of analysis tools for network structured data. Anomaly detection is an area that has received much attention in recent years. It has a wide variety of applications, including fraud detection and network intrusion detection. The techniques focused on anomaly detection in static graphs assume that the networks do not change and are capable of representing only a single snapshot of data. As real-world networks are constantly changing, there has been a shift in focus to dynamic graphs, which evolve over time. We present a technique for node anomaly detection in networks where arcs are annotated with time of creation. The technique aims at singling out anomalies by taking simultaneously into account information concerning both the structure of the network and the order in which connections have been established. The latter information is obtained by timestamps associated with arcs. A set of temporal structures is induced by checking certain conditions on the order of arc appearance denoting different kinds of user behaviors. The distribution of these structures is computed for each node and used to detect anomalies. We point out that the approach here investigated is substantially different from techniques dealing with dynamic networks. Indeed, our aim is not to determine the points in time in which a certain portion of the networks (typically a community or a subgraph) exhibited a significant change, as usually done by dynamic-graph anomaly detection techniques. Rather, our primary aim is to analyze each single node by taking simultaneously into account its temporal footprint.
  • Item
    Anomalies in cyber security: detection, prevention and simulation approaches
    (2018-07-03) Argento, Luciano; Crupi, Felice; Furfaro, Angelo; Angiulli, Fabrizio
    With themassive adoption of the Internet both our private andworking life has drastically changed. The Internet has introduced new ways to communicate and complete every day tasks. Organisations of any kind have taken their activities online to achieve many advantages, e.g. commercial organisations can reach more customers with proper marketing. However, the Internet has also brought various drawbacks and one of these concerns cyber security issues. Whenever an entity (e.g. a person or company) connects to the Internet it immediately becomes a potential target of cyber threats, i.e. malicious activities that take place in cyberspace. Examples of cyber threats are theft of intellectual property and denial of service attacks. Many efforts have been spent to make the Internet perhaps the most revolutionary communication tool ever created, but unfortunately little has been done to design it in a secure fashion. Since the massive adoption of the Internet we have witnessed a huge number of threats, perpetrated by many different actors such as criminal organisations, disgruntled workers and even people with little expertise, thanks to the existence of attack toolkits. On top of that, cyber threats are constantly going through a steady evolution process and, as a consequence, they are getting more and more sophisticated. Nowadays, the cyber security landscape is in a critical condition. It is of utmost importance to keep up with the evolution of cyber threats in order to improve the state of cyber security. We need to adapt existing security solutions to the ever-changing security landscape and devise new ones when needed. The research activities presented in this thesis find their place in this complex scenario. We investigated significant cyber security problems, related to data analysis and anomaly detection, in different areas of research, which are: Hybrid Anomaly Detection Systems; Intrusion Detection Systems; Access Control Systems and Internet of Things. Anomaly detection approaches are very relevant in the field of cyber security. Fraud and intrusion detection arewell-known research areaswhere such approaches are very important. A lot of techniques have been devised, which can be categorised in anomaly and signature based detection techniques. Researchers have also spent much effort on a third category of detection techniques, i.e. hybrid anomaly detection, which combine the two former approaches in order to obtain better detection performances. Towards this direction, we designed a generic framework, called HALF, whose goal is to accommodate multiple mining algorithms of a specific domain and provide a flexible and more effective detection capability. HALF can be easily employed in different application domains such as intrusion detection and steganalysis due to its generality and the support provided for the data analysis process. We analysed two case studies in order to show how HALF can be exploited in practice to implement a Network Intrusion Detection System and a Steganalysis tool. The concept of anomaly is a core element of the research activity conducted in the context of intrusion detection, where an intrusion can be seen as an anomalous activity that might represent a threat to a network or system. Intrusion detection systems constitute a very important class of security tools which have become an invaluable defence wall against cyber threats. In this thesis we present two research results that stemfromissues related to IDSs that resort to the n-grams technique. The starting point of our first contribution is the threat posed by content-based attacks. Their goal is to deliver malicious content to a service in order to exploit its vulnerabilities. This type of attacks has been causing serious damages to both people and organisations over these years. Some of these attacks may exploit web application vulnerabilities to achieve goals such as data theft and privilege escalation, which may lead to enormous financial loss for the victim. IDSs that exploit the n-gram technique have proven to be very effective against this category of cyber threats. However, n-grams may not be sufficient to build reliable models that describe normal and/or malicious traffic. In addition, the presence of an adversarial attacker is not properly addressed by the existing solutions. We devised a novel anomaly-based intrusion detection technique, called PCkAD to detect content-based attacks threatening application level protocols. PCkAD models legitimate traffic on the basis of the spatial distribution of the n−grams occurring in the relevant content of normal traffic and has been designed to be resistant to blending evasion techniques. Indeed, we demonstrate that evading is an intrinsically difficult problem. The experiments conducted to evaluate PCkAD show that it achieves state of the art performances in real attack scenarios and that it performs well against blending attacks. The second contribution concerning intrusion detection investigates issues that may be brought by the employment of the n-gram technique. Many approaches using n-grams have been proposed in literature which typically exploit high order n-grams to achieve good performance. However, because the n-gram domain grows exponentially with respect to the n-gram size, significant issues may arise, from the generation of huge models to overfitting. We present an approach aimed to reduce the size of n-grambased models, which is able build models that contain only a fraction of the original n-grams with little impact on the detection accuracy. The reported experiments, conducted on a real word dataset, show promising results. The research concerning access control systems focused on anomalies that represent attempts of exceeding or misusing access controls to negatively affect the confidentiality, integrity or availability of a target information system. Access control systems are nowadays the first line of defence of modern computing systems. However, their intrinsic static nature hinders autonomously refinement of access rules and adaptation to emerging needs. Advanced attributed-based systems still rely on mainly manual administration approaches and are not effective on preventing insider threat exploiting granted access rights. We introduce a machine learning approach to refine attribute-based access control policies based on behavioural patterns of users’ access to resources. The designed system tailors a learning algorithm upon the decision tree solutions. We analysed a case study and conducted an experiment to show the effectiveness of the system. IoT is the last topic of interest in the present thesis. IoT is showing the potential for impacting several domains, ranging from personal to enterprise environments. IoT applications are designed to improve most aspects of both business and citizens’ lives, however such emerging technology has become an attractive target for cybercriminals. A worrying security problem concerns the presence of many smart devices that have security holes. Researchers are investing their efforts in the evaluation of security properties. Following this direction, we show that it is possible to effectively assess cyber security scenarios involving IoT settings by combining novel virtual environments, agent-based simulation and real devices and then achieving a means that helps prevent anomalous actions fromtaking advantage of security holes for malicious purposes. We demonstrate the effectiveness of the approach through a case study regarding a typical smart home setting.
  • Item
    Discovering Exceptional Individuals and Properties in Data
    (2014-03-07) Fassetti, Fabio; Angiulli, Fabrizio; Palopoli, Luigi