Tesi di Dottorato
Permanent URI for this communityTesi di Dottorato
Browse
10 results
Search Results
Item Requirements engineering for complex systems(2017-07-26) Gallo, Teresa; Saccà, Domenico; Furfaro, Angelo; Garro, Alfredo; Crupi, FeliceRequirements Engineering (RE) is a part of Software Engineering and, in general, of System Engineering. RE aims to help in building software which satis es the user needs, eliciting, documenting, validating and maintaining the requirements that a software has to adequately satisfy. During its 30 years of RE history, its importance has been perceived with various degrees, from being the most important activity, well formalized and de ned in big complete documents which were the bible of the software project, to the opposite side where it has been reduced to just an informal activity, volatile, never formalized, not at all maintained, because ever changing. The need for well managing requirements is extremely important, mainly for complex systems which involve great investments of resources and/or cannot be easily substituted. A system can be complex because it is realized by the collaboration of a numerous and heterogeneous set of stakeholders, as for example in a big industrial research project, often co-funded with public resources, where usually many partners, with di erent backgrounds and languages must cooperate for reaching the project goals. Furthermore, a system can be complex because it constitutes the IT system of an Enterprise, which has been grown, along the time, by adding many pieces of software, integrated in many and di erent ways; the IT system is often distributed, ubiquitously interoperates on many computers, and behaves as a whole big system, though developed by many software providers, at di erent times, with di erent technologies and tools. The complexity of these systems is highly considered for several critical industrial domains where features of real-time and fault-tolerance are vital, such as automotive, railway, avionics, satellite, health care and energy; in these domains a great variety of systems are usually designed and developed by organizing and integrating existing components that pool their resources and capabilities to create a new system which is able to o er more functionalities and performances than those o ered by the simple sum of its components. Typically, the design and management of such systems, best known as System of Systems (SoS), have properties not immediately de ned, derived and easily analyzed starting from the properties of their stand-alone parts. For these reasons, SoS requires suitably engineered methods, tools and techniques, for managing requirements and any other construction process phase, with the aim to minimize whichever risk of fail. However, every complex IT system, even though it does not deal with a critical domain, but it supports the core business of an enterprise, must be well governed to avoid the risk of becoming rapidly inadequate to its role. This risk becomes high when many uncontrolled IT developments, aimed at supporting requirements changes, accumulate. In fact, as the complexity grows up, the IT system might become too expensive to maintain and then it should be retired and substituted after some too short time, often with big and underestimated di culties. For these reasons, complex systems must be governed during their evolution, both from the point of view of 'which application is where and why', and from the point of view of the supported requirements, that is 'which need is supported by each application and for whom'. This governance would facilitate the knowledge, the management, the essentialness and the maintenance of the complex systems, by allowing e cient support and a long-lasting system, with the consequence of minimizing waste of costs and inadequacy of the support for core business of the enterprise. This work addresses mainly the issue of governing systems which are complex because either they are the result of the collaboration of many di erent stakeholders (e.g. are big co-funded R&D projects) or they are Enterprise Information Systems (EIS) (e.g. IT system of medium/large enterprises). In this direction, a new goal-oriented requirements methodology, named GOReM, was de ned which has speci c features useful for the addressed issues. In addition a new approach, ResDevOps, has been conceived, that allows to re ne the government of the requirements of an EIS which is continuously improved, and which increases and evolves along the time. The thesis presents the framework of state of the art in which these activities found their collocation, together with a set of case studies which were developed inside some real projects, mainly big projects of R&D which have seen involved the University of Calabria, but also some cases in real industrial projects. The main results were published and were included in international conference proceedings and a manuscript is in press on an international journal.Item Malevolent Activities Detection and Cyber Range Scenarios Orchestration(2018-06-08) Piccolo, Antonio; Saccà, Domenico; Pugliese, Andrea; Crupi, Feliceincreasing availability of Internet accessible services driven by the di usion of connected devices. The consequent exposition to cyber-threats demands for suitable methodologies, techniques and tools allowing to adequately handle issues arising in such a complex domain. Most Intrusion Detection Systems are capable of detecting many attacks, but cannot provide a clear idea to the analyst because of the huge number of false alerts generated by these systems. This weakness in the IDS has led to the emergence of many methods in which to deal with these alerts, minimize them and highlight the real attacks. Furthermore, experience shows that the inter- pretation of the alerts usually requires more than the single messages provided by the sensors, so there is a need for techniques that can analyse the alerts within the context in which they have been generated. This might require the ability to correlate them with some other contextual information provided by other devices. Using synthetic data to design, implement and test these techniques its not fair and reliable because the variety and unpredictability of the real world data. On the other hand retrieve these information from real world networks is not easy (and sometimes impossible) due to privacy and con dential restrictions. Virtual Environments, Software De ned Systems and Software De ned Net- work will play a critical role in many cyber-security related aspects like the assessment of newly devised intrusion detection techniques, the generation of real world like logs, the evaluation of skills of cyber-defence team members and the evaluation of the disruptive e ects caused by the di usion of new malware. This thesis proposes, among other things, a novel domain-speci c platform, named SmallWorld, aimed to easily design, build and deploy realistic com- puter network scenarios achieved by the immersion of real systems into a software de ned virtual environment, enriched by Software De ned Agents put in charge of reproducing users or bot behaviours. Additionally, to provide validation and performance evaluation of the proposed platform, a number of Scenarios (including penetration testing laboratories, IoT and domotics net- works and a reproduction of the most common services on Internet like a DNS server, a MAIL server, a booking service and a payment gateway) have been developed inside SmallWorld. Over time the platform has been rewrit- ten and radically improved leading to the birth of Hacking Square. This new version is currently available on-line and freely accessible from anyone. The impact of this research prototype has been demonstrated, above all, during the course of "Metodi e Strumenti per la Sicurezza Informatica" for the mas- ter degree in Cyber Security at DIMES, University of Calabria. In fact, the platform has been employed to build the laboratory of the course as an in cloud service for students (including all the material to conduct exercises and assignments) and to organize a, practical, Capture the Flag (CTF) like nal test. Finally, the platform is under the attention of Consorzio Interuniver- sitario per l'Informatica (CINI), as it could be used to manage and deploy training content for the CyberChallenge 2018.Item Cyber defense of enterprise information systems: advanced isues and techniques(2014-11-28) Rullo, Antonino; Pugliese, Andrea; Saccà, Domenico; Greco, SergioItem The Generative Aspects of Count Constraints: Complexity, Languages and Algorithms(2011-11-23) Serra, Edoardo; Palopoli, Luigi; Saccà, DomenicoItem Ontology-Driven Modelling and analyzing of Business Process(2014-03-10) Gualtieri, Andrea; Saccà, Domenico; Talia, DomenicoItem Data mining techniques for fraud detection(2014-03-07) Guarascio, Massimo; Saccà, Domenico; Manco, Giuseppe; Palopoli, LuigiItem Modelling complex data mining applications in a formal framework(2008) Locane, Antonio; Saccà, Domenico; Manco, Giuseppe; Talia, DomenicoItem Entity resolution: effective schema and data reconciliation.(2004) Folino, Francesco; Talia, Domenico; Saccà, Domenico; Manco, GiuseppeItem Effective Histogram-based Techniques for Summarizing Multi-dimensional Data(2012-11-09) Mazzeo, Giuseppe Massimiliano; Saccà, Domenico; Talia, DomenicoThis thesis is an effort towards the definition of effective summarization techniques for multi-dimensional data. The state-of-the-art techniques are examviii Preface ined and the issues related to their inability in effectively summarizing multidimensional data are pointed out. The attention is particularly focused on histogram-based summarization techniques, which are the most flexible, the most studied and the most adopted in commercial systems. In particular, hierarchical binary partitions are studied as a basis for effective multi-dimensional histograms, focusing the attention on two aspects which turn out to be crucial for histogram accuracy: the representation model and the strategy adopted for partitioning data into buckets. As regards the former, a very specific spaceefficient representation model is proposed where bucket boundaries are represented implicitly by storing the partition tree. Histograms adopting this representation model (which will be said to be Hierarchical Binary Histograms - HBH) can store a larger number of buckets within a given amount of memory w.r.t. histograms using a “flat” explicit storage of bucket boundaries. On top of that, the introduction of a constraint on the hierarchical partition scheme is studied, allowing each bucket to be partitioned only by splits lying onto a regular grid defined on it: histograms adopting such a constrained partitioning paradigm will be said to be Grid Hierarchical Binary Histograms (GHBH). The introduction of the grid-constrained partitioning of GHBHs can be exploited to further enhance the physical representation efficiency of HBHs. As regards the construction of effective partitions, some new heuristics are introduced, guiding the data summarization by locating inhomogeneous regions of the data where a finer-grain partition is needed. The two physical representation schemes adopted by HBH and GHBH can be viewed as a form of lossless compression to be used on top of the summarization accomplished by histograms (which is a form of lossy compression). The combination of these forms of compression are shown to result in a relevant improvement of histograms effectiveness. On the one hand, the proposed compression-based representation models provide a mechanism for efficiently locating the buckets involved in query estimation, thus reducing the amount of time needed to estimate queries w.r.t. traditional flat representation models. On the other hand, applying lossless compression on top of summarization reduces the loss of information due to summarization, as it enables a larger amount of summary data to be stored within a given storage space bound: this turns out to yield lower error rates of query estimates. By means of experiments, a thorough analysis of different classes of histograms based on hierarchical partitions is provided: the accuracy provided by combining different heuristics (both the new proposals and the “classical” heuristics of two well-known techniques, namely MHIST and Min-Skew) with either the traditional MBR-based representation model or the novel specific tree-based ones (both the unconstrained and the grid-constrained one) is studied. These results provide an insight into the value of compression in the context of histograms based on hierarchical partitions. Interestingly, it is shown that the impact of both HBH and GHBH representation models on the accuracy of query estimates is not simply orthogonal to the adopted heuristic. Thus, the best combination of these different features is identified, Preface ix which turns out from adopting the grid-constrained hierarchical partitioning of GHBHs guided by one of the new heuristics. GHBH is compared with state-of-the-art techniques (MHIST, Min-Skew, GENHIST, as well as other wavelet-based summarization approaches), showing that new technique results in much lower error rates and satisfiable degree of accuracy also at high-dimensionality scenarios. Another important contribution of this thesis consists in the proposal of a new approach for constructing effective histograms. The superiority of GHBH w.r.t. the other histogram-based techniques has been found to depend primarily on the most accurate adopted criterion for guiding the data domain partitioning. In fact, traditional techniques for constructing histograms often yield partitions where dense and sparse regions are put together in the same bucket, thus yielding poor accuracy in estimating queries on summarized data. Despite GHBH adopts a criterion which in theory avoid this situation, there is an intrinsic limit in all the top-down partitioning techniques. That is, histograms which are obtained by iteratively splitting blocks by starting from the block coinciding with the whole data domain, could not have the actual possibility to reach all the dense regions in order to isolate them. In fact, each split yields an increase in the number of bucket and, as the number of buckets is bounded, the number of split that can be performed is bounded as well. Therefore, in large domains, where data are particularly skewed, the number of available splits could not be large enough to reach in a top-down split-sequence all the dense regions. Thus, it could happen that GHBH starts the partitioning of data domain following the correct direction which leads to isolating dense regions, but at a certain point the number of available buckets, and thus of available splits, is saturated. This problem could be avoided by adopting a bottom-up strategy, which first locates the dense region of the data, and then aggregates them into buckets according to some suitable strategy. The problem of searching dense regions is very close to the data clustering problem, that is the problem of grouping database objects into a set of meaningful classes. The enhancement of the histogram construction has been tried by exploiting the capability of clustering techniques to locate dense regions. A new technique, namely CHIST (Clustering-based Histograms), for constructing multidimensional histograms on the basis of a well known density-based clustering algorithm, namely DBSCAN, is proposed. CHIST algorithm first invokes DBSCAN algorithm for partitioning the data into dense and sparse regions, and then further refines this partitioning by adopting a grid-based paradigm. CHIST is compared to GHBH and it is shown to provide lower error rates, especially in “critical” settings, that is when query selectivity is particularly low or the compression ratio is very high. It is worth remarking that in these settings, experiments comparing GHBH to the other techniques showed that GHBH still provides acceptable error rates, while those provided by other techniques are completely unacceptable. CHIST is also extended to the case that data to be summarized are dynamic. In this case, the re-execution of the clustering algorithm at each data update could result prohibitive, due to the high computational cost of this task. Thus, on the basis of Incremental DBSCAN algorithm, a strategy for efficiently propagating data updates to the histogram is proposed. By means of experiments it is shown that the incremental approach, for small updates (i.e., bulk of updates 100 times smaller than the overall data size) can be computed from 100 to 200 times faster than the from-scratch re-computation of the histogram, and the accuracy remains almost unaffected.Item Knowledge Management and Extraction in XML Data(2012-11-09) Costa, Giovanni; Talia, Domenico; Saccà, Domenico; Manco, Giuseppe