SECURITY GROUP Computer Science Department Rensselaer Polytechnic Institute Mark Goldberg, Mukkai Krishnamoorthy, Malik Magdon-Ismail Boleslaw Szymanski, Bulent Yener, Wei Zhao Researchers in the security group focus on security problems at the systems level including discovery hidden networks in social networks; network camouflaging; privacy protection in data mining systems; metrics and architectures for anonymous communications; detecting hidden groups in large virtual networks. AUTHOR IDENTIFICATION. (B. Szymanski, Claire and Roland Schmitt Distinguished Professor) In this work, the focus is on capturing patterns in electronic documents. The approach called Recursive Data Mining, involves discovering patterns at varying degrees of abstraction, in a hierarchical fashion. The discovered patterns capture the stylistic characteristics of the author and are used as features to build efficient classifiers. Experiments on the Enron dataset, which categorize members into organizational roles, were conducted to substantiate the methodology. The results show that a Naive Bayes classifier that use the dominant patterns discovered by Recursive Data Mining perform well in role detection. MACHINE LEARNING TECHNIQUES FOR IMBALANCE CLASSIFICATION (B. Szymanski, Claire and Roland Schmitt Distinguished Professor) This research proposes several methods designed to improve solutions for security classification problems. The security classification problem involves unbalanced, high-dimensional, binary classification problems that are prevalent today. The imbalance within this data involves a significant majority of the negative class and a minority positive class. Any system that needs protection from malicious activity, intruders, theft, or other types of breaches in security addresses this problem. Exploration of intelligent subspace modeling and the fusion of subspace models is investigated. The major innovations of this work include synergistic classifier fusion through the analysis of ROC curves and rankings, insight into the statistical behavior of the Gaussian kernel, and novel methods for applying machine learning techniques to defend against computer intrusion detection. The primary empirical vehicle for this research is computer intrusion detection data, and both host-based intrusion detection systems (HIDS) and network-based intrusion detection systems (NIDS) are addressed USE OF REPUTATION FOR COMPUTER SECURITY (B. Szymanski, Claire and Roland Schmitt Distinguished Professor) Studies of worm outbreaks have found that the speed of worm propagation makes manual intervention ineffective. Consequently, many automated containment mechanisms have been proposed to contain worm outbreaks before they grow out of control. These containment systems, however, only provide protection for hosts within networks that implement them. Such a containment strategy requires complete participation to protect all vulnerable hosts. Moreover, collaborative containment systems, where participants share alert data, face a tension between resilience to false alerts and quick reaction to worm outbreaks. Our work suggests an alternative approach where an autonomous system in an internetwork, such as the Internet, protects not only its local hosts, but also all hosts that route traffic through it, which we call internetwork-centric containment. Additionally, we develop a novel reputation-based alerting mechanism to provide fast dissemination of infection information while maintaining the fairness of the system. Through simulation studies, we show that the combination of internetwork-centric containment and reputation-based alerting is able to contain an extremely virulent worm with relatively little participation in the containment system. In comparison to other collaborative containment systems, ours provides superior protection against worm outbreaks and resilience to false alerts. KEY GENERATION AND AUTHENTICATION ALGORITHMS FOR WIRELESS AD-HOC NETWORKS (Professor Bulent Yener) In this project, we design and analyze deterministic and hybrid techniques to distribute shared keys to sensor nodes before the deployment. Our techniques provide better probability of key share between sensor nodes, improve resilience and have less processing and communication overhead. We use techniques from combinatorial design theory and graph theory to pre-distribute a list of keys, called key-chains, to each sensor node so that after the deployment either two neighboring nodes have a key in common in their key-chains or there is a path, called key-path, connecting these two nodes where each pair of neighboring nodes on this path have a key in common. The objective is to minimize the key-chain size while (i) maximizing pairwise key sharing probability and resilience, and (ii) minimizing average key-path length. METRICS AND ARCHITECTURES FOR ANONYMOUS COMMUNICATIONS (Professor Bulent Yener) In this project, we investigate how to couple the physical layer characteristics of wireless networks with key generation algorithms. We present a novel approach based on the wireless communication phenomenon known as the reciprocity principle which states that in the absence of interference both transmitter and receiver experience the same signal envelope. The main observation here is that the signal envelope information can provide to the two transceivers two correlated random sources that provide sufficient amounts of entropy which can be used to extract a cryptographic key. In contrast, it is virtually impossible for a third party, which is not located at one of the transceiver's position, to obtain or predict the exact envelope; thus retrieve the key. Since in the presence of interference strict reciprocity property can not be maintained; our methodology is based on detecting "deep fades" to extract correlated bit-strings. In particular, we show how a pair of transceivers can reconcile such bit-strings and finally flatten their distribution to reach key agreement. MODELING OF THE INTERACTIONS AS AN EVOLUTION GRAPH (Professor Krishnamoorthy) In Modeling of the evolution graphs, we study how threats evolve and how acts follow threats. By studying these, one can study old patterns to identify new patterns of attacks. VISUALIZATION OF EVOLUTION GRAPHS`a (Professor Krishnamoorthy) Visualization of evolution graphs will help to identify patterns. WHAT- IF- CASES OF EVOLUTION GRAPHS. (Professor Krishnamoorthy) What-if graphs evolution graphs help in trying out different scenarios and studying their actions. ALGORITHMS FOR ANALYZING THE INFORMATION FLOW IN VIRTUAL SOCIAL NETWORKS (Professors Mark Goldberg and Malik Magdon-Ismail) The approach to problem of analyzing and predicting the information flow in virtual social networks, such as the Blogosphere, relies on the fact that information diffuses through social coalitions; hence we develop tools to discover and analyze the coalition structure in virtual social networks. To accumulate a database useful for testing our programs and results, we use our screen-scraper program that extracts information from the Blogosphere. Our approach to the discovery of a coalition structure does not require linguistic processing of the posts, but is based on analysis of the communication graph associated with the network. We identify social groups by discovering clusters of actors that correspond to sets of nodes in the communication graph that are locally optimal with respect to the density of their communication. MODEL OF THE BLOGOSPHERE. (Professors Mark Goldberg and Malik Magdon-Ismail) A probabilistic model of the internal processes in the Blogosphere is needed for the development of a network generator to be used for testing hypotheses regarding the future activity of the Blogosphere. Using multiple simulations, one learns the expected level of the ``life'' of the Blogosphere, and then one is able to recognize unexpected deviations in the real-life network. The structure of large social networks, e.g. the Blogosphere, has been the focus of intense research during the last decade. One of the main foci of this research has been the development of dynamic models of network creation which incorporates two fundamental elements: network growth, with nodes arriving one at a time; and some form of preferential attachment in which an arriving node is more likely to attach itself to a high degree existing node than a low degree one (the rich get richer). However, this research did not adequately address an important practical question: once a network has been created, how does it evolve? To answer this question, we developed the first such model; it implements the idea of locality of communications under which non-uniform asymmetric messages emanating from a node of the network are mainly determined by a relatively small "area" associated with the node. We test our model on a network of the blogs hosted by LiveJournal, specifically, on its Russian section with approximately 300,000 bloggers. Our program identifies Russian blogs by the presence of Cyrillic characters in the posts. Technically this also captures the posts in other languages with Cyrillic alphabet, but we found that the vast majority of the posts are actually Russian. We will expand the testing of our model by collecting the data from other sections of LiveJournal and from other blog-providers. We continue our work on the evolutionary models by expanding the set of necessary parameters that must be included in the formulation of the model for a more complete description of the Blogosphere, or an arbitrary virtual social network. Among those are: clustering coefficients; the distribution of the clusters by their densities and sizes; and the evolution of the edge-history, representing the statistics of repeated calls in the network. Incorporating new parameters will further improve our model of evolution. The objectives of this part of the project include (a) the development of models of virtual social networks; (b) testing of diverse regions of the Blogosphere; (c) a database of statistics describing changes in different regions of the Blogosphere. The model and the statistics can be used for timely identification of deviations in the functioning of the Blogosphere. DETECTING INFECTIOUS IDEAS IN THE BLOGOSPHERE} (Professors Mark Goldberg and Malik Magdon-Ismail) One recognizes an idea as infectious if it is being discussed by many bloggers, and their number is increasing at a rapid pace. The objective of this project is to provide a good estimation of the the likelihood of the idea being infectious at the earliest possible stage of the process. Our approach to the problem involves (a) the development of a program which monitors the Blogosphere and determines groups of posts discussing the same topic; and (b) the development of an criterion (threshold) to identify topics growing at the alarming rate in the Blogosphere, and automatically reporting the information to the analyst. EFFICIENT DISSEMINATION OF INFORMATION (Professors Mark Goldberg and Malik Magdon-Ismail) Information is spreading in the Blogosphere in a non-uniform way: the clusters are the parts of the Blogospace where the communications are more intense or more isolated than the average. Our programs for clustering discover and update the clusters in an efficient way; we use clusters for a number of purposes. Clusters can also be used for the efficient dissemination of information as the following idea suggests: define a graph G whose nodes represent the clusters, and the intersections represent edges; the size of the intersection can be used as the weight of the edge. If the number K of places where the information will be planted is fixed, the problem is to select K clusters so that the maximum distance (i.e. weighted length) in G from any cluster to some of the selected ones is as small as possible. FINDING HIDDEN GROUPS IN A MESSAGE STREAM (Professors Mark Goldberg and Malik Magdon-Ismail) Our software system SIGHTS contains programs for discovering hidden groups of users in a social network. We developed and implemented in two communication models: the cycle model, in which a communication time cycle of the group is used to aggregate communications to generate a sequence of communication graphs Our algorithms for the stream model, in which the communications are streaming and a cycle assumption is not assumed simultaneously identify the hierarchical organization structure within the hidden groups, in addition to evolution based on moving windows. Another approach to the problem of discovering hidden groups, applicable to chatrooms, is based on discovering time-correlations of the activities (times of postings) of the members of a group. SECURITY GROUP Computer Science Department Rensselaer Polytechnic Institute Mark Goldberg, Mukkai Krishnamoorthy, Malik Magdon-Ismail Boleslaw Szymanski, Bulent Yener, Wei Zhao Researchers in the security group focus on security problems at the systems level including discovery hidden networks in social networks; network camouflaging; and privacy protection in data ming systems. Specific project include: (1) author identification; machine learning techniques for imbalance classification; and use of reputation for computer security, headed by Bolek Szymanski, Claire and Roland Schmitt Distinguished Professor; (2) key generation and authentication algorithms for wireless ad-hoc networks; and metrics and architectures for anonymous communications headed by Professor Bulent Yener; (3) modeling of the interactions as an evolution graph; headed by Professor Krishnamoorthy; (4) algorithms for analyzing the information flow in virtual social networks; model of the blogosphere; detecting infectious ideas in the blogosphere; efficient dissemination of information; finding hidden groups in a message stream, heqaded by Professors Mark Goldberg and Malik Magdon-Ismail.