SECURITY GROUP  
                 Computer Science Department 
                 Rensselaer Polytechnic Institute 
                   
        Mark Goldberg, Mukkai Krishnamoorthy, Malik Magdon-Ismail 
                Boleslaw Szymanski, Bulent Yener, Wei Zhao 
 
Researchers in the security group focus on security problems at the 
systems level including discovery hidden networks in social networks; 
network camouflaging; privacy protection in data mining systems; 
metrics  and architectures for anonymous communications; detecting  
hidden groups in large virtual networks. 
 
AUTHOR IDENTIFICATION.  
(B. Szymanski, Claire and Roland Schmitt Distinguished Professor) 
 
In this work, the focus is on capturing patterns in electronic documents. 
The approach called Recursive Data Mining, involves discovering patterns  
at varying degrees of abstraction, in a hierarchical fashion.  
The discovered patterns capture the stylistic characteristics of the author  
and are used as features to build efficient classifiers. 
Experiments on the Enron dataset, which categorize members 
into organizational roles, were conducted to substantiate the  
methodology. The results show that a Naive Bayes classifier  
that use the dominant patterns discovered by Recursive Data  
Mining perform well in role detection. 
 
MACHINE LEARNING TECHNIQUES FOR IMBALANCE CLASSIFICATION 
(B. Szymanski, Claire and Roland Schmitt Distinguished Professor) 
 
This research proposes several methods designed to improve solutions 
for security classification problems. The security classification 
problem involves unbalanced, high-dimensional, binary classification 
problems that are prevalent today. The imbalance within this data 
involves a significant majority of the negative class and a minority 
positive class. Any system that needs protection from malicious 
activity, intruders, theft, or other types of breaches in security 
addresses this problem.  
Exploration of intelligent subspace modeling and the fusion of  
subspace models is investigated.  
The major innovations of this work include synergistic classifier  
fusion through the analysis of ROC curves and rankings, insight into  
the statistical behavior of the Gaussian kernel, and novel methods  
for applying machine learning techniques to defend against computer  
intrusion detection. The primary 
empirical vehicle for this research is computer intrusion detection 
data, and both host-based intrusion detection systems (HIDS) and 
network-based intrusion detection systems (NIDS) are addressed 
 
USE OF REPUTATION FOR COMPUTER SECURITY 
(B. Szymanski, Claire and Roland Schmitt Distinguished Professor) 
 
Studies of worm outbreaks have found that the speed of worm 
propagation makes manual intervention ineffective. Consequently, many 
automated containment mechanisms have been proposed to contain worm 
outbreaks before they grow out of control. These containment systems, 
however, only provide protection for hosts within networks that 
implement them. Such a containment strategy requires complete 
participation to protect all vulnerable hosts. Moreover, collaborative 
containment systems, where participants share alert data, face a 
tension between resilience to false alerts and quick reaction to worm 
outbreaks. 
Our work suggests an alternative approach where an autonomous system 
in an internetwork, such as the Internet, protects not only its local 
hosts, but also all hosts that route traffic through it, which we call 
internetwork-centric containment. Additionally, we develop a novel 
reputation-based alerting mechanism to provide fast dissemination of 
infection information while maintaining the fairness of the system. 
Through simulation studies, we show that the combination of 
internetwork-centric containment and reputation-based alerting is able 
to contain an extremely virulent worm with relatively little 
participation in the containment system. In comparison to other 
collaborative containment systems, ours provides superior protection 
against worm outbreaks and resilience to false alerts. 
 
KEY GENERATION AND AUTHENTICATION ALGORITHMS FOR WIRELESS AD-HOC NETWORKS 
(Professor Bulent Yener) 
 
In this project, we design and analyze deterministic and  
hybrid techniques to distribute shared keys to sensor nodes  
before the deployment. Our techniques provide better probability  
of key share between sensor nodes, improve resilience and have  
less processing and communication overhead.  
We use techniques from combinatorial design theory and graph  
theory to pre-distribute a list of keys, called key-chains, to  
each sensor node so that after the deployment either two  
neighboring nodes have a key in common in their key-chains or  
there is a path, called key-path, connecting these two nodes  
where each pair of neighboring nodes on this path have a key  
in common. The objective is to minimize the key-chain size while 
(i) maximizing pairwise key sharing probability and resilience, and 
(ii) minimizing average key-path length. 
 
 
METRICS  AND ARCHITECTURES FOR ANONYMOUS COMMUNICATIONS 
(Professor Bulent Yener) 
 
In this project, we investigate how to couple the physical layer 
characteristics of wireless networks with key generation algorithms. 
We present a novel approach  based on the wireless communication  
phenomenon known as the reciprocity principle which states that in  
the absence of interference both transmitter and receiver experience  
the same signal envelope. 
The main observation here is that the signal envelope information can 
provide to the two transceivers two correlated random sources that 
provide sufficient amounts of entropy which  can be used to extract  
a cryptographic key. 
 
In contrast, it is virtually impossible for a third party, which is 
not located at one of the transceiver's position, to obtain or 
predict the exact envelope; thus retrieve the key. 
 
Since in the presence of interference  strict reciprocity property 
can not be maintained;  our methodology  is based on  detecting 
"deep fades" to extract correlated bit-strings. In particular, 
we show how a pair of transceivers can reconcile such bit-strings and 
finally flatten their distribution to reach  key agreement. 
 
 
MODELING OF THE INTERACTIONS AS AN EVOLUTION GRAPH  
(Professor Krishnamoorthy) 
 
In Modeling of the evolution graphs, we study how threats evolve and how 
acts follow threats. By studying these, one can study old patterns to 
identify new patterns of attacks.  
 
VISUALIZATION OF EVOLUTION GRAPHS`a 
(Professor Krishnamoorthy) 
 
Visualization of evolution graphs will help to identify patterns.  
 
WHAT- IF- CASES OF EVOLUTION GRAPHS.  
(Professor Krishnamoorthy) 
 
What-if graphs evolution graphs help 
in trying out different scenarios and studying their actions. 
 
ALGORITHMS FOR ANALYZING THE INFORMATION FLOW IN VIRTUAL SOCIAL NETWORKS 
(Professors Mark Goldberg and Malik Magdon-Ismail) 
 
The approach to problem of analyzing and predicting the information flow 
in virtual social networks, such as the Blogosphere, relies on the fact  
that information diffuses through social coalitions; hence we develop  
tools to discover and analyze the coalition structure in virtual social  
networks.  
To accumulate a database useful for testing our programs and results,  
we use our screen-scraper program that extracts information from the  
Blogosphere. 
Our approach to the discovery of a coalition structure does not require  
linguistic processing of the posts, but is based on analysis of 
the communication graph associated with the network. We identify social  
groups by discovering clusters of actors that correspond to sets of nodes  
in the communication graph that are locally optimal with respect to the 
density of their communication. 
 
MODEL OF THE BLOGOSPHERE. 
(Professors Mark Goldberg and Malik Magdon-Ismail) 
 
A probabilistic model of the internal processes in the Blogosphere is needed 
for the development of a network generator to be used for testing 
hypotheses regarding the future activity of the Blogosphere.    
Using multiple simulations, one learns the expected level of the  
``life'' of the Blogosphere, and then one is able to recognize  
unexpected deviations in the real-life network.  
 
The structure of large social networks, e.g. the Blogosphere, has been  
the focus of intense research during the last decade.  
One of the main foci of this research has been the development of 
dynamic models of network creation which incorporates two fundamental 
elements: network growth, with nodes arriving one at a time; and 
some form of preferential attachment in which an arriving 
node is more likely to attach itself to a high degree existing node 
than a low degree one (the rich get richer). 
However, this research did not adequately  
address an important practical question: 
once a network has been created, how does it evolve? 
 
To answer this question, we developed the first such model; it 
implements the idea of locality of communications under which 
non-uniform asymmetric messages emanating from a node of the network  
are mainly determined by a relatively small "area" associated with the node.  
We test our model on a network of the blogs hosted by LiveJournal,  
specifically, on its Russian section with approximately 300,000 bloggers.  
Our program identifies Russian blogs by the presence of Cyrillic  
characters in the posts. 
Technically this also captures the posts in other languages 
with Cyrillic alphabet, but we found that the vast majority of the posts 
are actually Russian. We will expand the testing of our model by collecting the data from other sections of LiveJournal and from other blog-providers. 
We continue our work on the evolutionary models by expanding the set of  
necessary parameters that must be included in the formulation of the 
model for a  more complete description of the Blogosphere, or an arbitrary  
virtual social network. 
Among those are: clustering coefficients; the distribution of the clusters by 
their densities and sizes; and the evolution of the edge-history, representing  
the statistics of repeated calls in the network.  
Incorporating new parameters will further improve our model of evolution. 
The objectives of this part of the project include  
(a) the development of models of virtual social networks;   
(b) testing of diverse regions  of the Blogosphere; 
(c) a database of statistics describing changes in  
different regions of the Blogosphere.   
 
The  model and the statistics can be used for timely identification of 
deviations in the functioning of the Blogosphere. 
 
DETECTING INFECTIOUS IDEAS IN THE BLOGOSPHERE} 
(Professors Mark Goldberg and Malik Magdon-Ismail) 
 
One recognizes an idea as infectious if it is being discussed by 
many bloggers, and their number is increasing at a rapid pace. 
The objective of this project is to provide a good estimation of  
the the likelihood of the idea being infectious at the earliest  
possible stage of the process. Our approach to the problem involves  
(a) the development of a program  which monitors the Blogosphere  
and determines groups of posts discussing the same topic; and  
(b) the development of an criterion (threshold) to identify topics  
growing at the alarming rate in the Blogosphere, and automatically  
reporting the information to the analyst. 
 
EFFICIENT DISSEMINATION OF INFORMATION 
(Professors Mark Goldberg and Malik Magdon-Ismail) 
 
Information is spreading in the Blogosphere in a non-uniform way:  
the clusters are the parts of the Blogospace where the communications  
are more intense or more isolated than the average. Our programs for  
clustering discover and update the clusters in an efficient way; we  
use clusters for a number of purposes.  
Clusters can also be used for the efficient dissemination of information 
as the following idea suggests: define a graph G whose nodes represent  
the clusters, and the intersections represent edges; the size of the 
intersection can be used as the weight of the edge.  
If the number K of places where the information will be planted is fixed, 
the problem is to select K clusters so that the maximum distance (i.e.  
weighted length) in G from any cluster to some of the selected ones is  
as small as possible. 
 
FINDING HIDDEN GROUPS IN A MESSAGE STREAM 
(Professors Mark Goldberg and Malik Magdon-Ismail) 
 
Our software system SIGHTS contains programs for discovering hidden groups  
of users in a social network.  We developed and implemented in two  
communication models: the cycle model, in which a communication time cycle  
of the group is used to aggregate communications to generate a sequence  
of communication graphs 
Our algorithms for the stream model, in which the communications are  
streaming and a cycle assumption is not assumed simultaneously identify  
the hierarchical organization structure within the hidden groups, in  
addition to evolution based on moving windows. 
 
Another approach to the problem of discovering hidden groups, 
applicable to chatrooms, is based on discovering time-correlations 
of the activities (times of postings) of the members of a group. 
 
                        SECURITY GROUP  
                 Computer Science Department 
                 Rensselaer Polytechnic Institute 
                   
        Mark Goldberg, Mukkai Krishnamoorthy, Malik Magdon-Ismail 
                Boleslaw Szymanski, Bulent Yener, Wei Zhao 
 
Researchers in the security group focus on security problems at the 
systems level including discovery hidden networks in social networks; 
network camouflaging; and privacy protection in data ming systems. 
Specific project include: 
(1) author identification; machine learning techniques for  
imbalance classification; and use of reputation for computer security, 
headed by Bolek Szymanski, Claire and Roland Schmitt Distinguished Professor; 
 
(2) key generation and authentication algorithms for wireless ad-hoc networks; 
and metrics  and architectures for anonymous communications  
headed by Professor Bulent Yener; 
 
(3) modeling of the interactions as an evolution graph; 
headed by Professor Krishnamoorthy; 
 
(4) algorithms for analyzing the information flow in virtual social networks; 
model of the blogosphere; detecting infectious ideas in the blogosphere; 
efficient dissemination of information; finding hidden groups in a message  
stream, heqaded by Professors Mark Goldberg and Malik Magdon-Ismail.