SECURITY GROUP
Computer Science Department
Rensselaer Polytechnic Institute
Mark Goldberg, Mukkai Krishnamoorthy, Malik Magdon-Ismail
Boleslaw Szymanski, Bulent Yener, Wei Zhao
Researchers in the security group focus on security problems at the
systems level including discovery hidden networks in social networks;
network camouflaging; privacy protection in data mining systems;
metrics and architectures for anonymous communications; detecting
hidden groups in large virtual networks.
AUTHOR IDENTIFICATION.
(B. Szymanski, Claire and Roland Schmitt Distinguished Professor)
In this work, the focus is on capturing patterns in electronic documents.
The approach called Recursive Data Mining, involves discovering patterns
at varying degrees of abstraction, in a hierarchical fashion.
The discovered patterns capture the stylistic characteristics of the author
and are used as features to build efficient classifiers.
Experiments on the Enron dataset, which categorize members
into organizational roles, were conducted to substantiate the
methodology. The results show that a Naive Bayes classifier
that use the dominant patterns discovered by Recursive Data
Mining perform well in role detection.
MACHINE LEARNING TECHNIQUES FOR IMBALANCE CLASSIFICATION
(B. Szymanski, Claire and Roland Schmitt Distinguished Professor)
This research proposes several methods designed to improve solutions
for security classification problems. The security classification
problem involves unbalanced, high-dimensional, binary classification
problems that are prevalent today. The imbalance within this data
involves a significant majority of the negative class and a minority
positive class. Any system that needs protection from malicious
activity, intruders, theft, or other types of breaches in security
addresses this problem.
Exploration of intelligent subspace modeling and the fusion of
subspace models is investigated.
The major innovations of this work include synergistic classifier
fusion through the analysis of ROC curves and rankings, insight into
the statistical behavior of the Gaussian kernel, and novel methods
for applying machine learning techniques to defend against computer
intrusion detection. The primary
empirical vehicle for this research is computer intrusion detection
data, and both host-based intrusion detection systems (HIDS) and
network-based intrusion detection systems (NIDS) are addressed
USE OF REPUTATION FOR COMPUTER SECURITY
(B. Szymanski, Claire and Roland Schmitt Distinguished Professor)
Studies of worm outbreaks have found that the speed of worm
propagation makes manual intervention ineffective. Consequently, many
automated containment mechanisms have been proposed to contain worm
outbreaks before they grow out of control. These containment systems,
however, only provide protection for hosts within networks that
implement them. Such a containment strategy requires complete
participation to protect all vulnerable hosts. Moreover, collaborative
containment systems, where participants share alert data, face a
tension between resilience to false alerts and quick reaction to worm
outbreaks.
Our work suggests an alternative approach where an autonomous system
in an internetwork, such as the Internet, protects not only its local
hosts, but also all hosts that route traffic through it, which we call
internetwork-centric containment. Additionally, we develop a novel
reputation-based alerting mechanism to provide fast dissemination of
infection information while maintaining the fairness of the system.
Through simulation studies, we show that the combination of
internetwork-centric containment and reputation-based alerting is able
to contain an extremely virulent worm with relatively little
participation in the containment system. In comparison to other
collaborative containment systems, ours provides superior protection
against worm outbreaks and resilience to false alerts.
KEY GENERATION AND AUTHENTICATION ALGORITHMS FOR WIRELESS AD-HOC NETWORKS
(Professor Bulent Yener)
In this project, we design and analyze deterministic and
hybrid techniques to distribute shared keys to sensor nodes
before the deployment. Our techniques provide better probability
of key share between sensor nodes, improve resilience and have
less processing and communication overhead.
We use techniques from combinatorial design theory and graph
theory to pre-distribute a list of keys, called key-chains, to
each sensor node so that after the deployment either two
neighboring nodes have a key in common in their key-chains or
there is a path, called key-path, connecting these two nodes
where each pair of neighboring nodes on this path have a key
in common. The objective is to minimize the key-chain size while
(i) maximizing pairwise key sharing probability and resilience, and
(ii) minimizing average key-path length.
METRICS AND ARCHITECTURES FOR ANONYMOUS COMMUNICATIONS
(Professor Bulent Yener)
In this project, we investigate how to couple the physical layer
characteristics of wireless networks with key generation algorithms.
We present a novel approach based on the wireless communication
phenomenon known as the reciprocity principle which states that in
the absence of interference both transmitter and receiver experience
the same signal envelope.
The main observation here is that the signal envelope information can
provide to the two transceivers two correlated random sources that
provide sufficient amounts of entropy which can be used to extract
a cryptographic key.
In contrast, it is virtually impossible for a third party, which is
not located at one of the transceiver's position, to obtain or
predict the exact envelope; thus retrieve the key.
Since in the presence of interference strict reciprocity property
can not be maintained; our methodology is based on detecting
"deep fades" to extract correlated bit-strings. In particular,
we show how a pair of transceivers can reconcile such bit-strings and
finally flatten their distribution to reach key agreement.
MODELING OF THE INTERACTIONS AS AN EVOLUTION GRAPH
(Professor Krishnamoorthy)
In Modeling of the evolution graphs, we study how threats evolve and how
acts follow threats. By studying these, one can study old patterns to
identify new patterns of attacks.
VISUALIZATION OF EVOLUTION GRAPHS`a
(Professor Krishnamoorthy)
Visualization of evolution graphs will help to identify patterns.
WHAT- IF- CASES OF EVOLUTION GRAPHS.
(Professor Krishnamoorthy)
What-if graphs evolution graphs help
in trying out different scenarios and studying their actions.
ALGORITHMS FOR ANALYZING THE INFORMATION FLOW IN VIRTUAL SOCIAL NETWORKS
(Professors Mark Goldberg and Malik Magdon-Ismail)
The approach to problem of analyzing and predicting the information flow
in virtual social networks, such as the Blogosphere, relies on the fact
that information diffuses through social coalitions; hence we develop
tools to discover and analyze the coalition structure in virtual social
networks.
To accumulate a database useful for testing our programs and results,
we use our screen-scraper program that extracts information from the
Blogosphere.
Our approach to the discovery of a coalition structure does not require
linguistic processing of the posts, but is based on analysis of
the communication graph associated with the network. We identify social
groups by discovering clusters of actors that correspond to sets of nodes
in the communication graph that are locally optimal with respect to the
density of their communication.
MODEL OF THE BLOGOSPHERE.
(Professors Mark Goldberg and Malik Magdon-Ismail)
A probabilistic model of the internal processes in the Blogosphere is needed
for the development of a network generator to be used for testing
hypotheses regarding the future activity of the Blogosphere.
Using multiple simulations, one learns the expected level of the
``life'' of the Blogosphere, and then one is able to recognize
unexpected deviations in the real-life network.
The structure of large social networks, e.g. the Blogosphere, has been
the focus of intense research during the last decade.
One of the main foci of this research has been the development of
dynamic models of network creation which incorporates two fundamental
elements: network growth, with nodes arriving one at a time; and
some form of preferential attachment in which an arriving
node is more likely to attach itself to a high degree existing node
than a low degree one (the rich get richer).
However, this research did not adequately
address an important practical question:
once a network has been created, how does it evolve?
To answer this question, we developed the first such model; it
implements the idea of locality of communications under which
non-uniform asymmetric messages emanating from a node of the network
are mainly determined by a relatively small "area" associated with the node.
We test our model on a network of the blogs hosted by LiveJournal,
specifically, on its Russian section with approximately 300,000 bloggers.
Our program identifies Russian blogs by the presence of Cyrillic
characters in the posts.
Technically this also captures the posts in other languages
with Cyrillic alphabet, but we found that the vast majority of the posts
are actually Russian. We will expand the testing of our model by collecting the data from other sections of LiveJournal and from other blog-providers.
We continue our work on the evolutionary models by expanding the set of
necessary parameters that must be included in the formulation of the
model for a more complete description of the Blogosphere, or an arbitrary
virtual social network.
Among those are: clustering coefficients; the distribution of the clusters by
their densities and sizes; and the evolution of the edge-history, representing
the statistics of repeated calls in the network.
Incorporating new parameters will further improve our model of evolution.
The objectives of this part of the project include
(a) the development of models of virtual social networks;
(b) testing of diverse regions of the Blogosphere;
(c) a database of statistics describing changes in
different regions of the Blogosphere.
The model and the statistics can be used for timely identification of
deviations in the functioning of the Blogosphere.
DETECTING INFECTIOUS IDEAS IN THE BLOGOSPHERE}
(Professors Mark Goldberg and Malik Magdon-Ismail)
One recognizes an idea as infectious if it is being discussed by
many bloggers, and their number is increasing at a rapid pace.
The objective of this project is to provide a good estimation of
the the likelihood of the idea being infectious at the earliest
possible stage of the process. Our approach to the problem involves
(a) the development of a program which monitors the Blogosphere
and determines groups of posts discussing the same topic; and
(b) the development of an criterion (threshold) to identify topics
growing at the alarming rate in the Blogosphere, and automatically
reporting the information to the analyst.
EFFICIENT DISSEMINATION OF INFORMATION
(Professors Mark Goldberg and Malik Magdon-Ismail)
Information is spreading in the Blogosphere in a non-uniform way:
the clusters are the parts of the Blogospace where the communications
are more intense or more isolated than the average. Our programs for
clustering discover and update the clusters in an efficient way; we
use clusters for a number of purposes.
Clusters can also be used for the efficient dissemination of information
as the following idea suggests: define a graph G whose nodes represent
the clusters, and the intersections represent edges; the size of the
intersection can be used as the weight of the edge.
If the number K of places where the information will be planted is fixed,
the problem is to select K clusters so that the maximum distance (i.e.
weighted length) in G from any cluster to some of the selected ones is
as small as possible.
FINDING HIDDEN GROUPS IN A MESSAGE STREAM
(Professors Mark Goldberg and Malik Magdon-Ismail)
Our software system SIGHTS contains programs for discovering hidden groups
of users in a social network. We developed and implemented in two
communication models: the cycle model, in which a communication time cycle
of the group is used to aggregate communications to generate a sequence
of communication graphs
Our algorithms for the stream model, in which the communications are
streaming and a cycle assumption is not assumed simultaneously identify
the hierarchical organization structure within the hidden groups, in
addition to evolution based on moving windows.
Another approach to the problem of discovering hidden groups,
applicable to chatrooms, is based on discovering time-correlations
of the activities (times of postings) of the members of a group.
SECURITY GROUP
Computer Science Department
Rensselaer Polytechnic Institute
Mark Goldberg, Mukkai Krishnamoorthy, Malik Magdon-Ismail
Boleslaw Szymanski, Bulent Yener, Wei Zhao
Researchers in the security group focus on security problems at the
systems level including discovery hidden networks in social networks;
network camouflaging; and privacy protection in data ming systems.
Specific project include:
(1) author identification; machine learning techniques for
imbalance classification; and use of reputation for computer security,
headed by Bolek Szymanski, Claire and Roland Schmitt Distinguished Professor;
(2) key generation and authentication algorithms for wireless ad-hoc networks;
and metrics and architectures for anonymous communications
headed by Professor Bulent Yener;
(3) modeling of the interactions as an evolution graph;
headed by Professor Krishnamoorthy;
(4) algorithms for analyzing the information flow in virtual social networks;
model of the blogosphere; detecting infectious ideas in the blogosphere;
efficient dissemination of information; finding hidden groups in a message
stream, heqaded by Professors Mark Goldberg and Malik Magdon-Ismail.