4.2 Mining Closed Frequent Itemsets
on Terrorism Research
For complexity reasons, it is not possible to extract
frequent itemsets whose extension has fewer than
three documents, meanwhile we shall see that the
atom graph allows us to identify interesting closed
itemsets whose extension has only two documents.
Using the apriori algorithm in R package, we found
1926 closed itemsets with a support of at least three
documents of which 285 have more than three
elements (three items). The largest closed frequent
itemset without author names is: {new york city,
posttraumatic stress disorder, potential terrorist
attack, same traumatic event, world trade center}.
The largest overall has 12 items: {Parker_G,
Perl_TM, Russell_PK}, biological terrorism,
biological warfare, consensus-based
recommendation, emergency management
institution, MEDLINE database, nation civilian
population, potential biological weapon, working
group, world health}. Despite differences in length,
these two itemsets both have the same support: their
extension has three documents.
5 CONCLUSIONS
We have presented a platform for mapping the
dynamics of research in specialty fields. The
distinctive features of this methodology resides in its
clustering algorithm which is based primarily on
linguistic (symbolic) relations and on its graph
decomposition algorithm which renders complex
terminological graph for comprehensible for domain
analysts. The method has been able to identify the
most salient topics in two different research domains
and uncover the sub-structures formed by persistent
and evolving research threads. More importantly, we
have shown that it is possible, with limited linguistic
resources, to perform a surface analysis of texts and
use linguistic relation for clustering. To the best of
our knowledge, this represents a unique and
innovative approach to text clustering.
The graph decomposition algorithm offers a way
of visualizing complex terminological graphs and
revealing particular sub-structures contained therein.
Mining frequent itemsets, in combination with
evaluation by human experts, offer a joint and strong
evidence of the significance of the maps produced
for the domain.
ACKNOWLEDGEMENTS
This work was supported in part by the the French
National Research Agency CAAS project (ANR
2010 CORD 001 02).
REFERENCES
Agrawal R., Imielinski T., Swami A., Mining association
rules between sets of items in large databases. In ACM
SIGMOD Conf. Management of Data, May 1993.
Bar-Ilan J., Informetrics at the beginning of the 21
st
century
– A review, Journal of Informetrics, 2008, 2, 1-52
Berry A., Krueger R., Simonet G., Ultimate Generaliza-
tions of LexBFS and LEX M. WG 2005: 199-213.
Berry, M. W. (eds)., Survey of Text Mining, Clustering,
Classification and Retrieval, Springer, 2004, 244p.
Callon M., Courtial J-P., Turner W., Bauin S. , From
translation to network: The co-word analysis.
Scientometrics, 1983, 5(1).
Castellanos M., HotMiner: Discovering hot topics from
dirty texts, in Berry M. W. (dir.), Survey of Text
Mining Systems, Springer Verlag, NY, 2004, 123-157.
Chalmers M., Using a landscape metaphor to represent a
corpus of documents. In Spatial Information theory,
Frank A., Caspari I. (eds.), Springer Verlag LNCS
716, 1993, 377-390.
Chen C., CiteSpace II: Detecting and visualizing emerging
trends and transient patterns in scientific literature.
Journal of the American society for Information
Science, 2006, 57(3), 359-377.
Chen C., Ibekwe-SanJuan F., SanJuan E., Weaver C.,
Visual Analysis of Conflicting Opinions, 1st
International IEEE Symposium on Visual Analytics
Science and Technology (VAST 2006), Baltimore -
Maryland, USA, 31 Oct.-2 Nov. 2006, 59-66.
Chen H., Wingyan C., Qin J., Reid E., Sageman M.,
Uncovering the dark web: A case study of jihad on the
web. Journal of the American society for Information
Science, 2008, 59(8), 1347-1359.
Church K. W., Hanks P., Word association norms, mutual
information and lexicography, Computational
Linguistics, 16, n° 1, 1990, 22-29.
Cutting D., Pedersen J. O., Karger D., Tukey J. W.,
Scatter/Gather: A cluster based approach to browsing
large document collections. In Proceedings of the 15
th
Anuual ACM/SIGIR Conference, Copenhagen,
Danemark, 1992, 318-329.
Freeman L. C., A set of measures of centrality based on
betweenness, Sociometry, 1977, 40(1), 35–41.
Mane K. K, Borner K., Mapping topics and topic bursts,
Proceedings of the National Academy of Sciences,
USA (PNAS), 2004, 101 (suppl. 1), 5287-5290
Morris S. A., Martens B., Modeling and Mapping of
Research Specialties, Annual Review of Information
Science and Technology, 42, 2008, 52p.
MAPPING KNOWLEDGE DOMAINS - Combining Symbolic Relations with Graph Theory
535