1.27k likes | 1.68k Vues
Intelligence and Security Informatics for International Security: Framework and Case Studies Hsinchun Chen, Ph.D. McClelland Professor of MIS Director, Artificial Intelligence Lab NSF COPLINK Center Management Information Systems Department
E N D
Intelligence and Security Informatics for International Security: Framework and Case Studies Hsinchun Chen, Ph.D. McClelland Professor of MIS Director, Artificial Intelligence Lab NSF COPLINK Center Management Information Systems Department Eller College of Management, University of Arizona
Intelligence and Security Informatics (ISI) development of advanced information technologies, systems, algorithms, and databases for national security related applications, through an integrated technological, organizational, and policy-based approach” (Chen et al., 2003a) Building a New Discipline
Conferences and Workshops: NSF/DOJ/CIA, ISI 2003, Tucson, AZ NSF/CIA/DHS, ISI 2004, Tucson, AZ IEEE NSF/CIA/DHS, IEEE ISI 2005, Atlanta, Georgia PAKDD ISI Workshop 2006, Singapore IEEE NSF/CIA/DHS, IEEE ISI 2006, San Diego, CA IEEE ISI 2007 (NJ); IEEE ISI 2008 (Taiwan) Building a New Discipline
Professional Societies: IEEE Intelligent Transportation Systems Society (ITSS): hosting IEEE ISI President: Wang; VP: Zeng; BOG: Chen IEEE ITSS Technical Committee on Homeland Security (TCHS) IEEE Systems, Man, and Cybernetics Society (SMCS) Technical Committee on Homeland Security (TCHS) Building a New Discipline
Journal Special Issues (Appeared and In Press) “Intelligence and Security Informatics,” Journal of the American Society for Information Science and Technology, special issue on Intelligence and Security Informatics, Volume 56, Number 3, Pages 217-220, 2005. “Artificial Intelligence for Homeland Security,” IEEE Intelligent Systems, special issue on AI for Homeland Security, Volume 20, Number 5, Pages 12-16, 2005. “Intelligence and Security Informatics for Homeland Security: Information, Communication, and Transportation,” IEEE Transactions on Intelligent Transportation Systems, special section, 2006 (in press) “Intelligence and Security Informatics: Information Systems Perspective,” Decision Support Systems, special issue on Intelligence and Security Informatics, 2006 (in press). Building a New Discipline
Books: H. Chen, “Intelligence and Security Informatics for International Security: Information Sharing and Data Mining,” Springer, forthcoming, 2005. H. Chen, T. S. Raghu, R. Ramesh, A. Vinze, and D. Zeng, “Handbooks in Information Systems -- National Security,” Elsevier Scientific, forthcoming, 2006. H. Chen, E. Reid, and J. Sinai, “Terrorism Informatics,” Springer, forthcoming, 2006. Building a New Discipline
Call for Participation: PAKDD ISI Workshop 2006, Singapore, April 9-10, 2006 (Springer LNCS) IEEE NSF/CIA/DHS, IEEE ISI 2006, San Diego, CA, April 22-24, 2006 (Springer LNCS) IEEE Transactions on Knowledge and Data Engineering (TKDE), special issue IEEE Transactions on Intelligent Transportation Systems (TITS), special issue IEEE Transactions on Systems, Man, and Cybernetics (TSMC), special issue Building a New Discipline
Call for Participation: IEEE ITSS and SMC Technical Committee (TC) involvement IEEE annual ITSS and SMC conference special sessions (IEEE SMC October 2006, Taipei, Taiwan) IEEE ISI 2008 in Taipei, Taiwan (pending approval) Building a New Discipline
Intelligence and Security Informatics for International Security: Information Sharing and Data Mining
Intelligence and Security Informatics (ISI): Challenges and Opportunities An Information Sharing and Data Mining Research Framework ISI Research: Literature Review National Security Critical Mission Areas and Case Studies Intelligence and Warning Border and Transportation Security Domestic Counter-terrorism Protecting Critical Infrastructure and Key Assets Defending Against Catastrophic Terrorism Emergency Preparedness and Responses The Partnership and Collaboration Framework Outline
Federal authorities are actively implementing comprehensive strategies and measures in order to achieve the three objectives Preventing future terrorist attacks Reducing the nation’s vulnerability Minimizing the damage and recovering from attacks that occur Science and technology have been identified in the “National Strategy for Homeland Security” report as the keys to win the new counter-terrorism war. Introduction
Six critical mission areas Intelligence and Warning Border and Transportation Security Domestic Counter-terrorism Protecting Critical Infrastructure and Key Assets Defending Against Catastrophic Terrorism Emergency Preparedness and Response Information Technology and NationalSecurity
Facing the critical missions of national security and various data and technical challenges we believe there is a pressing need to develop the science of “Intelligence and Security Informatics”(ISI) Problems and Challenges
The abundant research and funding opportunities in ISI. National Science Foundation (NSF), Information Technology Research (ITR) Program Department of Homeland Security (DHS) National Institutes of Health (NIH), National Library of Medicine (NLM), Informatics for Disaster Management Program Center for Disease Control and Prevention (CDC), National Center for Infectious Diseases (NCID), Bioterrorism Extramural Research Grant Program Department of Defense (DOD), Advanced Research & Development Activity (ARDA) Program Department of Justice (DOJ), National Institute of Justice (NIJ) Federal Initiatives and Funding Opportunities in ISI
Crime Types Crime types and security concerns
KDD techniques can play a central role in improving counter-terrorism and crime-fighting capabilities of intelligence, security, and law enforcement agencies by reducing the cognitive and information overload. Many of these KDD technologies could be applied in ISI studies (Chen et al., 2003a; Chen et al., 2004b). With the special characteristics of crimes, criminals, and crime-related data we categorize existing ISI technologies into six classes: information sharing and collaboration crime association mining crime classification and clustering intelligence text mining spatial and temporal crime mining criminal network mining An ISI Research Framework
A knowledge discovery research framework for ISI A knowledge discovery research framework for ISI
The potential negative effects of intelligence gathering and analysis on the privacy and civil liberties of the public have been well publicized (Cook & Cook, 2003). There exist many laws, regulations, and agreements governing data collection, confidentiality, and reporting, which could directly impact the development and application of ISI technologies. Caveats for Data Mining
Framed in the context of domestic security surveillance, surveillance is considered as an important intelligence tool that has the potential to contribute significantly to national security but also to infringe civil liberties. (Strickland 2005) Data mining using public or private sector databases for national security purposes must proceed in caution: The search for general information must ensure anonymity. The acquisition of specific identity, if required, must by court authorized under appropriate standards or warrants. The peril of the “security-industrial complex” – marriage of private data and technology companies and government anti-terror initiatives. (R. O’Harrow, “No Place to Hide”) Domestic Security, Civil Liberties, and Knowledge Discovery
Information Sharing and Collaboration Crime Association Mining Crime Classification and Clustering Intelligence Text Mining Crime Spatial and Temporal Mining Criminal Network Analysis ISI Research: Literature Review
Information sharing across jurisdictional boundaries of intelligence and security agencies has been identified as one of the key foundations for securing national security (Office of Homeland Security, 2002). There are some difficulties ofinformation sharing: Legal and cultural issues regarding information sharing Integrate and combine data that are organized in different schemas stored in different database systems running on different hardware platforms and operating systems (Hasselbring, 2000). Information Sharing and Collaboration
Three approaches to data integration have been proposed: (Garcia-Molina et al., 2002) Federation: maintains data in their original, independent sources but provides a uniformed data access mechanism (Buccella et al., 2003; Haas, 2002). Warehousing: an integrated system in which copies of data from different data sources are migrated and stored to provide uniform access Mediation: relies on “wrappers” to translate and pass queries from multiple data sources. These techniques are not mutually exclusive. All these techniques are dependent, to a great extent, on the matching between different databases Approaches to data integration
The task of database matching can be broadly divided into schema-level and instance-level matching (Lim et al., 1996; Rahm & Bernstein, 2001). Schema-level matching is preformed by aligning semantically corresponding columns between two sources. Instance-level or entity-level matching is to connect records describing a particular object in one database to records describing the same object in another database. Instance-level matching is frequently performed after schema-level matching is completed. Information integration approaches have been used in law enforcement and intelligence agencies for investigation support. Information sharing has also been undertaken in intelligence and security agencies through cross-jurisdictional collaborative systems. E.g. COPLINK (Chen et al., 2003b) Database And Application
One of most widely studied approaches is association rule mining, a process of discovering frequently occurring item sets in a database. An association is expressed as a rule X Y, indicating that item set X and item set Y occur together in the same transaction (Agrawal et al., 1993). Each rule is evaluated using two probability measures, support and confidence, where support is defined as prob(XY) and confidence as prob(XY) / prob(X). E.g., “diaper milk with 60% support and 90% confidence” means that 60% of customers buy both diaper and milk in the same transaction and that 90% of the customers who buy diaper tend to also buy milk. Crime Association Mining
Crime association mining techniques can include incident association mining and entity association mining (Lin & Brown, 2003). Two approaches, similarity-based and outlier-based, have been developed for incident association mining Similarity-based method detects associations between crime incidents by comparing crimes’ features (O'Hara & O'Hara, 1980) Outlier-based method focuses only on the distinctive features of a crime (Lin & Brown, 2003) The task of finding and charting associations between crime entities such as persons, weapons, and organizations often is referred to as entity association mining (Lin & Brown, 2003) or link analysis. Crime Association Mining Techniques
Three types of link analysis approaches have been suggested: heuristic-based, statistical-based, and template-based. Heuristic-based approaches rely on decision rules used by domain experts to determine whether two entities in question are related. Statistical-based approach E.g. Concept Space (Chen & Lynch, 1992). This approach measures the weighted co-occurrence associations between records of entities (persons, organizations, vehicles, and locations) stored in crime databases. Template-based approach has been primarily used to identify associations between entities extracted from textual documents such as police report narratives. Link Analysis Approaches
Classification is the process of mapping data items into one of several predefined categories based on attribute values of the items (Hand, 1981; Weiss & Kulikowski, 1991). Widely used classification techniques: Discriminant analysis (Eisenbeis & Avery, 1972) Bayesian models (Duda & Hart, 1973; Heckerman, 1995) Decision trees (Quinlan, 1986, 1993) Artificial neural networks (Rumelhart et al., 1986) Support vector machines (SVM) (Vapnik, 1995) Several of these techniques have been applied in the intelligence and security domain to detect financial fraud and computer network intrusion. Crime Classification and Clustering
Clustering groups similar data items into clusters without knowing their class membership. The basic principle is to maximize intra-cluster similarity while minimizing inter-cluster similarity (Jain et al., 1999) Various clustering methods have been developed, including hierarchical approaches such as complete-link algorithms (Defays, 1977), partitional approaches such as k-means (Anderberg, 1973; Kohonen, 1995), and Self-Organizing Maps (SOM) (Kohonen, 1995). The use of clustering methods in the law enforcement and security domains can be categorized into two types: crime incident clustering and criminal clustering. Crime Classification and Clustering
Text mining has attracted increasing attention in recent years as the natural language processing capabilities advance (Chen, 2001). An important task of text mining is information extraction, a process of identifying and extracting from free text select types of information such as entities, relationships, and events (Grishman, 2003). The most widely studied information extraction subfield is named entity extraction. Four major named-entity extraction approaches have been proposed: Lexical-lookup Rule-based Statistical model Machine learning Intelligence text mining aims to identify people, organizations, locations, properties, and relationships of interest. Intelligence Text Mining
Most crimes, including terrorism, have significant spatial and temporal characteristics (Brantingham & Brantingham, 1981). Aims to gather intelligence about environmental factors that prevent or encourage crimes (Brantingham & Brantingham, 1981), identify geographic areas of high crime concentration (Levine, 2000), and detect trend of crimes (Schumacher & Leitner, 1999). Two major approaches for crime temporal pattern mining Visualization Present individual or aggregated temporal features of crimes using periodic view or timeline view Statistical approach Build statistical models from observations to capture the temporal patterns of events. Crime Spatial and Temporal Mining
Three approaches for crime spatial pattern mining : (Murray et al., 2001). Visual approach (crime mapping): Presents a city or region map annotated with various crime related information. Clustering approaches Has been used in hot spot analysis, a process of automatically identifying areas with high crime concentration. Partitional clustering algorithms such as the k-means methods are often used for finding hot spots of crimes. They usually require the user to predefine the number of clusters to be found Statistical approaches To conduct hot spot analysis or to test the significance of hot spots (Craglia et al., 2000) To predict crime Crime Spatial and Temporal Mining
Criminals seldom operate alone but instead interact with one another to carry out various illegal activities. Relationships between individual offenders form the basis for organized crime and are essential for the effective operation of a criminal enterprise. Criminal enterprises can be viewed as a network consisting of nodes (individual offenders) and links (relationships). Structural network patterns in terms of subgroups, between-group interactions, and individual roles thus are important to understanding the organization, structure, and operation of criminal enterprises. Criminal Network Analysis
Social Network Analysis (SNA) provides a set of measures and approaches for structural network analysis (Wasserman & Faust, 1994). SNA is capable of Subgroup detection Central member identification Discovery of patterns of interaction SNA also includes visualization methods that present networks graphically. The Smallest Space Analysis (SSA) approach (Wasserman & Faust, 1994) is used extensively in SNA to produce two-dimensional representations of social networks. Network Topological Analysis aims to identify topological characteristics of complex networks (e.g., random, small world, and scale-free networks) and their dynamics and guiding properties. Criminal Network Analysis
The above-reviewed six classes of KDD techniques constitute the key components of our proposed ISI research framework. Our focus on the KDD methodology, however, does NOT exclude other approaches. Researchers from different disciplines can contribute to ISI. DB, AI, data mining, algorithms, networking, and grid computing researchers can contribute to core information infrastructure, integration, and analysis research of relevance to ISI IS and management science researchers could help develop the quantitative, system, and information theory based methodologies needed for the systematic study of national security. Cognitive science, behavioral research, and management and policy are critical to the understanding of the individual, group, organizational, and societal impacts and effective national security policies. Conclusion and Future Direction
Intelligence and Warning Border and Transportation Security Domestic Counter-terrorism Protecting Critical Infrastructure and Key Assets Defending Against Catastrophic Terrorism Emergency Preparedness and Responses National Security Critical Mission Areas and Case Studies
Intelligence and Warning • By analyzing the communication and activity patterns among terrorists and their contacts detecting deceptive identities, or employing other surveillance and monitoring techniques, intelligence and warning systems may issue timely, critical alerts to prevent attacks or crimes from occurring. Four case studies of relevance to intelligence and warning
Border and Transportation Security • The capabilities of counter-terrorism and crime-fighting can be greatly improved by creating a “smart border,” where information from multiple sources is integrated and analyzed to help locate wanted terrorists or criminals. Technologies such as information sharing and integration, collaboration and communication, and biometrics and speech recognition will be greatly needed in such smart borders. Two case studies of relevance to Border and Transportation Security
Domestic Counter-terrorism • As terrorists, both international and domestic, may be involved in local crimes. Information technologies that help find cooperative relationships between criminals and their interactive patterns would also be helpful for analyzing domestic terrorism. Four case studies of relevance to Domestic Counter-terrorism Security in Chapter 7
Protecting Critical Infrastructure and Key Assets • Criminals and terrorists are increasingly using the cyberspace to conduct illegal activities, share ideology, solicit funding, and recruit. One aspect of protecting cyber infrastructure is to determine the source and identity of unwanted threats or intrusions. Three case studies of relevance to Protecting Critical Infrastructure and Key Assets
Defending Against Catastrophic Terrorism • Biological attacks may cause contamination, infectious disease outbreaks, and significant loss of life. Information systems that can efficiently and effectively collect, access, analyze, and report data about catastrophe-leading events can help prevent, detect, respond to, and manage these attacks. Two case studies of relevance to Defending Against Catastrophic Terrorism
Emergency Preparedness and Responses • Information technologies that help optimize response plans, identify experts, train response professionals, and manage consequences are beneficial to defend against catastrophes in the long run. Moreover, information systems that provide social and psychological support to the victims of terrorist attacks can also help the society recover from disasters. Two case studies of relevance to Emergency Preparedness and Responses
Dark Web Collection Building Dark Web Content Analysis Dark Web Forum Authorship Analysis and Visualization ISI Dark Web Case Studies
Exploring the Dark Side of the Web: Collection and Analysis of Extremist Online Forums
Terrorists’ Communication on the Internet • Internet enables diverse forms of communication. • The complexity of communication can range from text only messages to the use of multimedia. • Below is a comparison between communication mediums on the Internet.
Multiple pages of the board The title of the board Multiple pages of one thread The title of the thread # of replies of the thread # of views of the thread Forum as Communication Tool for Terrorists and Their Supporters A Typical Forum
The body of the message Post time The title of the thread User ID Other information about the author The virtual rank of the author in the forum Reply of the main thread Forum as Communication Tool for Terrorists and Their Supporters A Typical Forum
A Web Mining Approach Proposed Approach