ISEBOX (Integrated Socio-Cultural Environment for Behavior Observation eXploitation) methodology adds context. In this image, the boxes and spikes represent areas with a high number of violent events around religious facilities.
Fusing data layers for better insight: The red boxes show areas with high population density and high violent event rates, fusing two different datasets to see areas of concern quickly.
By Abe Usher, CTO and Altaf Bahora, Vice President The HumanGeo Group New York, N.Y. and Herndon, Va.
In today’s complex environment, planning for future military operations requires allocating and moving resources in the geospatial domain. Military planners and intelligence professionals are challenged to make sense out of disparate socio-cultural data, collected for different purposes by a multitude of systems, and at different levels of specificity.
To interpret such data, the military often turns to Geographic Information Systems (GIS) as well as statistical analysis and data mining techniques. However, these solutions have drawbacks. GIS solutions are not built around a user’s workflow and typically require that users have a significant understanding of the technology’s foundational concepts, including familiarity with the fields of geography, statistics, cartography, and database architecture. Intelligence professionals and military planners also need the technical know-how to phrase questions correctly to get to the answers or results they are searching for based on the differing search query formats and capabilities provided by each tool.
Statistical analysis and data mining are frequently used to predict future activities and events based on indicators (key information about the people and environment). Data mining also helps identify patterns for planners and analysts. There is a significant difference between knowing exactly what you want to look for and allowing patterns to emerge from the data to then analyze.
Such a non-parametric approach of letting the data “be your guide” is particularly powerful when analysts have simple tools that allow them to explore data based on their own expert hypotheses. Although not exact, such models hold enough precision and accuracy to be useful in planning scenarios, but do not provide a complete solution.
Leveraging this real-world example and ISEBOX, the developers replicated and distilled the same analysis of the area of interest in a period of hours, as opposed to weeks, and with a fraction of the resources needed with existing tools and datasets, to discover non-obvious information from open source socio-cultural data.
New Technology and Methods Are Needed
For all of these technology advances, the Department of Defense (DOD) is still constrained by the variety, quantity, and structure of data it collects. There is a large amount of data being collected, but the process of extracting the relevant data is time consuming. Moreover, the usefulness of the data is dependent upon the users’ knowledge, the data they have to work with, and the context from which they view the world.
Jeff Jonas, Chief Scientist of IBM Entity Analytics, notes, “If an organization cannot evaluate how new data points relate to its historical data holding in real time, the organization will miss opportunities for action. However, when the ‘data can find the data,’ there exists an opportunity for the insight to find the user.” To address these data challenges, it is necessary to find ways to collect and fuse non-standard data sources in order to rapidly gain understanding of other regions of the world. When dealing with highly structured, well-formatted data, it is relatively easy to “connect the dots” and understand the associations and relationships between various data elements.
Unfortunately, things are much less clear when examining socio-cultural and behavioral data. Specifically, some types of analyses lend themselves to macro-analyses (e.g., comparing GDP of nations in the world). Other types of analyses require very precise data (e.g., understanding the exact location of a facility that might be a safe house for a terror group).
Analysts are often tempted to interpret the data in terms of man-made boundaries such as provinces, states, or districts, even when such boundaries are often not at a level of geospatial detail required for analysis. This problem is well documented and described as the Modifiable Areal Unit Problem (MAUP), which in essence warns that the unit of analysis selected while performing geospatial analysis has an overwhelming influence on the outcome and accuracy of the analysis (and whether it can be used again in the future).
To enable the military to fuse together the data at its disposal and make better decisions, faster, the HumanGeo Group developed ISEBOX (Integrated Socio-Cultural Environment for Behavior Observation Exploitation), a geospatial threat-forecasting application that allows data with different spatial resolutions to be intermixed while preserving the original data. See Figure 1. ISEBOX identifies friendly forces, trends, geo-political activity, and threat indicators to provide operations planners with critical access to data required to perform Intelligence Preparation of the Battlefield (IPB). ISEBOX uses variable precision data encodings of location to facilitate non-obvious pattern detection and predictive analysis in the geospatial domain. See Figures 2 and 3.
Multi-Precision Data Fusion
To accomplish this, ISEBOX employs geospatial hashing algorithms to encode data. These mathematical procedures assign unique, compact, and structured indices to any location or time to enable the utilization of data of vastly different geospatial and temporal resolutions. This encoding method enables the combination of datasets to identify regions of threatening characteristics.
Regions are rapidly refined into grid patterns to provide planners, operators, and analysts with geo-rectified collection and analytic start points to address emergent operational analytic requirements. In this way, ISEBOX’s innovative capabilities allow decision makers to assess the adequacy of composite datasets to meet operational needs and to conduct effective risk analyses.
The typical DOD analyst currently faces the burden of inferring and conveying the resultant precision and uncertainty of the combined datasets.
ISEBOX’s methods of fusing data-based synthetic variables (grids) are a significant departure from legacy geospatial science. “Past strategies for fusing open source data into spatially enabled mission framework have been insufficient. The new innovative methods used for encoding and fusing data in ISEBOX are the future of GEOINT,” asserted Chris Tucker, USGIF Board Member.
ISEBOX ingests the widest range of open sources of geospatial data (such as social media, civilian government sources, NGO data, and community-driven data collections) and provides a means of combining the sources to enable analysts to detect non-obvious patterns in the data in order to “tip and cue” planners, collectors, and analysts to points on the ground defined by geography, time, function, and analytic discipline. Many commercial geospatial analysis tools purport to allow geospatial analysts to combine layers of information, assuming those layers of information are accurately registered, and in a common format. However, the typical DOD analyst faces the burden of inferring and conveying the resultant precision and uncertainty of the combined datasets.
During an informal discussion about data fusion, Jeff Jonas from IBM noted, “An organization can only be as smart as the sum of its perceptions. To improve, organizations need more data. Capabilities like ISEBOX that increase observation space by introducing orthogonal data elements will lead to leap-ahead improvements.”
Evolution of ISEBOX
In designing and implementing ISEBOX, there were both technical and practical challenges. The team, under the direction of Abe Usher, faced the challenge of applying and adapting hashing algorithms (used in crypto-graphy) to a wide variety of datasets. In addition, the sheer variety of sources, formats, and encodings, and the availability of socio-cultural data presented challenges in finding and ingesting relevant data at the desired levels of granularity.
Operationally, the ISEBOX developers had to contend with overcoming perceptions created by the introduction of so much software for the military over the past ten years. Hundreds of software designs have asserted a unique ability to synthesize data into meaningful products and reports, yet fail to understand the requirements of the end consumers of their information.
Developers have repeatedly claimed the capability of integrating and fusing geospatial data, with most falling far short of their promise to “make sense” of the data for the warfighter. This has led to many analysts and planners pushing back on the introduction of new software tools due to the time and effort required for training and integration of the applications into their often overburdened workflow.
ISEBOX uses variable precision data encodings of location to facilitate non-obvious pattern detection and predictive analysis in the geospatial domain.
To overcome these preconceptions and prove the value of ISEBOX very quickly, the developers applied ISEBOX to a vignette on threat actor activity to demonstrate the technology could reveal new insights (even just using open source data). Leveraging this real-world example and ISEBOX, the developers replicated and distilled the same analysis of the area of interest in a period of hours, as opposed to weeks, and with a fraction of the resources needed with existing tools and datasets, to discover non-obvious information from open source socio-cultural data. The use of a concrete real-world example helped to prove to analysts, planners, and decision makers that using ISEBOX could help save both time and money while also exposing “weak signals” from a combination of socio-cultural data to identify specific regions of interest/concern.
The Future of Open Source Socio-Cultural Information
Initially developed by Mr. Usher specifically for use by the DOD, ISEBOX introduces a new and unique capability to the nation’s geospatial tradecraft by allowing historical and general data for geospatial regions to be integrated with more precise data to address operational requirements while minimizing the potential for over- generalizing or creating a false sense of precision. This capability opens the door to fuse geospatial data with a much broader range of data types in supporting analysis – particularly open source datasets which may have varying temporal and/or geospatial granularity. It enables global scale analyses applying a breadth of sources including community-driven data collections that could not be otherwise integrated. ISEBOX enables the use of open source datasets to identify unique threat patterns that can provide purpose and direction to classified collections. ISEBOX also challenges current classified paradigms and the manner in which the classified sources collect and verify their analysis.
The ISEBOX methodology can enhance the ability of planners and leaders in making choices about how and where to allocate resources for the future. It provides operational impact by enabling the incorporation of “near real-time” analytic overlays into plans and forecasting tools along with delivering an inherent ability to assess the uncertainty of the resultant analyses. It also holds promise as a way to combine disparate, weak signals from different sources of socio-cultural data and synthesize them to create a detailed operational picture.
ISEBOX has been nominated by the Office of the Secretary of Defense, the Office of Naval Research, and a DOD Combatant Command for the the USGIF Industry Award for the most innovative geospatial software to be developed in 2011 for DOD. In the award nomination, the government said that the mathematical concept and the team’s “understanding of the necessity to harvest open source data for illustrative demonstrations of real-world problems was the best we (they) had seen in the previous decade of attempts.”
ISEBOX is currently being deployed by elements of a Combatant Command within the DOD, and it will also be on display at the HumanGeo booth in the New Member Forum (Booth 203) at the GEOINT 2011 Symposium in San Antonio in October. Mr. Usher will also speak at the GEOINT 2011 Symposium on the Enabling Socio-Cultural Technologies Panel hosted by Jeff Jonas from IBM.