Data Paradox

Information Sharing Incongruities in the Intelligence Community

Figure 1

This chart shows the relationship between the amount of information (Items of Information), accuracy of the handicappers’ prediction of the first place winners (Accuracy Correct First Place Selections), and the handicappers’ confidence in their predictions (Confidence). More information leads to overconfidence yet doesn’t improve on predictability and may lead to being closed off to falsification.

Figure 2

The Paradox of Choice is about the relationship between the number of choices (Number of Choices) and user satisfaction (Satisfaction). The curve reflects the decisive points where the increase in the number of choices no longer has the expected effect. The stationary point is where the gradient of the curve is zero. Notice that as choice increases, satisfaction and in many cases decisions flatten and ultimately weaken.

Figure 3

Haiti image from USAID. The event often outpaced both authoritative data and analysis and expectantly search and discovery. The best labors at careful estimates could not be consumed in time. As Winston Churchill recalled of the hectic days of spring 1940 “... sat almost every day to discuss the reports... and conclusions or divergences were explained or re-explained; and by the time this process was completed, the whole scene had often changed.” (Source: Churchill, The Gathering Storm, 1948.)

Figure 4

The Strength of Weak Ties and the dangers of Structural Holes within our social networks have been grossly underestimated by the 9/11 Commission’s Report and current designers of enterprise government systems. Humanizing connectivity in ways that support decision-making and sense-making is required.

Researcher, ITT Corporation
Liaison Officer & Special Projects Lead, NJOIC Pentagon
@rheimann (Twitter)

The ultimate value of spatial data is in its use, facilitated by sharing. In other words, a piece of data used once has value to the analyst or decision maker who took advantage of its accessibility. In a sense, the data has satisfied its purpose. Further to the point, if ten people were to use that same piece of data, its utility has effectively increased by a factor of ten.

It is logical to assume that this trend continues; the greater the number of people who use the data, the greater the utility of the data. Data, however, are not shared without the mediation of people, if only through policy. Therefore, data sharing is both a hard interoperability challenge with technical considerations to facilitate storage and transfer, and a soft (social) interoperability challenge with considerations to the organization of data and people.

The increase in choice becomes a data paradox; contrary to the conventional wisdom, more data choices do not always lend themselves to better decision making or more accurate predictions.

To a greater extent, the reforms within the Department of Defense (DoD) and the Intelligence Community (IC) have revolved around the 9/11 Commission Report’s recommendation to share information. The breakdown of the national security apparatus, the report explains, was due to failures of sharing information in “quick, imaginative, and agile” manners. The creation of the National Intelligence Director and of the Information Sharing Environment (Intelligence Reform Act of 2004) are clear steps to ensure standard information sharing throughout the Intelligence Community and to institutionalize a culture of sharing.

However, in the midst of all the enthusiasm, few seem concerned with the somewhat darker implications of such measures for analysts and decision makers alike. Information sharing as an end, instead of a beginning, overlooks key elements: how analysts and decision makers consume data, and the cognitive processes that are involved in acquiring situational awareness; the social processes of data sharing; the psychological and behavioral obstacles that exist when the number of choices reach a critical mass; and how problems of induction limit our ability to predict unexpected events of large magnitude.

The 9/11 Commission’s emphasis on information sharing, either spatial or aspatial, consigns users to passive consumption, which can have catastrophic results. The passive consumer is systematically decoupled from data production. It is no coincidence that the CIA, as well as many other intelligence organizations, often houses data production with data consumption. It is in this paradigm that context is securely and reliably transferred from producer to consumer; in other words, it is transferred from person to person. This intimate connection is representative of the organic nature of information sharing.

The dilemma faced with systematic decoupling is the decontextualization of information. American scientist Warren Weaver, a pioneer in machine translation, studied the statistical structure of language, namely the influence of context. Warren Weaver has “…the vague feeling that information and meaning may prove to be something like a pair of canonically conjugate variables in quantum theory, they being subject to some joint restriction that condemns a person to the sacrifice of the one as he insists on having much of the other.”

The excessive increase in the number of information choices soon becomes untenable and intractable. The increase in choice becomes a data paradox; contrary to the conventional wisdom, more data choices do not always lend themselves to better decision making or more accurate predictions. The more data choices one has, the slower one performs, or at least the harder one has to perform to keep pace and to prevent overload.

The social layer should be built into the infrastructure to improve information sharing, with special considerations to the psychological, behavioral, and cognitive processes of information consumption and analysis.

Lewis Carroll’s Red Queen, from Alice in Wonderland, proclaims, “Now, here, you see, it takes all the running you can do, to keep in the same place. If you want to get somewhere else, you must run at least twice as fast as that!” So it is in managing data, though this seems true only when information sharing is the goal. When context is transferred and the selection of information sources facilitated, good analysts tend to want more data to complement and even falsify the data that already exist within an analytical framework. When an analyst operates in the new information-sharing environment, the data paradox is real. Without context, users of such a system have difficulty reducing data to a dimensionality that is manageable and are quickly overrun with choice.

Probabilistically, it is reasonable to assume that a decision maker could choose the correct resources or make the correct inferences when there are few choices. If five variables are being considered, there is a 20% chance, if only by luck, that a correct choice will be made. However, as the number of choices increases, the probability of making the correct choice decreases. This doesn’t account for the often several interpretations of the intelligence data. In many cases, users naturally know the key variables, but when asked to make better predictions with more, albeit unfamiliar data and ignore the implicit rules of parsimony, users do not perform as well.

One such experiment serves to illustrate the point. Dr. Paul Slovic (1973) demonstrated this phenomenon in the Behavioral Problems of Adhering to a Decision Policy with experienced horse race handicappers. With only five variables to predict horse performance, the handicappers were confident and promptly calibrated their accuracy, but they became overconfident as additional information — 10, 20, and 40 variables — was required to make their predictions. In fact, some of the eight experienced horse race handicappers performed worse when more variables were offered to make predictions. All, however, were increasingly confident in their judgments as more variables were incorporated, likely resulting in the exclusion of diverse viewpoints. See Figure 1.

Analysts cannot accurately understand the environment when overrun with choice, let alone make accurate predictions. The 9/11 Commission accurately summarized the inability to connect the dots. The Paradox of Choice (Schwartz, 2004) and the undressing of context both became formidable obstacles and eventually resulted in fewer decisions, perhaps even no decisions being made. See Figure 2. It is clear that information sharing increases choice, but does it help with decisions?

The 9/11 Commission’s own report suggests that failure to grasp the significance of information was more important than the lack of information sharing. Therefore, data consumption, not data production, appears to be the greatest challenge facing the DoD and IC and the challenges seem to be getting larger. According to the International Data Corporation (IDC), the “Digital Universe” will expand to 1.8 zettabytes (ZB) by 2011, or almost two billion terabytes. The IC is publishing over 50,000 intelligence reports each year and the nearly 900,000 personnel with top-secret security clearances produce more and more data every day.

Moreover, the U.S. intelligence budget was publicly announced last year as $75 billion, 2 1/2 times the size it was on Sept. 10, 2001. This expansion has enabled the creation of new sensors, data centers, collection methods, and more personnel to create even more data. Willmoore Kendall, author of the decisive The Function of Intelligence writes of the practical effect of this extreme growth for analysts as “…a matter of somehow keeping one’s head above water in a tidal wave of documents, whose factual content must be processed.” Kendall cautions readers of Sherman Kent’s The Theory of Intelligence warning that there is limited “ability of our science to supply the sort of knowledge which Mr. Kent and his clients needed.” The job of analysts quickly becomes that of passive consumers of large stockpiles of data. Sharing decontextualized data, however, produces a negative network externality.

In other words, the action of sharing data without context eventually imposes a negative side effect on others in the network; as more and more data are shared, more data must be processed and reprocessed over and over by every user. The problem for one analyst quickly becomes the problem of many. The IDC reports that by 2020 the “Digital Universe” will be an estimated 35ZBs. That is growth by a factor of 44. Will the DoD or the IC see a 44 times increase in the number of analysts or decision makers? It is unlikely. The logic of information sharing requires some serious review and skepticism.

To be clear, this article does not argue against information sharing; analysts and decision makers need access sometimes to large volumes of information and should have access to the data wherever it resides, whenever it is needed. The synthesizing of these data in rapidly developing environments requires a community effort.

Challenges of Haiti

During the Haiti earthquake, the paradox of choice and the need for building a conceptual framework for the data paradox became apparent. The dynamic nature of the Haiti earthquake and similar events poses particular obstacles, and highlights the larger deficiencies of information sharing. The event often outpaced analysis and sometimes even search and discovery.

The construction of an accurate common operation picture (map) proved increasingly difficult. The accessibility of geospatial resources was relegated to a number of poorly developed and implemented portals and search instruments that often stored duplicative data and/or returned fewer resources than would have otherwise been discovered had users simply used their traditional means of social networking. See Figure 3.

The issues with many of the existing portals are twofold: first, these systems deliver only one message; second, they are inadequate at stimulating conversation. These platforms lack the promise to improve communication, as they neglect self-organization of people and data.The uniform message is not appropriate for all users, and the recipients of the message do not contribute to its creation and cannot provide feedback, despite their knowledge.

What is required is the ability for users to self-organize around data, and for the data to be reduced. Social media have effectively reduced data to manageable dimensions, whether photos, news feeds, or geospatial data. A remarkable benefit of these forms of media is their capacity to exploit weak ties. Weak ties allow reaching portions of the intelligence community that are not accessible via our strong ties and may conceptually be the interagency solution. The inability to bridge structure holes within and among networks can contribute to some of the shortcomings in “connecting the dots.” When nodes are unable to bring two different groups together, the community is left with isolated groups, unhinged from the rest of the network.

The Strength of Weak Ties

Mark Granovetter’s (1973, 1983) seminal work The Strength of Weak Ties and later A Network Theory Revised demonstrate the strength of weak ties to complement our knowledge rather than replicate it. Strong ties tend already to possess the same interests and qualities that we possess. They have expected benefits but fail to deliver in critical ways. It is imperative to understand the creation of these networks and to allow users the ability to organize without constraints. Humanizing connectivity in ways that support decision-making and sense-making is required. See Figure 4.

The ability to engineer a system that reduces the complexities of both data sharing and data interpretation is needed. Furthermore, analysts should be assisted in sense-making, which has been rather ignored in the rush to execute a culture of sharing. To accomplish these tasks, a larger emphasis should be placed on the natural way users share and analyze data. The placement of geographers should play a prominent role in the construction of such a system.

Geographers are already acutely aware of the special nature of spatial data (Anselin, 1989) in both presentation and analysis. Furthermore, geography is firmly placed within the realm of the social sciences, which should be considered in all aspects, but particularly in aspects of social structure, mental processing of information, and organizational culture. The problems that face the community are too large and broad for a single discipline.

The design and implementation of large enterprise systems that exclude a social layer is to a large extent a demonstration of technological determinism. The argument that technological development will change the social structure and organizational culture may be without foundation. Instead, the social layer should be built into the infrastructure to improve information sharing, with special considerations to the psychological, behavioral, and cognitive processes of information consumption and analysis.

Submit a comment

Comments [ 1 ]

  1. July 7, 2011 7:10am MST
    by Rayshelon
    Great stuff, you helepd me out so much!
Sensors & Systems | Monitoring, Analyzing and Adapting to Global Change | Stay in tune with the transformation. Subscribe to the free weekly newsletter.