Imaging and Data Linked by Geospatial Intelligence
MetaCarta Maps GeoInt Text
|Figure 1 Example values in a GDM entry of Paris, France.|
|Figure 2 Example of screen shot from GTS 3.5.|
Vice President/General Manager
MetaCarta Public Sector
Government agencies routinely use satellite imagery for purposes of intelligence, planning and research. Even the best imagery provides an incomplete picture, however. Agencies also must be able to access the reams of information stored in unstructured documents-web pages, blogs or any digital files stored in their networks.
But why must imagery be combined with unstructured data? Try these reasons on for size:
- Of all digital writing, 80 percent includes at least one geospatial reference.
- Unstructured documents, both within the enterprise and on the Internet, are growing at an exponential rate of at least 50 to 60 percent per year.
- Every knowledge worker and executive is a producer of geospatial information in a text form.
- Time-sensitive information is usually first spoken or written before being converted in any form to imagery or put into any GIS system.
- To anticipate actions by looking at a digital image or map is not possible, but combining text information and GIS makes anticipation not only possible, but routine.
Until two years ago, the idea of directly fusing an author’s original text with GIS was technically impossible. Capability was limited to produc-ing metadata that said, in effect, “this document is about Paris.” This metadata in summary form typically would reflect less than five percent of the geospatial information contained in the text.
Practical experience in one large government enterprise found that authors would include this type of metadata only about 25 percent of the time, even when instructed to do so. Additionally, agencies often compile data from a vastly dispersed human network; there is no overseer who can control the production of all text documents in all enterprises and on the Internet. Geospatial metadata produced by humans for text documents is simply not effective.
In response to this problem, MetaCarta developed the Geographic Text Search (GTS) Appliance. Widely used by the U.S. Intelligence Com-munity and in federal agencies as varied as the Department of Homeland Security and the Environmental Protection Agency, GTS geospatially enables unstructured text, bridging the gap between GIS and text search. The results are literally viewed on a map, as clickable icons. The appliance is able to connect to GIS applications such as ESRI’s ARCMap, Object FX’s Spatial FX, CRF’s Terrain Analysis System (TAS), Intergraph’s GeoMedia, and Google Earth. GTS also connects to content management systems such as Documentum and Open Text. It is a scalable geographic text search solution, fully accredited in the federal government and supporting thousands of users searching many millions of documents across multiple enterprises every day.
The GeoParser contained in GTS and the GeoTagger, its sister product, uses a worldwide Geographic Data Module (GDM) which has over 9 million locations—the largest such commercially-available database of its kind in the world. The GDM names each location with latitude and lon-gitude values, along with an a priori term frequency value. An example of a single entry in the base worldwide GDM is illustrated in Figure 1, show-ing the entry for Paris, France.
The a priori term’s frequency values are derived from an extensive process of geographic term collection and testing, using nearly a billion documents along with automated tuning tools created by world-renowned computational linguists in MetaCarta’s laboratory near MIT in Cam-bridge, Mass. The GDM also contains an extensive set of rules created by MetaCarta’s natural language scientists, used within GeoParser proc-essing, resulting in the most powerful geographic term analysis available in the world.
GTS indexes all text in a document into a CartaTrees index, along with special confidence and relevance values for geographic terms. This fast indexing system enables instant results even across document repositories of hundreds of millions of documents.
The recently released GTS 3.5 offers important new capabilities such as saved queries, notification by e-mail or pager, regional search and a 2D histogram feature that can show the worldwide distribution of documents as shown in Figure 2.
One of the key technical challenges that MetaCarta had to overcome was to disambiguate geographic references from non-geographic references, as humans do, in unstructured text.
GTS and GeoTagger contain the MetaCarta Geographic Reference Engine (GRE) module that identifies candidate geographic references in documents using Natural Language Processing (NLP) to determine the meaning of a geographic reference, and assign latitude/longitude values with a variety of confidence and relevance values to those geographic references. Software programs called ‘entity extractors’ identify and tag words or phrases referring to people, organizations, events or place names. Many products perform entity extraction, but very few also perform meaning resolution. That is, very few provide the actual meaning intended by the author, which is very important when a term does not uniquely identify its concept.
To accomplish this task, MetaCarta products must go beyond entity extraction and supply entity meaning. For example, a person’s name appears in a biographic article:
George W. Bush settles with his family at Bush Prairie near Tumwater in November, 1845. Stated otherwise: In November, 1845, George W. and Isabella James Bush and their five sons settle near Tumwater on a fertile plain that comes to be known as Bush Prairie.
Most entity extraction tools would correctly identify that George W. Bush is a person, but would not be able to determine that this reference was to George Washington Bush (1790-1863), an experienced frontiersman and successful farmer, and not to George Walker Bush, the current Presi-dent of the United States. An entity meaning tool uses additional analysis functions to take advantage of other clues such as the birth and death dates, wife’s name and number of sons to derive to which of the many possible George W. Bushes the article actually refers.
MetaCarta GTS and MetaCarta GeoTagger are the only commercially available tools in the world that provide geographic meaning resolution. MetaCarta reaches the 80 percent of data that other programs cannot.
GTS functions as a human does by identifying candidate place names and examining surrounding clues that determine to which specific loca-tion the author is referring. If an initial sentence says, “I drove through Rockville yesterday,” there are few clues, so we might assume the reference is to Rockville, Maryland, because it is statistically the most probable. However, if the sentence reads “I drove through Rockville and Hartford on my way to New London to see the submarine base,” the probability that Rockville in fact refers to Rockville, Connecticut, rises significantly and the probability that it is Rockville, Maryland drops, but not to zero. While other geographic entity tools can identify place names such as Rockville, Hartford and New London in text, they cannot determine to which of the over 55 places named Rockville, the 45 named Hartford and the 30 named New London the document refers. MetaCarta GTS can make that determination.
Entity meaning resolution is critical for search and for most discovery applications. Once the GRE within GTS identifies to which specific place the author has referred, it labels it with a unique identifier, such as latitude and longitude values, along with a probability value for each candidate location. Through extensive research, MetaCarta research and development has invented GeoParsing algorithms that perform geographic meaning resolution at very high speeds.
Enabling Senior Decision Makers to Fuse Geography with Data
The National Geospatial-Intelligence Agency (NGA) is developing an application for senior decision makers that would enable them and their staffs quickly and easily to track news within their areas of interest, in real-time. The key requirements are that it must be simple and intuitive to use, requiring no training, and that it must be web-based. NGA selected the Parsons Institute for Information Mapping (PIIM) of The New School in New York City as the prime contractor for this important task. PIIM combines information logic and scoring, information visualization, and engi-neering to develop interactive tools for rapidly understanding, analyzing, and responding to large amounts of complex data.
|Figure 3 Results page from PIIM’s GMT.|
The resulting application, the Geospace & Media Tool (GMT) is an integrated suite of commercial off-the-shelf products and custom-built software that will interweave real-time data. The goal of GMT is to provide this data in a clear and understandable way that is normally not possible when attempting to analyze simultaneously the disparate data sources such as news feeds, geospatial data, statistical information, bio-graphical data, and organizational information on people and other entities.
|Figure 4 The GMT interface is user-friendly and easy to navigate.|
The GMT interface, Figures 3 and 4, will easily navigate through this data, and will offer scoring and compilation of news and informa-tion into organized files. It will give the user the ability to filter this information by location or topic, and will allow for visualization of the data through various means most easily recognizable by the user.
After extensive research, evaluation and testing, PIIM selected MetaCarta’s GeoTagger as the best solution available to execute the key tasks of identifying location place names within unstructured data, performing the necessary disambiguation among possible candi-dates, and assigning the corresponding latitude/longitude information.
By geographically locating the source of the topic of news articles or information, the MetaCarta GeoTagger can provide GMT with a quick and easy way to sort and study data of several different types without having to sift through mountains of other non-applicable infor-mation.
Additionally, by offering the geospatial aspect of the data to the visualization piece of the GMT, the MetaCarta GeoTagger allows data to be represented geographically for the user.
The MetaCarta GeoTagger component of the PIIM GMT is an essential piece of the overall value being offered to the highest level of deci-sion makers and extends that value by offering the ability to see the world in a way to which we are all accustomed – based on geography.
MetaCarta technology is currently being used by highly diverse federal agencies and consistently delivers significant value in helping knowledge workers find the information they need in the mounds of data they must sift through. The same GeoTagger technology used by the Geospace and Media Tool is currently being used by the Defense Intelligence Agency (DIA) in its Direct-access User Knowledge Environment (DUKE) Information Management System. The integration of MetaCarta’s GeoTagger technology allows the DUKE system to exploit un-structured text messages, producing valuable geospatial content that is incorporated with other data for display and interpretation by the intelli-gence analyst.
The Environmental Protection Agency (EPA) has integrated MetaCarta technology into its Window to My Environment (WME), a web-based tool that provides the public with a wide range of federal, state and local environmental information on a particular geographic area of interest, through EPA’s website. The integration of MetaCarta’s unique technology will enable WME users easily to search and retrieve documents, reports, and other resources within a specific geographic region, reducing the time needed to find and analyze the most relevant results.
On the state and local level, the Arizona Counter-Terrorism Information Center (ACTIC), a unit combining 15 federal, state and local law enforcement agencies, deployed MetaCarta technology to assist with its intelligence-gathering efforts. The unit’s officers found MetaCarta GTS capabilities highly effective in searching open sources of information such as local newspaper websites on either side of the U.S.-Mexican border to gather intelligence about a specific location in a matter of seconds.
The technology proved itself especially useful during the Minuteman Project – an exercise organized by citizens to monitor illegal immi-gration across the U.S.-Mexican border – by enabling ACTIC’s officers to observe events on both sides of the border without having to send officers to Mexico, where tensions were already heightened because of the exercise. In a border security situation, MetaCarta technology al-lows law enforcement officers to see patterns of information and hotspots of activity, streamlining their process to arrive more rapidly at an appropriate actionable decision.
New applications of MetaCarta technology are nearly unlimited. The same patterns of information that are so critical in a border security context can also be applied to disaster recovery. After a disaster, federal, state and local agencies work together to restore services, rescue citi-zens and return order. With many entities generating reports and other documents, MetaCarta technology would quickly become indispensable.
Local law enforcement departments also encounter large amounts of unstructured data with geospatial references. Using MetaCarta tech-nology, hotspots of activity would be quickly visible, giving the departments the ability to deploy their resources most effectively.
In order to keep government agencies and their analysts informed, technologists must provide the big picture. Using only structured data misses a large part of the story. The ability to search unstructured data, combined with the ability to identify geographic location ensures that decision mak-ers, watchstanders and first responders have the intelligence necessary to make timely, appropriate decisions.