Usage of GIS data has exploded in the past decade, especially for field use.
An aerial photo of the Seattle waterfront, as captured by MapMart.
Landsat 7 was launched in 1999 with 15-meter resolution for panchromatic imagery.
LiDAR data can be acquired aerially, terrestrially, or even by remotely controlled vehicles, shown here.
A set of LiDAR points representing a portion of the Grand Teton mountains, viewed as a MrSID file in LizardTech’s standalone GeoViewer. Data courtesy of Sanborn Map Company.
By Jon Skiffington, Director of Product Management
Matt Fleagle, Senior Technical Writer Seattle, Wash. LizardTech
No one is more familiar with the headaches caused by the explosion of geospatial data in recent years than LizardTech. Though some in the industry may not know us by our company name, most geospatial professionals working with raster data have at least a nodding acquaintance with the MrSID image format, which LizardTech commercialized and developed starting in the mid-1990s. We now make software products and solutions that enable organizations to manage and distribute massive, high-resolution geospatial data such as aerial and satellite imagery and LiDAR data.
Editor’s Note: Our feature story on LiDar this issue is here.
So we’re in a good position to address basic questions about handling “all that data.” Why is there so much of it these days? What kind of problems does the increasing volume of geospatial data in use and in distribution pose for industry professionals? Also, since we’re regarded as the experts in geospatial imaging technology, how does LizardTech help the industry get its work done, not only from a product standpoint but also technologically?
In addressing these questions, we’ll look at two issues that underlie them and that cause the headaches that users are experiencing today: the quantity of data they must work with – not merely the volume of data, but the size of files as well – and the distribution of that data. Then we’ll look at the kinds of things that can be done to address both issues.
Whence So Much Data?
Why does it seem that there is so much more geospatial data today and that files are larger? Well, because there is and they are. Even before global counterterrorism efforts and the Indian Ocean Tsunami and Hurricane Katrina disasters brought the critical role of geospatial data into sharp relief, use of high-resolution imagery was on the rise for reasons as different as crop identification and civil litigation, and both technology and demand continue leap-frogging each other today. See Figure 1.
In the case of aerial orthophoto-graphy in the United States, more areas of the country are being flown all the time, and flown more often. Here in Washington State, for example, all 39 counties were flown in both 2006 and 2009 as part of the USDA’s National Agriculture Imagery Program (NAIP). See Figure 2. Three years earlier in 2003, less than half of them were flown. How much of a state gets flown for NAIP – and how often – is determined by the availability of federal funds and the health of state budgets, but the trend seems to indicate that the ideal would be “total coverage every year.”
Satellite and aerial imagery is also of increasingly higher resolution. When Landsat 7 was launched in 1999 with state-of-the-art image capture technology, it carried a panchromatic (black and white) sensor with a resolution of 15 meters and a multispectral (including RGB) sensor of 30-meter resolution. See Figure 3. At the same time, IKONOS was launched by then-Space Imaging (now GeoEye), with 1-meter panchromatic and 4-meter multi-spectral resolution. By comparison, GeoEye’s GeoEye-1, launched in 2008, is .41-meter pan and 1.65-meter multispectral resolution. Digital Globe’s WorldView-2, launched in 2009, boasts 50-cm panchromatic resolution and 2-meter multispectral resolution. Quick arithmetic tells us that an image of the same region taken by the latter will represent 16-30 times the data of that taken by Landsat 7.
Speaking of multispectral imagery, the increasing number of bands (or spectra) being imaged adds to the girth of image datasets. Aerial images of mid-last century were composed of a single panchromatic band. Landsat 7 was chartered with sensing light in seven bands; WorldView-2 is able to image in eight different bands. These multispectral arrays are still discrete spectral bands, but there is also increasing use of hyperspectral images, which use a continuous spectrum and therefore enable much narrower analytical distinctions in geospatial applications.
We are not yet at the end of the increase. LiDAR data, which represents a three-dimensional view of the world through collected laser pulse returns, is enjoying increasing use as well. Only last year, the alley behind the building LizardTech is in was scanned using terrestrial LiDAR equipment. The exact location of every brick in every wall has been recorded in anticipation of a potential deep-bore tunnel project under a nearby street so that the state’s Department of Transportation will have data to consult if property owners later claim that the tunnel construction caused their buildings to move.
Like its raster cousin, LiDAR data can be sensed aerially from above, laterally via terrestrial imaging, and even from vehicles using mobile technologies. See Figure 4. And like imaging sensors, LiDAR sensors are enjoying swift advances that make it easier to collect more and more data cheaply and quickly. In the case of our back alley, measurements must be taken in both summer and winter and at high and low tides, because temperature and tide levels both cause buildings to shift slightly. That’s going to be a very large repository of data stored against the mere possibility of litigation.
If higher resolution means that it takes more files of the same size to cover a given region, then users want to mosaic files together to preserve usefulness at the cost of dealing with significantly larger files. Thus file sizes increase along with the quantities of data. And even if resolution were not getting finer, users would still want to create mosaics of as large a size as they possibly could. It is an instance of the old maxim that we always consume whatever resources become available. As technology enables us to handle file sizes that once bedeviled us, we want to pack yet more data into our files.
Imagery is now expected in more environments than ever. A simple phenomenon that anyone can observe is the addition of new kinds of imagery to tools we use online every day. Ten years ago, only those in the geospatial industry used this kind of imagery on a daily basis. Even five years ago, you didn’t expect raster imagery to become available in navigational tools such as MapQuest until you had zoomed in a great deal.
Today, when you open MapQuest, or any number of newer online websites such as Zillow’s real estate locator tool, you can immediately view satellite imagery at the overview level. In Google Earth, when you’ve zoomed down as far as overhead imagery will allow, you can switch to Google Street View to look at street fronts laterally as from a car window. Microsoft’s Bing Maps offers oblique “bird’s eye” views in many urban and suburban areas. All this imagery loads quickly, and it’s fair to imagine that users of online applications are becoming accustomed to navigating maps primarily by recognizing features in raster data rather than by vector representations.
It takes a lot of server power to make all this happen. We just noted that geospatial data is contained in increasingly large files. Sizes vary a lot, but a typical digital orthophoto quarter quad (DOQQ) might measure 50 megabytes (MB) for a grayscale image and 150 MB for RGB plus infrared (RGB-IR), and mosaics are routinely made up of thousands of such images. All this data is hard to transfer from provider to user, or pull into a viewing application across a network.
Both the size and distribution of geospatial data have been addressed a number of ways. Several file formats have been adopted by the industry for compressing imagery, such as the ECW format developed by ERMapper, the ISO standard JPEG 2000, and LizardTech’s MrSID format. Compressing imagery not only makes it easier to store and view, but is also the first and simplest means of facilitating distribution across networks.
LiDAR is unlike raster data in ways that make it very difficult to compress. While raster images are composed of pixels all lying next to each other in a regular grid, LiDAR points are completely random in their spacing.
Hailed initially as the new geospatial standard, JPEG 2000 has been slow to achieve traction in the industry because of the complexity of the options exposed to the creators of images, but it remains the most flexible format for encoding compressed imagery for specific workflows. One problem LizardTech solved for creators of JPEG 2000 imagery was compliance with government-mandated “encoding profiles” such as “NITF Preferred JPEG2000 Encoding” (NPJE) and “Exploitation Preferred JPEG2000 Encoding” (EPJE), which were meant to ensure that the right JPEG 2000 parameters were used to encode images bound for given workflows, such as rapid decoding at full resolution, or quickly being able to pan and zoom around the image. These profiles were difficult for image providers to implement until LizardTech introduced predefined groups of settings for these and other profiles in its GeoExpress compression and image manipulation software.
LizardTech’s proprietary format, MrSID goes even further in making optimizing decisions under the hood so that good results are achieved regardless of the workflow. In this model, the creators of the imagery have a little less control over very advanced parameters, but are not overwhelmed by choices that can come back to bite them later. And MrSID has proved successful. Over the years LizardTech has released three versions of the MrSID format – MrSID Generation 2 (MG2) in the mid-1990s, MG3 in 2002 and, most recently, MG4, which supports both raster imagery and LiDAR data. MG3, MG4 and JPEG 2000 support lossless compression, which means that file sizes can be reduced generally by half while retaining literally every “bit” of the original image, and that the raw original can be retrieved from the lossless MrSID or JPEG 2000 file.
When the Maine Department of Transportation (MaineDOT) needed to make aerial imagery available digitally to internal agencies and the public at large, it used LizardTech’s GeoExpress software to compress 960 GB of raw imagery to 80 GB in MrSID format. Their workflow included not only spring- and fall- flown aerial imagery acquired through contracted flights, but also a backlog of 40,000 older, paper-based aerial images, which were scanned to TIFF and then compressed. MaineDOT achieved both storage and distribution efficiency while maintaining image quality at compression ratios from 12:1 up to 20:1.
LiDAR data, which can be thought of as a “cloud” of points that reveal topographies and topographic features in three-dimensional space, has presented fresh challenges for everyone. The result of a contracted flyover, a raw point cloud is represented by thousands of LAS files of a certain point density – the greater the density, the greater the size and number of the files. Accordingly, one of the problems users of LiDAR data experience is that they can’t easily use all the raw LiDAR data they’ve paid for. Trying to navigate regions divided up among so many separate files is debilitating.
LiDAR is unlike raster data in ways that make it very difficult to compress. While raster images are composed of pixels all lying next to each other in a regular grid, LiDAR points are completely random in their spacing. Most efficiencies of proximity and probability that aid the raster compression process are useless in LiDAR compression.
LizardTech had to come up with brand new algorithms for compressing LiDAR data, but the effort paid off. LizardTech’s LiDAR Compressor software mosaics those large and numerous LAS files together and compresses them to a single LiDAR file in MrSID format in the same way that GeoExpress mosaics image tiles into one big MrSID raster image. This enables users to put more of their data to work. See Figure 5.
What Servers Can Do
Compressed image formats help with the distribution issue merely by making files smaller so that they take up less bandwidth in transfer, but even more can be done with image servers. Numerous server applications are available, from proprietary solutions such as ERDAS Image Web Server, ESRI ArcGIS Server, and LizardTech Express Server, to open source technologies such as MapServer and MapGuide. Many servers support open standards, such as WMS (Web Mapping Service).
LizardTech’s Express Server image-serving software leverages the strengths of an existing image server, speeding up image delivery and enabling user-defined extractions based on scene, resolution, or quality. Express Server achieves this by taking advantage of the fact that multiple resolutions of an image are contained within a MrSID or JPEG 2000 file – like an image pyramid, but without the storage overhead. This way, if a user zooms in to a small region of an image on their viewing client, the server can extract all the image “quality” – the full-resolution image detail – for that portion of the image and send it to the client immediately, rather than having to decode the entire image.
These were the efficiencies sought by the State of New Jersey’s Office of Geographic Information Systems (OGIS). OGIS had an archive of over 900 gigabytes of losslessly compressed JPEG 2000 files stored on a SAN (storage area network) and needed an extensible solution that would provide easy access to this imagery as well as all future datasets for state residents, the GIS community, and the online public. This solution needed to enable WMS access to imagery from file-based storage for easy image management. OGIS ran two installations of Express Server software – load balanced for performance and reliability – and employed Express Server in support of two public applications (Information Warehouse and Business Map – see http://njwebmap.state.nj.us) and provided WMS services for other users who required imagery.
The challenges continue. As new technologies for analyzing the earth are developed and become commonplace in the geospatial environment, LizardTech will be figuring out ways to make raster, LiDAR and other data easy to use in applications and to transfer over networks.