Results

Year One

During the first year of the project (Aug. 2009 to Aug. 2010), we focused on 1) developing improved image representations, and 2) implementing a pipeline for automatically compiling a dataset of sample object images using the USGS GNIS and National Map.

We developed bag-of-visual-words (BOVW) methods for performing land-use image retrieval and classification in high-resolution overhead imagery. With regard to image retrieval, we completed a thorough investigation of order-less (non-spatial) BOVW approaches on an extensive 21 land-use class dataset manually extracted from 1 foot resolution aerial imagery publicly available from the USGS National Map. We examined: the effect of the size of the visual vocabulary, whether it is better to use the Mahalanobis or Euclidean distance to perform the k-means clustering to compute the quantization codebook, how many sample points should be used to perform the clustering, and the best distance measure. We determined: that BOVW approaches outperform other image descriptors such as color and texture; that performance continues to increase with larger vocabularies although does eventually plateau; the Euclidean distance is better than the Mahalanobis distance for creating the quantization codebook; that clustering larger numbers of points to create the codebook is better; and that the L1 distance outperforms other histogram comparison methods. This results of this work are being prepared for journal publication.

We also investigated BOVW methods for land-use classification. Using the same 21 class dataset, we investigated non-spatial BOVW representations as well as two spatial extensions in a support vector machine classification framework: the spatial pyramid match kernel (SPMK) [Lazebnik et al., CVPR 2003] which considers the absolute spatial arrangement of the image features, as well as a novel method which we term the spatial co-occurrence kernel (SCK) that considers the relative arrangement. These extensions are motivated by the importance of spatial structure in geographic data. We compared the BOVW features to standard features such as color and texture. We showed that while the BOVW-based approaches do not perform better overall than the best standard approach, color histograms extracted in the HLS colorspace, they represent a robust alternative that is more effective for certain classes. We also showed that our SCK extension consistently improves upon a non-spatial BOVW baseline as well as the SPMK. For more details, please see our 2010 ACM SIGSPTIAL GIS paper Bag-Of-Visual-Words and Spatial Extensions for Land-Use Classification. We anticipate that the BOVW representations will play a signficant role in the object appearance models we will develop in the subsequent stages of this project as the objects are typically composed of one or more of the land-use classes.

During the first year, we also implemented a processing pipeline for compiling sample object images. Jesus Pulido, an undergraduate researcher supported by the grant, designed a system which first queries the USGS GNIS gazetteer for object instances. It then uses the point locations of the returned records to download high-resolution aerial imagery from the USGS National Map. We have so far downloaded hundreds of sample images for a number of object classes including schools, parks, reservoirs, and stadiums. Jesus also extended our content-based geographic image retrieval (CBGIR) demo so that users could use the sample object images as query examples. This demo is described in our 2010 ACM SIGSPTIAL GIS demo paper and will be available online soon.

Year Two

During the second year of the project (Aug. 2010 to Aug. 2011), we 1) completed our work on spatial extensions to bag-of-visual-words (BOVW) for land-use/land-cover classification; 2) performed preliminary work on geospatial object models; and 3) developed the PedSeg application.

We completed our work on spatial extensions to bag-of-visual-words (BOVW) techniques for land-use/land-cover (LULC) classification. We developed a novel spatial pyramid co-occurrence kernel (SPCK) for use in SVM or other kernel-based analysis. In this approach, images are characterized by the co-occurrence of visual words with respect to spatial predicates such as proximity or relative angular position much like Haralick's grey-level co-occurrence matrices characterize image texture. The co-occurrences are computed at multiple scales in a pyramid configuration. The kernel is then computed as the histogram intersection between the hierarchical co-occurrence matrices. An early single-level variation of this technique, spatial co-occurrence kernel (SCK), was published as a full-paper titled "Bag-of-visual-words and spatial extensions for land-use classification" at the 2010 ACM SIGSPATIAL International Symposium on Advances in Geographic Information Systems. The more recent SPCK work has been accepted for publication in a paper titled "Spatial pyramid co-occurrence for image classification" at the 2011 IEEE International Conference on Computer Vision.

More recently, we have investigated hierarchical models for characterizing complex geospatial objects such as high schools and golf courses. This modeling is key to the project's goal of using overhead images to estimate the spatial extent of geospatial objects to update gazetteers (gazetteers are currently deficient in that they specify the spatial extent using a single latitude/longitude point). This work leverages our previous work on LULC classification. Our preliminary work on this problem has resuled in a model with three levels. At the lowest level, an image is characterized using quantized local invariant features. These are then aggregated into a BOVW representation at an intermediate level and classified into a number of LULC classes using soft assignment. The top level characterizes the object as a distribution over the LULC classes. We feel the intermediate, latent LULC representation is key to our approach as it allows us to bridge the gap between the low-level image features and the high-level objects. We demonstrated this work on a manually created ground truth dataset with four object types: high schools, golf courses, mobile home parks, and Costco stores.

Finally, we developed PedSeg, a system for computing the boundaries of visually-distinct geospatial objects by applying advanced segmentation techniques such as active contours to high resolution overhead imagery. The novel aspect of this work is that the image segmentation is seeded with a GPS track acquired by simply walking around or otherwise traversing the approximate boundary of the target object (thus the prefix Ped) with a low-cost GPS logger. The system is completely automated once the GPS track is uploaded to the system. High resolution imagery is downloaded from the USGS National Map and the GPS track is geo-registered and overlaid on the image. The GPS track then forms the initial contour of the segmentation. We successfully applied PedSeg to compute the spatial extents of a range of objects including small lakes and ponds, a quad on the UC Merced campus, a sports track in Santa Barbara, and a fountain in San Francisco. The work has been submitted as a demo paper titled "PedSeg: GPS tracks as priors for overhead image segmentation" to the 2011 ACM SIGSPATIAL GIS conference. This work contributes to the project in that the active contour segmentation techniques refined in the PedSeg system can later be applied using initial contours derived from the point spatial extents in gazetteers. It also represents a novel method for computing the spatial extents of visually distinctive geospatial objects.

Year Three

During the third year of the project (Aug. 2011 to Aug. 2012), we 1) continued development of our hierachcial object appearance models; and 2) started to investigate region-based object models.

We focused on supervised and semi-supervsied learning frameworks for the hierarchical models we developed to characterize complex geospatial objects such as high schools and golf courses. Again, this modeling is key to the funded project's goal of using overhead images to estimate the spatial extent of geospatial objects to update gazetteers (gazetteers are currently deficient in that they specify the spatial extent using a single latitude/longitude point). We first investigated a fully supervised learning approach in which the model is learned from a set of manually delineated ground-truth objects (for example, the boundaries of golf courses in images). This work was published in the paper Estimating the Spatial Extents of Geospatial Objects Using Hierarchical Models at the 2012 IEEE Workshop on Applications of Computer Vision. More recently, we have investigated a semi-supervised learning approach in which the model is learned from images believed to contain the object of interest but which lack the actual boundary (this the weakly labeled training data that is made available by integrating gazetteers and overhead imagery).

So far, our hierarchical object model has been based on a partitioning of an image on a regular grid. The boundary of the object is therefore constrained to the grid. Inspired by some of the papers surveyed in a special topics graduate course I taught this year on overhead image analysis, we are considering an alternate approach in which the model is based on a segmentation of the image into homogeneous regions. The benefit of this approach is that the object boundary is no longer constrained to a predefined grid but can follow the actual boundaries present in the image. The challenge is that segmentation is a difficult problem.

Year Four

During the fourth year of the project (Aug. 2012 to Jul. 2013), we investigated semi-supervised learning frameworks for incorporating the weakly labeled data provided by the gazetteers.

We developed a method to learn our previously developed hierarchical object models from a combination of strongly and weakly labeled data. The weakly labeled data are images centered on the point locations of known object instances as provided by the gazetteers. However, the gazetteers only provide this point and so the assumption is made that the image regions surrounding this point are likely to belong to the object of interest. We incorporated this weakly labeled data using a relevancy framework in which a relevance function is used to rank tiled regions in the weakly labeled images. The highly ranked regions are then used to learn the object model along with strongly labeled image regions provided by a manual labeling of training images.

The proposed semi-supervised learning framework which incorporates the weakly labeled data was shown to result in more accurate models than a fully supervised learning framework in which only strongly labeled training data is used. The models are evaluated by observing how well they classify a set of manually created ground truth data.