Abstract: |
A new representation of 3-d object appearance from video sequences has been developed over the past several years (Pollard and Mundy, 2007; Pollard, 2008; Crispell, 2010), which combines the ideas of background modeling and volumetric multi-view reconstruction. In this representation, Gaussian mixture models for intensity or color are stored in volumetric units. This 3-d probabilistic volume model, PVM, is learned from a video sequence by an on-line Bayesian updating algorithm. To date, the PVM representation has been applied to video image registration (Crispell et al., 2008), change detection (Pollard and Mundy, 2007) and classification of changes as vehicles in 2-d only (Mundy and Ozcanli, 2009; O¨ zcanli and Mundy, 2010). In this paper, the PVM is used to develop novel viewpoint-independent features of object appearance directly in 3-d. The resulting description is then used in a bag-of-features classification algorithm to recognize buildings, houses, parked cars, parked aircraft and parking lots in aerial scenes collected over Providence, Rhode Island, USA. Two approaches to feature description are described and compared: 1) features derived from a PCA analysis of model neighborhoods; and 2) features derived from the coefficients of a 3-d Taylor series expansion within each neighborhood. It is shown that both feature types explain the data with similar accuracy. Finally, the effectiveness of both feature types for recognition is compared for the different categories. Encouraging experimental results demonstrate the descriptive power of the PVM representation for object recognition tasks, promising successful extension to more complex recognition systems. |