The spectral classification approach has been criticized by many researchers when fine resolution imagery is used for urban feature extraction. Texture plays an important role in the human visual system. There have been some attempts to improve the spectral classification of remotely sensed data using texture analysis. One of the serious difficulties of texture analysis in the past was the lack of adequate tools to represent textures at different scales with the use of multi sub-bands. Recent developments in the very powerful mathematical theory of wavelet transforms have received overwhelming attention by the image analysts to overcome this difficulty. This study introduces a new approach to classify urban landscapes with the use of texture at multiple scales and the combination of multichannels. Wavelet transforms based on different mother bodies were also compared and examined. Seven types of urban land cover features were evaluated using this approach.
This research examines the utility of wavelet transforms as an innovative
classification approach for tracking urban features from high-resolution
multi-spectral image data. The advanced Thermal and Land Applications Sensor
(ATLAS) image data at 10 m spatial resolution acquired with 15 channels
(0.45 micrometer - 12.2 micrometer) was used for this study. The data was
collected by a NASA Lear Jet flying at 16,500 feet over Baton Rouge, Louisiana,
on May 11, 1998. Figure 1 shows a false color composite of the Baton Rouge
area by displaying channel 2 (0.52 – 0.60 micrometer) in green, channel
6 (0.76 – 0.90 micrometer) in red, and channel 13 (9.60 – 10.2 micrometer)
in blue. Traditional image classification methods, such as maximum likelihood
classifiers use spectral information (pixel values) as a basis to analyze
and classify remote sensing images. These methods often require additional
intensive human interpretation based on higher-resolution reference data
for accurate land cover classification. They become less efficient when
complex urban features are to be analyzed. Urban landscapes are composed
of diverse materials (concrete, asphalt, metal, plastic, glass, shingles,
water, grass, shrubs, trees, and soil) arranged by humans in complex ways
to build housing, transportation systems, utilities, commercial buildings,
and recreational landscapes (Swerdlow, 1998). Trained image analysts utilize
the tone, color, texture, shape, size, orientation, pattern, shadow silhouette,
site, and situation of objects in the urban landscape to identify and judge
their significance (Jensen, 1996). Unfortunately, until present, the traditional
techniques have proven inadequate due to the lack of efficient tools to
digitally classify the urban land cover features in high-resolution image
data. The problem is mainly due to their complex nature. To extract the
heterogeneous nature of urban features in high-resolution images, we need
the texture information contained in a group of pixels instead of individual
spectral values.
Figure 1. A false color composite of Baton Rouge area (a subset) by
displaying channel 2 (0.52 – 0.60 micrometer) in green,
channel 6 (0.76 – 0.90 micrometer) in red, and channel 13 (9.60 – 10.2
micrometer) in blue.
Urban or built-up land is composed of areas of intensive use with much
of the land covered by structures. These structures include cities;
towns; villages; strip developments along highways; transportation, power,
and communication facilities; and areas occupied by mills, shopping centers,
industrial and commercial complexes, and institutions that may even be
isolated from the urban areas. (Lillesand and Kiefer, 1994). The spectral
classification approach has been criticized when fine resolution imagery
is used especially for urban features (Latty & Hoffer, 1981; Markham
& Townsend, 1981; Woodcock & Strahler, 1987; Cushnie, 1987). Texture
plays an important role in the human visual system for pattern recognition
and interpretation. In image interpretation, pattern is defined as the
overall spatial form of related features, and the repetition of certain
forms is a characteristic pattern found in many cultural objects and some
natural features. Texture is the visual impression of coarseness or smoothness
caused by the variability or uniformity of image tone or color (Avery and
Berlin, 1992). Local variability in remotely sensed data can be characterized
by computing statistics of a group of pixels, e.g., coefficient of variance
or autocovariance, or by analysis of fractal relationships. There have
been some attempts to improve the spectral analysis of remotely sensed
data by using texture transforms in which some measure of variability in
DN values is estimated within local windows; e.g. contrast between neighboring
pixels (Edwards et al., 1988); the standard deviation (Arai 1993), or local
variance (Woodcock and Harward, 1992). The coefficient of variance gives
a measure of the total relative variation of pixel values in an area and
can be computed easily, but it gives no information about spatial patterns
(De Jong and Burrough, 1995). Snow and Mayer (1992); Klinkenberg (1992);
and Burrough (1993) criticized many other neighborhood operations such
as diversity or variation filters. Their absolute outcome was easy to compare
but they did not reveal any information on spatial irregularities. Lam
(1990) and Lam et al. (1992) demonstrated that the fractal dimension of
remote sensing data could yield quantitative insight on the spatial complexity
and information content contained within these data. Quattrochi et al.
(1997) used a software package known as the Image Characterization and
Modelling Systems (ICAMS) to explore how fractal dimension is related to
surface texture. They also investigated how spatial resolution affects
the computed fractal dimension of ideal fractal sets by using the isarithm
method (Lam and De Cola, 1993), the variogram (Mark and Aronson, 1984),
and the triangular prism methods (Clarke, 1986). De Jong and Burrough (1995)
analyzed variograms of remotely sensed measurements to quantitatively describe
the spatial patterns. Variogram interpretation of satellite data was also
carried out by Woodcock et al., (1988a); Woodcock et al., (1988b); and
Webster et al., (1989). Emerson et al., (1999) analyzed the fractal dimension
using isarithm method and the spatial auto-correlation of satellite imagery
using Moran’s I and Geary’s C to observe the differing spatial structures
of the smooth and rough surfaces in remotely sensed images.
In general, the above mentioned methods have been successful to a certain extent. Most of them primarily focus on the coupling between image pixels on a single scale and within a single band. These methods alone might not be able to provide satisfactory accuracy when they are applied to fine resolution remotely sensed images with the use of relatively small local windows to differentiate between very closely related features or similar clusters. That is especially true when the above methods are applied to the normal original images. Recent developments in the mathematical theory of wavelet transform approaches based on multi-channel or multi-resolution analysis has received overwhelming attention. There have been a number of developments in spatial frequency analysis of mathematical transforms, which provide multi-resolution analysis. Of all transforms, wavelets play the most outstanding part in texture analysis of remotely sensed images. In this study, as explained in the next section, wavelet transforms are applied to extract textural features of multi-spectral urban images. There is a possibility that we could improve the classification accuracy in the future by combining the above mentioned methods and wavelet transforms.
In the past, one difficulty of texture analysis was the lack of adequate tools to characterize different scales of texture effectively. Recent developments in multi-resolution analysis such as Gabor and wavelet transforms have helped to overcome this difficulty (Zhu and Yang, 1998). A key idea for wavelets is the concept of “scale”. Sums and differences are at the finest scale. We can move to a larger picture by taking sums and differences again. This is recursion – the same transform at a new scale. It leads to a multi-resolution of the original signal (Strang and Nguyen, 1997). The discrete wavelet transform proposed by Mallat (1989) initially decomposes an image into one “approximation” image and three “detail” images. It filters the original image with complementary low-pass and high-pass filters in each dimension. The filtered images are downsampled at every other pixel producing four images of half the resolution of the original (Tonsmann and Tyler, 1999).
A discrete wavelet transform can be used to compute the multi-resolution representation of signals (rows and columns of images in this study). Approximation of a signal cAj+1, also known as the trend, can be obtained by convolving the input signal cAj with the low pass filter Lo_D. Downsampling is done by keeping one column or one row out of two (a pixel in this case). All the approximations, cAj+1, 1 < j < J, can be obtained by repeating the process (where J is the maximum scale). cDj+1 denotes the difference of the input signal, also called the details at scale j. The detail cDj+1 is computed by convolving cAj with the high pass filter Hi_D and returning every other sample of output. The performances of wavelet transforms of a discrete signal cAj can be obtained by successively decomposing cAj+1 into cAj+2 and cDj+2 for 0 <= j < J. This representation provides information about approximation and details of signals at different scales. This multi-resolution analysis can be easily extended to two-dimensions by transforming in row and column directions separately. A wavelet transform of an image consists of four sub-images each with a quarter of the original area. The sub-image, composed of the low frequency parts for both row and column, is called an approximated image. The remaining three images, containing high frequency components, are termed detail images. This kind of two-dimensional wavelet transform leads to a decomposition of approximation coefficients at level j in four components: the approximation coefficients and the details in three orientations (horizontal, vertical, and diagonal) at level j+1. Figure 2 shows the basic decomposition steps for images.
The dilation equation related to the low pass filter is:
The wavelet equation related to high pass filter is:
where c(k) and d(k) are the coefficients.
The Haar wavelet transform has coefficients:
Its dilation equation can be expressed as:
and its wavelet equation as:
The Haar wavelet transform is the simplest orthonormal basis. More details
can be observed in Strand and Nguyen (1997). Initially, Haar wavelets were
used to build our prototypes. The usefulness of other wavelets such as
Daubechies, Mallat, and Symlet are currently being investigated. From the
standard wavelet decomposition, it is understood that further decomposition
is done using the low frequency channels. However, the most important information
for texture appears in the high frequency channels (the detail sub-bands).
Thus, upsampling is performed using the first level three detail sub-images.
The approximation image was discarded in the upsampling process. Figure
3 shows the basic standard reconstruction steps for these images. However,
it should be noted that reconstruction was done without using the approximation
image in this study since the detail images have more valuable textural
information.
The detail sub-images were reconstructed by adding a row of zeros, convolving
the columns with a one dimensional filter, adding a column of zeros between
each column of the resulting image, and convolving the rows with another
one dimensional filter. Figure 4 represents a diagram of standard orthonormal
wavelet decomposition with 2 levels and multi-resolution wavelet representation
of a sample image in such decomposition process. In this study, further
decomposition was carried out on the horizontal edge since more textural
information can be obtained in the middle frequency channels. Zhu and Yang
(1998) demonstrated that the decomposition of horizontal images was more
efficient than the standard decomposition technique in their study.
In this study, three approaches were employed: (1) the standard decomposition; (2) decomposition with the reconstruction of the first level three detail images; and (3) decomposition with the horizontal detail sub-band. Up to four levels of decomposition can be used with the local window size of 45 x 45. The multi-resolution approach of wavelet transforms provides textural information of images at different scales from coarse to fine. This is the unique property of wavelet transforms. Each level and each sub-image yielded additional frequency and spatial properties. Different wavelet decomposition models, multi-channels, and different mother bodies of the same features were examined in this study. The channels selected for this study included channel 2 (0.52 - 0.60 micrometer: visible), channel 6 (0.76 - 0.93 micrometer: reflected infrared), and channel 13 (9.60 - 10.2 micrometer: thermal infrared). Decomposition was carried out with the three channels and analyzed the textural features separately for seven classes.
The performances of wavelet transforms were evaluated for the classification of seven different land cover types derived from the ATLAS data. These classes include residential-1 (single family homes with < 30% tree canopy), residential-2 (single family homes with between 30 - 60% tree canopy), residential-3 (single family homes with > 60% tree canopy), dense vegetation, commercial and offices, water bodies, and agriculture land. Samples of the classes are shown in Figure 5.
Lark (1996) discussed a working definition of texture based on the variance
of their DN values, characteristic scale (or scales) of variability, directional
dependence (anisotropy), and spatial periodicity of a particular texture.
In general, the samples of water body and dense vegetation do not have
directionality whereas the first 3 residential areas (commercial and offices
and some agriculture land) have directionality and spatial periodicity.
It was difficult to make a decision on the size of the local window for
the texture analysis. However, we know that the window size depends on:
(i) scale (resolution) of the image,
(ii) directionality of objects in a texture feature,
(iii) spatial periodicity or size of objects which form a particular
texture, and
(iv) spatial variation of that texture.
The accuracy should increase with a larger window size since it would
contain more information (variability of digital value) and provide more
complete coverage of the directionality and spatial periodicity of a particular
texture than a smaller window size. Texture of a residential area consists
of single family homes, lawns, backyards, trees, streets, concrete footpaths,
and/or lakes. If we were to use 2 m resolution data, it would be impossible
for us to analyze the above residential texture features with the use of
a local window size less than 40 x 40 since the window needs to cover at
least 1 single family home with all above mentioned objects. Pesaresi (2000)
experimented with 47 different square window sizes, ranging from 5 x 5
to 99 x 99 and showed the increase of histogram separation index with the
increase of window size. In this study, different window sizes were not
tested since the immediate objective was to examine the utility of the
innovative classification algorithm with the use of wavelet transforms.
The local window size of 45 x 45 which was expected to be moderate was
used without prior testing and comparing the accuracy of different window
sizes since this was not the main purpose of this study. Ten training and
ten testing samples were used for each class in the analysis.
Four levels for the first and second approaches and three levels for the third approach were performed for each training and testing sample. Sixteen sub-images were obtained for every sub-sample with both standard and reconstructed details. Twelve sub-images were obtained for the horizontal detail decomposition. Each level and each sub-band provided its spatial properties and characteristic frequency. Pesaresi (1999) tested contrast, angular second moment, inverse difference moment, and entropy with the use of raw original image data for urban pattern recognition. Albuz et al. (1999) used the sum of squares of the wavelet coefficients of each sub-band for their image retrieval system. Sheikholeslami et al. (1999) calculated the mean and variance of wavelet coefficients to represent the contrast of the image and counted the number of edge pixels in the horizontal, vertical, and diagonal directions to have an approximation of directionality of the image. In this study, the Shannon’s entropy measure was computed for each sub-image and its value was used as a distinct feature value in a vector. Entropy, in general, is the amount of information provided by an observation. There are sixteen real values for each sub-image for the first two approaches and twelve real values for the last model. The mean vector of 16 and 12 feature vectors of the 10 training samples of a class was used as a total feature vector of the samples. The total feature vector of the training samples were treated as the representations of the classes and used for the classification. The total feature vector of the testing samples were also computed and used to test the accuracy.
Shannon’s entropy measure is defined as:
Where N is the number of gray levels present in the image and c(i,j) is the value of an image at (i,j).
The performance of a minimum distance classifier was evaluated for texture
classification. Each pattern class Ck is represented by a prototype pattern
Pk. In this study, Pk (k = 1, 2,…,7) were total feature vectors of the
training samples. The minimum distance classifier assigns an unknown class
pattern Q to the class Sk, if the distance Rk between Q and Pk is minimum
among all possible class prototypes. The Euclidean distance is defined
as
Using the above approach, classification of the 7 texture images shown in Figure 5 was carried out. There are four levels for the first and second approaches, and three levels for the third approach. This study used 70 (10x7) each for training and testing sub-samples. The results of the classification are shown in tables 1, 2, and 3 for channel 13, 6 and 2 respectively.
Using the wavelet transform features of individual channel data, the accuracy was found to be very low for all three approaches. The accuracy can be as high as 77.1% when using channel 6 data alone with the first approach (standard decomposition) of level 1 – 4. The three decomposition models gave different results. The first approach (standard decomposition) was found to be the most efficient. Feature vector for the combination of different levels proved to be better than any single level decomposition. In general, the higher level had lower accuracy for all approaches. Channel 6 alone was found to be more efficient than others for the texture analysis.
Most image analysts would agree that, when extracting urban/suburban information from remotely sensed data, it is more important to have high spatial resolution (often ? 5 by 5 m) than high spectral resolution (i.e., a large number of spectral bands) (Jensen and Cowen, 1999). It is obvious that the textures among 3 residential areas and the textures of dense vegetation and residential-3 (single family homes with > 60% tree canopy) were similar to each other. It was one of the major problems, which makes the classification more difficult than other land use and land cover texture analysis. It was observed that textures of the same area in different bands (e.g., visible, reflected infrared, and thermal infrared) were different in terms of contrast, smoothness/coarseness and spatial variation (Figure 6). By visual observation, texture appearances in channel 2 seemed to be weaker than the other 2 channels. Taking advantage of different textural information of the same windows or areas of the same class in different channels, multi-spectral texture analysis has been introduced in this study to improve the accuracy of the classification. Using this approach, total feature vector of a sample with different channel combinations using minimum distance classifier was examined. Channel combinations included: (1) channel 13, 6, 2; (2) channel 13 and 6; (3) channel 13 and 2; and (4) channel 6 and 2. The purpose of multi-channel texture analysis was to achieve complementary textural information benefits from different channels, which could definitely be expected to improve the mapping accuracy. The results of the multispectral texture classification are shown in the tables 4, 5, 6, and 7 for the above listed channel combinations.
With the use of wavelet transforms, seven types of urban image features were successfully classified. As expected, the multi-channel approach significantly improved the classification accuracy. In general, the accuracy obtained from multiband combinations was higher than the single band approach. This accuracy was as high as 92.9 % when using 3 channels or combination of channels 13 and 6. It is still difficult to say which band combination was most efficient. In the future, it is recommended that more samples and feature classes be tested to yield better comparisons. Combinations of channels 12 and 2, and channels 6 and 2 gave low accuracy. This was mainly due to the weakness of texture features in channel 2 as observed earlier by checking the texture appearances in different bands visually. Future work should investigate the optimum local window size, which provides the satisfactory accuracy. There is also a question of another candidate, which could provide better accuracy than the use of Shannon’s entropy measure. The preliminary results of this research indicated that the accuracy of texture analysis in classifying fine resolution image data could be significantly improved with the use of wavelet transforms approach.
The author would like to thank Nina Lam, Geography and Regional Science Program, NSF and John Tyler, Department of Computer Science, LSU for their suggestions and guidance during the analytical phases of this research. Thanks are also extended to Nan Walker, Coastal Studies Institute, Department of Oceanography, LSU for her assistance in preparing this paper.
Albuz, Elif, E. Kocalar, A.A. Khokhar, 1999. Vector-wavelet based scalable indexing and retrieval system for large color image archives, Proceedings of the IEEE International Confference on Acoustics, Speech, and Signal Processing, Phoenix, Arizona, 4:3021-3024.
Arai, K., 1993. A classification method with a spatial-spectral variability. International Journal of Remote Sensing, 14:699-709.
Avery, T.E., and G.L. Berlin, 1992. Fundamentals of Remote Sensing and Airphoto Interpretation, Macmillan Publishing Co., New York, 472.
Burrough, P.A., 1993. Soil Variability: A Late 20th Century View, Soils and Fertilizers, pp. 529-562.
Clarke, K.C., 1986. Computation of the Fractal Dimension of Topographic Surfaces Using the Triangular Prism Surface Area Method, Computers and Geosciences, 12(5):713-722.
Cushnie, J.L., 1987. The interactive effect of spatial resolution and degree of internal variability within land-cover types on classification accuracies, International Journal of Remote Sensing, 8:12-29.
De Jong, S.M., and P.A. Burrough, 1995. A Fractal Approach to the classification of Mediterranean Vegetation Types in Remotely Sensed Images, Photogrammetric Engineering and Remote Sensing, 61:1041-1053.
Edwards, G., R. Landary, and K.P.B. Thomson, 1988. Texture analysis of forest regeneration sites in high-resolution SAR imagery. Proceedings of the International Geosciences and Remote Sensing Symposium (IGARSS 88), ESA SP-284 (Paris: European Space Agency), pp 1355-1360.
Emerson, C.W., N.S.N. Lam, and D.A. Quattrochi, 1999. Multi-Scale Fractal Analysis of Image Texture and Pattern, Photogrammetric Engineering and Remote Sensing, 65(1): 51-61.
Jensen, J.R., 1999. Remote Sensing of Urban/Suburban Infrastructure and Socio-Economic Attributes, , Photogrammetric Engineering and Remote Sensing, 65(5): 611-622.
------------, 1996. Introductory Digital Image Processing: A Remote Sensing Perspective, Prentice-Hall, Saddle River, New Jersey, 318 p.
Klinkenberg, B., 1992. Fractals and Morphometric Measures: Is there a Relationship? Geomorphology, 5:5-20.
Lam, N.S.N., 1990. Description and Measurement of Landsat TM Images Using Fractals, Photogrammetric Engineering and Remote Sensing, 56(2): 187-195.
Lam, N.S.N., and D.A. Quattrochi, 1992. On the Issues of Scale, Resolution, and Fractal Analysis in the Mapping Sciences, Professional Geographer, 44(1):88-97.
Lam, N.S.N., and L. De Cola, 1993. Fractal Simulation and Interpolation, Fractals in Geography (N.S.N. Lam and L. De Cola, editors), Prentice Hall, Englewood Cliffs, New Jersey, pp. 56-74.
Lark, R.M., 1996. Geostatistical description of texture on an aerial photograph for discriminating classes of land cover, International Journal of Remote Sensing, 17: 2115-2133.
Latty, R.S., and R.M. Hoffer, 1981. Computer based classification accuracy due to the spatial resolution using per point vs. per field classification techniques, Proc. of Symp. on Machine Process. of Remotely Sensed Data, West Lafayette, IN, pp. 384-392.
Lillesand, T.M., and R.W. Kiefer, 1994. Remote Sensing and Image Interpretation, John Wiley & Sons, Inc., third edition, 1994, New York, 750 p.
Markham, B.L., and Townshend, J.R.G. (1981) Land cover classification accuracy as a function of sensor spatial resolution. Proc. of 15th Int. Symp. On Remote Sensing of the Environment, West Lafayette, IN, pp. 384-392.
Mark, D.M., and P.B. Aronson, 1984. Scale Dependent Fractal Dimensions of Topographic Surfaces: An Empirical Investigation with Applications in Geomorphology and Computer Mapping, Mathematical Geology, 16:671-683.
Mallat, S.G., 1989. A Theory for multi-resolution signal decomposition: the wavelet representation, IEEE Transactions on Pattern Analysis and Machine Intelligence, 11:674-693.
Pesaresi, M., 2000. Texture Analysis for Urban Pattern Recognition Using Fine-resolution Panchromatic Satellite Imagery, Geographical and Environmental Modelling, 4(1):43-63.
Quattrochi, D.A., N.S.N. Lam, H. Qiu, and Wei Zhao, 1997. Image Characterization and Modeling System (ICAMS): A Geographic Information System for the Characterization and Modeling of Multi-scale Remote Sensing Data, Scale in Remote Sensing and GIS (D.A. Quattrochi and M.F. Goodchild, editors), CRC Press, Boca Raton, Florida, pp. 295-308.
Sheikholeslami, G., A. Zhang, and L. Bian, 1999. A Multi-Resolution Content-Based Retrieval Approach for Geographic Images, Geoinformatica, 3(2): 109-139.
Snow, R.S., and L. Mayer, 1992. Fractals in Geomorphology, Geomorphology, 5(1/2):194.
Strand, G., and T. Nguyen, 1997. Wavelets and Filter Banks. Wellesley-Cambridge Press, , revised edition, 1997, Wellesley, MA, USA, 520 p.
Swerdlow, J.L., 1998. Making Sense of the Millennium, National Geographic, 193:2-33.
Tonsmann, Guillermo, and J.M. Tyler, 1999. Estimation of oceanic surface velocity fields using wavelets, Proceedings of SPIE Conference on Wavelet Applications VI,Orlando, Florida, 3723:122-129.
Webster, R., and M.A. Oliver, 1992. Sample Adequately to Estimate Variograms of Soil Properties, J. Soil Science, 43:177-192.
Woodcock, C.E., and A.H. Strahler, 1987. The factor of scale in remote sensing, Remote Sensing of Environment, 21:311-332.
Woodcock, C.E., A.H. Strahler, and D.L.B. Jupp, 1988a. The use of Variograms in Remote Sensing: I. Scene Models and Simulated Images, Remote Sensing of Environment, 25:323-348.
------------, 1988b. The use of Variograms in Remote Sensing: II. Real Digital Images, Remote Sensing of Environment, 25:323-348.
Woodcock, C., and V.J. Harward, 1992. Nested-hierarchical scene models and image segmentation, International Journal of Remote Sensing, 13:3167-3187.
Zhu, C., and X. Yang, 1998. Study of remote sensing image texture analysis
and classification using wavelet, International Journal of Remote Sensing,
13:3167-3187.