Online Book Reader

Home Category

Data Mining - Mehmed Kantardzic [203]

By Root 867 0
of a house, while (b) emphasizes a significant trend that can be observed for the city of Munich, where the average rent decreases quite regularly when moving away from the city. One of the main reasons for developing a large number of spatial data-mining applications is the enormous amount of special data that are collected recently at a relatively low price. High spatial and spectral resolution remote-sensing systems and other environmental monitoring devices gather vast amounts of geo-referenced digital imagery, video, and sound. The complexity of spatial data and intrinsic spatial relationships limits the usefulness of conventional data-mining techniques for extracting spatial patterns.

Figure 12.25. Illustrative examples of spatial data-mining results. (a) Example of collocation spatial data mining

(Shekhar and Chawla, 2003);

(b) average rent for the communities of Bavaria

(Ester et al., 1997).

Figure 12.26. Main differences between traditional data mining and spatial data mining.

One of the fundamental assumptions of data-mining analysis is that the data samples are independently generated. However, in the analysis of spatial data, the assumption about the independence of samples is generally false. In fact, spatial data tends to be highly self-correlated. Extracting interesting and useful patterns from spatial data sets is more difficult than extracting corresponding patterns from traditional numeric and categorical data due to the complexity of spatial data types, spatial relationships, and spatial autocorrelation. The spatial attributes of a spatial object most often include information related to spatial locations, for example, longitude, latitude and elevation, as well as shape. Relationships among nonspatial objects are explicit in data inputs, for example, arithmetic relation, ordering, is an instance of, subclass of, and membership of. In contrast, relationships among spatial objects are often implicit, such as overlap, intersect, close, and behind. Proximity can be defined in highly general terms, including distance, direction and/or topology. Also, spatial heterogeneity or the nonstationarity of the observed variables with respect to location is often evident since many space processes are local. Omitting the fact that nearby items tend to be more similar than items situated apart causes inconsistent results in the spatial data analysis. In summary, specific features of spatial data that preclude the use of general-purpose data-mining algorithms are: (1) rich data types (e.g., extended spatial objects), (2) implicit spatial relationships among the variables, (3) observations that are not independent, and (4) spatial autocorrelation among the features (Fig. 12.26).

One possible way to deal with implicit spatial relationships is to materialize the relationships into traditional data input columns and then apply classical data-mining techniques. However, this approach can result in loss of information. Another way to capture implicit spatial relationships is to develop models or techniques to incorporate spatial information into the spatial data-mining process. A concept within statistics devoted to the analysis of spatial relations is called spatial autocorrelation. Knowledge-discovery techniques, which ignore spatial autocorrelation, typically perform poorly in the presence of spatial data.

The spatial relationship among locations in a spatial framework is often modeled via a contiguity matrix. A simple contiguity matrix may represent a neighborhood relationship defined using adjacency. Figure 12.27a shows a gridded spatial framework with four locations, A, B, C, and D. A binary matrix representation of a four-neighborhood relationship is shown in Figure 12.27b. The row-normalized representation of this matrix is called a contiguity matrix, as shown in Figure 12.27c. The essential idea is to specify the pairs of locations that influence each other along with the relative intensity of interaction.

Figure 12.27. Spatial framework and its four-neighborhood contiguity matrix.

SDM consists of extracting

Return Main Page Previous Page Next Page

®Online Book Reader