Online Book Reader

Home Category

Data Mining - Mehmed Kantardzic [196]

By Root 719 0
α values from 3 to 10.

Figure 12.17. SAX look-up table. (a) Break points in the Gaussian curve (b = 0,c2 = 0.2) when α = 3; (b) SAX look-up table.

Once the break points are defined along with the corresponding coding symbols for each interval, the sequence is discretized as follows:

1. Obtain the PAA averages of the time series.

2. All PAA averages in the given interval (, ) are coded with a specific symbol for this interval. For example, if α = 3, all PAA averages less than the smallest break point (−0.43) are mapped to “a.” All PAA coefficients less than the second break point but greater than first breakpoint (−0.43, 0.43) are mapped to “b,” and all average values larger than the second break point (0.43) are mapped to “c.” This process is illustrated in Figure 12.18.

Figure 12.18. Transformation of PAA to SAX.

It is assumed that the symbols “a,” “b,” and “c” are approximately equiprobable symbols in the representation of a time series. The original sequence is then represented as a concatenation of these symbols, which is known as a “word.” For example, the mapping from PAA (C′) to a word C″ is represented as C″ = (bcccccbaaaaabbccccbb). The main advantage of the SAX method is that 100 different discrete numerical values in an initial discrete time series C is first reduced to 20 different (average) values using PAA, and then they are transformed into only three different categorical values using SAX.

The proposed approach is intuitive and simple, yet a powerful methodology in a simplified representation of a large number of different values in time series. The method is fast to compute and supports different distance measures. Therefore, it is applicable as a data-reduction step for different data-mining techniques.

Transformation-Based Representations

The main idea behind transformation-based techniques is to map the original data into a point of a more manageable domain. One of the widely used methods is the Discrete Fourier Transformation (DFT). It transforms a sequence in the time domain to a point in the frequency domain. This is done by selecting the top-K frequencies and representing each sequence as a point in the K-dimensional space. One important property that is worth noting is that Fourier coefficients do not change under the shift operation. One problem with DFT is that it misses the important feature of time localization. To avoid this problem Piecewise Fourier Transform was proposed, but the size of the pieces introduced new problems. Large pieces reduce the power of multi-resolution, while modeling of low frequencies with small pieces does not always give expected representations. The Discrete Wavelet Transformation (DWT) has been introduced to overcome the difficulties in DFT. The DWT transformation technique, analogously to fast Fourier transformation, turns a discrete vector of function values with the length N into a vector of N wavelet coefficients. Wavelet transformation is a linear operation and it is usually implemented as a recursion. The advantage of using DWT is its ability for multi-resolution representation of signals. It has the time-frequency localization property. Therefore, signals represented by wavelet transformations bear more information than that of DFT.

In some applications we need to retrieve objects from a database of certain shapes. Trends can often be reflected by specifying shapes of interest such as steep peaks, or upward and downward changes. For example, in a stock-market database we may want to retrieve stocks whose closing price contains a head-and-shoulders pattern, and we should be able to represent and recognize this shape. Pattern discovery can be driven by a template-based mining language in which the analyst specifies the shape that should be looked for. Shape Definition Language (SDL) was proposed to translate the initial sequence with real-valued elements occurring in historical data into a sequence of symbols from a given alphabet. SDL is capable of describing a variety of queries about the shapes found in the database. It allows the user to create

Return Main Page Previous Page Next Page

®Online Book Reader