Online Book Reader

Home Category

Data Mining_ Concepts and Techniques - Jiawei Han [353]

By Root 1657 0
curves such as the dashed curve indicated in Figure 13.2.

2. Cyclic movements: These are the long-term oscillations about a trend line or curve.

3. Seasonal variations: These are nearly identical patterns that a time series appears to follow during corresponding seasons of successive years such as holiday shopping seasons. For effective trend analysis, the data often need to be “deseasonalized” based on a seasonal index computed by autocorrelation.

4. Random movements: These characterize sporadic changes due to chance events such as labor disputes or announced personnel changes within companies.

Figure 13.2 Time-series data for the stock price of AllElectronics over time. The trend is shown with a dashed curve, calculated by a moving average.

Trend analysis can also be used for time-series forecasting, that is, finding a mathematical function that will approximately generate the historic patterns in a time series, and using it to make long-term or short-term predictions of future values. ARIMA (auto-regressive integrated moving average), long-memory time-series modeling, and autoregression are popular methods for such analysis.

Sequential Pattern Mining in Symbolic Sequences

A symbolic sequence consists of an ordered set of elements or events, recorded with or without a concrete notion of time. There are many applications involving data of symbolic sequences such as customer shopping sequences, web click streams, program execution sequences, biological sequences, and sequences of events in science and engineering and in natural and social developments. Because biological sequences carry very complicated semantic meaning and pose many challenging research issues, most investigations are conducted in the field of bioinformatics.

Sequential pattern mining has focused extensively on mining symbolic sequences. A sequential pattern is a frequent subsequence existing in a single sequence or a set of sequences. A sequence is a subsequence of another sequence if there exist integers such that , . For example, if and , where a, b, c, d, and e are items, then is a subsequence of . Mining of sequential patterns consists of mining the set of subsequences that are frequent in one sequence or a set of sequences. Many scalable algorithms have been developed as a result of extensive studies in this area. Alternatively, we can mine only the set of closed sequential patterns, where a sequential pattern s is closed if there exists no sequential pattern , where s is a proper subsequence of , and has the same (frequency) support as s. Similar to its frequent pattern mining counterpart, there are also studies on efficient mining of multidimensional, multilevel sequential patterns.

As with constraint-based frequent pattern mining, user-specified constraints can be used to reduce the search space in sequential pattern mining and derive only the patterns that are of interest to the user. This is referred to as constraint-based sequential pattern mining. Moreover, we may relax constraints or enforce additional constraints on the problem of sequential pattern mining to derive different kinds of patterns from sequence data. For example, we can enforce gap constraints so that the patterns derived contain only consecutive subsequences or subsequences with very small gaps. Alternatively, we may derive periodic sequential patterns by folding events into proper-size windows and finding recurring subsequences in these windows. Another approach derives partial order patterns by relaxing the requirement of strict sequential ordering in the mining of subsequence patterns. Besides mining partial order patterns, sequential pattern mining methodology can also be extended to mining trees, lattices, episodes, and some other ordered patterns.

Sequence Classification

Most classification methods perform model construction based on feature vectors. However, sequences do not have explicit features. Even with sophisticated feature selection techniques, the dimensionality of potential features can still be very high and the sequential nature of features

Return Main Page Previous Page Next Page

®Online Book Reader