Online Book Reader

Home Category

Data Mining - Mehmed Kantardzic [30]

By Root 652 0
validate these choices. MA weight all time points equally in the average. Typical examples are MA in the stock market, such as 200 days MA for the DOW or NASDAQ. The objective is to smooth neighboring time points by an MA to reduce the random variation and noise components

Another type of average is an exponential moving average (EMA) that gives more weight to the most recent time periods. It is described recursively as

where p is a value between 0 and 1. For example, if p = 0.5, the most recent value t(i) is equally weighted with the computation for all previous values in the window, where the computation begins with averaging the first two values in the series. The computation starts with the following two equations:

As usual, application knowledge or empirical validation determines the value of p. The exponential MA has performed very well for many business-related applications, usually producing results superior to the MA.

An MA summarizes the recent past, but spotting a change in the trend of the data may additionally improve forecasting performances. Characteristics of a trend can be measured by composing features that compare recent measurements with those of the more distant past. Three simple comparative features are

1. t(i) − MA(i, m), the difference between the current value and an MA,

2. MA(i, m) − MA(i − k, m), the difference between two MAs, usually of the same window size, and

3. t(i)/MA(i, m), the ratio between the current value and an MA, which may be preferable for some applications.

In general, the main components of the summarizing features for a time series are

1. current values,

2. smoothed values using MA, and

3. derived trends, differences, and ratios.

The immediate extension of a univariate time series is to a multivariate one. Instead of having a single measured value at time i, t(i), multiple measurements t[a(i), b(j)] are taken at the same time. There are no extra steps in data preparation for the multivariate time series. Each series can be transformed into features, and the values of the features at each distinct time A(i) are merged into a sample i. The resultant transformations yield a standard tabular form of data such as the table given in Figure 2.5.

Figure 2.5. Tabulation of time-dependent features. (a) Initial time-dependent data; (b) samples prepared for data mining with time window = 3.

While some data-mining problems are characterized by a single time series, hybrid applications are more frequent in real-world problems, having both time series and features that are not dependent on time. In these cases, standard procedures for time-dependent transformation and summarization of attributes are performed. High dimensions of data generated during these transformations can be reduced through the next phase of a data-mining process: data reduction.

Some data sets do not include a time component explicitly, but the entire analysis is performed in the time domain (typically based on several dates that are attributes of described entities). One very important class of data belonging to this type is survival data. Survival data are data concerning how long it takes for a particular event to happen. In many medical applications, the event is the death of a patient, and therefore we analyze the patient’s survival time. In industrial applications, the event is often the failure of a component in a machine. Thus, the output in these sorts of problems is the survival time. The inputs are the patient’s records in medical applications and characteristics of the machine component in industrial applications. There are two main characteristics of survival data that make them different from the data in other data-mining problems. The first characteristic is called censoring. In many studies, the event has not happened by the end of the study period. So, for some patients in a medical trial, we may know that the patient was still alive after 5 years, but do not know when the patient died. This sort of observation would be called a censored observation. If the output is censored,

Return Main Page Previous Page Next Page

®Online Book Reader