Data Mining - Mehmed Kantardzic [199]
Figure 12.20. Finite-state machine. (a) State-transition table; (b) state-transition diagram.
Markov Model (MM) extends the basic idea behind FSM. Both FSM and MM are directed graphs. As with FSM, MM always has a current state. Start and end nodes are drawn for illustrative purposes and need not be present. Unlike in FSM, transitions are not associated with specific input values. Arcs carry a probability value for transition from one state to another. For example, the probability that transition from state “Start” to “S1” is 0.4 and the probability of staying in the “Start” state is 0.6. The sum of the probability values coming out of each node should be 1. MM shows only transitions with probability greater than 0. If a transition is not shown, it is assumed to have a probability of 0. The probabilities are combined to determine the final probability of the pattern produced by the MM. For example, with the MM shown in Figure 12.21, the probability that the MM takes the horizontal path from starting node to S2 is 0.4 × 0.7 = 0.28.
Figure 12.21. A simple Markov Model.
MM is derived based on the memoryless assumption. It states that given the current state of the system, the future evolution of the system is independent of its history. MMs have been used widely in speech recognition and natural language processing.
Hidden Markov Model (HMM) is an extension to MM. Similar to MM, HMM consists of a set of states and transition probabilities. In a regular MM, the states are visible to the observer, and the state-transition probabilities are the only parameters. In HMM, each state is associated with a state-probability distribution. For example, assume that we were given a sequence of events in a coin toss: O = (HTTHTHH), where H = Head and T = Tail. But additional information is necessary. What is not given is the sequence generated with one or two coins. According to the above definitions, Figure 12.22 shows two possible models. Figure 12.22a assumes that only one coin was tossed. We can model this system as an MM with a two-state model, where Head and Tail are the two states with the same initial probabilities. The probability of the sequence O is P(O) = 0.5 × 0.7 × 0.3 × 0.7 × 0.3 × 0.7 × 0.7 = 0.0108.
Figure 12.22. Markov Model versus hidden Markov Model. (a) One-coin model; (b) two-coin model.
Another possibility for explaining the observed sequence is shown in Figure 12.22b. There are again two states in this model, and each state corresponds to a separate biased coin being tossed. Each state has its own probability distribution of Heads and Tails, and therefore the model is represented as an HMM. Obviously, in this model we have several “paths” to determine the probability of the sequence. In other words, we can start with tossing one or another coin, and continue with this selection. In all these cases, composite probability will be different. In this situation, we may search for the maximum probability of the sequence O in the HMM. HMM may be formalized as a directed graph with V vertices and A arcs. Set V = {v1, v2, … , vn} represents states, and matrix A = {aij} represents transition-probability distribution, where aij is the transitional probability from state i to state j. Given a set of possible observations O = {o1, o2, … , om} for each state vi, the probability of seeing each observation in the sequence is given by Bi = {oi1, oi2, … , oim}. The initial-state distribution is represented as σ, which determines the starting state at time t = 0.
12.2.4 Mining Sequences
Temporal data-mining tasks include prediction, classification, clustering,