Data Mining - Mehmed Kantardzic [212]
Any joint distribution can be represented by a corresponding graphical model. It is the absence of links in the graph that conveys interesting information about the properties of the class of distributions that the graph represents. We can interpret such models as expressing the processes by which the observed data arose, and in many situations we may draw conclusions about new samples from a given probability distribution. The directed graphs that we are considering are subject to an important restriction, that is, that there must be no directed cycles. In other words, there are no closed paths within the graph such that we can move from node to node along links following the direction of the arrows and end up back at the starting node. Such graphs are also called Directed Acyclic Graphs (DAGs).
An important concept for probability distributions over multiple variables is that of conditional independence. Consider three variables a, b, and c, and suppose that the conditional distribution of a, given b and c, is such that it does not depend on the value of b, so that
We say that a is conditionally independent of b given c. This can be extended in a slightly different way if we consider the joint distribution of a and b conditioned on c, which we can write in the form
The joint distribution of a and b, conditioned on c, may be factorized into the product of the marginal distribution of a and the marginal distribution of b (again both conditioned on c). This says that the variables a and b are statistically independent, given c. This independence may be presented in a graphical form in Figure 12.34a. The other typical joint distributions may be graphically interpreted. The distribution for Figure 12.34b represents the case
Figure 12.34. Joint probability distributions show different dependencies between variables a, b, and c.
while for Figure 12.34c the probability p(c| a, b) is under the assumption that variables a and b are independent p(a, b) = p(a) p(b).
In general, graphical models may capture the causal processes by which the observed data were generated. For this reason, such models are often called generative models. We could make previous models in Figure 12.33 generative by introducing a suitable prior distribution p(x) for all input variables (these are variables—nodes without input links). For the case in Figure 12.33a this is a variable a, and for the case in Figure 12.33b these are variables: x1, x2, and x3. In practice, producing synthetic observations from a generative model can prove informative in understanding the form of the probability distribution represented by that model.
This preliminary analysis about joint probability distributions brings us to the concept of Bayesian networks (BN). BN are also called belief networks or probabilistic networks in the literature. The nodes in a BN represent variables of interest (e.g., the temperature of a device, the gender of a patient, the price of a product, the occurrence of an event), and the links represent dependencies among the variables. Each node has states, or a set of probable values for each variable. For example, the weather could be cloudy or sunny, an enemy battalion could be near or far, symptoms of a disease are present or not present, and the garbage disposal is working or not working. Nodes are connected with an arrow to show causality and also indicate the direction of influence. These arrows are called edges. The dependencies are quantified by conditional probabilities for each node given its parents in the network. Figure 12.35 presents some BN architectures, initially without probabilities distributions. In general, we can formally describe a BN as a graph in which the following holds:
1. A set of random variables makes up the nodes of the network.
2. A set of directed links connects pairs of nodes. The intuitive meaning of an arrow from node X to node Y is that