Data Mining - Mehmed Kantardzic [58]
The process of inductive learning and estimating the model may be described, formalized, and implemented using different learning methods. A learning method is an algorithm (usually implemented in software) that estimates an unknown mapping (dependency) between a system’s inputs and outputs from the available data set, namely, from known samples. Once such a dependency has been accurately estimated, it can be used to predict the future outputs of the system from the known input values. Learning from data has been traditionally explored in such diverse fields as statistics, engineering, and computer science. Formalization of the learning process and a precise, mathematically correct description of different inductive-learning methods were the primary tasks of disciplines such as SLT and artificial intelligence. In this chapter, we will introduce the basics of these theoretical fundamentals for inductive learning.
4.1 LEARNING MACHINE
Machine learning, as a combination of artificial intelligence and statistics, has proven to be a fruitful area of research, spawning a number of different problems and algorithms for their solution. These algorithms vary in their goals, in the available training data sets, and in the learning strategies and representation of data. All of these algorithms, however, learn by searching through an n-dimensional space of a given data set to find an acceptable generalization. One of the most fundamental machine-learning tasks is inductive machine learning where a generalization is obtained from a set of samples, and it is formalized using different techniques and models.
We can define inductive learning as the process of estimating an unknown input–output dependency or structure of a system, using limited number of observations or measurements of inputs and outputs of the system. In the theory of inductive learning, all data in a learning process are organized, and each instance of input–output pairs we use a simple term known as a sample. The general learning scenario involves three components, represented in Figure 4.2.
1. a generator of random input vectors X,
2. a system that returns an output Y for a given input vector X, and
3. a learning machine that estimates an unknown (input X, output Y″) mapping of the system from the observed (input X, output Y) samples.
Figure 4.2. A learning machine uses observations of the system to form an approximation of its output.
This formulation is very general and describes many practical, inductive-learning problems such as interpolation, regression, classification, clustering, and density estimation. The generator produces a random vector X, which is drawn independently from any distribution. In statistical terminology, this situation is called an observational setting. It differs from the designed-experiment setting, which involves creating a deterministic sampling scheme, optimal for a specific analysis according to experimental design theory. The learning machine has no control over which input values were supplied to the system, and therefore, we are talking about an observational approach in inductive machine-learning systems.
The second component of the inductive-learning model is the system that produces an output value Y for every input vector