Data Mining - Mehmed Kantardzic [253]
The Andrews’s curves technique plots each n-dimensional sample as a curved line. This is an approach similar to a Fourier transformation of a data point. This technique uses the function f(t) in the time domain t to transform the n-dimensional point X = (x1, x2, x3, … , xn) into a continuous plot. The function is usually plotted in the interval −π ≤ t ≤ π. An example of the transforming function f(t) is
One advantage of this visualization is that it can represent many dimensions; the disadvantage, however, is the computational time required to display each n-dimensional point for large data sets.
The class of geometric-projection techniques also includes techniques of exploratory statistics such as principal component analysis (PCA), factor analysis, and multidimensional scaling. Parallel coordinate–visualization technique and radial-visualization technique belong in this category of visualizations, and they are explained in the next sections.
Another class of techniques for visual data mining is the icon-based techniques or iconic-display techniques. The idea is to map each multidimensional data item to an icon. An example is the stick-figure technique. It maps two dimensions to the display dimensions and the remaining dimensions are mapped to the angles and/or limb lengths of the stick-figure icon. This technique limits the number of dimensions that can be visualized. A variety of special symbols have been invented to convey simultaneously the variations on several dimensions for the same sample. In 2-D displays, these include Chernoff’s faces, glyphs, stars, and color mapping. Glyphs represent samples as complex symbols whose features are functions of data. We think of glyphs as location-independent representations of samples. For a successful use of glyphs, however, some sort of suggestive layout is often essential, because comparison of glyph shapes is what this type of rendering primarily does. If glyphs are used to enhance a scatter plot, the scatter plot takes over the layout functions. Figure 15.2 shows how the other icon-based technique, called a star display, is applied to quality of life measures for various states. Seven dimensions represent seven equidistant radiuses for a circle: one circle for each sample. Every dimension is normalized on interval [0, 1], where the value 0 is in the center of the circle and the value 1 is at the end of the corresponding radius. This representation is convenient for a relatively large number of dimensions but for a very small number of samples. It is usually used for comparative analyses of samples, and it may be included as a part of more complex visualizations.
Figure 15.2. A star display for data on seven quality-of-life measures for three states.
The other approach is an icon-based, shape-coding technique that visualizes an arbitrary number of dimensions. The icon used in this approach maps each dimension to a small array of pixels and arranges the pixel arrays of each data item into a square or a rectangle. The pixels corresponding to each of the dimensions are mapped to a gray scale or color according to the dimension’s data value. The small squares or rectangles corresponding to the data items or samples are then arranged successively in a line-by-line fashion.
The third class of visualization techniques for multidimensional data aims to map each data value to a colored pixel and present the data values belonging to each attribute in separate windows. Since the pixel-oriented techniques use only one pixel per data value, the techniques allow a visualization of the largest amount of data that are possible on current displays (up to about 1,000,000 data values). If one pixel represents one data value, the main question is how to arrange the pixels on the screen. These techniques