Data Mining_ Concepts and Techniques - Jiawei Han [183]
■ Knowledge type constraints: These specify the type of knowledge to be mined, such as association, correlation, classification, or clustering.
■ Data constraints: These specify the set of task-relevant data.
■ Dimension/level constraints: These specify the desired dimensions (or attributes) of the data, the abstraction levels, or the level of the concept hierarchies to be used in mining.
■ Interestingness constraints: These specify thresholds on statistical measures of rule interestingness such as support, confidence, and correlation.
■ Rule constraints: These specify the form of, or conditions on, the rules to be mined. Such constraints may be expressed as metarules (rule templates), as the maximum or minimum number of predicates that can occur in the rule antecedent or consequent, or as relationships among attributes, attribute values, and/or aggregates.
These constraints can be specified using a high-level declarative data mining query language and user interface.
The first four constraint types have already been addressed in earlier sections of this book and this chapter. In this section, we discuss the use of rule constraints to focus the mining task. This form of constraint-based mining allows users to describe the rules that they would like to uncover, thereby making the data mining process more effective. In addition, a sophisticated mining query optimizer can be used to exploit the constraints specified by the user, thereby making the mining process more efficient.
Constraint-based mining encourages interactive exploratory mining and analysis. In Section 7.3.1, you will study metarule-guided mining, where syntactic rule constraints are specified in the form of rule templates. Section 7.3.2 discusses the use of pattern space pruning (which prunes patterns being mined) and data space pruning (which prunes pieces of the data space for which further exploration cannot contribute to the discovery of patterns satisfying the constraints).
For pattern space pruning, we introduce three classes of properties that facilitate constraint-based search space pruning: antimonotonicity, monotonicity, and succinctness. We also discuss a special class of constraints, called convertible constraints, where by proper data ordering, the constraints can be pushed deep into the iterative mining process and have the same pruning power as monotonic or antimonotonic constraints. For data space pruning, we introduce two classes of properties—data succinctness and data antimonotonicty —and study how they can be integrated within a data mining process.
For ease of discussion, we assume that the user is searching for association rules. The procedures presented can be easily extended to the mining of correlation rules by adding a correlation measure of interestingness to the support-confidence framework.
7.3.1. Metarule-Guided Mining of Association Rules
“How are metarules useful?” Metarules allow users to specify the syntactic form of rules that they are interested in mining. The rule forms can be used as constraints to help improve the efficiency of the mining process. Metarules may be based on the analyst's experience, expectations, or intuition regarding the data or may be automatically generated based on the database schema.
Metarule-guided mining
Suppose that as a market analyst for AllElectronics you have access to the data describing customers (e.g., customer age, address, and credit rating) as well as the list of customer transactions. You are interested in finding associations between customer traits and the items that customers buy. However, rather than