Data Mining_ Concepts and Techniques - Jiawei Han [420]
five-number summary49
distributive measures145
Divisive Analysis (DIANA)459, 460
divisive hierarchical method459
agglomerative hierarchical clustering versus459–460
DIANA459, 460
DNA chips512
document classification430
documents
language model26
topic model26–27
drill-across operation148
drill-down operation11, 146–147
drill-through operation148
dynamic itemset counting256
E
eager learners423, 437
Eclat (Equivalence Class Transformation) algorithm260, 272
e-commerce609
editing method425
efficiency
Apriori algorithm255–256
backpropagation404
data mining algorithms31
elbow method486
email spam filtering435
engineering applications613
ensemble methods378–379, 386
bagging379–380
boosting380–382
for class imbalance problem385
random forests382–383
types of378, 386
enterprise warehouses132
entity identification problem94
entity-relationship (ER) data model9, 139
epoch updating404
equal-frequency histograms107, 116
equal-width histograms107, 116
equivalence classes427
error rates367
error-correcting codes431–432
Euclidean distance72
mathematical properties72–73
weighted74 see alsodistance measures
evaluation metrics364–370
evolution, of database system technology3–5
evolutionary searches579
exception-based, discovery-driven exploration231–234, 235
exceptions231
exhaustive rules358
expectation-maximization (EM) algorithm505–508, 538
expectation step (E-step)505
fuzzy clustering with505–507
maximization step (M-step)505
for mixture models507–508
for probabilistic model-based clustering507–508
steps505 see alsoprobabilistic model-based clustering
expected values97
cell234
exploratory data mining. seemultidimensional data mining
extraction
data134
rule, from decision tree357–359
extraction/transformation/loading (ETL) tools93
extractors151
F
fact constellation141
example141–142
illustrated142
fact tables136
summary165
factor analysis600
facts136
false negatives365
false positives365
farthest-neighbor clustering algorithm462
field overloading92
financial data analysis607–609
credit policy analysis608–609
crimes detection609
data warehouses608
loan payment prediction608–609
targeted marketing609
FindCBLOF algorithm569–570
five-number summary49
fixed-width clustering570
FOIL359, 363, 418
Forest-RC383
forward algorithm591
FP-growth257–259, 272
algorithm illustration260
example257–258
performance259
FP-trees257
condition pattern base258
construction257–258
main memory-based259
mining258, 259
Frag-Shells212, 213
fraudulent analysis610–611
frequency patterns
approximate281, 307–312
compressed281, 307–312
constraint-based281
near-match281
redundancy-aware top-k281
top-k281
frequent itemset mining18, 272, 282
Apriori algorithm248–253
closed patterns262–264
market basket analysis244–246
max patterns262–264
methods248–264
pattern-growth approach257–259
with vertical data format259–262, 272
frequent itemsets243, 246, 272
association rule generation from253, 254
closed247, 248, 262–264, 308
finding247
finding by confined candidate generation248–253
maximal247, 248, 262–264, 308
subsets309
frequent pattern mining279
advanced forms of patterns320
application domain-specific semantics282
applications317–319, 321
approximate patterns307–312
classification criteria280–283
colossal patterns301–307
compressed patterns307–312
constraint-based294–301, 320
data analysis usages282
for data cleaning318
direct discriminative422
high-dimensional data301–307
in high-dimensional space320
in image data analysis319
for indexing structures319
kinds of data and features282
multidimensional associations287–289
in multilevel, multidimensional space283–294
multilevel associations283–294
in multimedia data analysis319
negative patterns291–294
for noise filtering318
Pattern-Fusion302–307
quantitative association rules289–291
rare patterns291