Data Mining_ Concepts and Techniques - Jiawei Han [414]
challenges426
categorical attributes41
CBA. seeClassification Based on Associations
CBLOF. seecluster-based local outlier factor
CELL method562, 563
cells10–11
aggregate189
ancestor189
base189
descendant189
dimensional189
exceptions231
residual value234
central tendency measures39, 44, 45–47
mean45–46
median46–47
midrange47
for missing values88
models47
centroid distance108
CF-trees462–463, 464
nodes465
parameters464
structure illustration464
CHAID343
Chameleon459, 466–467
clustering illustration466
relative closeness467
relative interconnectivity466–467 see alsohierarchical methods
Chernoff faces60
asymmetrical61
illustrated62
ChiMerge117
chi-square test95
chunking195
chunks195
2-D197
3-D197
computation of198
scanning order197
CLARA. seeClustering Large Applications
CLARANS. seeClustering Large Applications based upon Randomized Search
class comparisons166, 175, 180
attribute-oriented induction for175–178
mining176
presentation of175–176
procedure175–176
class conditional independence350
class imbalance problem384–385, 386
ensemble methods for385
on multiclass tasks385
oversampling384–385, 386
threshold-moving approach385
undersampling384–385, 386
class label attributes328
class-based ordering357
class/concept descriptions15
classes15, 166
contrasting15
equivalence427
target15
classification18, 327–328, 385
accuracy330
accuracy improvement techniques377–385
active learning433–434
advanced methods393–442
applications327
associative415, 416–419, 437
automatic445
backpropagation393, 398–408, 437
bagging379–380
basic concepts327–330
Bayes methods350–355
Bayesian belief networks393–397, 436
boosting380–382
case-based reasoning425–426
of class-imbalanced data383–385
confusion matrix365–366, 386
costs and benefits373–374
decision tree induction330–350
discriminative frequent pattern-based437
document430
ensemble methods378–379
evaluation metrics364–370
example19
frequent pattern-based393, 415–422, 437
fuzzy set approaches428–429, 437
general approach to328
genetic algorithms426–427, 437
heterogeneous networks593
homogeneous networks593
IF-THEN rules for355–357
interpretability369
k-nearest-neighbor423–425
lazy learners393, 422–426
learning step328
model representation18
model selection364, 370–377
multiclass430–432, 437
in multimedia data mining596
neural networks for19, 398–408
pattern-based282, 318
perception-based348–350
precision measure368–369
as prediction problem328
process328
process illustration329
random forests382–383
recall measure368–369
robustness369
rough set approach427–428, 437
rule-based355–363, 386
scalability369
semi-supervised432–433, 437
sentiment434
spatial595
speed369
support vector machines (SVMs)393, 408–415, 437
transfer learning434–436
tree pruning344–347, 385
web-document435
Classification Based on Associations (CBA)417
Classification based on Multiple Association Rules (CMAR)417–418
Classification based on Predictive Association Rules (CPAR)418–419
classification-based outlier detection571–573, 582
one-class model571–572
semi-supervised learning572 see alsooutlier detection
classifiers328
accuracy330, 366
bagged379–380
Bayesian350, 353
case-based reasoning425–426
comparing with ROC curves373–377
comparison aspects369
decision tree331
error rate367
k-nearest-neighbor423–425
Naive Bayesian351–352
overfitting data330
performance evaluation metrics364–370
recognition rate366–367
rule-based355
Clementine603, 606
CLIQUE481–483
clustering steps481–482
effectiveness483
strategy481 see alsocluster analysis; grid-based methods
closed data cubes192
closed frequent itemsets247, 308
example248
mining262–264
shortcomings for compression308–309
closed graphs591
closed patterns280
top-k most frequent307