Data Mining_ Concepts and Techniques - Jiawei Han [426]
holdout method370, 386
random sampling370, 386
recursive335
tuples334
Partitioning Around Medoids (PAM) algorithm455–457
partitioning methods448, 451–457, 491
centroid-based451–454
global optimality449
iterative relocation techniques448
k-means451–454
k-medoids454–457
k-modes454
object-based454–457 see alsocluster analysis
path-based similarity594
pattern analysis, in recommender systems282
pattern clustering308–310
pattern constraints297–300
pattern discovery601
pattern evaluation8
pattern evaluation measures267–271
all_confidence268
comparison269–270
cosine268
Kulczynski268
max_confidence268
null-invariant270–271 see alsomeasures
pattern space pruning295
pattern-based classification282, 318
pattern-based clustering282, 516
Pattern-Fusion302–307
characteristics304
core pattern304–305
initial pool306
iterative306
merging subpatterns306
shortcuts identification304 see alsocolossal patterns
pattern-guided mining30
patterns
actionable22
co-location319
colossal301–307, 320
combined significance312
constraint-based generation296–301
context modeling of314–315
core304–305
distance309
evaluation methods264–271
expected22
expressed309
frequent17
hidden meaning of314
interesting21–23, 33
metric space306–307
negative280, 291–294, 320
negatively correlated292, 293
rare280, 291–294, 320
redundancy between312
relative significance312
representative309
search space303
strongly negatively correlated292
structural282
type specification15–23
unexpected22 see alsofrequent patterns
pattern-trees264
Pearson' correlation coefficient222
percentiles48
perception-based classification (PBC)348
illustrated349
as interactive visual approach607
pixel-oriented approach348–349
split screen349
tree comparison350
phylogenetic trees590
pivot (rotate) operation148
pixel-oriented visualization57
planning and analysis tools153
point queries216, 217, 220
pool-based approach433
positive correlation55, 56
positive tuples364
positively skewed data47
possibility theory428
posterior probability351
postpruning344–345, 346
power law distribution592
precision measure368–369
predicate sets
frequent288–289
k289
predicates
repeated288
variables295
prediction19
classification328
link593–594
loan payment608–609
with naive Bayesian classification353–355
numeric328, 385
prediction cubes227–230, 235
example228–229
Probability-Based Ensemble229–230
predictive analysis18–19
predictive mining tasks15
predictive statistics24
predictors328
prepruning344, 346
prime relations
contrasting classes175, 177
deriving174
target classes175, 177
principle components analysis (PCA)100, 102–103
application of103
correlation-based clustering with511
illustrated103
in lower-dimensional space extraction578
procedure102–103
prior probability351
privacy-preserving data mining33, 621, 626
distributed622
k-anonymity method621–622
l-diversity method622
as mining trend624–625
randomization methods621
results effectiveness, downgrading622
probabilistic clusters502–503
probabilistic hierarchical clustering467–470
agglomerative clustering framework467, 469
algorithm470
drawbacks of using469–470
generative model467–469
interpretability469
understanding469 see alsohierarchical methods
probabilistic model-based clustering497–508, 538
expectation-maximization algorithm505–508
fuzzy clusters and499–501
product reviews example498
user search intent example498 see alsocluster analysis
probability
estimation techniques355
posterior351
prior351
probability and statistical theory601
Probability-Based Ensemble (PBE)229–230
PROCLUS511
profiles614
proximity measures67
for binary attributes70–72
for nominal attributes68–70
for ordinal attributes74–75
proximity-based methods552, 560–567, 581
density-based564–567
distance-based561–562
effectiveness552
example552