Data Mining_ Concepts and Techniques - Jiawei Han [423]
item skipping263
items13
itemsets246
candidate251, 252
dependent266
dynamic counting256
imbalance ratio (IR)270, 271
negatively correlated292
occurrence independence266
strongly negatively correlated292 see alsofrequent itemsets
iterative Pattern-Fusion306
iterative relocation techniques448
J
Jaccard coefficient71
join indexing161–163, 179
K
k-anonymity method621–622
Karush-Kuhn-Tucker (KKT) conditions412
k-distance neighborhoods565
kernel density estimation477–478
kernel function415
k-fold cross-validation370–371
k-means451–454
algorithm452
application of454
CLARANS457
within-cluster variation451, 452
clustering by453
drawback of454–455
functioning of452
scalability454
time complexity453
variants453–454
k-means clustering536
k-medoids454–457
absolute-error criterion455
cost function for456
PAM455–457
k-nearest-neighbor classification423
closeness423
distance-based comparisons425
editing method425
missing values and424
number of neighbors424–425
partial distance method425
speed425
knowledge
background30–31
mining29
presentation8
representation33
transfer434
knowledge bases5, 8
knowledge discovery
data mining in7
process8
knowledge discovery from data (KDD)6
knowledge extraction. seedata mining
knowledge mining. seedata mining
knowledge type constraints294
k-predicate sets289
Kulczynski measure268, 272
negatively correlated pattern based on293–294
L
language model26
Laplacian correction355
lattice of cuboids139, 156, 179, 188–189, 234
lazy learners393, 422–426, 437
case-based reasoning classifiers425–426
k-nearest-neighbor classifiers423–425
l-diversity method622
learning
active430, 433–434, 437
backpropagation400
as classification step328
connectionist398
by examples445
by observation445
rate397
semi-supervised572
supervised330
transfer430, 434–436, 438
unsupervised330, 445, 490
learning rates403–404
leave-one-out371
lift266, 272
correlation analysis with266–267
likelihood ratio statistic363
linear regression90, 105
multiple106
linearly412–413
linearly inseparable data413–415
link mining594
link prediction594
load, in back-end tools/utilities134
loan payment prediction608–609
local outlier factor566–567
local proximity-based outliers564–565
logistic function402
log-linear models106
lossless compression100
lossy compression100
lower approximation427
M
machine learning24–26
active25
data mining similarities26
semi-supervised25
supervised24
unsupervised25
Mahalanobis distance556
majority voting335
Manhattan distance72–73
MaPle519
margin410
market basket analysis244–246, 271–272
example244
illustrated244
Markov chains591
materialization
full159, 179, 234
iceberg cubes319
no159
partial159–160, 192, 234
semi-offline226
max patterns280
max_confidence measure268, 272
maximal frequent itemsets247, 308
example248
mining262–264
shortcomings for compression308–309
maximum marginal hyperplane (MMH)409
SVM finding412
maximum normed residual test555
mean39, 45
bin, smoothing by89
example45
for missing values88
trimmed46
weighted arithmetic45
measures145
accuracy-based369
algebraic145
all_confidence272
antimonotonic194
attribute selection331
categories of145
of central tendency39, 44, 45–47
correlation266
data cube145
dispersion48–51
distance72–74, 461–462
distributive145
holistic145
Kulczynski272
max_confidence272
of multidimensional databases146
null-invariant272
pattern evaluation267–271
precision368–369
proximity67, 68–72
recall368–369
sensitivity367
significance312
similarity/dissimilarity65–78
specificity367
median39, 46
bin, smoothing by89
example46
formula46–47
for missing values88
metadata92, 134, 178
business135
importance135
operational135
repositories134–135
metarule-guided mining
of association rules295–296
example295–296
metrics73
classification