Data Mining_ Concepts and Techniques - Jiawei Han [425]
nonvolatile data warehouses127
normalization112, 120
data transformation by113–115
by decimal scaling115
min-max114
z-score114–115
null rules92
null-invariant measures270–271, 272
null-transactions270, 272
number of270
problem292–293
numeric attributes43–44, 79
covariance analysis98
interval-scaled43, 79
ratio-scaled43–44, 79
numeric data, dissimilarity on72–74
numeric prediction328, 385
classification328
support vector machines (SVMs) for408
numerosity reduction86, 100, 120
techniques100
O
object matching94
objective interestingness measures21–22
one-class model571–572
one-pass cube computation198
one-versus-all (OVA)430
online analytical mining (OLAM)155, 227
online analytical processing (OLAP)4, 33, 128, 179
access patterns129
data contents128
database design129
dice operation148
drill-across operation148
drill-down operation11, 135–136, 146
drill-through operation148
example operations147
functionalities of154
hybrid OLAP164–165, 179
indexing125, 160–163
in information networks594
in knowledge discovery process125
market orientation128
multidimensional (MOLAP)132, 164, 179
OLTP versus128–129, 130
operation integration125
operations146–148
pivot (rotate) operation148
queries129, 130, 163–164
query processing125, 163–164
relational OLAP132, 164, 165, 179
roll-up operation11, 135–136, 146
sample data effectiveness219
server architectures164–165
servers132
slice operation148
spatial595
statistical databases versus148–149
user-control versus automation167
view129
online transaction processing (OLTP)128
access patterns129
customer orientation128
data contents128
database design129
OLAP versus128–129, 130
view129
operational metadata135
OPTICS473–476
cluster ordering474–475, 477
core-distance475
density estimation477
reachability-distance475
structure476
terminology476 see alsocluster analysis; density-based methods
ordered attributes103
ordering
class-based358
dimensions210
rule357
ordinal attributes42, 79
dissimilarity between75
example42
proximity measures74–75
outlier analysis20–21
clustering-based techniques66
example21
in noisy data90
spatial595
outlier detection543–584
angle-based (ABOD)580
application-specific548–549
categories of581
CELL method562–563
challenges548–549
clustering analysis and543
clustering for445
clustering-based methods552–553, 560–567
collective548, 575–576
contextual546–547, 573–575
distance-based561–562
extending577–578
global545
handling noise in549
in high-dimensional data576–580, 582
with histograms558–560
intrusion detection569–570
methods549–553
mixture of parametric distributions556–558
multivariate556
novelty detection relationship545
proximity-based methods552, 560–567, 581
semi-supervised methods551
statistical methods552, 553–560, 581
supervised methods549–550
understandability549
univariate554
unsupervised methods550
outlier subgraphs576
outliers
angle-based20, 543, 544, 580
collective547–548, 581
contextual545–547, 573, 581
density-based564
distance-based561
example544
global545, 581
high-dimensional, modeling579–580
identifying49
interpretation of577
local proximity-based564–565
modeling548
in small clusters571
types of545–548, 581
visualization with boxplot555
oversampling384, 386
example384–385
P
pairwise alignment590
pairwise comparison372
PAM. seePartitioning Around Medoids algorithm
parallel and distributed data-intensive mining algorithms31
parallel coordinates59, 62
parametric data reduction105–106
parametric statistical methods553–558
Pareto distribution592
partial distance method425
partial materialization159–160, 179, 234
strategies192
partition matrix538
partitioning
algorithms451–457
in Apriori efficiency255–256
bootstrapping371, 386
criteria447
cross-validation370–371, 386
Gini