Data Mining_ Concepts and Techniques - Jiawei Han [429]
quality control600
regression599
survival analysis600
statistical databases (SDBs)148
OLAP systems versus148–149
statistical descriptions24, 79
graphic displays44–45, 51–56
measuring the dispersion48–51
statistical hypothesis test24
statistical models23–24
of networks592–594
statistical outlier detection methods552, 553–560, 581
computational cost of560
for data analysis625
effectiveness552
example552
nonparametric553, 558–560
parametric553–558 see alsooutlier detection
statistical theory, in exceptional behavior disclosure291
statistics23
inferential24
predictive24
StatSoft602, 603
stepwise backward elimination105
stepwise forward selection105
stick figure visualization61–63
STING479–481
advantages480–481
as density-based clustering method480
hierarchical structure479, 480
multiresolution approach481 see alsocluster analysis; grid-based methods
stratified cross-validation371
stratified samples109–110
stream data598, 624
strong association rules272
interestingness and264–265
misleading265
Structural Clustering Algorithm for Networks (SCAN)531–532
structural context-based similarity526
structural data analysis319
structural patterns282
structure similarity search592
structures
as contexts575
discovery of318
indexing319
substructures243
Student' t-test372
subcube queries216, 217–218
sub-itemset pruning263
subjective interestingness measures22
subject-oriented data warehouses126
subsequence589
matching587
subset checking263–264
subset testing250
subspace clustering448
frequent patterns for318–319
subspace clustering methods509, 510–511, 538
biclustering511
correlation-based511
examples538
subspace search methods510–511
subspaces
bottom-up search510–511
cube space228–229
outliers in578–579
top-down search511
substitution matrices590
substructures243
sum of the squared error (SSE)501
summary fact tables165
superset checking263
supervised learning24, 330
supervised outlier detection549–550
challenges550
support21
association rule21
group-based286
reduced285, 286
uniform285–286
support, rule245, 246
support vector machines (SVMs)393, 408–415, 437
interest in408
maximum marginal hyperplane409, 412
nonlinear413–415
for numeric prediction408
with sigmoid kernel415
support vectors411
for test tuples412–413
training/testing speed improvement415
support vectors411, 437
illustrated411
SVM finding412
supremum distance73–74
surface web597
survival analysis600
SVMs. seesupport vector machines
symbolic sequences586, 588
applications589
sequential pattern mining in588–589
symmetric binary dissimilarity70
synchronous generalization175
T
tables9
attributes9
contingency95
dimension136
fact165
tuples9
tag clouds64, 66
Tanimoto coefficient78
target classes15, 180
initial working relations177
prime relation175, 177
targeted marketing609
taxonomy formation20
technologies23–27, 33, 34
telecommunications industry611
temporal data14
term-frequency vectors77
cosine similarity between78
sparse77
table77
terminating conditions404
test sets330
test tuples330
text data14
text mining596–597, 624
theoretical foundations600–601, 625
three-layer neural networks399
threshold-moving approach385
tilted time windows598
timeliness, data85
time-series data586, 587
cyclic movements588
discretization and590
illustrated588
random movements588
regression analysis587–588
seasonal variations588
shapelets method590
subsequence matching587
transformation into aggregate approximations587
trend analysis588
trend or long-term movements588
time-series data analysis319
time-series forecasting588
time-variant data warehouses127
top-down design approach133, 151
top-down subspace search511
top-down view151
topic model26–27
top-k patterns/rules281
top-k queries225
example225–226
ranking cubes to answer226–227
results225
user-specified preference components225
top-k