Finish last time’s slides
Kolmogorov-Smirnov Test (not on midterm!)
Kolmogorov-Smirnov Test
Population: some population distribution with c.d.f
Sample: n i.i.d from population,
Parameter: Whole CDF
Null hypothesis: , versus
Kolmogorov-Smirnov Test
Test statistic
where is the empirical cumulative distribution function:
and is the cumulative distribution function for the null hypothesized distribution.
ECDF: Example
Sample values: 1.8, 2.2, 2.7, 5.7, 6.9, 7.4, 8.1, 8.7, 9 and 9.5

KS test statistic: Uniform(0, 10)
Say,
I.e.

KS test statistic: Uniform(0, 10) cont.
(occurs at just less than 6.9)

KS test statistic: Normal(5, 6.25)

KS test statistic: Example cont.
(occurs at just less than 6.9)

Reference Distribution?
where is the Kolmogorov Distribution.
Reject for large values of .
In R
ks.test(x = y, y = punif, min = 0, max = 10)
##
## One-sample Kolmogorov-Smirnov test
##
## data: y
## D = 0.28632, p-value = 0.3209
## alternative hypothesis: two-sided
One sided tests
Lesser alternative:
Test statistic
Greater alternative:
Test statistic
One sided tests are hard to interpret
Example based on simulated data.
n <- 20
y <- rnorm(n, 0, 1)
For greater alternative: where is the c.d.f of the Normal.
ks.test(y, pnorm, 0, 10, alternative = "greater")
##
## One-sample Kolmogorov-Smirnov test
##
## data: y
## D^+ = 0.42016, p-value = 0.000513
## alternative hypothesis: the CDF of x lies above the null hypothesis
One sided tests are hard to interpret
For lower alternative: :
ks.test(y, pnorm, 0, 10, alternative = "less")
##
## One-sample Kolmogorov-Smirnov test
##
## data: y
## D^- = 0.44858, p-value = 0.0001717
## alternative hypothesis: the CDF of x lies below the null hypothesis
One sided tests
The combination of the two one-sided alternatives, does not cover all the possibilities for which the null hypothesis is false.
This makes it very hard to interpret one-sided KS tests - i.e. don’t do a one-sided test.
Estimating parameters
The KS test should only be used if you can completely specify , the population distribution under the null hypothesis.
You should not estimate parameters from the data then do the test.
Kind of like trying to test , you’ll rarely reject.
Next time…
After midterm: what if distribution is discrete?