Kolmogorov-Smirnov Test ST551 Lecture 16

Finish last time’s slides

Kolmogorov-Smirnov Test (not on midterm!)

Kolmogorov-Smirnov Test

Population: Y some population distribution with c.d.f F

Sample: n i.i.d from population, Y1,,Yn

Parameter: Whole CDF

Null hypothesis: H0:F=F0, versus HA:FF0

Kolmogorov-Smirnov Test

Test statistic

D(F0)=supy|F^(y)F0(y)|

where F^(y) is the empirical cumulative distribution function:

F^(y)=1ni=1n11{Yiy}

and F0 is the cumulative distribution function for the null hypothesized distribution.

ECDF: Example

Sample values: 1.8, 2.2, 2.7, 5.7, 6.9, 7.4, 8.1, 8.7, 9 and 9.5

KS test statistic: Uniform(0, 10)

Say, H0:F(Y)={0,y0y10,0<y101,y>10

I.e. H0:YUniform(0,10)

KS test statistic: Uniform(0, 10) cont.

D(F0)=supy|F^(y)F0(y)|0.29 (occurs at y just less than 6.9)

KS test statistic: Normal(5, 6.25)

H0:YNormal(5,6.25)

KS test statistic: Example cont.

D(F0)=supy|F^(y)F0(y)|0.37 (occurs at y just less than 6.9)

Reference Distribution?

nD(F0)dK where K is the Kolmogorov Distribution.

Reject H0 for large values of nD(F0).

In R

ks.test(x = y, y = punif, min = 0, max = 10)
## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  y
## D = 0.28632, p-value = 0.3209
## alternative hypothesis: two-sided

One sided tests

Lesser alternative: HA:F<F0, i.e. F(y)<F0(y) for all y

Test statistic D(H0)=supy(F0(y)F^(y))

Greater alternative: HA:F>F0, i.e. F(y)>F0(y) for all y

Test statistic D+(H0)=supy(F^(y)F0(y))

One sided tests are hard to interpret

Example based on simulated data. H0:YN(0,100)

n <- 20
y <- rnorm(n, 0, 1)

For greater alternative: HA:FY(y)>Φ(y;0,100) where Φ(y;μ,σ2) is the c.d.f of the Normal(μ,σ).

ks.test(y, pnorm, 0, 10, alternative = "greater")
## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  y
## D^+ = 0.42016, p-value = 0.000513
## alternative hypothesis: the CDF of x lies above the null hypothesis

One sided tests are hard to interpret

For lower alternative: HA:FY(y)<Φ(y;0,100):

ks.test(y, pnorm, 0, 10, alternative = "less")
## 
##  One-sample Kolmogorov-Smirnov test
## 
## data:  y
## D^- = 0.44858, p-value = 0.0001717
## alternative hypothesis: the CDF of x lies below the null hypothesis

One sided tests

The combination of the two one-sided alternatives, does not cover all the possibilities for which the null hypothesis is false.

This makes it very hard to interpret one-sided KS tests - i.e. don’t do a one-sided test.

Estimating parameters

The KS test should only be used if you can completely specify F0, the population distribution under the null hypothesis.

You should not estimate parameters from the data then do the test.

Kind of like trying to test H0:μ=Y¯, you’ll rarely reject.

Next time…

After midterm: what if distribution is discrete?