Duncan Golicher’s weblog

Research, scripts and life in Chiapas

Posts Tagged ‘statistics

Kolmogorov-Smirnov tests of normality

with 3 comments

Click here for an explanation of the animation below

ks

Today I had an interesting exchange regarding tests of normality when teaching introductory statistics.

I have a dilemma. The point is that I do not place a great deal of stress on tests of normality when I teach Master’s students, although I mention that they are often used. However in introductory statistics texts tests of normality are given a lot more attention, presumably in order to ensure that students are aware that normality is an important assumption for many statistical procedures. I’m all in favour of testing assumptions. But do students really know what assumptions they are testing?
I have to teach introductory statistics without confusing students or sending mixed messages. It is therefore quite a delicate matter that needs clarity.

In fact none of these statements are accurate (and its Smirnoff you are thinking of!). My own  preference is to try teach students to understand why any underlying population or sampling frame might not be normal.They should also intuitively understand how the procedure used for sampling from the population may influence the properties of the sample drawn from the populations.

These properties are then treated as expected before beginning any field work. All data transformation or use of non-parametric tests are pre-planned as part of the formal protocol designed for data collection and analysis.

I really do not like any post-hoc alterations to a planned work scheme after the data are collected. At best they waste time, at worst they lead students to think that the data themselves are somehow “invalid” and thus unpublishable.

I therefore quite strongly dislike including post hoc tests of normality within the work flow of the analysis as a knee jerk procedure with a yes/no answer. This certainly does not suggest that I tell students to assume that all the preplanned analyses are necessarily valid, nor to accept that inference on the mean can be conducted without checking assumptions.

The alternative to automated tests of normality is to make sure that students always visualise the distribution of their data fully in order to understand why any assumption of normality may be wrong. I also try to encourage students understand how and why data transformations might work. Again this is usually most helpful before data is collected, but it is also a way to deal with major surprises.

Here again is the link to the pdf document I wrote that suggests a possible answer to the poll.

ksdemo2

Click on the link above as it is easier to include PDFs in wordpress this way.

An here is a quick test of any interpretation of the results of a KS test of normality.

Just to summarise the well known reason to avoid testing for normality. If you draw a very large sample from a slightly non-normal population the test tends to provide low p-values. You should presumably reject the null hypothesis that the data could have come from a normal population and according to a strict interpretation you then can’t use your planned analysis as it would be “invalid!

However if you draw small samples from very non normal populations (as shown in the pdf) you will not reject the null hypothesis as often, even though the methodology will provide misleading inference.ksdemo3

Rationality and the lottery

leave a comment »

The BBC web site today contained what appears to me to be a misrepresentation of decision theory. The argument goes…

“Should you invest £2 a day or use it to buy lottery tickets?

Maths makes the decision obvious. Suppose you invest two quid every day at the reasonable rate of 10%. It will take you almost exactly 50 years to accumulate £1m. To earn this same £1m in the National Lottery, you would (on average) have to match five numbers and a bonus ball, at odds of 2,330,635-to-1.

If you spent two quid a day for 50 years you would total just over 36,500 tickets and would thus have only a 1-in-63 chance of making that million pounds. However, the available image of immediate wealth subverts this rationality.”

Is this right. Is it “obvious” as the author claims. No it is not. It is far from obvious.

The calculation of compound interest is correct, although banks do not normally compound interest on a daily basis and 10% is rather optimistic. You can check by simulating the arrangement as an R function using numerical integration.

f<-function(ndays=365*50,interest=0.1,value=2){
a<-numeric(ndays)
a[1]<-value
for (i in 2:(ndays)){
a[i]<-a[i-1]+value
a[i]<-a[i]+(interest/365)*a[i]}
a}

par(bg=grey(0.92))
plot(f(v=2),type=”l”,lwd=2,col=”red”,xlab=”Number of days”,ylab=”Accumulated value”)
grid(col=1)

fiig16.png

The money in the bank grows healthily towards the one million target. So what is wrong with the argument? The author claims that the odds of winning a million on the lottery are 2,330,635:1. This is not a fair bet, but it is not such a bad one either. You have just under one chance in two million of winning the one million on offer. The expected value of your one pound ticket is the chances of winning (admittedly very small indeed) multiplied by the sum that would be won (and of course this is very large).
1/2330636* 1000000= 0.4290674

So the expected value of your ticket is about 43p. You have superficially wasted 57p.

The story about all the interest you would get by investing the money is a misleading red herring. If you took the conclusion of a 1:63 ratio between saving and gambling seriously it would persuade you not to buy a single lottery ticket even if the odds on winning were to become more favourable than one in a two million and bettered the value of the prize. Decisions between retaining a small sum with certainty and risking a big one do always involve subjective judgement, but few would not consider the lottery worth a shot if the prize of 1 million could be won at odds of (say) 200,000:1. The author of the article would (on this erroneous logic) still be convinced that it is better to put the money in the bank.

The formula for compound interest can be written as an R function in terms of the principal (p) the number of periods in a year that interest is paid (q), interest rate (i) and number of years(n)

f1<-function(p=1,i=0.1,q=365,n=1)p*(1+(i/q))^(n*q)

So using this function, lets think this all through calmly. If you were to win the lottery tomorrow and do the same with the money as you would have done with the two pounds you spent on the ticket, i.e. invest it at a compound interest of 10% you would be colossally wealthy in fifty years time. Using the same interest rate calculation that the author assumed you would have over 148,000,000.

f1(p=1000000,n=50)
[1] 148311560

On the other hand, if you were to win your million exactly fifty years from now you would just have your million at the end of the period. This would coincide with what you would have gained from saving.

So to reiterate, all wins before the final date are worth more than the saved money in the bank at the end of fifty years, The earlier you win the better. The only addition I have made to the authors’ own argument is to assume (quite fairly) that lottery winnings also gain compound interest. The comparison the author makes between the frugal saver and the lottery player is quite unfair. It uses only the absolute minimum that a lottery win would be worth as the baseline for comparison. The expected lottery winnings at the end of fifty years are quite clearly worth very much more than one million. In fact under this model, it is easy to show that the expected amount is exactly 0.4290674 times the money that would be in the bank if you had not played the lottery, providing comparable assumptions are made regarding the use of the money.

plot(f(v=2),type=”l”,lwd=2,col=”red”,xlab=”Number of days”,ylab=”Accumulated value”)

lines(f(v=2*0.4290674),type=”l”,lwd=2,col=”blue”)

fiig18.png

The differences between the money paid and the in expected value (in purely monetary terms) doesn’t change.The ratio between the red line (saver) and the blue line (expected value from playing the lottery and investing the proceeds) stays the same. Lottery players have (on average) an expected value of around 43% that of the savers. They are worse off, but nowhere near as irrational as the article suggests..

But we can go a step further with the argument. As you think it through and apply common sense it gets better and better for the lottery player. You clearly wouldn’t ever dream of actually investing a million you won tomorrow in order to have megabucks in fifty years time. A small fortune is worth much more to you now than an unspendable fortune in the future. In fact, to you, it is almost certainly worth much more than 148 times its future value, given the positive, life enhancing, potential of a single million. After the first million the next 147 are increasingly irrelevant to your happiness. This could be written using a function that converts money into happiness. This is a curve that reaches some sort of asymptote. The absolute level of the asymptote varies between individuals, but the shape is fixed, even for Bill Gates.

fiig17.png

At the same time the savings should be devalued by the probability of dying before they can be used, the bank suffering a fate worse than Northern Rock, a meteorite strike, or the consequences of catastrophic global warming among a multitude of other scenarios. Depending on just how much all these trade offs come in at (which again is a rather subjective matter), your lottery ticket could easily turn out to be worth more to you than the pound you paid for it.

The original article states ..

If you spent two quid a day for 50 years you would total just over 36,500 tickets and would thus have only a 1-in-63 chance of making that million pounds.

A 1-in-63 chance during a lifetime doesn’t sound so unlikely!
It can in fact be perfectly rational to buy a lottery ticket. Which is why so many rational people do.

End line: Why should this interest a forest ecologist? Because it might explain why sustainable forestry is so difficult!

Written by Duncan Golicher

February 11, 2008 at 7:21 pm

Follow

Get every new post delivered to your Inbox.