Today I had an interesting exchange regarding tests of normality when teaching introductory statistics.
I have a dilemma. The point is that I do not place a great deal of stress on tests of normality when I teach Master’s students, although I mention that they are often used. However in introductory statistics texts tests of normality are given a lot more attention, presumably in order to ensure that students are aware that normality is an important assumption for many statistical procedures. I’m all in favour of testing assumptions. But do students really know what assumptions they are testing?
I have to teach introductory statistics without confusing students or sending mixed messages. It is therefore quite a delicate matter that needs clarity.
In fact none of these statements are accurate (and its Smirnoff you are thinking of!). My own preference is to try teach students to understand why any underlying population or sampling frame might not be normal.They should also intuitively understand how the procedure used for sampling from the population may influence the properties of the sample drawn from the populations.
These properties are then treated as expected before beginning any field work. All data transformation or use of non-parametric tests are pre-planned as part of the formal protocol designed for data collection and analysis.
I really do not like any post-hoc alterations to a planned work scheme after the data are collected. At best they waste time, at worst they lead students to think that the data themselves are somehow “invalid” and thus unpublishable.
I therefore quite strongly dislike including post hoc tests of normality within the work flow of the analysis as a knee jerk procedure with a yes/no answer. This certainly does not suggest that I tell students to assume that all the preplanned analyses are necessarily valid, nor to accept that inference on the mean can be conducted without checking assumptions.
The alternative to automated tests of normality is to make sure that students always visualise the distribution of their data fully in order to understand why any assumption of normality may be wrong. I also try to encourage students understand how and why data transformations might work. Again this is usually most helpful before data is collected, but it is also a way to deal with major surprises.
Here again is the link to the pdf document I wrote that suggests a possible answer to the poll.
Click on the link above as it is easier to include PDFs in wordpress this way.
An here is a quick test of any interpretation of the results of a KS test of normality.
Just to summarise the well known reason to avoid testing for normality. If you draw a very large sample from a slightly non-normal population the test tends to provide low p-values. You should presumably reject the null hypothesis that the data could have come from a normal population and according to a strict interpretation you then can’t use your planned analysis as it would be “invalid!
However if you draw small samples from very non normal populations (as shown in the pdf) you will not reject the null hypothesis as often, even though the methodology will provide misleading inference.ksdemo3