Wednesday 13 April 2011

A short nerdy post about use of percentiles in statistical analyses

Results from psychological tests can be expressed in various ways. Percentiles are a popular format in clinical reports, because they can be explained to non-experts fairly easily, in terms of the percentage of the population that would be expected to get a score of this level or below. So if your score is at the 10th percentile, only 10% of the population would be expected to score this low.
The other format that is commonly used in reporting test scores is the standard score or scaled score. This represents how many standard deviations a score is above or below the population mean. The simplest version is the z-score, obtained by the formula:
  (X-M)/S
where X is the obtained score, M is population mean, and S is the population standard deviation.
In clinical tests, z-scores are often transformed to a different scale, e.g. mean of 100 and SD 15 in the case of most IQ tests. This is done just by multiplying the z-score by the SD and adding the new mean.  So a z-score of -.33 becomes a scaled score of  (-.33 x 15)+100 = 95.
The important point to note is that all of these different methods of reporting scores are just transformations of one another. If you want to turn a z-score into a percentile, you can do so with the Excel function:
100*NORMDIST(A1,0,1,1)
where A1 is the address of the value you want to convert.
The second value in this expression is the mean and the third is the SD, so if you want to convert a scaled score with mean 100 and SD 15 into a percentile, the function is:
100*NORMDIST(A1,100,5,1)
The normsdist function returns a cumulative proportion, so it’s multiplied by 100 to give a percentage.
You can work the other way round using the NORMSINV function, which turns a proportion into a z-score. So if you have a percentile in cell A1, then you get a z-score with:
=NORMSINV(A1/100)

If all this Excel stuff gives you a headache, you can ignore it, so long as you get the message that z-scores, scaled scores and percentiles are all different ways of representing the same information.
They are NOT equivalent, however, in their distributions. Percentiles aren’t suitable as input to statistical procedures that assume normality, such as Anova and t-tests. They should always be converted to z-scores or other scaled scores.
This can be simply illustrated. If you are into Excel, you can generate your own data to make the point - otherwise you can just look at the output from the data I have generated.
Let’s simulate data from two groups, each of 50 participants. Assume the data are reading test scores, and that group 1 has reading difficulties and group 0 hasn’t.  For group 0 I will just generate a random normal distribution of scores with mean 0 and SD 1, by typing this function in each of 50 cells:
=NORMSINV(RAND())
For group 1, I use the same formula, but subtract 0.4 from each score:
=NORMSINV(RAND())-0.4
I pasted my simulated data into SPSS, as it makes it a bit easier to generate relevant statistical output. So for each of 100 simulated subjects, I have a column denoting their group (0 or 1), a column with their z-score, and a column with their percentile score.
Here’s what you get if you do a t-test (you’ll get different values if you generated your own data as the random process is different each time - but it should show the same pattern):
So why, if the numbers are just transforms of each other, are the results different?
The answer lies in the distribution of data. If you take percentiles, you transform a normal distribution into a rectangular one, as can be seen if you plot the histograms.
 
(That hole in the middle of the percentile distribution is just a fluke in the particular dataset I generated). Another way to think about it is to consider the size of difference between two points in the distribution. In terms of z-scores, the difference between the 1st and 10th percentile is 2.32-1.28 = 1.04, and the difference between the 41st and 50th percentile is .23. But on the percentile scale, these differences are treated as equivalent. In effect, the percentile transformation stretches out the points in the middle of the scale and gives them more weight than they should have.
So percentiles are a good way of communicating test scores of individuals, but a bad choice if you are doing statistical analyses of group data.




5 comments:

  1. Thanks for this - as always you explain things very clearly! I'll be keeping a bookmark to send it to offending articles I'm reviewing, or confused project students.

    ReplyDelete
  2. Overall one has to question the validity of statistics in the sense that they can only give a broad insight into the data. fMRI is a good example of how statistical analysis actually distances you from reality whilst giving the impression it does the opposite. The resolution statistical analysis gives is just not high enough for such endeavors but you have no real way of knowing. Silly headlines from fMRI studies attest to that.

    ReplyDelete
  3. Just a quick comment about statistical procedures that assume normality, namely ANOVA - according to Monte Carlo tests (Johnson, 1993), ANOVA (and even ANCOVA) are robust to violations of most assumptions including extremely skewed distributions and/or heterogeneity of group variances, assuming a balanced design and adequate group sizes (N>30).

    Johnson, C. C. (1993). The Effects of Violation of Data Set Assumptions when Using the Oneway, Fixed Effects Analysis of Variance and the One Concomitant Analysis of Covariance Statistical Procedures. Paper presented at the Annual Meeting of the Mid-South Educational Research Association (New Orleans, LA, November 10-12, 1993). Retrieved from: http://www.eric.ed.gov/PDFS/ED365720.pdf on 27/4/2011

    ReplyDelete
  4. Thanks for the comment Ben. I agree it makes sense not to worry too much about normality if you start out with a non-normal dataset, provided it isn't too severely non-normal. But what makes no sense is to take a normally distributed dataset and do stats on the percentile scores, as you are in effect deliberately modifying it to make it non-normal. And as the stats on the example above illustrate, that reduces power to detect genuine effects.

    ReplyDelete
  5. Results from psychological tests can be expressed in various ways. Percentiles are a popular format in clinical reports, because they can be explained to . feng shui

    ReplyDelete