Saturday, 30 June 2012

Schoolgirls' health put at risk by Catholic view on vaccination

Today's Time Newsfeed carries a remarkable story: parents of children attending Catholic schools in Calgary were sent a special letter to accompany details of a vaccination programme against human papillomavirus (HPV), which protects against cervical cancer. In it, local bishops wrote: "Although school-based immunization delivery systems generally result in high numbers of students completing immunization, a school-based approach to vaccination sends a message that early sexual intercourse is allowed.” I find this amazing for several reasons:
  • There's a complete failure to understand what affects teenagers' behaviour. Do the bishops seriously think that teenaged girls who are thinking of having sex say to themselves "Oh, wait a minute. I might get HPV. Let's not do it." Potential consequences of sex include a host of sexually transmitted diseases, as well as pregnancy. If these don't put girls off, then why should a risk of HPV? 
  • HPV is a sexually transmitted disease. You can get it if you are a virgin who marries someone with HPV. You can get it if you are raped (something which has been known to occur in Catholic schools). 
  • The recommendation seems theologically dubious. I'm an atheist, but my understanding of Catholicism is that whether or not something is a sin is largely to do with motivation rather than action. So if you are tempted to sex but desist because it would upset God, then that's good. If you are tempted to sex but desist only because of a fear of disease, that's still a sin. The church should be teaching girls to love God so much that they won't do things that offend him, not to conform to standards of sexual behaviour out of fear. No doubt religious readers will put me right if I've misunderstood this distinction. 
  • These girls are attending a Catholic school where I assume morality is drummed into them day and night. The bishops assume that their grasp of that morality is so weak that having a HPV vaccination will be sufficient to overturn everything they have been told about sexual ethics. Doesn't say much for the religious teaching in the schools, or for the intelligence of the pupils. 
  • One thing Jesus really understood is that humans aren't perfect and frequently fall short of the moral standards they try to adhere to. There's a huge emphasis on forgiveness of sin in his teachings. The Bishops are in effect saying that God won't forgive you if you stray from the straight and narrow: he'll commit you to a life with an unpleasant disease, and increase your risk of dying from cancer. That's not the Christian God I was taught about.

Sunday, 24 June 2012

Causal models of developmental disorders: the perils of correlational data

Experimental psychology depends heavily on statistics, but psychologists don’t always agree about the best ways of analyzing data. Take the following problem:
I have two groups each of 30 children, dyslexics and controls. I give them a test of auditory discrimination and find a significant difference between the groups, with the dyslexic mean being lower. I want to see whether reading ability is related to the auditory task. I compute the correlation between the auditory measure and reading, and find it is .42, which in a sample of 64 cases is significant at the .001 level.
I write up the results, concluding that poor auditory skill is a risk factor for poor reading. But reviewers are critical. So what’s wrong with this?
I’ll deal quickly with two obvious points. First, there is the well-worn phrase that correlation does not equal causation. The correlation could reflect a causal link from auditory deficit to poor reading, but we need also to consider other causal routes, as I’ll illustrate further below. This is an issue about interpretation rather than data analysis.
A second point concerns the need to look at the data rather than just computing the correlation statistic. Correlations can be sensitive to distributional properties of the data and can be heavily influenced by outliers. There are statistical ways of checking for such effects, but a good first step is just plotting a scatterplot to see whether the data look orderly. A tip for students: if your supervisor asks to see your project data, don’t just turn up with numerical output from the analysis: be ready to show some plots.
Figure 1: Fictitious data showing spurious correlation between height and reading ability
A less familiar point concerns the pooling of data across the dyslexic and control groups. Some people have strong views about this, yet, as far as I’m aware, it hasn’t been discussed much in the context of developmental disorders. I therefore felt it would be good to give it an airing on my blog and see what others think.
Let’s start with a fictitious example that illustrates the dangers of pooling data from two groups. Figure 1 is a scatterplot showing the correlation between height and reading ability in groups of 6-year-olds and 10-year-olds. If I pool across groups, I’m likely to see a strong correlation between height and reading ability, whereas within any one age group the correlation is negligible. This is a clear case of spurious correlation, as illustrated in Figure 2. Here the case against pooling is unambiguous, and it's clear that if you look at the correlation within either age band, there is no relationship between reading ability and height.
Figure 2: Model showing how a spurious correlation between height and reading arises because both are affected by age

Examples such as this have led some people to argue that you shouldn’t pool data in studies such as the dyslexic vs. control example. Or, to be more precise, the recommendation is usually that you should check the correlations within each group, and avoid pooling if they don’t look consistent with the pooled correlation. I’ve always been a bit uneasy about this logic and have been giving some thought as to why.
First, there is the simple issue of power. If you halve your sample size, then you increase the standard error of estimate for a correlation coefficient, making it more likely that it will be nonsignificant. Figure 3 shows the 95% confidence intervals around a correlation of .5 depending on sample size, and you can readily see that these are larger for small than big samples. There's a nice website by Stan Brown that gives relevant formulae in Excel.
Figure 3: 95% confidence interval around estimated correlation of .5, with different sample sizes

A less obvious point is that the data in Figure 1 look analogous to the dyslexic vs. control example, but there is an important difference. We know where we are with age: it is unambiguous to define and measure. But dyslexia is more tricky. Suppose we substitute dyslexia for age, and auditory processing for height, in the model of spurious correlation in Figure 2. We have a problem: there is no independent diagnostic test for dyslexia. It is actually defined in terms of one of our correlated variables, reading ability. Thus, the criterion used to allocate children to groups is not independent of the measures that are entered into the correlation. This creates distortions in within-group correlations, as follows.
If we define our groups in terms of their scores on one variable, we effectively restrict the range of values obtained by each group, and this lowers the correlation.  Furthermore, the restriction will be less for the controls than for the dyslexic group - who are typically selected as scoring below a low cutoff, such as one SD below the mean. Figure 4 shows simulated data for two groups selected from a population where the true correlation between variables A and B is .5. Thirty individuals (dyslexics) are selected as scoring more than 1 SD below average on variable A, and another 30 (controls) are selected as scoring above this level. 
Figure 4: Correlations obtained in samples of dyslexic (red) and controls (blue) for 20 runs of simulation with N = 30 per group.
The Figure shows correlations from twenty runs of this simulation. For both groups, the average correlation is less than the true value of .5, because of the restricted range of scores on variable A. However, because the range is more restricted for the dyslexic group, their average correlation is lower than that of the controls. A correlation of .42 corresponds to the .05 significance level for a sample of this size, and we can see that the controls are more likely to exceed this value than the dyslexic group. All these results are just artefacts of the way in which the groups were selected: both groups come from the same population where r = .5.
What can we conclude from all this? Well, the bottom line is that if we find non-significant within-group correlations this does not necessarily invalidate a causal model. The simulation shows that we may find that within-group correlations look quite different in dyslexic and control groups, even if they come from a common distribution.
So where does this leave us?! It would seem that in general, within-group data are unlikely to help us distinguish between causal and non-causal models: they may be compatible with both. So how should we proceed?
There’s no simple solution, but here are some suggestions:
1. If considering correlational data, always report the 95% confidence interval. Usually people (including me!) just report the correlation coefficient, degrees of freedom and p-value. It’s so uncommon to add confidence intervals that I suspect most psychologists don’t know how to compute it. Do not assume that because one correlation is significant and another is not that they are meaningfully different. This website can be used to test for the significance of the difference between correlations. I would, however, advise against interpreting such a comparison if your data are affected by the kinds of restriction of range discussed above.
2. Study the relationship between key variables in a large unselected sample covering a wide range of scores. This is a more tractable solution, but is seldom done. Typically, people recruit an equivalent number of cases and controls, with a sample size that is inadequate for getting a precise estimate of a correlation in either group. If your underlying model predicts a linear relationship between, say, auditory processing and phonological awareness, then with a sample of 200 cases, a fairly precise estimate can be obtained. With this approach, one can also identify whether the relationship is linear.
3. More generally, it’s important to be explicit about what models you are testing. For instance, I’ve identified four underlying models of the relationship between auditory deficit and language impairment, as shown in Figure 5. In general, correlational data on these two skills won’t distinguish between these models, but specifying the alternatives may help you think of other data that could be informative. 
Figure 5: Models of causal relationships underlying observed correlation between auditory deficit and language impairment
For instance:
  • We found that, when studying heritable conditions, it is useful to include data on parents or siblings. Models differ in predictions about how measures of genetic risk - for instance, family history, or presence of specific genetic variants - relate to A (auditory deficit) and B (language impairment) in the child. This approach is illustrated in this paper. Interestingly, we found that the causal model that is often implicitly assumed, which we termed the Endophenotype model, did not fit the data, but nor did the spurious correlation model, which corresponds here to the Pleiotropy model.
  • There may be other groups that can be informative: for instance, if you think auditory deficits are key in causing language problems, it may be worth including children with hearing loss in a study - see this paper for an example of this approach using converging evidence.
  • Longitudinal data can help distinguish whether A causes B or B causes A.
  • Training studies are particularly powerful, in allowing one to manipulate A and see if it changes B.
So what’s the bottom line? In general, correlational data from small samples of clinical and control groups are inadequate for testing causal models. They can lead to type I errors, where pooling data leads to a spurious association between variables, but also to type II errors, where a genuine association is discounted because it isn’t evident within subject groups. For the field to move forward, we need to go beyond correlational data.

P.S. 9th July 2012
I've written a little tutorial on simulating data using R to illustrate some of these points. No prior knowledge of R required. see:

Bishop DV, Hardiman MJ, & Barry JG (2012). Auditory deficit as a consequence rather than endophenotype of specific language impairment: electrophysiological evidence. PloS one, 7 (5) PMID: 22662112

If you liked this post, you may also be interested in my other posts on statistical topics:
Getting genetic effect sizes in perspective
The joys of inventing data
A short nerdy post about the use of percentiles
The difference between p < .05 and a screening test

Monday, 4 June 2012

The ‘autism epidemic’ and diagnostic substitution

Based on: King & Bearman (2011) American Sociological Review, 76(2), 320-346; 
Data from birth and diagnostic records for all children born in California 1992-2000
Everyone agrees there has been a remarkable increase in autism diagnosis across the world. There is, however, considerable debate about the reasons for this. Three very different kinds of explanation exist.
  • Explanation #1 maintains that something in our modern environment has come along to increase the risk of autism. There are numerous candidates, as indicated in this blogpost by Emily Willingham
  • Explanation #2 sees the risks as largely biological or genetic, with changing patterns of reproduction altering prevalence rates, either because of assortative mating (not much evidence, in my view) or because of an increase in older parents (more plausible). 
  • Explanation #3 is very different: it says the increase is not a real increase - it’s just a change in what we count as autism. This has been termed ‘diagnostic substitution’ - the basic idea is that children who would previously have received another diagnosis or no diagnosis are now being identified with autism spectrum disorder (ASD). This could be in part because of new conceptualisations of autism, but may also be fuelled by strategic considerations: resources for children with ASD tend to be much better than those for children with other related conditions, such as language impairment or intellectual handicaps, so this diagnosis may be preferred.
In 2008, my research group published a study that documented one kind of diagnostic substitution. We contacted people who had taken part in our studies of children with specific language impairment years ago. We carried out a standard diagnostic observation procedure for autism with the young adults themselves and, where possible, interviewed their parents about their early history. We found a number of individuals who had been regarded as cases of specific language impairment ten or twenty years ago but who would nowadays be diagnosed with ASD. Although it’s possible that some people develop autistic symptomatology as they get older, in our cases the autistic symptoms appeared to have been present from early childhood - as indicated by the parental interviews. Around half of the sample had been identified as having ‘semantic-pragmatic disorder’ in childhood, but autism had been excluded because at that time, prior to publication of DSM-IV diagnostic guidelines, it was regarded as a very rare condition in which there were severe social and behavioural impairments. How many children would have qualified for ASD diagnoses had they been seen today? Well, it depends. I suspect few people appreciate just how flexible the diagnostic criteria are for autism, even when lengthy standardized diagnostic instruments are used. Although we used the gold standard diagnostic procedures (ADOS-G and ADI-R) we found they seldom gave the same answer. If we diagnosed ASD only when both diagnostic instruments agreed, 21% of cases met criteria. If we included anyone who met criteria for autism or PDDNOS on either ADI-R or ADOS, the rate shot up to 66%.
Last year, a fascinating study by Brugha and colleagues attacked the same question from a different angle. They did an epidemiological survey of a representative sample of adults from the English population, using the ADOS-G, and found that the rates of ASD were similar to those recently reported in children. Within the adult population, rates of ASD did not change with age. Thus, provided we stick to the same diagnostic criteria, then the prevalence of autism is the same for those born several decades ago, as it is for the current generation of children. Importantly, none of these adults with ASD had received a formal diagnosis.
Recently, we conducted a study with another group: children with an additional sex chromosome (i.e. trisomy). We had not intended to study diagnostic substitution: the goal was rather to understand more about the language difficulties that had previously been described in children with sex chromosome trisomies. The effect of an extra sex chromosome is relatively mild: most of these children attend mainstream schools and they do not have any obvious physical abnormalities. Indeed, they can be hard to study because many individuals with trisomies will be unaware of their condition. We gathered information by parental report, and did not do any direct evaluation of the child, but we did ask about whether the child had had any kind of diagnosis by a medical or psychological expert. We confirmed that there was a strong association with language problems in all three kinds of trisomy (girls with XXX, and boys with XYY or XXY), many of whom had had speech-language therapy. But we also found that 2/19 (11%) of boys with XXY and 11/58 (19%) of those with XYY had received an ASD diagnosis.
It is important to emphasise that most children with a sex chromosome trisomy did not have an ASD diagnosis, and many were not giving any cause for concern. Nevertheless, although they are only a minority of cases, the proportion with ASD is much higher than in the general population. We were really surprised at this because before publishing our study we had done a systematic review of the literature on children with sex chromosome trisomies, focusing on studies that avoided ascertainment bias. In these studies, not a single case of autism had been mentioned when discussing outcomes. So was our study a fluke? We are confident this is not the case, because this year two further studies from the USA have been reported (Ross et al and Lee et al, in press), both of which got results very similar to ours, though using different methods.
This research provides further evidence that diagnostic substitution has occurred, suggesting that children who in the past would have been diagnosed with language impairment are now being diagnosed with ASD. The only other way to explain the increased diagnosis rate in children with a known chromosomal abnormality would be if the trisomy acted as a risk factor, making children more sensitive to environmental factors that could cause autism. That’s a possibility, but it seems more likely that cases of ASD were missed in the past because more stringent diagnostic criteria were used, just as was found in our follow-up of children with SLI and in the epidemiological study of adults by Brugha and colleagues.
It is becoming clear that changing diagnostic criteria, increased awareness of ASD, and strategic use of diagnosis to gain access to services, have had a massive effect on the numbers of children with ASD. When I started studies in this area, I thought diagnostic substitution had happened but I did not think it would be sufficient to explain the increase in numbers of ASD diagnoses. But now, on the basis of studies reviewed here, I think it could be the full story.

PS: a slightly extended version of this blogpost was featured on PLOS Blogs on 8th June 2012.

Bishop, D., Jacobs, P., Lachlan, K., Wellesley, D., Barnicoat, A., Boyd, P., Fryer, A., Middlemiss, P., Smithson, S., Metcalfe, K., Shears, D., Leggett, V., Nation, K., & Scerif, G. (2010). Autism, language and communication in children with sex chromosome trisomies Archives of Disease in Childhood, 96 (10), 954-959 DOI: 10.1136/adc.2009.179747
Bishop, D., Whitehouse, A., Watt, H., & Line, E. (2008). Autism and diagnostic substitution: evidence from a study of adults with a history of developmental language disorder Developmental Medicine & Child Neurology, 50 (5), 341-345 DOI: 10.1111/j.1469-8749.2008.02057.x  

Brugha, T. (2011). Epidemiology of Autism Spectrum Disorders in Adults in the Community in England Archives of General Psychiatry, 68 (5) DOI: 10.1001/archgenpsychiatry.2011.38

Lee, N. R., Wallace, G. L., Adeyemi, E. I., Lopez, K. C., Blumenthal, J. D., Clasen, L. S., & Giedd, J. N. (2012, in press). Dosage effects of X and Y chromosomes on language and social functioning in children with supernumerary sex chromosome aneuploidies: Implications for idiopathic language impairment and autism spectrum disorders. Journal of Child Psychology and Psychiatry

Ross, J. L.,et al (2012). Behavioral and social phenotypes in boys with 47, XYY syndrome or 47, XXY Klinefelter syndrome.  Pediatrics, 129(4), 769-778. doi: 10.1542/peds.2011-0719