Monday, 3 September 2012

What Chomsky doesn't get about child language

 

© Cartoonstock.com
Noam Chomsky is widely regarded as an intellectual giant, responsible for a revolution in how people think about language.  In a recent book by Chomsky and James McGilvray, the Science of Language, the foreword  states: “It is particularly important to understand Chomsky’s views … not only because he virtually created the modern science of language by himself …. but because of what he and colleagues have discovered about language – particularly in recent years…”  

As someone who works on child language disorders, I have tried many times to read Chomsky in order to appreciate the insights that he is so often credited with. I regret to say that, over the years, I have come to the conclusion that, far from enhancing our understanding of language acquisition, his ideas have led to stagnation, as linguists have gone through increasingly uncomfortable contortions to relate facts about children’s language to his theories. The problem is that the theories are derived from a consideration of adult language, and take no account of the process of development. There is a fundamental problem with an essential premise about what is learned that has led to years of confusion and sterile theorising.
Let us start with Chomsky’s famous sentence "Colourless green ideas sleep furiously". This was used  to demonstrate independence of syntax and semantics: we can judge that this sentence is syntactically well-formed even though it makes no sense. From this, it was a small step to conclude  that language acquisition involves deriving abstract syntactic rules that determine well-formedness, without any reliance on meaning. The mistake here was to assume that an educated adult's ability to judge syntactic well-formedness in isolation has anything to do with how that ability was acquired in childhood. Already in the 1980s, those who actually studied language development found that children used a wide variety of cues, including syntactic, semantic, and prosodic information, to learn language structure (Bates & MacWhinney, 1989).  Indeed, Dabrowska (2010) subsequently showed that agreement on well-formedness of complex sentences was far from universal in adults.
Because he assumed that children were learning abstract syntactic rules from the outset, Chomsky encountered a serious problem. Language, defined this way, was not learnable by any usual learning system: this could be shown by formal proof from mathematical learning theory. The logical problem is that such learning is too unconstrained: any grammatical string of elements is compatible with a wide range of underlying rule systems. The learning becomes a bit easier if children are given negative evidence (i.e., the learner is explicitly told which rules are not correct), but (a) this doesn’t really happen and (b) even if it did, arrival at the correct solution is not feasible without some prior knowledge of the kinds of rules that are allowable. In an oft-quoted sentence, Chomsky (1965) wrote: "A consideration of the character of the grammar that is acquired, the degenerate quality and narrowly limited extent of the available data, the striking uniformity of the resulting grammars, and their independence of intelligence, motivation and emotion state, over wide ranges of variation, leave little hope that much of the structure of the language can be learned by an organism initially uninformed as to its general character." (p. 58) (my italics).
So we were led to the inevitable, if surprising, conclusion that if grammatical structure cannot be learned, it must be innate. But different languages have different grammars. So whatever is innate has to be highly abstract – a Universal Grammar.  And the problem is then to explain how children get from this abstract knowledge to the specific language they are learning. The field became encumbered by creative but highly implausible theories, most notably the parameter-setting account, which conceptualised language acquisition as a process of "setting a switch" for a number of innately-determined parameters (Hyams, 1986). Evidence, though, that children’s grammars actually changed in discrete steps, as each parameter became set, was lacking. Reality was much messier.
Viewed from a contemporary perspective, Chomsky’s concerns about the unlearnability of language seem at best rather dated and at worst misguided. There are two key features in current developmental psycholinguistics that were lacking from Chomsky’s account, both concerning the question of what is learned. First, there is the question of the units of acquisition: for Chomsky, grammar is based on abstract linguistic units such as nouns and verbs, and it was assumed that children operated with these categories. Over the past 15 years, direct evidence has emerged to indicate that children don't start out with awareness of underlying grammatical structure; early learning is word-based, and patterning in the input at the level of abstract elements is something children become aware of as their knowledge increases (Tomasello, 2000).  
Second, Chomsky viewed grammar as a rule-based system that determined allowable sequences of elements. But people’s linguistic knowledge is probabilistic, not deterministic. And there is now a large body of research showing how such probabilistic knowledge can be learned from sequential inputs, by a process of statistical learning. To take a very simple example, if repeatedly presented with a sequence such as ABCABADDCABDAB, a learner will start to be aware of dependencies in the input, i.e. B usually follows A, even if there are some counter-examples. Other types of sequence such as AcB can be learned, where c is an element that can vary (see Hsu & Bishop, 2010, for a brief account). Regularly encountered sequences will then form higher-level units. At the time Chomsky was first writing, learning theories were more concerned with forming of simple associations, either between paired stimuli, or between instrumental acts and outcomes. These theories were not able to account for learning of the complex structure of natural language. However, once language researchers started to think in terms of statistical learning, this led to a reconceptualisation of what was learned, and many of the conceptual challenges noted by Chomsky simply fell away.
Current statistical learning accounts allow us to move ahead and to study the process of language learning. Instead of assuming that children start with knowledge of linguistic categories, categories are abstracted from statistical regularities in the input (see Special Issue 03, Journal of Child Language 2010, vol 37). The units of analysis thus change as the child develops expertise. And, consistent with the earlier writings of Bates and MacWhinney (1989), children's language is facilitated by the presence of correlated cues in the input, e.g., prosodic and phonological cues in combination with semantic context. In sharp contrast to the idea that syntax is learned by a separate modular system divorced from other information, recent research emphasises that the young language learner uses different sources of information together. Modularity emerges as development proceeds.
A statistical learning account does not, however, entail treating the child as a “blank slate”. Developmental psychology has for many years focused on constraints on learning: biases that lead the child to attend to particular features of the environment, or to process these in a particular way. Such constraints will affect how language input is processed, but they are a long way from the notion of a Universal Grammar. And such constraints are not specific to language: they influence, for instance, our ability to perceive human faces, or to group objects perceptually.

It would be rash to assume that all the problems of language acquisition can be solved by adopting a statistical learning approach. And there are still big questions, identified by Chomsky and others – Why don’t other species have syntax? How did language evolve? Is linguistic ability distinct from general intelligence?  But we now have a theoretical perspective that makes sense in terms of what we know about cognitive development and neuropsychology, that has general applicability to many different aspects of language acquisition, which forges links between language acquisition and other types of learning, and leads to testable predictions. The beauty of this approach is that it is amenable both to experimental test and to simulations of learning, so we can identify the kinds of cues children rely on, and the categories that they learn to operate with.

So how does Chomsky respond to this body of work? To find out, I decided to take a look at The Science of Language, which based on transcripts of conversations between Chomsky and James McGilvray between 2004 and 2009. It was encouraging to see from the preface that the book is intended for a general audience and “Professor Chomsky’s contributions to the interview can be understood by all”.  

Well, as “one of the most influential thinkers of our time”, Chomsky fell far short of expectation. Statistical learning and connectionism were not given serious consideration, but were rapidly dismissed as versions of behaviourism that can’t possibly explain language acquisition. As noted by Pullum elsewhere, Chomsky derides Bayesian learning approaches as useless – and at one point claimed that statistical analysis of sequences of elements to find morpheme boundaries “just can’t work” (cf. Romberg & Saffran, 2010). He seemed stuck with his critique of Skinnerian learning and ignorant of how things had changed.
I became interested in not just what Chomsky said, but how he said it.  I’m afraid that despite the reassurances in the foreword, I had enormous difficulty getting through this book. When I read a difficult text, I usually take notes to summarise the main points. When I tried that with the Science of Language, I got nowhere because there seemed no coherent structure. Occasionally an interesting gobbet of information bobbed up from the sea of verbiage, but it did not seem part of a consecutive argument. The style is so discursive that it’s impossible to prĂ©cis. His rhetorical approach seemed the antithesis of a scientific argument. He made sweeping statements and relied heavily on anecdote.

A stylistic device commonly used by Chomsky is to set up a dichotomy between his position and an alternative, then represent the alternative in a way that makes it preposterous. For instance, his rationalist perspective on language acquisition, which presupposes innate grammar, is contrasted with an empiricist position in which “Language tends to be seen as a human invention, an institution to which the young are inducted by subjecting them to training procedures”.  Since we all know that children learn language without explicit instruction, this parody of the empiricist position has to be wrong.
Overall, this book was a disappointment: one came away with a sense that a lot of clever stuff had been talked about, and much had been confidently asserted, but there was no engagement with any opposing point of view – just disparagement.  And as Geoffrey Pullum concluded, in a review in the Times Higher Education, there was, alas, no science to be seen.


References
Bates, E., & MacWhinney, B. (1989). Functionalism and the competition model. In B. MacWhinney & E. Bates (Eds.), The crosslinguistic study of sentence processing (pp. 3-73). Cambridge: Cambridge University Press. Available from: http://psyling.psy.cmu.edu/papers/bib.html
Chomsky, N. (1965). Aspects of the theory of syntax. Cambridge, MA: MIT Press.
Chomsky, N., & McGilvray, J. (2012). The Science of Language: Interviews with James McGilvray. Cambridge: Cambridge University Press.
Dabrowska, E. (2010). Native v expert intuitions: An empirical study of acceptability judgements. The Linguistic Review, 27, 1-23.
Hsu, H. J., & Bishop, D. V. M. (2010). Grammatical difficulties in children with specific language impairment (SLI): is learning deficient? Human Development, 53, 264-277.
Hyams, N. (1986). Language acquisition and the theory of parameters. Dordrecht: Reidel.
Romberg, A. R., & Saffran, J. R. (2010). Statistical learning and language acquisition Wiley Interdisciplinary Reviews: Cognitive Science, 1 (6), 906-914 DOI: 10.1002/wcs.78
Tomasello, M. (2000). Acquiring syntax is not what you think. In D. V. M. Bishop & L. B. Leonard (Eds.), Speech and Language Impairments in Children: Causes, Characteristics, Intervention and Outcome (pp. 1-15). Hove, UK: Psychology Press.


Correction: 4/9/2010. I had originally cited the wrong reference to Dabrowska (Dabrowska, E. 1997. The LAD goes to school : a cautionary tale for nativists. Linguistics, 35, 735-766). The 1997 paper is concerned with variation in adults' ability to interpret syntactically complex sentences. The 2010 paper cited above focuses on grammaticality judgements.




A far-too-long response to (some) commentators
12th October 2012
One of the nice things about blogging is that it gives an opportunity to get feedback on one’s point of view. I’d like to thank all those who offered comments on what I’ve written here, particularly those who have suggested readings to support the arguments they make. The sheer diversity of views has been impressive, as is the generally polite and scholarly tone of the arguments. I’ve tried to look seriously at the points people have made and I’ve had a fascinating few weeks reading some of the broader literature recommended by commentators.
I quickly realised that I could easily spend several months responding to comments and reading around this area, so I have had to be selective. I’ll steer clear of commenting on Chomsky’s political arguments, which I see as quite a separate issue. Nor am I prepared to engage with those who suggest Chomsky is above criticism, either because he is so famous, or because he’s been around a long time.  Finally, I won’t say more about the views of those who have expressed agreement, or extensions of my arguments – other than to say thanks: this is a weird subject area where all too often people seem scared to speak out for fear of seeming foolish or ignorant. As Anon (4 Sept) says, it can quickly get vitriolic, which is bad for everyone.  But if we at least boldly say what we think, those with different views can either correct us, or develop better arguments. 
I’ll focus in this reply on the main issues that emerged from the discussion: how far is statistical learning compatible with a Chomskyan account, are there things that a non-Chomskyan account simply can’t deal with, and finally, are there points of agreement that could lead to more positive engagement in future between different disciplines?
How compatible is statistical learning with a Chomskyan account?
A central point made by Anon, (3rd Sept/4th Sept), and Chloe Marshall (11th Sept) is that  probabilistic learning is compatible with Chomsky's views. 
This seems to be an absolutely crucial point. If there really is no mismatch between what Chomsky is saying and those who are advocating accounts of language acquisition in terms of statistical learning, then maybe the disagreement is just about terminology and we should try harder to integrate the different approaches. 
It’s clear we can differentiate between different levels of language processing.  For instance, here are just three examples of how statistical learning may be implicated in language learning:

  • The original work by Saffran et al (1996) focused on demonstrating that infants were sensitive to transitional probabilities in syllable strings. It was suggested that this could be a mechanism that was involved in segmenting words from speech input.
  • Redington et al (1998) proposed that information about lexical categories could be extracted from language input by considering sequential co-occurrences of words.
  • Edelman and Waterfall (2007) reviewed evidence that children attend to specific patterns of specific lexical items in their linguistic input, concluding that they first acquire the syntactic patterns of particular words and structures and later generalize information to entire word classes. They went on to describe heuristic methods for uncovering structure in input, using the example of the ADIOS (Automatic DIstillation Of Structure) algorithm. This uses distributional regularities in raw, unannotated corpus data to identify significant co-occurrences, which are used as the basis for distributional classes. Ultimately, ADIOS discovers recursive rule-like patterns that support generalization.

So what does Chomsky make of all of this? I am grateful to Chloe for pointing me to his 2005 paper “Three factors in language design”, which was particularly helpful in tracing the changes in Chomsky’s views over time.
Here’s what he says on word boundaries: 
“In Logical Structure of Linguistic Theory (LSLT; p. 165), I adopted Zellig Harris’s (1955) proposal, in a different framework, for identifying morphemes in terms of transitional probabilities, though morphemes do not have the required beads-on-a-string property. The basic problem, as noted in LSLT, is to show that such statistical methods of chunking can work with a realistic corpus. That hope turns out to be illusory, as has recently been shown by Thomas Gambell and Charles Yang (2003), who go on to point out that the methods do, however, give reasonable results if applied to material that is preanalyzed in terms of the apparently language-specific principle that each word has a single primary stress. If so, then the early steps of compiling linguistic experience might be accounted for in terms of general principles of data analysis applied to representations preanalyzed in terms of principles specific to the language faculty....”
Gambell and Yang don’t seem to have published in the peer-reviewed literature, but I was able to track down four papers by these authors (Gambell & Yang, 2003; Gambell & Yang, 2004; Gambell & Yang, 2005a; Gambell & Yang, 2005b),which all make essentially the same point. They note that a simple rule that treats a low-probability syllabic transition as a word boundary doesn’t work with a naturalistic corpus where a high proportion of words are monosyllabic. However, adding prosodic information – essentially treating each primary stress as belonging to a new word – achieves a much better level of accuracy. 
The work by Gambell and Yang is exactly the kind of research I like: attempting to model a psychological process and evaluating results against empirical data. The insights gained from the modelling take us forward. The notion that prosody may provide key information in segmenting words seems entirely plausible. If generative grammarians wish to refer to such a cognitive bias as part of Universal Grammar, that’s fine with me. As noted in my original piece, I agree that there must be some constraints on learning; if UG is confined to this kind of biologically plausible bias, then I am happy with UG. My difficulties arise with more abstract and complex innate knowledge, such as are involved in parameter setting (of which, more below).
But, even at this level of word identification, there are still important differences between my position and the Chomskyan one. First of all, I’m not as ready as Chomsky to dismiss statistical learning on the basis of Gambell and Yang’s work. Their model assumed a sequence of syllables was a word unless it contained a low transitional probability. Its accuracy was so bad that I suspect it gave a lower level of success than a simpler strategy: “Assume each syllable is a word.”  But consider another potential strategy for word segmentation in English, which would be “Assume each syllable is a complete word unless there’s a very high transitional probability with the next syllable.” I’d like to see a model like that tested before assuming transitional probability is a useless cue.
Second, Gambell and Yang stay within what I see as a Chomskyan style of thinking which restricts the range of information available to the language processor when solving a particular problem. This is parsimonious and makes modelling tractable, but it’s questionable just how realistic it is. It contrasts sharply with the view proposed by Seidenberg and MacDonald (1999), who argue that cues that individually may be poor at solving a categorisation problem, may be much more effective when used together. For instance, the young child doesn’t just hear words such as ‘cat’, ‘dog’, ‘lion’, ‘tiger’, ‘elephant’ or ‘crocodile’: she typically hears them in a meaningful context where relevant toys or pictures are present. Of course, contextual information is not always available and not always reliable. However, it seems odd to assume that this contextual information is ignored when populating the lexicon. This is one of the core difficulties I have with Chomsky: the sense that meaning is not integrated in language learning. 
Turning to lexical categories, the question is whether Chomsky would accept that these might be discovered by the child through a process of statistical learning, rather than being innate. I have understood that he’d rejected this idea, and have not found any statement by him to suggest otherwise, but others may be able to point to these. Franck Ramus (4th Sept) argues that children do represent some syntactic categories well before this is evident in their language and this is not explained by statistical relationships between words. I’m not convinced by the evidence he cites, which is based on different brain responses to grammatical and ungrammatical sentences in toddlers (Bernal et al, 2010). First, the authors state: “Infants could therefore not detect the ungrammaticality by noticing the co-occurrence of two words that normally never occur together”. But they don’t present any information on transitional probabilities in a naturalistic corpus for the word sequences used in their sentences. All that is needed is for statistical learning is for the transitional probabilities to be lower in the ungrammatical than grammatical sentences: they don't have to be zero.  Second, the children in this study were two years old, and would have been exposed to a great deal of language from which syntactic categories could have been abstracted by mechanisms similar to those simulated by Redington et al.
Regarding syntax, I was pleased to be introduced to the work of Jeffrey Lidz, whose clarity of expression is a joy after struggling with Chomsky. He reiterates a great deal of what I regard as the ‘standard’ Chomskyan view, including the following:
Speaking broadly, this research generally finds that children’s representations do not differ in kind from those of adults and that in cases where children behave differently from adults, it is rarely because they have the wrong representations. Instead, differences between children and adults are often attributed to task demands (Crain & Thornton, 1998), computational limitations (Bloom,1990; Grodzinsky & Reinhart, 1993), and the problems of pragmatic integration (Thornton & Wexler, 1999) but only rarely to representational differences between children and adults (Radford, 1995; see also Goodluck, this volume).” Lidz, 2008
The studies cited by Lidz as showing that children’s representations are the same as adults – except for performance limitations – has intrigued me for many years. As someone who has long been interested in children’s ability to understand complex sentence structures, I long ago came to realise that the last thing children usually attend to is syntax: their performance is heavily influenced by context, pragmatics, particular lexical items, and memory load. But my response to this observation is very different from that of the generative linguists. Whereas they strive to devise tasks that are free of these influences, I came to the conclusion that they play a key part in language acquisition.  Again, I find myself in agreement with Seidenberg and MacDonald (1999):
The apparent complexity of language and its uniqueness vis a vis other aspects of cognition, which are taken as major discoveries of the standard approach, may derive in part from the fact that these ‘performance’ factors are not available to enter into explanations of linguistic structure. Partitioning language into competence and performance and then treating the latter as a separate issue for psycholinguists to figure out has the effect of excluding many aspects of language structure and use from the data on which the competence theory is developed.” (p 572)
The main problem I have with Chomskyan theory, as I explained in the original blogpost, is the implausibility of parameter setting as a mechanism of child language acquisition. In The Science of Language, Chomsky (2012) is explicit about  parameter-setting as an attractive way out of the impasse created by the failure to find general UG principles that could account for all languages.  Specifically, he says:
If you’re trying to get Universal Grammar to be articulated and restricted enough so that an evaluation will only have to look at a few examples, given data, because that’s all that’s permitted, then it’s going to be very specific to language, and there aren’t going to be general principles at work. It really wasn’t until the principles and parameters conception came along that you could really see a way in which this could be divorced. If there’s anything that’s right about that, then the format for grammar is completely divorced from acquisition; acquisition will only be a matter of parameter setting. That leaves lots of questions open about what the parameters are; but it means that whatever is left are the properties of language.”
I’m sure readers will point out if I’ve missed anything, but what I take away from this statement is an admission that UG is now seen as consisting of very general and abstract constraints on processing that are not necessarily domain-specific. The principal component of UG that interests Chomsky is 
an operation that enables you to take mental objects [or concepts of some sort], already constructed, and make bigger mental objects out of them.  That’s Merge. As soon as you have that, you have an infinite variety of hierarchically structured expressions [and thoughts] available to you.” 
I have no difficulty in agreeing with the idea that recursion is a key component of language and  humans have a capacity for this kind of processing. But Chomsky makes another claim that I find much harder to swallow. He sees the separation of UG from parameter-setting as a solution to the problem of acquisition; I see it as just moving the problem elsewhere.  For a start, as he himself notes, there are “a lot of questions open” about what the parameters are.  Also, children don’t behave as if parameters are set one way or another: their language output is more probabilistic. I was interested to read that modifications of Chomskyan theory have been proposed to handle this:
Developing suggestions of Thomas Roeper’s, Yang proposes that UG provides the neonate with the full array of possible languages, with all parameters valued, and that incoming experience shifts the probability distribution over languages in accord with a learning function that could be quite general. At every stage, all languages are in principle accessible, but only for a few are probabilities high enough so that they can actually be used.” (Chomsky, 2005, p. 9).
So not only can the theory can be adapted to handle probabilistic data; probability now assumes a key role, as it is the factor that decides which grammar will be adopted at any given point in development.  But while I am pleased to see the probabilistic nature of children’s grammatical structures acknowledged, I still have problems with this account:
First, it is left unclear why a child opts for one version of the grammar at time 1 and another at time 2, then back to the first version at time 3. If we want an account that is explanatory rather than merely descriptive, then non-deterministic behaviour needs explaining. It could reflect the behaviour of a system that is rule-governed but is affected by noise or it could be a case of different options being selected according to other local constraints. What seems less plausible –though not impossible -  is a system that flips from one state to another with a given probability. In a similar vein,  if a grammar has an optional setting on a parameter, just what does that mean?  Is there a random generator somewhere in the system that determines on a moment-by-moment basis what is produced,  or are there local factors that constrain which version is preferred?
Second, this account ignores the fact that early usage of certain constructions is influenced by the lexical items involved (Tomasello, 2006), raising questions about just how abstract the syntax is.
Third, I see a clear distinction between saying that a child has the potential to learn any grammar, and saying that the child has available all grammars from the outset, “with all parameters valued”. I’m happy to agree with the former claim (which, indeed, has to be true, for any typically-developing child), but the latter seems to fly in the face of evidence that the infant brain is very different from the adult brain, in terms of number of neurons, proportion of grey and white matter, and connectivity.  It’s hard to imagine what the neural correlate of a “valued parameter” would be. If the “full array of languages” is already available in the neonate, then how is it that a young child can suffer damage to a large section of the left cerebral hemisphere without necessarily disturbing the ultimate level of language ability (Bishop, 1988)?
Are there things that only a Chomskyan account can explain?
Progress, of course, is most likely when people do disagree, and I suspect that some of the psychological work on language acquisition might not have happened if people hadn’t taken issue with being told that such-and-such a phenomenon proves that some aspect of language must be innate.  Let me take three such examples:
1. Optional infinitives.  I remember many years ago hearing Ken Wexler say that children produce utterances such as “him go there”, and arguing that this cannot have been learned from the input and so must be evidence of a grammar with an immature parameter-setting.  However, as Julian Pine pointed out at the same meeting, children do hear sequences such as this in sentences such as “I saw him go there”, and furthermore children’s optional infinitive errors tend to occur most on verbs that occur relatively frequently as infinitives in compound finite constructions (Freudenthal et al., 2010).
2. Fronted interrogative verb auxiliaries. This is a classic case of an aspect of syntax that Chomsky (1971) used as evidence for Poverty of the Stimulus – i.e., the inadequacy of language input to explain language knowledge. Perfors et al (2010) take this example and demonstrate that it is possible to model acquisition without assuming innate syntactic knowledge. I’m sure many readers would take issue with certain assumptions of the modelling, but the important point here is not the detail so much as the demonstration that some assumptions about impossibility of learning are not as watertight as often assumed: a great deal depends on how you conceptualise the learning process.
3. Anaphoric ‘one’. Lidz et al (2003) argued that toddlers aged around 18 months manage to work out the antecedent of the anaphoric pronoun “one” (e.g. “Here’s a yellow bottle. Can you see another one?”), even though there was insufficient evidence in their language input to disambiguate this. The key issue is whether “another one” is taken to mean the whole noun phrase, “yellow bottle”,  or just its head, “bottle”. Lidz et al note that in the adult grammar the element “one” typically refers to the whole constituent “yellow bottle”. To study knowledge of this aspect of syntax in infants, they used preferential looking: infants were first introduced to a phrase such as “Look! A yellow bottle”. They were then presented with two objects: one described by the same adjective+noun combination (e.g. another yellow bottle), and one with the same noun and a different adjective (e.g. a blue bottle).  Crucially, Lidz et al claimed that 18-month-olds would look significantly more often to the yellow (rather than blue) bottle when asked “Do you see another one?”, i.e., treating “one” as referring to the whole noun phrase, just like adults. This was not due to any general response bias, because they showed the opposite bias (preference for the novel item) if asked a control question “What do you see now?” In addition Lidz et al analysed data from the CHILDES database and concluded that, although adults often used the phrase “another one” when talking to young children, this was seldom in contexts that disambiguated its reference.
This study stimulated a range of responses from researchers who suggested alternative explanations; I won’t go into these here, as they are clearly described by Lidz and Waxman (2004), who go carefully through each one presenting arguments against it. This is another example of the kind of work I like – it’s how science should proceed, with claim and counter-claim being tested until we arrive at a resolution. But is the answer clear?
My first reaction to the original study was simply that I’d like to see it replicated: eleven children per group is a small sample size for a preferential looking study, and does not seem a sufficiently firm foundation on which to base the strong conclusion that children know things about syntax that they could not have learned. But my second reaction is that, even if this replicates, I would not find the evidence for innate knowledge of grammar convincing. Again, things look different if you go beyond syntax. Suppose, for instance, the child interprets “another one” to mean “more”. There is reason to suspect this may occur, because in the same CHILDES corpora used by Lidz, there are examples of the child saying things like “another one book”.   
On this interpretation, the Lidz task would still pose a challenge, as the child has to decide whether to treat “another one” as referring to the specific object (“yellow bottle”), or the class of objects (“bottle”). If the former is correct, then they should prefer the yellow bottle. If the latter, then there’d be no preference. If uncertain, we’d expect a mixture of responses, somewhere between these options. So what was actually found?  As noted above, children given the control sentence “What do you see now?” there was a slight bias to pick the new item and so the old item (yellow bottle) was looked at for only an average of 43% of the time (SD = 0.052). For children asked the key question: “Do you see another one?” the old item (yellow bottle) was looked at on average 54% of the time (SD = 0.067). The difference between the two instruction types is large in statistical terms (Cohen’s d = 1.94), but the bias away from chance is fairly modest in both cases.  If I’m right and syntax not the most crucial factor for determining responses, then we might find that the specific test items would affect performance:  e.g., a complex noun phrase that describes a stable entity (e.g. a yellow bottle) might be more likely to be selected for “another one”  than an object in a transient state (e.g. a happy boy). [N.B. My thanks to Jeffrey Lidz who kindly provided raw data that are the basis of the results presented above].
Points of agreement – and disagreement – between generative linguists and others
The comments I have received give me hope that there may be more convergence of views between Chomskyans and those modelling language acquisition than I had originally thought. The debate between  connectionist ‘bottom up’ and Bayesian ‘top down’ approaches to modelling language acquisition highlighted by Jeff Bowers (4th Sept) and described by Perfors et al (2011) gets back to basic issues about how far we need a priori abstract symbolic structures, and how far these can be constructed from patterned input. I emphasise again that I would not advocate treating the child as a blank slate. Of course, there need to be constraints affecting what is attended to and what computations are conducted on input. I don’t see it as an either (bottom up)/or (top down) problem.  The key questions have to do with what top-down constraints are and how domain-specific they need to be, and just how far one can go with quite minimal prior specification of structure.
I see these as empirical questions whose answers need to take into account (a) experimental studies of child language acquisition and (b) formal modelling of language acquisition using naturalistic corpora as well as (c) the phenomena described by generative linguists, including intuitive judgements about grammaticality etc. 
I appreciate the patience of David Adjer (Sept 11th) in trying to argue for more of a dialogue between generative linguists and those adopting non-Chomskyan approaches to modelling child language.  Anon (Sept 4th) has also shown a willingness to engage that gives me hope that links may be forged between those working in the classic generative tradition and others who attempt to model language development.  I was pleased to be nudged by Anon (4th Sept) into reading Becker et al (2011), and agree it is an example of the kind of work that is needed: looking systematically at known factors that might account for observed biases, and pushing to see just how much these could explain. It illustrates clearly that there are generative linguists whose work is relevant for statistical learning. I still think, though, that we need to be cautious in concluding there are innate biases, especially when the data come from adults, whose biases could be learned. There are always possible factors that weren’t controlled – e.g. in this case I wondered in this case about age of acquisition effects (cf. data from a very different kind of task by Garlock et al, 2001).  But overall, work like this offers reassurance that not all generative linguists live in a Chomskyan silo - and if I implied that they did, I apologise.
When Chomsky first wrote on this topic, we did not have either the corpora or the computer technology to simulate naturalistic language learning. It still remains a daunting task, but I am impressed at what has been achieved so far.  I remain of the view that the task of understanding language acquisition has been made unduly difficult by adopting a conceptualisation of what is learned that focuses on syntax as a formal system that is learned in isolation of context and meaning.  Like Edelman and Waterfall (2007) I also suspect that obstacles have been created by the need to develop a ‘beautiful’ theory, i.e. one that is simple and elegant in accounting for linguistic phenomena. My own prediction is that any explanatorily adequate account of language acquisition will be an ugly construction, cobbled together from  bits and pieces of cognition, and combining information from many different levels of processing. The test will ultimately be if we can devise a model that can predict empirical data from child language acquisition. I probably won’t live long enough, though, to see it solved.
References
Becker, M., Ketrez, N., & Nevins, A. (2011). The surfeit of the stimulus: Analytic biases filter lexical statistics in Turkish laryngeal alternations. Language, 87(1), 84-125.
Bernal, S., Dehaene-Lambertz, G., Millotte, S., & Christophe, A. (2010). Two-year-olds compute syntactic structure on-line. Developmental Science, 13(1), 69-76. doi: 10.1111/j.1467-7687.2009.00865.x
Bishop, D. V. M. (1988). Language development after focal brain damage. In D. V. M. Bishop & K. Mogford (Eds.), Language development in exceptional circumstances (pp. 203-219). Edinburgh: Churchill Livingstone.
Chomsky, N. (2005). Three factors in language design. Linguistic Inquiry, 36(1), 1-22.
Edelman, S., & Waterfall, H. (2007). Behavioral and computational aspects of language and its acquisition. Physics of Life Reviews, 4, 253-277.
Freudenthal, D., Pine, J., & Gobet, F. (2010). Explaining quantitative variation in the rate of Optional Infinitive errors across languages: A comparison of MOSAIC and the Variational Learning Model. Journal of Child Language, 37(3), 643-669. doi: 10.1017/s0305000909990523
Garlock, V. M., Walley, A. C., & Metsala, J. L. (2001). Age-of-acquisition, word frequency   and neighborhood density effects on spoken word recognition: Implications for the development of phoneme awareness and early reading ability. Journal of Memory and  Language, 45, 468-492.
Lidz, J., Waxman, S., & Freedman, J. (2003). What infants know about syntax but couldn't have learned: experimental evidence for syntactic structure at 18 months. Cognition, 89(3), 295-303.
Lidz, J., & Waxman, S. (2004). Reaffirming the poverty of the stimulus argument: a reply to the replies. Cognition, 93, 157-165. 
Perfors, A., Tenenbaum, J. B., & Regier, T. (2011). The learnability of abstract syntactic principles. Cognition, 118(3), 306-338. doi: 10.1016/j.cognition.2010.11.001
Perfors, A., Tenebaum, J. B., & Wonnacott, E. (2010). Variability, negative evidence, and the acquisition of verb argument constructions. Journal of Child Language, 37(3), 607-642. doi: http://dx.doi.org/10.1017/S0305000910000012
Perfors, A., Tenenbaum, J. B., & Regier, T. (2011). The learnability of abstract syntactic principles. Cognition, 118(3), 306-338. doi: 10.1016/j.cognition.2010.11.001
Redington, M., Chater, N., & Finch, S. (1998). Distributional information: A powerful cue for acquiring syntactic categories. Cognitive Science, 22(4), 425-469.
Saffran, J. R., Aslin, R. N., & Newport, E. L. (1996). Statistical learning by 8-month-old infants. Science, 274(5294), 1926-1928.
Seidenberg, M. S., & MacDonald, M. C. (1999). A probabilistic constraints approach to language acquisition and processing. Cognitive Science, 23(4), 569-588.

Tomasello, M. (2006). Acquiring linguistic constructions. In R. Siegler & D. Kuhn (Eds.), Handbook of child psychology (pp. 1-48): Oxford University Press.
 
P.S. 15th October 2012
I have added some links to the response of 12th October. In addition, I have discovered this book, which gives an excellent account of generative vs. constructivist approaches to language acquisition:
Ambridge, B., & Lieven, E. V. M. (2011). Child Language Acquisition - Contrasting Theoretical Approaches: Cambridge University Press.