How We Hear Language: Representation and Direct Perception

  • The following is the first finished draft of the literature review for my Master’s thesis English Front Vowel Perception by Korean University and Elementary EFL Students: The Role of Syllable Codas and Foreign Living Experience. It changed in certain important ways after this, but I found the collection of information gathered here to be helpful.


And so science began – from SMBC

Knowledge and how we come to know has been a preoccupation of philosophy and science for centuries. The field of epistemology is entirely dedicated to this pursuit and still examines and re-examines what it means to know and if it is possible at all. A key tool, and maybe the most often pursued, is the knowledge that is available from our senses.

Speech perception clearly falls into the category of sense knowledge, being a specific form of sound sense. And while sound has often been given second-hand status by philosophers and scientists who more often build theories on the foundation of vision, many have begun to examine the specifics of sound sense itself and specifically the sensing of speech sounds (e.g. Nudds & O’Callaghan, 2009). Speech occupies an important place in the science of perception, language and knowledge because it has seemed to be unique to humans. Research into whether or not humans uniquely perceive speech and if so, how humans perceive speech differently has been approached from various perspectives.

More specific still, how different humans perceive speech sounds is another area for which our theories of perception must account. I hear and understand English, but not when I hear Mandarin; though I do, crucially, identify it as speech in a way that a Dolphin’s call is not. However do I, an English dominant speaker, hear Mandarin the same way a dominant Mandarin speaker does? And if not, why not? It is these types of questions that researchers in the field of second language speech perception (L2SP) have engaged.

While there are many divisions between the philosophical theories of sound and speech perception, the science of speech perception has been dedicated to the idea of realism, or that there really are objects in the world that are perceived by our senses. For science and the naturalistic worldview realism is assumed, but how the ear and brain mediate the sounds we hear in the world is a major area of contention. In L2SP research, realism is approached from two sides, indirect realism and direct realism. The purpose of this review then is to examine the theories of indirect and direct speech perception and their empirical support. To orient and focus ourselves further, our specific question is: How do L2 language learners in foreign language learning (FLL) contexts perceive L2 speech sounds?

In order to situate our question, I will focus on two influential theories of L2SP; Flege’s (1995) indirect realist Speech Learning Model (SLM) and Best’s (1995) direct realist Perceptual Assimilation Model (PAM) along with its theoretical extension to L2SP, PAM-L2 (Best and Tyler, 2007). First we will compare the theoretical similarities and contrasts between them, focusing on categorization of sound and perceptual primitives. Then we will give a review of the experimental support for each. Finally, this review will examine each model in light of classroom-based L2 learning and the implications for future research.

Some Assumptions: Categories and attunement

A basic tenet of much of modern cognitive science is that the world outside of our minds is much richer than our senses can perceive. We only see light in the color spectrum, but a certain amount of light is required before our eyes even recognize colors (Hunt, 1952). Analogously, our ears hear sound as disturbances in the air (i.e. the wave theory of perception, O’Shaughnessy, 2009), but our ears cannot pick up every sound, and sounds must disturb a certain amount of air around our ears before we hear it. This basic biological observation is in part the reason representationalist theories of perception have described the sound input in speech as poor. This impoverished input must then be enriched cognitively in our minds through categorical mental representations.

A dinosaur will now explain the poverty of the stimulus

Flege’s (1995) SLM is built around the assumption that when a child is learning their first language (L1), invariant information is stored overtime in long-term memory as “category representations called phonetic categories” (p. 239). Crucially, and in contrast with other representationalist accounts (e.g. Newport, 1990), SLM also argues that the ability of the brain to create and store categorical mental representations continues throughout a person’s life. Flege (2003) describes the development of perceptual categories as being language specific and well established by the time a child is five or six years old. An L2 user older than five or six then will filter L2 speech through the mental categorical representations of their L1. However, the mechanisms through which all speech is perceived is still intact for the L2 learner. Over time then, while the L1 categories continue to filter and possibly block new sound category formation (Bosch et al., 2000), the L2 user can become perceptually aware of L2 sound categories not in their L1.

A key hypothesis of the SLM revolves around this idea of mental category formation. Difficulty in perceiving L2 speech is related to how similar those sounds are to previously established speech categories. From a SLM perspective then, L2 sounds that are very different from any L1 sound are easier to perceive and more likely to undergo mental category formation than L2 sounds for which an existing L1 sound category seems to fit, due to an “equivalence classification” mechanism (Flege, 1995, p. 239). As an example, Korean learners of English (KE) may perceive differences between English /b/ and /f/, even though /f/ does not exist in Korean, better than /b/ and /d/ because /f/ represents a new sound category involving phonetic details not used in their L1 (i.e. labiodental). /d/ however, contrasts with /b/ only in voicing, an aspect of Korean that is allophonic, not phonemic.

Finally, the assumption of categorical mental representation for the SLM has led to a further divergence from other representationalist accounts in that the SLM postulates that bilinguals (meaning a person who uses two languages, but not necessarily at equal levels) have a “shared phonological space” (Flege, 1995, p. 242) of their mental representations. This contrasts with modular representationalist accounts of language and speech perception, where L2 sounds are stored and represented separately than L1 sounds (e.g. Norris, McQueen & Cutler, 2000).

Direct Realism, PAM-L2 and Attunement

In contrast to the indirect assumption of mental categories, direct realist accounts like Best and Tyler’s (2007) PAM-L2 reject category formation as a property of perception. Following the ecological perspective of Gibson and Gibson (1955) and the perceptual learning model of Gibson (1963), PAM-L2 is centered on the assumption of perceptual attunement. In contrast to the indirect assumption that the input in the world is impoverished, PAM-L2 claims that the sensory information from the world is rich, reliable and importantly multimodal. The sound sense information is crucially also part of the visual sense information, if available to the perceiver.

Perceivers from this perspective come to perceive sense information through active exploration of their environment which attunes their perceptual system (not just the sound or visual system). In this way, both PAM-L2 and SLM argue against maturational constraints for speech perception. PAM-L2 holds that the attunement of the perceptual system is on-going throughout a person’s life. This active exploration is an important theoretical component of ecological direct realism. If people simply become perceptually attuned throughout their life through exposure to the sensory information around them, then we would expect, for example, that L2 learners who live in the L2 community to have somewhat uniform perceptual development. However, this is not the case (Tsukada, Birdsong, Bialystok, Mack, Sung, & Flege, 2005). Active exploration means that the perceiver is attentionally focused by perceptual, linguistic or non-linguistic goals. Perceivers may at any moment attend to different levels of focus and as such their perceptual attunement may develop differently or may stop developing if the purposes of perception have been achieved. Perception for PAM-L2 then is not simply a passive process that reorganizes our brains, but is an active interaction between the goals of the listener and the sensory information available.

L2 perception from a direct realist perspective then is the relationship between the attunement of the listener (i.e. their past experience) and how the L2 sense information either matches or mismatches with that attunement. Additionally, how the listener attends to information and for what purposes will also influence their perception. In the case of our hypothetical KE example above, PAM-L2 would describe the accurate perception of /f/ as a result of its newness to the KEs perceptual attunement. /d/ on the other hand has been dealt with by the KEs perceptual attunement in the past by attending to invariant information perhaps unrelated to voicing. As such, their poorer perception of /d/ compared with L1 English listeners would be a result of different attentional attunement.

For the L2 learner in an FLL context then, both SLM and PAM-L2 would suggest that either developing new mental categories or perceptual attunement would be more difficult without some way to consistently interact with the L2. Additionally, L2 sensory information will likely be in some ways different from so-called native speakers of the L2. In the case of English in the Korean context, we can expect the KE to interact with many L2 English speakers, each of which produce English sense information differently. The perceptual attunement of such a learner is likely to reflect the sense information around them.

Perceptual primitives

To this point we have been using more general terms to discuss both the attunement and mental categorization processes of the SLM and PAM-L2 models as being speech sound categorization and attunement. However, important distinctions are made between both models about what exactly is in the sense information present in the environment that our perceptual systems attend to. For the SLM, the perceptual primitives are the phonetic details (i.e. formant and temporal cues) in speech. In contrast, PAM-L2 takes the articulatory gestures produced by a speaker’s vocal tract as primitive. Importantly, both models reject the phoneme as the object of perception, though the SLM would argue that the mental categories in the mind are phonemes that are activated through phonetic perception.

Get those crazy symbols outta my fundamental units.

Like the SLM, the majority of L2SP research either knowingly or unknowingly assume acoustic phonetic cues as the basis for phonemic contrasts (e.g. bat / cat). The SLM does not provide motivations for assuming acoustic phonetic features as its perceptual primitive, but it seems likely that the SLM assumes philosophical wave theory of sound perception (O’Shaughnessy, 2009) and a mechanical source-filter theory (Fant, 1960). The event of hearing a sound is the result of some source creating an energy disturbance in the air which moves until it interacts with a perceiver, such as the human ear. Wave theory then, is a mechanical description of perception which leads researchers to determine the atoms of sound perception via reduction, or breaking up the sound wave into smaller components. Acoustically, speech is not distinct from other types of sound, in that it is made up of formant and temporal cues.

Using acoustic phonetic features as a perceptual primitive situates the SLM solidly within a much larger body of research and provides continuity between L2SP research and other pursuits such as general phonetics and acoustics. In this psychoacoustic view of perception, the formant and temporal cues are in themselves meaningless features of the wave or source. Over time, speakers come to associate specific cues (such as vowel height, or F1 cues) with specific phonemes (or feature bundles). This association allows perceivers to ignore most of the information in the incoming sounds, but still reconstruct the phoneme by paying close attention to specific cues in the sounds. These phonemes with their associated cues are stored in long-term memory as traces. SLM builds off the work in acoustic phonetics (e.g. Klatt, 1979) that established theoretical and methodological tools for assessing the acoustics of speech into visual spectrograms, which measure the air pressure at different auditory levels (hertz). On a spectrogram, from 0 to 1000Hz is commonly understood as the first formant or F1, from 1000-2000Hz F2 and so on. Correlations have been discovered between formants and the sounds that produced them (e.g. high vowels like /i/ or /u/ have low F1s, 300-1000Hz). The higher the tongue is placed when producing a high vowel like /i/, the lower the F1 will be marked on the spectrogram.

Spectrogram for “Led, Red, Wed and Yell”. Numbers down the left side are the Hz and on the bottom are duration.

Articulatory Gestures

PAM-L2 and direct realism reject the source-filter theory as it relates to acoustic features as the fundamental unit of perception. Instead, PAM-L2 asserts that distal articulatory gestures are the perceptual primitives. Articulatory gestures are the “phonemes” of the theory of Articulatory Phonology (AP) (Fowler and Goldstein, 2003). According to AP, articulatory gestures are the ways the vocal tract changes or stays the same in order to produce speech sounds. There are six distinct articulators: lips, tongue tip, tongue body, tongue root, velum and larynx. Importantly, these six articulators are independent of each other, even though they are all connected in some ways (e.g. tongue body and tip). They are independent in the sense that each can constrict the vocal tract without forcing another articulator to also constrict. In AP, articulators are coordinated in a dynamic system (Saltzman, 1995) in order to produce coarticulated speech sounds and not segmented, linear sequences of sounds. This can be visualized in what is called a gestural score which shows the different articulators involved in a word and for how long they constrict the articulatory tract.


The use of articulatory gestures as the basic combinatorial system of speech production and perception allows researchers to overcome the need to posit both a public (phonetics) and private (phonology) system. Instead of phonemes as the fundamental combinatory unit, which is unobservable and therefore necessarily a construct of the mind, AP provides a way to describe the private through the public (or to do phonology through phonetics). For PAM-L2 and the direct realist approach to L2SP, this is a necessary first step.

The next aspect, distal, refers to the sources of the articulatory gestures, which is the vocal tract of the person who produced them. This contrasts with SLMs acoustic cues which are proximal sources, or the interaction of sound waves at the ear of the perceiver and the cognitive enhancement of the input in the mind of the perceiver. However, PAM-L2 does not argue against the reality of sound waves, only that they carry the distal information of the source’s vocal tract gestures.

However, most adult L1 speakers do not attend to the specific gestures of the vocal tract during perception. Ecological direct realism argues that perception is goal-oriented and that objects are perceived for their potential to achieve those goals. Young children then, when learning their L1 attend to the language around them “in terms of its linguistic affordances—communicative goals” (Best, 1995, p. 178). As a learner becomes more attuned to the speech signal, they attend to higher-order invariants in the gestural constellation (or all the multimodal features of the source).

For the L2 learner in an FLL context, they are likely to have wide experience practicing phonemic contrasts and may even have some metalinguistic knowledge of the L2 phonemic system. In this way, the principled way the SLM explains the difficulty L2 learners may have with specific phonemic contrasts in terms of their similarity or newness may be insightful and helpful. However, intense focus on phonemes as the fundamental unit of speech perception may not lead to the development of better perception. For its part, PAM-L2’s goal-oriented description of speech perception as the perception of linguistic affordances supports functionalist and communicative approaches to L2 learning. Additionally, the focus on multimodal perception and the importance of linguistic and non-linguistic context in aiding perception is both helpful and possibly a difficulty for learners without easy access to more expert L2 speakers.

Philosophical grounding Direct Realism Indirect Realism, Representationalism
Theoretical grounding Articulatory Phonology Source-Filter Theory


Perceptual attunement Mental category representation
Fundamental unit Distal articulatory gestures Proximal acoustic cues

Adapted from Best (1995)

Predictions and Evidence

“New vs. similar” and assimilation types

In presenting the SLM, Flege (1995) discusses four postulates, which have been described in some fashion above, and seven hypotheses related to those postulates. Most of the hypotheses relate to how advanced L2 learners perceive individual phonemic segments. Hypothesis 1 and 7 (H1, H7) claim that categorical phonemic perception is built up from context-dependent allophonic perception of acoustic cues. The other claims (H2-H6) make predictions related to how L2 perceivers will categorize the L2 sounds they hear.

As mentioned above, if the L2 sound is sufficiently different from any of the perceiver’s L1 sounds, SLM predicts that they will be able to perceive that sound well and that it is more likely to undergo category formation. These are described as “new” sounds (Flege, 1995, p. 240). L2 sounds that are similar to L1 sounds are likely instead to be associated with the pre-existing L1 phonemic category. If an L2 sound is categorized as similar to L1 sounds, then SLM predicts that categorical perception of minimally contrasting words (e.g. bit / beat) will be difficult. Additionally, this difficulty is context dependent, such that some sounds may be more perceivable in syllabic onsets, rimes or medially.

Evidence for speech perception being based on the phonetic and allophonic features of phonemes and not phonemes themselves comes from studies such as Strange (1992) where Japanese English  language users (JE) more accurately perceive English liquids (/l/ and /r/) in word-final position than word-initially. Takagi (1993) examined JEs categorization of English nonsense words by having them transcribe auditory examples into Japanese Katakana symbols. Word initial English liquids were uniformly transcribed by the participants, but word-final /l/ was transcribed as ‘ru’ whereas /r/ was transcribed as ‘a’. This suggests that the similarity of contrastive English liquids is not sufficiently distinct for JEs to categorically identify English /r/ in word-final position.

Flege, Bohn and Jang (1997) examined participants from four language groups’ (German, Spanish, Mandarin and Korean) ability to perceive and produce tense/lax English front vowels (/i/, /ɪ/, /ɛ/, /æ/). Per the SLM, German users had no difficulty perceiving /i/ and /ɪ/ since German has both phonemic categories. In contrast, the other language users had difficulties with this contrast, presumably because their mental representations associated both /i/ and /ɪ/ as instances of /i/. For /ɛ/ and /æ/, all participant groups had difficulty perceiving and producing this contrast. Korean users in particular exhibited a cross phenomenon, meaning they produce /ɛ/ words such as ‘bet’ more like ‘bat’ and /æ/ words like ‘bat’ more like ‘bet’.

An additional observation by Flege et al. (1997) is that the ability to perceive a phonemic category is related to the ability to also produce that phoneme. Flege, MacKay and Meador (1999) found a similar result with Italian speakers and Tsukada, et al. (2005) found KEs in Canada to be able to produce English vowels as a result of their age of arrival in Canada and their ability to also perceive those same English vowels. Young KEs who had lived in Canada for about three years were found to be undistinguishable in both perception and production of English vowels from native Canadians. This suggests that mental category formation is a function of both experience and perception and that the ability to regularly produce those same categories depends on their formation through perception first.

Turning now to PAM-L2, much of what has been described by SLM is also described by PAM-L2. That is, in experimental research, participants are asked to categorize and discriminate L2 sounds and their ability to do so places them in specific categories related to their perceptual attunement. PAM-L2 makes specific hypotheses about which L2 users will categorizes specific L2 sounds based on their relationship to the attunement of the L2 users L1. L2 sounds will fall into one of three categories: Assimilated to a native category, assimilated as uncategorizable speech sound and not assimilated to speech (nonspeech sound). Within these three categories, there are at least six different types of categorization possible. A two-category assimilation (TC) refers two L2 sounds that minimally contrast (such as English /i/ and /ɪ/) that are assimilated to two separate native categories (such as the German users from Flege et al., 1997). For TC, perception is expected to be native-like. Category-goodness difference (CG) perception refers to L2 sounds that are assimilated to one native category, but are not equally perceived as good examples of that category. Examples of CG perception may be the Korean users in Flege et al. (1997) perception of English /i/ and /ɪ/, where /ɪ/ in bit was more likely to be correctly perceived and produced as /ɪ/ than the /i/ of beat.

Single-category assimilation (SC) refers to two L2 sounds (such as /i/ and /ɪ/) being assimilated to a single native category. That is, participants judge both sounds as equally good or poor examples of a single category in their L1. Perception of these L2 sounds is expected to be the poorest. Best, Faber and Levitt (1996) for example, found that American participants’ perception of Norwegian high front vowels, which contrast minimally by roundness, to be equally good examples of American English high front vowels and were perceived poorly in a discrimination task.

Two types of categorization in PAM-L2 related to L2 sounds that are uncategorizable, meaning they do not seem to L2 learners to be good examples of any of their L1 sounds or what SLM would describe as new sounds. If two contrasting L2 sounds are found to be both uncategorizable (UU) discrimination of those sounds is expected to range from poor to moderate as a function of their “proximity to each other and to native categories” (Best, 1995, p. 195). UU contrasts with uncategorized versus categorized (UC) types in which only one of the sounds is perceived as outside the person’s L1 attuned perceptual system. In UC cases, perception is predicted to be good. Finally, Nonassimilable (NA) perception occurs when both sounds are perceived as non-speech sounds and perception is expected to be very good.

Some researchers reject NA as a reasonable hypothesis in speech discrimination tasks, because invariably, researchers are interested in presenting to participants minimally contrasting speech sounds in some language to L2 learners of that language (Kingston, 2003). However, this category is proposed as the result of English-speaking participants’ categorization and perception of Zulu click consonants (Best, McRoberts & Sithole, 1998).

Experiments in the foreign language learning context

Initially, both SLM and PAM-L2 were not designed to explain the categorization or attunement patterns of L2 language learners in FLL contexts. They were both interested and tested on either naïve L2 perceivers or L2 users who lived in the L2 context (such as immigrants). Researchers more interested in classroom oriented language learning however have attempted to use both models to explain the outcomes of their students.

Romanelli and Menegotto (2015) examined the perceptual abilities of American Spanish language learners after a three week course focused on teaching Spanish phonemic contrasts. They found that after the course, participants could both perceive and produce the acoustic features of the Spanish /a, e, o/ vowels similarly to native Spanish speakers. However, stress perception (such as toma vs tomá) was found to be poor, even after the course. SLM does not make predictions based on suprasegmental features, but does suggest that they can have an effect as they related to the allphonic variation of a language’s phonemic categories. PAM-L2 in contrast would describe stress as a higher-order invariant of a user’s perceptual attunement which would lead American L2 Spanish learners to categorize stressed vowels as SC-type assimilations.

Linebaugh and Roche (2015) also challenge the assumption that perception precedes production. Two groups of Arabic L2 English students were given two types of instruction. The first group was given articulatory instruction along with perceptual exposure as a guide to producing English vowels and some consonants. The second group was given only exposure through listening discrimination learning tasks. They found that the group with articulatory instruction had bigger gains over a pre-task test in perception of English speech sounds than did the group that only had exposure. The authors take this result as a rejection of the hypothesis of SLM, and by extension, some aspects of PAM-L2, that perception necessarily leads production. However, it is not clear exactly that both Linebaugh and Rocher (2015) have demonstrated perception does not necessarily precede production, because both groups still received input. At best, we might be able to say that focusing learner attention with metalinguistic strategies helps accurate perception.

Chan (2013) examined the perceptual assimilation strategies of forty Cantonese speaking students in Hong Kong. As perceptual goals, participants were asked to discriminate four long and short (or commonly understood as tense and lax) English vowel pairs. Most vowel pairs were found to be SC type assimilation patterns where both English vowels appeared to the participants to be examples of a single Cantonese vowel, except in the case of /ɛ/ and /æ/, which was perceived to be instances of multiple Cantonese vowels. Chan (2013) describes this as a CG assimilation. PAM-L2 predicts that SC assimilations should be perceived more poorly than CG assimilations, therefore Cantonese speakers should perceive /ɛ/ and /æ/ worse than the other tense/lax vowels in this study. However, Chan (2013) did not find this result.

Experiments with FLL L2 learners then, has challenged both the SLM and PAM-L2 in their ability to account for the experience of classroom-based language learning outside the L2 community. Classroom language learners may be able to produce L2 sounds before they can consistently perceive them and their perception may not follow what has been identified as typical of L2 learners in L2 communities.


This theoretical review attempted to understand and describe the primary differences and similarities between two major models of L2SP, Flege’s (1995) SLM and Best and Tyler’s (2007) PAM-L2. We found that beyond the superficial similarities related to experimental predictions and methodologies, each model is rooted in very different philosophical and scientific assumptions about how people interact with sound objects in the world. However, the extention of both models to FLL contexts has been experimentally indeterminate until this point. And it appears that both models have experimental difficulty explaining the perceptual abilities of L2 learners who primarily learn the language via classroom instruction.

For their part, Best and Tyler (2007) acknowledge the inherent difficulties in capturing both FLL and L2-immersion types of L2 perception and learning. However, if our models of L2SP cannot account for the real experiences of FLL learners and their eventual linguistic ability, new models may be needed or updated versions of current models.

Some Future Questions                                                                                               

However, while current research has demonstrated some weakness of our current L2SP models, the extension of these models to FLL is a very recent experimental pursuit. Further research could focus on additional learner types besides university students (such as children or primary school-aged learners) and methodological variations such as variations in the speakers in discrimination tasks and multimodal affordances such as visual presentation of the speakers in tasks.

Finally, beyond classroom-based learners, returning L2-immersion learners such as students who participate in language-exchange programs in high school or who study in L2 university settings and then return to their L1 community represent an unexplored area of research. It is likely that participants of this type have actually been in some of the research to this point but their effect has been unexamined in either direct or indirect theories.


Acoustic Analysis of Consonants. (2016). Retrieved 5 May 2016, from chapter_3_consonants_new.htm

Best, C. T. (1995). A direct-realist view of cross-language speech perception. In W. Strange (Ed.), Speech perception and linguistic experience: Issues in cross-language research (171-204). Timonium, MD: York Press.

Best, C. T., Faber, A., & Levitt, A. (1996). Assimilation of non‐native vowel contrasts to the American English vowel system. The Journal of the Acoustical Society of America99(4), 2602-2603.

Best, C. T., McRoberts, G. W., & Sithole, N. N. (1988). The phonological basis of perceptual loss for non-native contrasts: Maintenance of discrimination among Zulu clicks by English speaking adults and infants. Journal of Experimental Psychology: Human Perception and Performance 14, 345-360.

Best, C. T., & Tyler, M. D. (2007). Nonnative and second-language speech perception: Commonalities and complementarities. In M.J. Munro & O.-S. Bohn (eds) Second language speech learning: The role of language experience in speech perception and production. Amsterdam: John Benjamins.

Bosch, L., Costa, A., & Sebastián-Gallés, N. (2000). First and second language vowel perception in early bilinguals. European Journal of Cognitive Psychology12(2), 189-221.

Chan, A. Y. (2013). The discrimination of English vowels by Cantonese ESL learners in Hong Kong: a test of the perceptual assimilation model. Open journal of modern linguistics3(03), 182.

Klatt, D. H. (1979). Speech perception: A model of acoustic-phonetic analysis and lexical access. Journal of phonetics7(3), 279-312.

Fant, G. (1960). Acoustic theory of speech production. The Hague: Mouton.

Flege, J. (2003). Assessing constraints on second-language segmental production and perception. In N. O Schiller & A. Meyer (Eds.) Phonetics and phonology in language comprehension and production: Differences and similarities (319-355). Berlin: De Gruyter.

—. (2008). Give input a chance! In T. Piske and Young-Scholten, M. (Eds) Input Matters in SLA.  Bristol: Multilingual Matters, 175-190.

—. (1995). Second language speech learning: Theory, findings, and problems. In W. Strange (Ed.), Speech perception and linguistic experience: Issues in cross-language research (233-276). Timonium, MD: York Press.

Flege, J. E., Bohn, O. S., & Jang, S. (1997). Effects of experience on non-native speakers’ production and perception of English vowels. Journal of phonetics25(4), 437-470.

Flege, J., MacKay, I. & Meador, D. (1999) Native Italian speakers’ production and perception of English vowels. Journal of Acoustical Society of America, 106(2973-2987).

Gibson, E. J. (1963). Perceptual learning. Annual review of psychology, 14(1), 29-56.

Goldstein, L. & Fowler, C. (2003). Articulatory Phonology: A phonology for public language use. In N. O Schiller & A. Meyer (Eds.) Phonetics and phonology in language comprehension and production: Differences and similarities. Berlin: De Gruyter.

Hubble, Edwin (May 1929). The exploration of space. Harper’s Magazine 158: 732.

Huckvale, M. (2016). Variation With Context. Retrieved 5 May 2016, from

Hunt, R. W. G. (1952). Light and Dark Adaptation and the Perception of Color*. Journal of the Optical Society of America42(3), 190-199.

Kingston, J. (2003). Learning foreign vowels. Language and Speech, 46(2-3), 295-348.

Linebaugh, G., & Roche T. (2015). Evidence that L2 production training can enhance perception. Journal of Academic Language & Learning, 9(1), 1-17.

Newport, E. L. (1990). Maturational constraints on language learning. Cognitive science14(1), 11-28.

Norris, D., McQueen, J. M., & Cutler, A. (2000). Merging information in speech recognition: Feedback is never necessary. Behavioral and Brain Sciences23(03), 299-325.

Nudds, M., & O’Callaghan, C. (Eds.). (2009). Sounds and perception: New philosophical essays. Oxford: Oxford University Press.

O’Shaughnessy, B. (2009). The location of a perceived sound. In M. Nudds & C. O’Callaghan (Eds.) Sounds and perception: New philosophical essays. Oxford: Oxford University Press.

Romanelli, S., & Menegotto, A. C. (2015). English speakers learning Spanish: Perception issues regarding vowels and stress. Journal of Language Teaching and Research6(1), 30-42.

Saltzman, E. (1995). Dynamics and coordinate systems in skilled sensorimotor activity. In R. Port & T. van Gelder (Eds.), Mind as motion: Explorations in the dynamics of cognition, 150-173. Cambridge, MA: MIT Press.

Strange, W. (1992). Learning non-native phoneme contrasts: Interactions among subject, stimulus, and task variables. In E. Tohkura, E. Vatikiotis-Bateson & Y. Sagisaka (Eds.) Speech Perception, Production, alld Linguistic Structure. Tokyo: Ohmsha.

Takagi, N. (1993). Perception of American English /r/ and /l/ by adult Japanese learners of English: A unified view. Unpublished Ph.D dissertation, University of California-Irvine.

Tsukada, K., Birdsong, D., Bialystok, E., Mack, M., Sung, H., & Flege, J. (2005). A developmental study of English vowel production and perception by native Korean adults and children. Journal of Phonetics33(3), 263-290.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.