Speaking

Speaking

Language is the light of the mind. -- John Stuart Mill

In Paris they simply stared when I spoke to them in French; I never did succeed in making those idiots understand their own language. -- Mark Twain The Innocents Abroad

Introduction

speech is to produce words to communicate
words consist of phonemes

the sounds associated with the letters of the alphabet - vowels & consonants
50 phonemes, but not every one is used in every language
e.g., 'l' and 'r' are phonemes used in English ('lip' & 'rip'), but in Mandarin Chinese no words are distinguished by 'l' & 'r'. 'l' is only at word beginning & 'r', at word end

'l' & 'r' are phones - sounds that may or may not convey meaning
'l' & 'n' are the same meaning

e.g., Chinese speaking English will say at night, "Turn out the night"

e.g., English 'pit' & 'spit'. air burst for 'pit' but not 'spit'. in English 'p' in initial but not other word position is aspirated, but the difference in how the 'p' produced does not affect meaning. in Hindi aspirated & nonaspirated 'p' have different meanings

but consider " uh'-uh " & " uh-huh' " or " hmmm "

the elements comprising phones are phonetic features
allophones are distinct phones that are not distinct phonemes

words vary in intonation and stress

In English, intonation & stress convey meanings about phrases or sentences but not words
In other tone languages, e.g., Mandarin Chinese, intonation affects meaning of words as well as of phrases

e.g., " maa " [even intonation] - mother

" see " [even] - thing

" se-e' " [rising] - shit

timing of speech

when we speak more quickly,

durations of vowels are reduced more than durations of consonants
omit all but most critical elements needed to ensure communication

'I do not know' vs. 'I dunno'

flatten intonation

maximum rate is 200 words/minute

@ 7 phonemes per English word 1400 phonemes per minute
but estimated that it takes 0.1 sec to produce single phoneme, so we should be able to produce no more than 10 phonemes/sec 600 phonemes/minute
So how do we talk so fast?

phonemes are not produced in serial fashion. coarticulation improves speed of production

e.g., anticipatory lip rounding of 'construe' or 'tulip'
e.g., prenasalization, opening the velum in anticipation of nasal consonant. e.g., /n/ in freon

prenasalization happens in languages in which nasalization of consonants carries no meaning like in English. In contrast, prenasalization does not occur in French.

requirements of a theory of speech motor control

how all aspects of production are managed
why certain sounds and combinations are used and others are not

'K!' used by K!ung tribe in Africa
but 'snoring' sounds not used in any language
Probably based on biomechanical efficiency or deep grammatical rules

But speech can be produced even when vocal apparatus in different physical configuration

talking with something held between teeth, e.g., pipe or pencil

interaction of speech and hearing

how other's speech discriminated given phonetic features are obscure
how our own speech is regulated as we hear ourself

experiment - delay auditory feedback

affects frequency & amplitude of vibrato in singers if delayed as little as 100 ms
disrupts speaking if delayed as little as 100 ms, can result in stuttering

learning to speak correctly requires hearing speech

deafness after learning to speak results in disrupted speech after a period

The vocal tract and articulatory dynamics

Resonance of air column in vocal tract produces compressions & rarefactions that are sound. Varying the shape of the vocal tract produces different sounds
3 subsystems: respiratory, laryngeal & articulatory
The respiratory system

lungs
diaphragm
intercostal muscles

external - inhalation
internal - exhalation

breathing controlled differently during speaking or singing

skilled singers can control amount & rate in inhalation & exhalation
precise regulation of intercostal muscles in relation to lung expansion & contraction

sustained activation components
transient activation components

not related to syllables as once thought

air pushed through larynx...

Laryngeal mechanisms

larynx - 4 main functions

regulates characteristic pitch of voice

de Vinci discovered in cadavers that mass of vocal cords affects vocal pitch; larger vibrate at lower frequency. enlargement of vocal cords during male puberty
rapid pitch changes effected by changing the stiffness of the vocal cords.

stiffer cords vibrate @ higher frequency

modulates aspiration (e.g., 'p')
allows for whispering
creates voicing, subtle buzzing sound

the difference between 'f' & 'v'
vocal cords are 2 folds lying across the roof of larynx
narrow distance between vocal cords produces voicing, e.g., 'v', 'b'. wide distance eliminates voicing, e.g., 'f', 'p'.

shape of larynx accounts in part for better singing

ability to lower larynx created enlarged cavity that produces extra formant (additional concentration of energy in auditory frequency range) so that voice heard better

Articulatory mechanisms

rich variation in speech sounds accomplished by structures above the larynx: pharynx, mouth, jaws, lips, nasal tract and velum

The pharynx

pharynx shape produces different vowels
vowels produced because mature humans have long necks, low larynxes and large, mobile throats
human infants, pre-human hominids and apes have short necks, high larynx and small immobile throats

cannot produce vowels
but also cannot choke!

Vowels

all vowels voiced
variations in tongue position produce different vowels

lax vowels produced with root of tongue retracted

	front	back
high	"sin"	"book"
	"get"	"luck"
low	"ash"	"caught"

tense vowels have the root of tongue advanced, e.g., 'teen' or 'boot'

Consonants

tongue placement important
also positions and movements of lips, jaw & velum
categorize according to manner and place of articulation

manner of articulation - way the air stream is constricted by articulators

stops - pug [partial listing]
constrictives - forth
nasals - mine
liquids - reel
glides - wet
affricates - crutch

place of articulation - location where constriction occurs

bilabials - m, b [partial listing]
labiodentals - f, v
dentals - then
alveolars - d, s
palatals - rich
velars - kid
glottals - button

'standard theory' of phonology (Chomsky & Halle, 1968)

each phoneme is coded according to its distinct features
implies that distinct speech sounds produced by articulators achieving specific positions
problem, though, is that articulators exhibit wide variation in position and movement although continuing to produce speech

Variability and the motor theory of speech perception

one line of evidence for variability of speech is from analysis of spectrograms, a plot of sound frequency versus time.

but spectrograms of the same phoneme can vary, e.g., b in 'big' and b in 'big'
so how can we perceive the same phoneme from different heard stimuli?

motor hypothesis of speech perception

speech sounds heard as such because we invoke the commands needed to produce them. acoustic invariance arises from articulatory invariance
overt muscle activation is not needed
studies investigated articulatory invariance, measuring EMG of speech musculature

invariance not found

so how does comprehensible speech arise out of such a variable system?

target hypothesis

MacNeilage, 1970
feedback from articulators is used via system to bring articulators to specific target positions. targets defined as spatial relations of articulators

because it is based on feedback control, starting position is not relevant,
and the system can operate despite physical perturbation of the articulators

evidence

a patient with normal hearing and motor control but no proprioception could not produce clear speech
experimental disruption of oral feedback impairs speech quality

[talking after visiting the dentist!]

compensation for disturbances of articulators

talking with clenched teeth requires altered tongue and lip movements
if particular articulator positions and motions were required for speech, then this should not be possible

stability of articulatory targets in spite of variability in articulatory starting positions

with repetitions of an utterance, there was less variation in final jaw positions than in initial jaw positions
velocity of lower lip approaching upper lip in producing /p/, /b/ or /m/ increased with initial lip opening following preceding vowel

counter-evidence

absolute muscle lengths needed to produce speech with clenched teeth are actually different from muscle lengths that are normally required. because the system modulates muscle length, it cannot alone produce effective compensation
experiment - with block in mouth, Ss produced consonants that required lip closure. Target hypothesis predicts increased muscle tension to counteract enforced separation. Observed decreased muscle tension!
quite different articulatory configurations can be used to produce vowels with similar acoustic characteristics. Target hypothesis does not predict such wide variation.

relative positions and acoustic targets

the realization that the purpose of speech is to generate sounds led to the idea that speech designed to achieve acoustic rather than spatial targets
central insight is that proper acoustic results realized when articulators achieve proper relative positions, and the absolute positions are less important

Experiment - measure upper & lower lip position when speakers say [apa]
control of movement to relative target positions happens rapidly and can compensate for perturbations
this finding reflect motor system's tendency to achieve motor equivalence - capacity to achieve same result with different motor components

similar findings for finger and arm movements

A mechanism for relative positioning

A parallel distributed processing system for coarticulation
High-level control of speech

what control mechanisms regulate serial ordering of words, phrases & sentences?
Word games

Pig Latin

this ability reveals that speakers are sensitive to phonemes clusters, e.g., 'scram' 'amscray'

Backward talking

not like playing the record backwards; instead reverse the sounds within individual syllables

'I can talk backward"

'I nac kawt cabdraw'

produce syllables in forward direction

'Cabdraw kawt nac I'

produce syllable in backward direction

reversed phonemes rarely cross syllable boundaries

backward draw-kcab and not drawk-ab
indicates psychological reality of syllables

syllabic stress remains in same temporal order even when syllables reversed

'con'-trast' tsart'-noc and not tsart-noc'
Thus, stress patterns autonomous

Laboratory studies of speaking speed

Investigate hierarchical programming of syllable sequences (Gordon & Meyer, 1987)

Task

Ss learned to associate different 4 syllable utterances with different signals
3 different sequences

primary sequence: bee bay bah boo
hierarchically congruent: bah boo bee bay
hierarchically incongruent: bah bay boo bee

on each trial 1 signal presented as warning and second signal presented as trigger signal. On most trials the 2nd signal matched the 1st
measure choice RT

Results

RT shortest when 2nd signal matched 1st

Ss were preparing to utter specified sequence

when 2nd signal didn't match 1st, RT shorter when utterance had the same hierarchical syllable organization

Interpretation

hierarchically congruent syllables are faster than hierarchically incongruent sequences indicates that syllables and syllable pairs are functional units
makes it easier to modify words based on linguistic context, e.g., adding or deleting prefix or suffix
makes programming easier

Investigate programming of syllables sequences (Rosenbaum et al., 1987)

Task

Ss uttered 1 of 2 sequences depending on identity of auditory signal
one condition: single syllable, e.g., 'gee' & 'goo'
another condition: 2 syllables, e.g., 'geebee' & 'gooboo'
another condition: 3 syllables, e.g., 'geebeedee' & gooboodoo'

['doo-bee-doo-bee-doo'...]

the 2 auditory signals were compatible with the vowels in one of the sequences, high-pitch tone 'ee' & low-pitched tone 'oo'

Results

RT increased with number of syllables in sequence
RT shorter when auditory signal was compatible with its distinguishing vowel

Interpretation

vowel characterizing the entire sequence could be specified in a single processing stage. consistent with hierarchical programming of sequence with vowel assignment made for all syllables once at the top of the hierarchy

Speech errors

"unintended, nonhabitual deviation from a speech plan"
errors reveal something about how speech programmed; they are derived from mistakes in otherwise normal workings of speech planning and production
early analysis by Sigmund Freud

Freudian slip
Ss supposed to say "bine foddy" when tested by provocatively attired experimenter instead says "fine body"
subconscious urges not only source of errors, though

verbal slips contain fewer nonwords than words, so some sort of editing may take place

Baars et al. (1975) experiment

Ss read word list silently, then spoke word pair prior to 'respond' signal
when 'darn bore' 'barn door', phonological bias created by earlier words influenced production
however, Ss less likely to 'dart board' 'bart doard' because phonological bias could not override bias against nonwords

types of speech errors

different kinds of linguistic units are involved

words
morphemes - the meaning-bearing parts of words, such as 's' for plural
phonemes
consonant clusters
vowel-consonant pairs

different kinds of disruptions

misorderings

anticipations
perseverations
shifts
deletions

noncontextual errors

exact source of error hard to identify, e.g., blends

exchanges of same linguistic units

word exchanges
phoneme exchanges

models of speech production tested by accounting for regularities of speech errors

basic assumption that there are levels of linguistic representation leading to sentences

semantic level

linguistic meaning

syntactic

word order and grammatical relations between words

morphological

details of word formation, e.g., presence of prefix or suffix

phonological

phoneme representation

phonetic

phone representation

motor

physical articulation to produce sounds

some models have interactions flow from high to lower level. other models have interactions running both directions.
some models rule-based; others operate according to activation of units in a network

Brain mechanisms underlying speech

Neuronal basis of language historically tied to the pendulum of the localization of function debate

Pro-localization, early 1800s - Gall, Bouillard
Anti-localization, mid 1800s - Flourens
Pro-localization, late 1800s - Broca, Wernicke
Anti-localization, 1920-1960 - Hughlings-Jackson, Freud, Head
Pro-localization, 1960 to present - Geschwind, Sperry, Damasio

Brain regions specifically related to language

Cerebral cortex

Broca's area
Wernicke's area

Evidence for localization

Effects of electrical stimulation of left-hemisphere

Positive - Vowel burst, "Oooohh..."
Negative - Interruption, prevention or disruption of speech

Local cerebral blood flow (PET scans)
Ablation

Aphasia syndromes

Fluent

Wernicke's (sensory)

Inability to comprehend words or to arrange sounds into coherent speech.
Damage to Wernicke's area

Transcortical sensory (isolation syndrome)

Can repeat and understand words and name objects but cannot speak spontaneously, or are also unable to comprehend words even though they can repeat them. Cannot read or write.
Damage to parieto-temporo-occipital junction

Conduction

Speech, naming objects, comprehension intact, but cannot repeat words.
Damage to arcuate fasciculus

Anomic ("amnesic aphasia")

Speech comprehension, production and repetition not impaired, but difficulty naming objects.
Damage to temporo-occipital border

Nonfluent

Broca's

Difficulty speaking. Comprehension preserved.
Damage to Broca's area

Transcortical motor

Deficit in spontaneous speech. Repetition, reading and naming objects not impaired. Writing impaired.
Damage anterior to Broca's area

Global

Loss of production & comprehension
Damage to both Broca's and Wernicke's areas

Aprosody

Deficit in production or comprehension of intonation, affective component of speech
Damage to right hemisphere

Pure
Alexia without agraphia

Normal speech production. Can write but cannot read

Agraphia

Normal speech production. Cannot write

Word deafness
Normal speech production. Poor comprehension or repetition of certain words.

Wernicke-Geschwind model of language

Components

Input: visual cortex, auditory cortex
Sensory Processing: angular gyrus, Wernicke's area
Transmission: arcuate fasciculus
Motor Processing: Broca's area
Ouput: Face area of primary motor cortex

But the model is too localized