Benjamin Parrell

research interests

My research interests lie primarily in phonetics and speech production and how the variable realization of speech units (such as phonemes or gestures) relates to both invariant or abstract levels of control and dynamic, variable, uncontrolled aspects. I am particularly interested in the temporal aspects of speech gestures: how they are coordinated in time, how their duration is controlled, and how changes in coordination or duration can affect the physical and acoustic outcomes of those gestures. Control of these factors is fundamental: there is ample evidence of the need for duration specification in geminates, fricatives, approximants, and for voice/voiceless distinctions in many languages, as well as for signaling prosodic emphasis and boundaries. Yet we know very little about how duration is specified in speech-even whether it is controlled explicitly or arises from the timing and coordination of multiple speech movements. Improving our understanding of the control of duration in speech has implications for theories of motor control and phonology, and for speech therapy.

I have a number of ongoing projects related to the topic of timing and durational control in speech:

The relationship between durational and spatial reduction
My current research examines the articulatory control of stops in both Spanish and English. These languages are interesting to compare because both show reduction of some stops, but with very different articulatory and acoustic outcomes. In Spanish, the full series of both voiced and voiceless stops exhibit lenition or weakening in many dialects. Both voiced and voiceless stops can be produced with incomplete oral closure (this is more common for voiced than voiceless stops); additionally, voiceless stops are often produced with partial or full voicing during the oral constriction. My previous work (Parrell, LabPhon 2011) has shown that the lenition of the voiced stops is the consequence of an invariant constriction goal that lead to productions of either a full or undershot stop depending on the duration of the movement. I am currently working on a project extending this analysis to the wide-spread (but more variable) voicing of phonologically voiceless stops. I am working to develop a novel method using ultrasound to visualize the larynx and vocal folds during speech in order to quantify the relationship between duration and magnitude of glottal opening (see Ultrasonic imaging of the vocal folds during speech).

In contrast to the situation in Spanish, English shows reduction of only coronal stops, though the process is the same for both the phonologically voiced and voiceless phonemes (/d/ and /t/). Both stops reduce to a flap, which is produced without lateral contact of the tongue. Similarly to Spanish, it has been hypothesized that a reduction in the duration of the constriction may lead to the difference in articulation, as few reliable differences between stops and flaps have been found in examining the kinematics of the tongue tip (e.g. Turk, Working papers of the Cornell phonetics laboratory, 1992; De Jong, Journal of Phonetics 1998; Fukaya & Byrd, Journal of the IPA 2005). If this is indeed the case, why should shortening of coronal stops in one language leads to spirantization (Spanish) and in another to flapping (English)? I hypothesize that these different results are the result of different ways the tongue is used to attain oral closure in the two languages: coronal stops in Spanish are made with blade of the tongue at the teeth, while in English they are made with the tongue tip placed at the alveolar ridge. Because of this difference, the tongue tip is oriented differently in the two languages: upward in English, and downwards in Spanish. These postural differences would lead to the different consequences of temporal reduction in the two languages. I am currently conducting a study to test this hypothesis using real-time MRI. The difference in the location of tongue tip contact is visible in the videos below. Additionally, you can see how the tongue body is actively involved in making the constriction for a /d/ but doesn't move much during the production of the flap (on the bottom). The same is true for Spanish: there is some tongue body movement to create full closure for the stop, but not for the spirantized production. All the videos have been slowed down considerably to be able to see the tongue movements more clearly.

English [ɾ]

English [d]

Spanish [ð]

Spanish [d]

Coupling between speech and other motor systems
One of the most important ways that timing and duration are used in speech is in the prosodic structure of language. In addition to changes in the pitch of the voice, prosodic structure alters the timing, duration, and magnitude of speech gestures. Near a prosodic boundary, speech movements slow down and become larger in spatial magnitude. An outstanding question is how this slowing and enlarging is accomplished; by what mechanism do we so flexibly alter our speech rhythm, and what can this tell us about the control of duration and coordination in speech more generally? One possibility is that we recruit some more general, non-speech specific aspect of the motor control system to implement prosodic structure. There is a large and growing body of evidence that has found pervasive links between speech and other motor systems, with examples as diverse as the co-development of rhythmic limb/hand movements and babbling and speech in infants, the tight temporal coupling between manual gestures (such as pointing) and syntactic and prosodic structure in language (such as sentence- and word-level stress), and repetitive tapping and speaking tasks. This project uses this last paradigm: by having subjects co-produce speech and finger tapping with different rhythmic patterns, we can assess how the two motor systems interact and, perhaps, whether there is evidence for shared control of the two. For example, subjects in these tasks show significantly larger productions of finger taps and louder syllables for speech when they place a stress in the same domain; they also show larger/louder movements for the coproducted movement in the other domain (Keslo, Tuller, & Harris, The Production of Speech, 1983).

We have found additional evidence for entrainment between the two domains. First, we replicated the cross-domain effects demonstrated previously using direct kinematic analysis of the speech articulators, rather than acoustic intensity of speech. Second, we showed that the same pattern of co-variation seen for the magnitude of movements under stress is present in the duration of the speech and finger gestures. We also demonstrated that this covariation in duration and magnitude exists independently of the presence of explicit or alternating stress. These results indicate that the functional task of prosody-grouping information and highlighting salient information-harnesses a broad set of body components, including those not normally considered part of the speech system.

With Dani Byrd, Sungbok Lee, & Louis Goldstein

Ultrasonic imaging of the vocal folds during speech
Currently, no method currently exists to measure the absolute magnitude of glottal abduction in speech. Most previous work has used either transillumination, fiberoptic endoscopy, or a combination of the two. Both provide relative rather than absolute measures as the distance from light source/camera to the larynx varies during speech, making quantification of the magnitude of glottal abduction difficult. Additionally, the magnitude of the signal is sensitive to the placement of the photoreceptor. To avoid these difficulties, I use high-speed B-mode ultrasound to measure vocal fold movement during speech while simultaneously recording oral articulation via electromagnetic articulography. This type of ultrasonic imaging is completely non-invasive, and has been shown to accurately capture glottal movements and to provide accurate, absolute measurements of static laryngeal anatomy. The novel technique I am developing extends this method to speech research for the first time (Parrell et al., JASA 2011). The video to the right shows the laryngeal movements during repeated utterances of the phrase "Say tapper again" produced by an adult male speaker of English. The arytenoid cartilages as well as the thyro-arytenoid muscles are clearly visible. The vocal folds themselves move in and out of the image. This is due to how thin the folds are coupled with vertical movement of the larynx itself. This causes the folds to move in and out of the ultrasound beam.

This work is funded by a grant from the Diploma in Innovation at the University of Southern California. I am collaborating with Adam Lammert, from the USC Department of Computer Science.

Controlling duration and magnitude of speech gestures during imitation
Previous research has shown that speakers can imitate subphonemic detail in shadowing and repetition tasks. For example, many studies have shown unconscious imitation of VOT: a measure of the temporal difference between release of an oral closure and adduction of the vocal folds to begin voicing. However, the relative influences of phonological category and phonetic constraints on imitation are not well understood. This study examines whether imitation extends to control of a single speech gesture and, if so, to differences between spatial and durational characteristics.

In Spanish, voiced stops are both shorter and less constricted than voiceless stops. This could be due to constraints on the relation between gestural duration and magnitude, in which case it is not clear whether the phonological target specifies a gestural duration, magnitude, or both. To test this, we use the TaDA articulatory speech synthesizer to create a series of stops that vary in both duration and constriction degree. We then use these stimuli in a delayed shadowing task to examine the extent to which speakers control these parameters independently. We find that speakers do imitate differences in gestural duration and magnitude, though imitation is stronger for duration than magnitude. This unequal imitation indicates that the duration and magnitude of gestures are at least partially separable in speech motor control. Interestingly, we also find a significant correlation between the duration of the speech gesture and its stiffness. This mirrors a similar relationship found at prosodic boundaries: speech gestures in these locations are longer and less stiff. This indicates that altering the duration of speech gestures may rely on small prosodic variation and raises the question of whether duration can be explicitly manipulated in speech.

With Sam Tilsen

Some of my other projects have examined coordinative instability in speech and quantifying the effects of prosodic structure on speech timing.