Skip to main content
Intended for healthcare professionals
Available access
Research article
First published online October 4, 2021

Prosodic Disambiguation in First and Second Language Production: English and Korean

Abstract

This study investigates the use of prosodic cues for syntactic ambiguity resolution by first language (L1) and second language (L2) speakers. In a production experiment, sentences with relative clause attachment ambiguity were elicited in three language conditions: native English speakers’ L1 productions as well as Korean-English bilingual speakers’ L1 Korean and L2 English productions. The results show that English uses both boundary marking (pause) and relative word prominence (elevated pitch and intensity) for disambiguation, while Korean mainly relies on boundary marking (pre-boundary lengthening and pause). The bilingual speakers have learned to use the English phonological categories such as pitch accents for disambiguation, but their use of phonetic cues to realize these categories still differed from that of native English speakers. In addition, they did not show a significant use of boundary cues. These results are discussed in relation to the typological differences between the prosody of English and of Korean.

1 Introduction

One source of ambiguity in language is structural ambiguity, in which the same sequence of words may be associated with more than one syntactic structure (e.g., the boss of the clerk who was dishonest, in which the relative clause may modify either the boss or the clerk). To avoid potential miscommunication resulting from structural ambiguity, speakers and listeners often rely on differences in prosodic realization. The strategies used to resolve ambiguity may vary across languages, reflecting differences in their prosodic systems. Cues that assist in resolving structural ambiguities include, but are not limited to, fundamental frequency (F0) movement associated with boundary tones in Mandarin (Shen, 1993), English (Speer et al., 1996), Italian (Hirschberg & Avesani, 1997), Seoul Korean (Kang & Speer, 2003), European Portuguese (Vigário, 2003), and German (O’Brien et al., 2014); different placement of nuclear pitch accents in Italian (Hirschberg & Avesani, 1997), English (Snedeker & Trueswell, 2003), and European Portuguese (Vigário, 2003); and the distinction between downstep and upstep in Tokyo Japanese (Venditti, 1994). Despite the clear cross-linguistic variation in the use of prosodic cues for disambiguation, there is a lack of systematic comparisons of the prosody of disambiguation in typologically different languages.
Although English and Korean have different syntactic properties, both display a similar type of syntactic ambiguity, called relative clause (RC) attachment ambiguity. For instance, in the English sentence in (1), the RC can modify the closest noun phrase (NP), the clerkNP2, to mean ‘the boss of the dishonest clerk’, as in (1a). Alternatively, the RC can modify the preceding NP, the bossNP1, to mean ‘the dishonest boss of the clerk’, as in (1b). These differences are represented in terms of where the RC attaches in the syntactic tree: to the lower NP contained within an upper NP (the low attachment reading) or to the upper NP itself (the high attachment reading).
(1) English RC attachment ambiguity
Jennifer blackmailed the bossNP1 of the clerkNP2 [who was dishonest]RC.
a. Low attachment reading: ‘Jennifer blackmailed the boss of the dishonest clerk.’
b. High attachment reading: ‘Jennifer blackmailed the dishonest boss of the clerk.’
While native English speakers show a default preference for low attachment readings in the absence of prosodic information (Dussias, 2003; Fodor, 2002), their processing of RC attachment is influenced by prosodic cues signaling the relative strength of prosodic boundaries after NP1 and after NP2 (Clifton et al., 2002; Jun, 2003) or the relative prominence of the two nouns (Jun & Bishop, 2015b; E.-K. Lee & Watson, 2011; Schafer et al., 1996).
Similar RC attachment ambiguities are found in Korean (2), where the RC can modify either NP1, as in the low attachment reading (2a), or NP2, as in the high attachment reading (2b).
(2) Korean RC attachment ambiguity1
[Pucilenha-n]RCsonye-uyNP1enni-kaNP2 cemsim-ul mek-koiss-da.
Diligent-comp girl-poss older.sister-nom lunch-acc eat-prog-dec
a. Low attachment reading: ‘The older sister of the diligent girl is eating lunch.’
b. High attachment reading: ‘The diligent older sister of the girl is eating lunch.’
Jun (2003) reported that while native Korean speakers preferred the high attachment reading when there are no disambiguating prosodic cues, their interpretation of ambiguous RC was affected by the presence of prosodic boundaries before NP1 or NP2. Similarly, Korean speakers inserted a stronger prosodic boundary after the RC when a high attachment reading was elicited, and a strong prosodic boundary appeared after NP1 when a low attachment reading was elicited (Baek & Yun, 2018).
Although there has been much research on the use of prosody for syntactic ambiguity resolution in individual languages, less attention has been paid to comparisons across prosodically different languages and the impact of cross-linguistic differences on second language (L2) learning. By comparing prosodic disambiguation of a certain syntactic ambiguity structure across languages, we can compare how prosodic structure may reflect underlying syntactic structure in the studied languages, while minimizing the effects of cross-linguistic segmental and lexical differences. Even though prosody plays a crucial role in communication, considerably less attention has been paid to the prosodic aspects than to the segmental aspects of L2 speech. Despite the relative complexity of the RC attachment ambiguity structure and the relative infrequency of its occurrence in spontaneous speech, prosodic resolution of this ambiguity by L2 speakers allows us to investigate what prosodic aspects of the target language they use to communicate when prosody is the only cue for effective communication at hand. Therefore, the goals of this study are twofold: to compare the uses of prosody by English and Korean speakers for the resolution of the RC attachment ambiguity in their native languages, and to investigate prosodic disambiguation in the production of English as L2 by Korean-English bilingual speakers.

2 Background

2.1 Prosodic disambiguation in English

The prosodic system of English consists of two parameters, as summarized in (3). English marks word prominence with pitch accents, or well-defined pitch shapes, whose metrically strong (starred) tone is associated with words or syllables to be accented. For instance, an H* accent is typically associated with new information to be predicated in the discourse, while an L* accent in general marks information that is intended to be salient but not to be predicated by the speaker (for more information on the pitch accent inventory of English, see Pierrehumbert & Hirschberg, 1990). Pitch accents are phonetically realized as pitch movements and increased intensity. In addition, English marks boundaries of prosodic units, such as intermediate phrases (ip) or intonational phrases (IP), with phrase accents, boundary tones, final lengthening and pauses. An ip boundary is marked by a phrase accent, represented by H- or L-. An H- phrase accent indicates that the current phrase combines with the following phrase to form a larger phonological unit, and an L- phrase accent signals the separation of the current phrase from a subsequent phrase (Pierrehumbert & Hirschberg, 1990). IP boundary tones are labelled with the % symbol, as in H% and L%. An L% boundary tone generally marks a statement or a wh question, while an H% boundary tone typically marks yes-no questions (Pierrehumbert & Hirschberg, 1990).
a. Word prominence
Phonological categories: pitch accents (H*, L*, H*+L, H+L*, L*+H, L+H*)
Phonetic cues: pitch movements, increased intensity
b. Prosodic boundary
Phonological categories: phrase accents (H-, L-) and boundary tones (H%, L%)
Phonetic cues: pitch movements, final lengthening, pauses
In the example sentence in (4), the words cookie and kill have an H* pitch accent, marking new information to be predicated. The H- phrase accent on cookie marks ip boundaries, indicating the continuation of the larger phonological unit, IP. The sentence-final L% boundary tone marks an IP boundary.
(4) Eat another cookie and I’ll kill you.
        H* H-    H* L-L%  (Pierrehumbert & Hirschberg, 1990)
Word prominence and prosodic boundaries provide disambiguating cues when native English speakers comprehend spoken sentences with RC attachment ambiguity. Schafer et al. (1996) and E.-K. Lee and Watson (2011) showed that English speakers listening to sentences with RC attachment ambiguity were more likely to interpret a noun as the head of the RC when it carried a pitch accent (H* or L+H*) compared to when the other noun received the accent. Moreover, Jun and Bishop (2015b) also found that listeners were biased toward the more prominent noun as the head of the RC and that this effect of word prominence was persistent even when the boundary cues were directed toward the other syntactic phrasing. Clifton et al. (2002) created auditory stimuli varying in the relative strength of the two prosodic boundaries as in I met the daughter [1] of the colonel [2] who was on the balcony, based on which listeners responded to a forced-choice task indicating their interpretations of the sentences. The participants preferred the high attachment reading when the second boundary was stronger (no prosodic boundary at [1] and an ip boundary at [2]) than when the first boundary was stronger (an IP boundary at [1] and an ip boundary at [2]).
English speakers’ use of prosody for disambiguation has also been found in production studies. For example, Kraljic and Brennan (2005) investigated English speakers’ production of sentences with prepositional phrase (PP) attachment ambiguity, as in Put the dog [in the basket]PP on the star, in which the PP in the basket can specify either the dog (modifier interpretation) or the location at which the dog is to be placed (goal interpretation). The results of their production experiment indicated that for modifier interpretations, the duration of the basket and the subsequent pause was longer than the duration of the dog and its subsequent pause by 17.3% of the utterance length. The opposite pattern was found for goal interpretations; the duration of the first NP and its subsequent pause was longer than that of the second NP and its subsequent pause by 10.5% of the utterance length. These results suggest that speakers resolve attachment ambiguity by placing a prosodic juncture (i.e., lengthened phrase-final word and its subsequent pause) at the location of boundaries of bigger syntactic phrases. A similar use of prosodic boundaries was found in the resolution of complex NP ambiguity ([television] [or radio and newspapers] vs. [television or radio] [and newspapers]) and closure ambiguity (Because her grandmother knitted pullovers / Kathy kept warm in the wintertime vs. Because her grandmother knitted / pullovers kept Kathy warm in the wintertime) (Allbritton et al., 1996; Price et al., 1991; Speer et al., 1996). However, few studies have investigated the use of word prominence cues in the production of ambiguous sentences by native English speakers.

2.2 Prosodic disambiguation in Korean

In contrast to English, Korean does not have an accent system associated with lexical prominence (5). Instead, it marks boundaries of prosodic units, such as accentual phrases (AP) and IP. Similarly to those in English, these prosodic boundaries in Korean are marked by boundary tones, along with final lengthening and pauses. The final syllable of an AP typically carries a Ha boundary tone, and a La boundary tone occurs when it is flanked by two high tones at the end of a short AP (Jun, 2005a). The final syllable of an IP is marked by one of the nine boundary tones, L%, H%, LH%, HL%, LHL%, HLH%, LHLH%, HLHL%, LHLHL%, which have different semantic and pragmatic functions. Moreover, an IP-final syllable is about twice as long as the same syllable in the middle of an IP (Jun, 2005a).
(5) Parameters in Korean prosody (Jun, 2005b, 2014)
a. (No word prominence)
b. Prosodic boundary
Phonological categories: AP boundary tones (Ha, La) and IP boundary tones (e.g., L%, H%, LH%, HL%)
Phonetic cues: pitch movements, final lengthening, pauses
For example, in (6), each word forms an AP, as represented by the Ha boundary tones, and the sentence-final L% boundary tone marks an IP boundary.
(6) yengmani-ne-nun yenga-lul miwehay-yo
Youngman-family-top Youngah-acc hate-dec
    Ha       Ha       L%
‘Youngman’s family hates Youngah.’ (Jun, 2005b)
Korean speakers use prosodic boundaries to resolve RC attachment ambiguity, both in processing and production. Jun (2003) invited speakers of typologically diverse languages to produce a sentence with RC attachment ambiguity in their native language equivalent to Someone shot the servant of the actress who was on the balcony (NP1+NP2+RC for head-initial languages or RC+NP1+NP2 for head-final languages) with a pause or a phrase break before or after either NP2. The speakers were then asked to answer the question Who was on the balcony? for each production. The Korean speakers, as well as speakers of other head-final languages, preferred low attachment readings when there was a prosodic boundary between NP1 and NP2 and high attachment readings when there was a boundary between the RC and the adjacent NP. Although this was not a traditional processing task, in the sense that the informants were making judgments on their own guided productions, the results suggest that their judgments were influenced by prosodic boundary cues.
Moreover, Korean speakers use prosodic boundaries as a cue to disambiguation even when they are not explicitly directed to do so. In Baek and Yun (2018), the participants were instructed to read aloud sentences such as (7) to deliver a given meaning. Acoustic analysis of their productions showed a stronger prosodic juncture, realized as boundary tones, lengthening, pause, or pitch reset (a domain-initial reset of pitch to a high value after a continuous declination in the previous domain), after the RC when a high attachment reading was elicited, and the same cues were produced after NP1 when a low attachment reading was elicited. The authors also found that the strength of a prosodic juncture correlated with the number of syntactic phrase boundaries occurring in the same position, which suggests that a mapping between syntactic phrase boundaries and prosodic phrase boundaries helps ambiguity resolution in the case of RC attachment ambiguity in Korean.
(7) Pucilenha-nRC nam-haksayng-uyNP1 nwunaNP2  diligent-comp male-student-poss older.sister
a. low attachment: ‘the older sister of the diligent male student’
b. high attachment: ‘the diligent older sister of the male student’

2.3 Prosodic disambiguation in L2 production

Cross-linguistic differences in prosodic disambiguation might reasonably be expected to influence L2 learning. Yang (2010) studied the use of prosody for disambiguation by Taiwanese-English L2 speakers at two proficiency levels—advanced and limited. In a read-aloud task, the speakers produced ambiguous English sentences with coordinate structures, such as The little (a) dogs (b) and cats chased a ball, where the prosodic boundaries at (a) and (b) were expected to help resolve the ambiguity. This study found that the advanced L2 speakers produced strong boundary cues (pre-boundary lengthening and pause) at (a) when the adjective modified both of the following nouns, but these cues appeared at (b) when the adjective modified only the immediately following noun. This was the same pattern that the control group of native English speakers showed. In contrast, speakers with limited L2 proficiency produced both boundaries with a similar duration regardless of the intended interpretations and thus did not exhibit significant use of prosody for disambiguation.
Jackson and O’Brien (2011) elicited German sentences from English-German L2 speakers to examine their use of prosodic cues for the resolution of temporary PP attachment ambiguity (e.g., The manager thanked the secretary [with two children]PP inducing noun attachment and . . . [with a bouquet of flowers]PP inducing verb attachment). Their results showed that the L2 speakers used different prosodic cues for the different attachments, such as pre-boundary lengthening, pauses, and a phrase-initial pitch accent. Moreover, although the L2 speakers varied in their L2 German proficiency test scores, there was no significant relationship between their German proficiency and the extent to which they made use of these prosodic cues for disambiguation.
While Yang (2010) and Jackson and O’Brien (2011) made important contributions to our understanding of prosodic disambiguation by L2 speakers, their studies shared theoretical and analytical limitations. First, both did not compare the prosodic systems of the participants’ L1 and L2, which is necessary to create a testable hypothesis on the effects of L1 on L2 speech production. Second, both studies focused only on prosodic phrasing phenomena in disambiguation. When it comes to word prominence, another significant parameter in prosodic typology (Jun, 2014; Wagner & Watson, 2010), there has been a dearth of studies on how L2 speakers utilize this cue in producing ambiguous sentences in the target language.

2.4 The present study

There has been much research on the use of a single type of prosodic cue—either word prominence or prosodic boundaries—in ambiguity resolution. However, languages differ in terms of what parameters they have available in their prosodic systems, which should then influence what prosodic parameters are used for disambiguation. The first goal of the present study was to conduct cross-linguistic research comparing prosodic disambiguation in prosodically different languages, namely, English and Korean. On the assumption that languages make use of all prosodic parameters that their prosodic system makes available, English is expected to use both word prominence and prosodic boundary cues for disambiguation. That is, native English speakers’ productions of RC attachment ambiguity structure, NP1+NP2+RC, would differ with regard to both (a) the relative prominence of NP1 and NP2 and (b) the relative strength of prosodic boundaries after NP1 and NP2, depending on what readings they intend to deliver. On the contrary, Korean would only use prosodic boundaries since it lacks a prosodic parameter marking word prominence. Thus, when native Korean speakers produce RC+NP1+NP2 structures with different intended readings, their productions would differ in the relative strength of prosodic boundaries after RC and NP1 but not in the prominence of NP1 and NP2. To test these hypotheses, this study compared the prosodic properties of the two language groups’ productions of ambiguous RC structures.
Furthermore, this study also aimed to examine the impact of these cross-linguistic prosodic differences on L2 productions. The prosodic systems of English and Korean are relatively similar in the parameter of prosodic boundaries but differ with regard to the parameter of prosodic prominence. Thus, when Korean-English bilingual speakers produce sentences with RC attachment ambiguity in L2 English, they would be expected to use the relative strength of prosodic boundaries after NP1 and NP2 as native speakers do, but to fail to manipulate the prominence of NP1 and NP2 in a target-like manner. In order to test these hypotheses, Korean-English bilingual speakers were invited to produce RC attachment ambiguity structures in English, and their productions were compared to native English speakers’ productions.

3 Method

3.1 Materials

To elicit the production of ambiguous sentences in English and Korean, 12 English sentences and 12 Korean sentences with RC attachment ambiguity were used as stimuli. The English sentences were adopted from previous studies (Jun & Bishop, 2015a, 2015b), and the Korean sentences were created by the author. To ensure that the sentences had the desired two readings, three native speakers of each language indicated their preferences for each sentence on a 5-point scale (e.g., question: Who was dishonest?; choices: must be the boss/more likely the boss/equally likely/more likely the clerk/must be the clerk). None of the sentences received more than one response at either end of the scale, confirming that they could be interpreted as ambiguous. Taking into account the raters’ comments, slight lexical changes were made to ensure the sentences were less biased towards either of the readings as well as easier to pronounce. The final list of stimuli is given in Appendix A.

3.2 Procedure and participants

Each target sentence was presented on a computer screen along with two short sentences representing two different interpretations, as shown in Figure 1. The task was to read the target sentence aloud twice, for one reading after the other, as indicated by the highlighted sentences. The order between the two attachment readings were counterbalanced across participants and across items. This choice of methodology was motivated by the results of a previous study (Baek & Yun, 2018), which showed that prosodic cues for disambiguation were saliently produced only when participants were explicitly attempting to differentiate competing interpretations in the absence of a disambiguating context. Similarly, Straub (1996) reported that the salience of disambiguating prosodic cues diminished when alternate sources of disambiguation information, such as biasing contexts, were present for the listeners.
Figure 1. Example slides for speech elicitation.
Experimental sessions took place in a sound-treated recording room. All elicited utterances were recorded using a Zoom H6 digital recorder and a SM10A-CN dynamic head-mounted microphone at a sample rate of 44.1 kHz. At the end of the session, the participants completed a brief questionnaire on their demographic information and language background.
Eleven English speakers (8 females, 3 males) participated in the experiment. Their mean age was 20.2, ranging from 18 to 24, and they reported that they had grown up in the New York, Pennsylvania, or Delaware regions. Twelve Korean-English bilingual speakers (6 females, 6 males) also participated. All of them indicated Korean as their native language and English as their L2. Their age and language background information from the questionnaire is summarized in Table 1.
Table 1. Korean-English bilingual speakers’ language background. Self-evaluated English proficiency was rated on a 10-point scale from 0 (none) to 10 (perfect).
 meanrange
Age25.819–38
Years of residence in the US6.10.6–12
Age of starting learning English
 Speaking8.74–12
 Reading9.56–12
Self-evaluated English proficiency
 Speaking6.86–7
 Listening8.07–10
 Reading7.66–9
 Writing7.16–8
All participants were attending Stony Brook University. The native English speakers completed the task in English. The Korean-English bilingual speakers completed the task first in L1 Korean and then in L2 English.
During pre-processing of the recorded data, one female English speaker was excluded due to creaky voice quality, which interfered with accurate pitch measurement. Two female Korean speakers were excluded because they spoke the Kyeongsang dialect of Korean, which is prosodically different from Seoul Korean. The remaining data consisted of three language data sets (L1 English, Korean-English bilingual speakers’ L1 Korean and L2 English), each consisting of 240 utterances (12 sentences × 2 attachments × 10 speakers). Thirty-two utterances (8 L1 English tokens, 8 L1 Korean tokens, and 16 L2 English tokens) were excluded due to production errors, such as repetition of a word or production of an incorrect word.

3.3 Measurements

The resulting 688 utterances were word-segmented by hand in Praat, version 6.1.07 (Boersma & Weenink, 2019). In each of the recorded English sentences, three intervals were segmented—NP1 (e.g., boss), NP2 (e.g., clerk), and RC (e.g., who was dishonest)—as well as pauses between them, if present. Similarly, in each of the recorded Korean sentences, three intervals were segmented: the predicate in the RC, NP1, and NP2 (e.g., pucilenhan, sonyeuy, and ennika), as well as pauses between them. After segmentation, the following acoustic measurements were extracted: (a) the duration of the intervals (ms), (b) the duration of pauses (ms), (c) the maximum and mean pitch of the intervals (Hz), and (d) the maximum and mean intensity of the intervals (dB). To eliminate the possible influence of speech rate, the duration measures were used to calculate the duration proportions of each interval or pause to the entire sentence. Pitch values in Hz were converted into semitones to control for individual variation (12log2F0/Fref, reference level = 1Hz). The use of mean intensity measurements was motivated by previous studies reporting a relationship between word mean intensity and word prominence, such as focus marking, across typologically diverse languages including English and Korean (A. Lee & Xu, 2012; Y. Lee et al., 2015; Y.-c. Lee & Xu, 2010; Wu & Xu, 2010).

3.4 Statistical analyses

Differences in measurements across attachment conditions (Low and High) and across word positions (NP1, NP2, and RC) were statistically analyzed by mixed-effects regression models using the lmer() function from the lme4 package (Bates et al., 2015) in R, version 3.6.1 (R Core Team, 2019). First, the L1 English data and Korean-English bilinguals’ L2 English data were compared for each dependent variable (word duration, pause duration, maximum and mean pitch, and maximum and mean intensity) by fitting a model with language group (English, Korean), word position (NP1, NP2, RC), and attachment (low attachment, high attachment) as fixed effects and a by-participant random intercept. Since the results indicated significant three-way interactions, as reported in the section reporting the results of the Korean-English bilingual speakers, each dataset (L1 English, Korean-English bilingual speakers’ L1 Korean and L2 English) was then analyzed separately to understand the prosodic cue use by each language group.
For each dataset, models were fit for each dependent variable with word position and attachment as fixed effects. The predictors were treatment-coded. The reference level for word position was the first interval for word and pause duration analyses (NP1 for English and RC for Korean) and the first NP for pitch and intensity analyses (NP1 for both English and Korean). The reference level for attachment was low attachment for all analyses. For the random effect structure, a full model was fit with by-participant and by-item random intercepts as well as by-participant and by-item random slopes for both fixed effects. When a model failed to converge, the random effect that captured the smallest variance was removed until the model fit reached convergence (Barr et al., 2013). The final models are reported in Appendix B. The p-values were obtained using the lmerTest package (Kuznetsova et al., 2017).

4 Results

Table 2 shows the descriptive statistics results of the acoustic measurements by language and attachment conditions. For all three language datasets, the results of the maximum pitch and intensity measurements showed the same patterns as those of the mean measurements and thus are not reported.
Table 2. Descriptive statistics results.
LI English productions
 Low attachmentHigh attachment
 NP1PauseNP2PauseRCNP1PauseNP2PauseRC
 M(SD)M(SD)M(SD)M(SD)M(SD)M(SD)M(SD)M(SD)M(SD)M(SD)
Word duration (ms)331.9 (92.0) 425.8 (131.4) 917.0 (267.4)364.6 (109.7) 453.7 (138.0) 920.7 (259.0)
Word duration (%)10.0 (2.6) 12.9 (3.9) 27.7 (7.2)10.5 (2.9) 13.1 (3.7) 26.6 (6.8)
Pause duration (ms) 34.0 (98.7) 23.5 (80.9)  8.3 (29.1) 105.4 (124.5) 
Pause duration (%) 1.0 (2.8) 0.7 (2.2)  0.2 (0.7) 2.9 (3.4) 
Mean pitch (semitone)88.7 (5.0) 88.0 (5.0) 86.2 (4.7)89.1 (4.8) 85.5 (5.5) 87.1 (5.0)
Mean intensity (dB)59.5 (5.0) 58.3 (4.1) 54.9 (4.2)60.4 (5.0) 54.8 (4.6) 54.7 (4.3)
Korean-English bilingual speakers’ LI Korean productions
 Low attachmentHigh attachment
 RCPauseNP1PauseNP2RCPauseNP1PauseNP2
 M(SD)M(SD)M(SD)M(SD)M(SD)M(SD)M(SD)M(SD)M(SD)M(SD)
Word duration (ms)463.9 (108.1) 492.2 (119.0) 446.2 (135.9)491.8 (125.7) 408.3 (97.8) 501.4 (136.4)
Word duration (%)15.1 (3.6) 16.2 (4.6) 14.4 (4.0)15.7 (4.1) 13.1 (3.5) 15.9 (3.9)
Pause duration (ms) 9.8 (40.7) 71.6 (113.7)  41.1 (88.4) 13.5 (52.8) 
Pause duration (%) 0.3 (1.1) 2.2 (3.4)  1.3 (2.7) 0.4 (1.6) 
Mean pitch (semitone)88.4 (5.6) 87.2 (5.9) 86.0 (5.5)88.0 (5.8) 87.0 (5.7) 85.8 (5.6)
Mean intensity (dB)62.1 (4.1) 60.4 (3.9) 59.3 (4.5)61.6 (4.5) 59.9 (4.6) 59.6 (4.0)
Korean-English bilingual speakers’ L2 English productions
 Low attachmentHigh attachment
 NP1PauseNP2PauseRCNP1PauseNP2PauseRC
 M(SD)M(SD)M(SD)M(SD)M(SD)M(SD)M(SD)M(SD)M(SD)M(SD)
Word duration (ms)382.9 (111.3) 526.9 (151.9) 1135.7 (336)427.6 (117.2) 501.5 (143.5) 1098.3 (308.5)
Word duration (%)9.3 (2.7) 12.8 (3.6) 27.3 (7.2)10.3 (2.9) 12.0 (3.2) 26.4 (7.1)
Pause duration (ms) 24.0 (70.0) 109.1 (171.9)  51.2 (86.4) 133.6 (187.8) 
Pause duration (%) 0.6 (1.7) 2.5 (3.8)  1.2 (2.1) 3.0 (4.2) 
Mean pitch (semitone)86.9 (5.5) 86.1 (5.2) 84.8 (5.4)87.2 (5.4) 85.6 (5.4) 84.9 (5.3)
Mean intensity (dB)58.7 (4.5) 57.0 (4.2) 54.3 (4.3)59.4 (4.3) 56.5 (4.4) 54.3 (4.3)

4.1 L1 English results

This section reports the results of the four acoustic measurements—word duration, pause duration, mean pitch, and mean intensity—extracted from the L1 English data. Figure 2 shows the means and standard errors of the four acoustic measurements for each interval by attachment condition. Table 3 summarizes the results of the linear mixed-effects analyses for each acoustic measure.
Figure 2. Word duration, pause duration, mean pitch, mean intensity per interval for L1 English speakers, separated by attachment condition. Error bars represent the standard error.
Table 3. Estimated effects and coefficients for word duration proportion (%), pause duration proportion (%), mean pitch (semitone), and mean intensity (dB) in English L1 productions.
  βSE(β)tp(>|t|)
Word duration proportion(Intercept)9.9780.70726.814 
Low vs. High0.5900.5861.0060.315
NP1 vs. NP22.8730.5864.9020.000***
NP1 vs. RC17.6350.58630.0900.000***
Low vs. High × NP1 vs. NP2–0.2930.829–0.3540.724
Low vs. High × NP1 vs. RC–1.5340.829–1.8510.065
Pause duration proportion(Intercept)0.9760.4032.423 
Low vs. High–0.7750.296–2.6210.009**
After NP1 vs. After NP2–0.3050.296–1.0310.303
Low vs. High × After NP1 vs. After NP22.9980.4187.1700.000***
Mean pitch(Intercept)88.7561.53957.686 
Low vs. High0.3730.2891.2920.197
NP1 vs. NP2–0.6720.564–1.1920.249
NP1 vs. RC–2.5320.476–5.3170.000***
Low vs. High × NP1 vs. NP2–2.9850.409–7.3020.000***
Low vs. High × NP1 vs. RC0.5560.4091.3590.175
Mean intensity(Intercept)59.4471.50539.501 
Low vs. High0.9810.2563.8300.000***
NP1 vs. NP2–1.2160.663–1.8330.090
NP1 vs. RC–4.6280.996–4.6490.000***
Low vs. High × NP1 vs. NP2–4.3550.362–12.0220.000***
Low vs. High × NP1 vs. RC–0.9660.362–2.6670.008**

4.1.1 Word duration

In both low and high attachment conditions, the average duration of NP1 was shorter than the average duration of NP2, which in turn was shorter than the average duration of RC. The duration of RC is noticeably longer than that of either NP1 or NP2, because its segmented interval contained an entire clause, while the intervals for NP1 and NP2 contained one lexical item. The difference across positions was consistent in both low and high attachment conditions, and the results of a mixed-effects linear regression analysis indicate that there was no significant interaction between position and attachment. In other words, the English speakers did not use word duration as a cue for attachment ambiguity resolution.

4.1.2 Pause duration

In the low attachment condition, the proportion of pause duration was greater after NP1 than after NP2, while in the high attachment condition, the proportion of pause duration was greater after NP2 than after NP1. There was a significant interaction between position and attachment, indicating that the English speakers made significant use of pause duration to resolve ambiguity. Longer pauses were inserted between words to mark a juncture between major prosodic phrases: between (NP1) and (NP2 RC) in the low attachment condition and between (NP1 NP2) and (RC) in the high attachment condition. The resulting prosodic phrase structure corresponds to the intended syntactic structure: [NP1][NP2 RC] in the low attachment and [NP1 NP2][RC] in the high attachment conditions.

4.1.3 Mean pitch

The low attachment pattern shows a consistent pitch decrease from NP1 to NP2 and then to RC. In the high attachment condition, in contrast, the mean pitch drops sharply from NP1 to NP2, after which it increases in RC. The interaction between position and attachment on the mean pitch values of NP1 and NP2 was statistically significant, suggesting that the English speakers manipulated pitch on the two nouns for disambiguation. They showed a continuous pitch downstep throughout the entire phrase in the low attachment condition, where NP2 was the head of the RC, but they produced NP2 with a noticeably lower pitch in the high attachment condition, where NP1 was the head of the RC.

4.1.4 Mean intensity

In the low attachment condition, the mean intensity showed a slight decrease from NP1 to NP2, followed by a sharp drop on RC. Although intensity also showed a continuous drop in the high attachment condition, the decrease from NP1 to NP2 was more noticeable, followed by a slight decrease in RC. The interaction effects between position and attachment were statistically significant, especially on the intensity of NP1 versus NP2. This indicates that the English speakers manipulated the relative intensity of the two nouns for disambiguation, producing NP2 with a clearly weaker intensity when NP1 was the head of the RC (high attachment) compared to when NP2 itself was the RC head (low attachment).
To summarize the L1 English production results, the interaction between position and attachment was found to be significant for pause duration, mean pitch, and mean intensity cues (p<0.001 for all three), but not for word duration. Longer pauses were found in word junctures that corresponded to a major syntactic phrase boundary of the intended attachment reading. In addition, while NP1 in general had higher pitch and intensity than NP2, these cues on NP2 decreased more dramatically when the other noun was the head of the RC compared to when NP2 itself was the RC head. These results suggest that English speakers resolve RC attachment ambiguity by marking prosodic phrase boundaries with a pause and also by manipulating the relative prominence of the two nouns.

4.2 Korean-English bilinguals’ L1 Korean results

This section reports the results of the acoustic measurements extracted from the Korean-English bilingual speakers’ L1 Korean data. The following subsections each report the results for each prosodic measure: word duration, pause duration, mean pitch, and mean intensity. Figure 3 shows the means and standard errors of the four acoustic measurements for each interval by attachment condition. Table 4 summarizes the results of the linear mixed-effects analyses for each acoustic measure.
Figure 3. Word duration, pause duration, mean pitch, mean intensity per interval for Korean-English bilingual speakers’ L1 Korean productions, separated by attachment condition. Error bars represent the standard error.
Table 4. Estimated effects and coefficients for word duration proportion (%), pause duration proportion (%), mean pitch (semitone), and mean intensity (dB) in the Korean-English bilingual speakers’ L1 Korean productions.
  βSE(β)tp(>|t|)
Word duration proportion(Intercept)15.1220.7919.134 
Low vs. High0.6290.4111.5310.126
RC vs. NP11.0630.4132.5770.010*
RC vs. NP2–0.6940.413–1.6810.093
Low vs. High × RC vs. NP1–3.6350.581–6.2570.000***
Low vs. High × RC vs. NP20.9040.5811.5550.120
Pause duration proportion(Intercept)0.3090.3170.974 
Low vs. High0.9670.2893.3490.001***
After RC vs. After NP11.9740.3435.7630.000***
Low vs. High × After RC vs. After NP1–2.80.408–6.8590.000***
Mean pitch(Intercept)87.0361.83447.45 
Low vs. High–0.1940.214–0.9070.365
NP1 vs. RC1.1140.2155.190.000***
NP1 vs. NP2–1.2150.215–5.660.000***
Low vs. High × NP1 vs. RC–0.0240.302–0.0790.937
Low vs. High × NP1 vs. NP20.1030.3020.3390.734
Mean intensity(Intercept)60.2161.27847.106 
Low vs. High–0.4050.239–1.6980.09
NP1 vs. RC1.7020.7352.3150.039*
NP1 vs. NP2–1.0530.341–3.0850.006**
Low vs. High × NP1 vs. RC0.0320.3370.0940.925
Low vs. High × NP1 vs. NP20.7270.3372.1570.031*

4.2.1 Word duration

In the low attachment condition, the duration of NP1 was the longest compared to the duration of RC and NP2. In contrast, in the high attachment condition, the duration of RC and NP2 was longer than the duration of NP1. The interaction between word position and attachment condition was statistically significant on the duration of RC and NP1, indicating that word duration was significantly used for disambiguation by the Korean speakers. They lengthened NP1 in the low attachment condition, where it demarcated the prosodic phrases (RC NP1) and (NP2), while RC and NP2 were lengthened in the high attachment condition, where they demarcated the phrases (RC) and (NP1 NP2). The resulting prosodic phrase structure corresponds to the intended syntactic structure: [RC NP1][NP2] in the low attachment and [RC][NP1 NP2] in the high attachment conditions.

4.2.2 Pause duration

In the low attachment condition, the pause after NP1 was longer than the pause after RC, while in the high attachment condition, the pause after RC was longer than the pause after NP1. The interaction between position and attachment was significant, which indicates that the Korean speakers used pause duration to resolve ambiguity. Longer pauses were inserted between words to mark the same prosodic juncture as marked by word lengthening: between (RC NP1) and (NP2) in the low attachment, but between (RC) and (NP1 NP2) in the high attachment conditions.

4.2.3 Mean pitch

In both attachment readings, the mean pitch of RC was higher than the mean pitch of NP1, which in turn was higher than that of NP2. There was no significant interaction between position and attachment condition. In other words, the Korean speakers showed a continuous pitch downstep throughout the whole phrase and did not use pitch as a significant cue for disambiguation.

4.2.4 Mean intensity

In both attachment conditions, the mean intensity of RC was greater than the mean intensity of NP1, which in turn was greater than that of NP2. Although both attachment conditions show a continuous intensity decrease, the drop from NP1 to NP2 was greater in the low attachment condition than in the high attachment condition, yielding a significant interaction between position and attachment condition. In other words, the Korean speakers produced NP2 with a greater intensity in the high attachment condition, where it was the head of the RC.
To summarize the Korean-English bilingual speakers’ L1 Korean production results, the interaction between position and attachment was found to be significant for word duration, pause duration (p<0.001 for both), and mean intensity (p=0.031), but not significant for mean pitch. A word was lengthened and followed by a longer pause when it was at the right edge of a major syntactic constituent in the intended attachment reading. In addition, NP2 was produced with greater intensity when it was the head of the RC. These results suggest that the Korean speakers used prosodic boundary cues, such as pre-boundary word lengthening and pauses, and intensity to resolve RC attachment ambiguity.

4.3 Korean-English bilinguals’ L2 English results

First, the L1 English data and the Korean-English bilingual speakers’ L2 English data were compared by running a model with language group, position, and attachment as fixed effects (reference level: English L1, NP1, low attachment) for each acoustic measurement. The results indicated significant three-way interaction effects for pause duration (t=3.8,p<.001), mean pitch (t=3.7,p<.001), and mean intensity (t=4.2,p<.001). To better understand how these speakers used these prosodic cues, separate analyses were conducted on the Korean-English bilingual speakers’ L2 English data. This section reports the results for each prosodic measure: word duration, pause duration, mean pitch, and mean intensity. Figure 4 shows the means and standard errors of the four acoustic measurements for each interval by attachment condition. Table 5 summarizes the results of the linear mixed-effects analyses for each acoustic measure.
Figure 4. Word duration, pause duration, mean pitch, mean intensity per interval for Korean-English bilingual speakers’ L2 English productions, separated by attachment condition. Error bars represent the standard error.
Table 5. Estimated effects and coefficients for word duration proportion (%), pause duration proportion (%), mean pitch (semitone), and mean intensity (dB) in Korean-English bilingual speakers’ L2 English productions.
  βSE(β)tp(>|t|)
Word duration proportion(Intercept)9.2950.59415.64 
Low vs. High1.0010.3522.8420.005**
NP1 vs. NP23.4570.7874.3950.001***
NP1 vs. RC18.2812.0818.7840.000***
Low vs. High × NP1 vs. NP2–1.9030.498–3.8230.000***
Low vs. High × NP1 vs. RC–1.7340.498–3.4820.001***
Pause duration proportion(Intercept)0.5530.4151.333 
Low vs. High0.6460.4991.2940.211
After NP1 vs. After NP21.9360.3884.9880.000***
Low vs. High × After NP1 vs. After NP2–0.1480.559–0.2940.792
Mean pitch(Intercept)86.7091.6851.628 
Low vs. High0.4210.251.6830.093
NP1 vs. NP2–0.80.246–3.2580.001**
NP1 vs. RC–2.0610.246–8.390.000***
Low vs. High × NP1 vs. NP2–0.8210.354–2.3220.021*
Low vs. High × NP1 vs. RC–0.3160.354–0.8920.373
Mean intensity(Intercept)58.6891.1650.582 
Low vs. High0.7850.3412.3040.022*
NP1 vs. NP2–1.6720.334–5.0050.000***
NP1 vs. RC–4.390.334–13.1390.000***
Low vs. High × NP1 vs. NP2–1.2320.481–2.5610.011*
Low vs. High × NP1 vs. RC–0.6970.481–1.4480.148

4.3.1 Word duration

In both conditions, the duration proportion of NP1 was smaller than that of NP2, which in turn was smaller than the duration proportion of RC. Again, the duration of RC was prominently longer than those of the nouns, because it was a full clause while NP1 and NP2 consisted of a single lexical item. The interaction between position and attachment was significant, indicating that the Korean-English bilingual speakers used word duration as a cue for attachment ambiguity resolution. They produced a noun with longer duration when it was the head of the RC (NP2 in the low attachment condition, NP1 in the high attachment condition) compared to when the other noun was the RC head.

4.3.2 Pause duration

In both attachment conditions, pauses were longer after NP2 than after NP1. There was no significant interaction between position and attachment. This indicates that the Korean-English bilingual speakers always had a longer pause after NP2 regardless of attachment and did not use pause duration for disambiguation.

4.3.3 Mean pitch

Both low and high attachment conditions show a consistent pitch decrease across the phrase. The interaction between position and attachment on the mean pitch values of NP1 and NP2 was statistically significant. The mean pitch of a noun was higher when it was the head of the RC (NP2 in the low attachment condition, NP1 in the high attachment condition) compared to when the other noun was the RC head. In other words, the Korean-English bilingual speakers manipulated the relative pitch of the two nouns to resolve attachment ambiguity.

4.3.4 Mean intensity

In both attachment conditions, the mean intensity of NP1 was greater than the mean intensity of NP2, which in turn was greater than that of RC. The interaction effect between position and attachment on the mean intensity of NP1 and NP2 was statistically significant, indicating that the mean intensity of a noun was greater when it was the head of the RC (NP2 in the low attachment condition, NP1 in the high attachment condition) compared to when the other noun was the RC head. This finding indicates that the Korean-English bilingual speakers manipulated the relative intensity of the two nouns to resolve attachment ambiguity.
To summarize the Korean-English bilingual speakers’ L2 English production results, the interaction between position and attachment was found to be significant for word duration (p<0.001), mean pitch, and mean intensity (p<0.05 for both), but not for pause duration. While NP2 was generally longer than NP1, the difference was larger when NP2 was the head of the RC compared to when NP1 was the RC head. In addition, while NP1 in general had higher pitch and intensity than NP2, the difference was greater when NP1 was the head of the RC compared to when NP2 was the RC head. No significant use of pause duration was found. These results suggest that to resolve ambiguity, Korean-English bilingual speakers manipulated the relative prominence of the nouns, represented as duration, pitch, and intensity.

5 Discussion

5.1 English versus Korean as L1

The first goal of this study was to compare English speakers and Korean speakers in terms of their use of prosodic cues to resolve ambiguity in their native languages. First, the English speakers used a combination of boundary cues and word prominence cues to disambiguate RC attachment. They inserted pauses between words to mark prosodic boundaries at different positions, depending on the intended attachment. When the RC was attached to the noun that was lower in the syntactic structure (NP2), a pause was inserted before NP2, marking a stronger prosodic juncture between (NP1) and (NP2 RC). When the RC was attached to the higher noun (NP1), a pause was inserted before RC, indicating a stronger prosodic juncture between (NP1 NP2) and (RC). Previous studies have reported similar use of prosodic boundaries by English speakers in the resolution of other types of syntactic ambiguities (Allbritton et al., 1996; Kraljic & Brennan, 2005; Price et al., 1991; Schafer et al., 2005; Speer et al., 1996). The English speakers also manipulated relative prominence of words for RC attachment ambiguity resolution. Nouns were produced with greater prominence (higher pitch and greater intensity) when they were heads of and modified by the RC than when they were not. The role of word prominence in English listeners’ comprehension of ambiguous structure has been observed in earlier processing studies (Jun & Bishop, 2015b; E.-K. Lee & Watson, 2011; Schafer et al., 1996). Adding to such studies on the effects of word prominence in processing, the current study found that English speakers manipulate word prominence for disambiguation in their utterances, suggesting that the effect of word prominence is present in production as well as in processing.
However, the English speakers did not show significant use of word duration for disambiguation. This result is likely due to the fact that lengthening can be used for both prosodic boundary marking and head prominence (Wagner & Watson, 2010). In the low attachment reading, (NP1) (NP2 RC), NP1 could be lengthened as a cue for the following prosodic boundary, while NP2 could also be lengthened for prominence as the RC head. Similarly, in the high attachment reading, (NP1 NP2) (RC), NP1 could be lengthened for head prominence, while NP2 could also be lengthened to mark the following prosodic boundary. There has been disagreement on the scope of phrase-final lengthening in English, with the locus of lengthening variously identified as segment, rhyme, syllable, and word (e.g., Cho et al., 2013; Klatt, 1975; Price et al., 1991; Turk & Shattuck-Hufnagel, 2007; Wightman et al., 1992). It is thus possible that further analysis of the data might show an effect of pre-boundary lengthening in domains smaller than the word.
In contrast to English, Korean relies mainly on boundary phenomena, including pre-boundary word lengthening and pauses, for disambiguation. When the RC modified the lower noun (NP1), NP1 was lengthened and followed by a longer pause, marking a stronger prosodic juncture between the two nouns, as in (RC NP1)(NP2). In contrast, when the RC was attached to the higher noun (NP2), RC was lengthened and followed by a longer pause, indicating a strong prosodic juncture between (RC) and (NP1 NP2). This finding conforms to the previous studies by Jun (2005b), who found pauses at phrase boundaries optionally accompanied by pre-boundary lengthening, as well as by Baek and Yun (2018), who found an effect of the number of syntactic phrase boundaries on the strength of prosodic juncture, reflected in pre-boundary lengthening and pauses. In the same vein, prosodic boundaries were also reported to affect syntactic ambiguity interpretation by Korean listeners (Jun, 2003; Kang & Speer, 2003).
In addition to the boundary cues, the Korean speakers manipulated the intensity of NP2 according to the intended reading. The mean intensity of NP2 was greater in the high attachment productions than in the low attachment productions. As NP2 is the head of the RC in the high attachment reading, it is likely that the greater intensity was used to mark head prominence. If this is true, it would suggest that even edge-marking languages, such as Korean, may make use of prosodic prominence cues to distinguish meaning. The use of prosodic prominence in Korean has also been reported in the domain of lexical ambiguity resolution. In Korean, an interrogative sentence containing a wh-phrase (e.g., nwuka ‘who’ or ‘anyone’) is ambiguous, as it can be interpreted as either a yes-no question, a wh question, or an incredulity question. Jun and Oh (1996) found that native speakers’ productions with different intended interpretations differed not only in prosodic phrasing but also in the pitch range and intensity peak on the wh-phrase, although these prominence cues were not used as consistently across speakers as prosodic phrasing, and prosodic phrasing cues had a stronger effect on the perception of the sentences than prominence cues. Similarly, since the effect size of the intensity use in this study was relatively small compared to that in the L1 English speakers’ productions (t=2.157,p=0.031 for L1 Korean vs. t=12.022,p<0.001 for L1 English), further investigation would be needed before reaching any conclusion on whether Korean has a prosodic system of word prominence using intensity. Another possibility lies in that the Korean population used in this study are Korean-English bilingual speakers immersed in an English-speaking environment. As they are exposed to English on a daily basis, it is possible that their use of intensity for disambiguation is an effect of English prosody on their L1 Korean production. If this is true, it remains to be answered why only intensity and not pitch was the subject of L2-to-L1 transfer. I leave this to future studies.

5.2 Korean-English bilingual speakers’ production of L2 English

The second goal of this study was to investigate the use of prosodic cues for disambiguation in English by Korean-English bilingual speakers. Given the similarities and differences between the prosodic systems of English and Korean, it was hypothesized that Korean-English bilingual speakers would use word duration and pause duration as boundary cues in the same way as native English speakers do and that their use of pitch and intensity as prominence cues would be different from that of native English speakers. However, the results of this study did not support these hypotheses.
The Korean-English bilingual speakers increased the duration, pitch, and intensity of a noun that was the head of the RC and decreased these measures for the other noun. This demonstrates that these speakers resolved ambiguity by manipulating relative word prominence (i.e., by placing pitch accents on the noun that heads the RC and by de-accenting the other noun), suggesting that they have successfully learned the inventory of the phonological categories of pitch accents in the English intonation system. However, the degrees to which they scaled these cues, represented by the interaction effect sizes in the statistical results, were considerably weaker compared to those in the English L1 speakers’ production. In the Korean-English bilingual speakers’ productions, the effect sizes of the interaction between word position (NP1 vs. NP2) and attachment condition (low vs. high) were t=2.322 (p=0.021) for mean pitch and t=2.561 (p=0.011) for mean intensity. In contrast, in the English L1 speakers’ productions, the effect sizes were t=7.302 (p<0.001) for mean pitch and t=12.022 (p<0.001) for mean intensity. In other words, although the Korean-English bilingual speakers have learned to use the phonological category of pitch accents, their scaling of acoustic cues to realize this category is still far from the native target. This finding is in line with the L2 Intonation Learning Theory (LILt), which postulates that different dimensions of L2 intonation pose different degrees of learning difficulty for L2 learners (Mennen, 2015). The LILt defines four dimensions of L2 intonation: the phonological dimension, the phonetic dimension, the semantic dimension, and the frequency dimension. These dimensions are expected to constitute different levels of difficulty in L2 learning depending on whether two languages are similar or different in each dimension. On the relative difficulty of learning of the phonological and phonetic dimensions, L2 speakers in earlier studies showed more difficulty in the phonetic dimension than in the phonological dimension, the same pattern found in the current study (Backman, 1979; O’Brien & Gut, 2010; Trofimovich & Baker, 2006).
Moreover, the Korean-English bilingual speakers did not show significant use of boundary marking cues for disambiguation, such as pre-boundary lengthening and pauses. Instead of using word duration as a cue for boundary marking as they do in their native language, they used duration as a cue for head prominence by lengthening a noun when it was the head of the RC (i.e., NP1 in the high attachment reading, NP2 in the low attachment reading). In addition, Korean-English bilingual speakers did not resolve ambiguity using pauses. Regardless of the attachment condition, they consistently inserted a longer pause after NP2, demarcating the prosodic phrases (NP1 NP2) and (RC). While it has been found that speakers in general tend to pause more often in their L2 than in their L1 (Riazantseva, 2001), it is still not clear from the literature at what juncture L2 speakers are more likely to pause. Moreover, this general tendency does not explain the current data, in which the grouping of words by syntactic junctures should differ depending on the intended syntactic structure (i.e., attachment of the RC). If it is the case that L2 speakers generally tend to pause more often at syntactic junctures, it is expected that their different attachment productions would differ in the relative pause durations at NP1-NP2 vs. NP2-RC boundaries. On the contrary, the current study found that they paused longer at NP2-RC boundaries regardless of the intended syntactic structure. This is a significant finding showing that they do not produce prosodic junctures to represent syntactic junctures. Rather, the prosodic grouping produced by the Korean-English bilingual speakers in this study, (NP1 NP2) (RC), is consistent with the general tendency in English for a prosodic break to be located close to the middle of the sentence, yielding an approximately symmetrical, or balanced, structure (Cooper & Paccia-Cooper, 1980; Fodor, 1998; Gee & Grosjean, 1983). Thus, the bilingual speakers seem to have followed this general tendency for prosodic bisection regardless of intended reading, failing to produce prosodic boundaries to mark corresponding syntactic boundaries for ambiguity resolution.
In short, the Korean-English bilingual speakers made better use of the L2 prosodic aspects that differ from their L1 prosody (word prominence) than those that are shared by their L1 and L2 (boundary marking). Some empirical studies on L2 segment learning also suggest that a novel L2 sound is more learnable than an L2 sound that has a similar counterpart in the L1 inventory. For example, in Aoyama et al.’s (2004) one-year longitudinal study, Japanese children showed a greater improvement in the perception and production of English /ɹ/, a novel sound to Japanese speakers, than that of English /l/, which is phonetically more similar to Japanese /r/. According to Flege’s (1995) Speech Learning Model, an L2 segment that is perceptually distinct from any L1 sound is more likely to be learned as a new independent phonetic category by L2 learners. On the contrary, an L2 sound that is perceptually similar to an L1 sound will not be perceived as a separate category and thus less successfully learned. The current findings suggest that this hypothesis of the Speech Learning Model can be adopted to L2 prosody learning. The pitch accent system in English is perceptually distinct from Korean prosody, which lacks a word prominence system. Prosodic boundaries in English, on the other hand, are phonetically similar to those in Korean. Consequently, the Korean-English bilingual speakers might have established new category representations for pitch accents, namely, relative pitch targets for tones and their alignment with lexical items, while having difficulty doing so for prosodic boundaries, such as target durations of a lengthened unit and its subsequent pause as well as the placement of prosodic boundaries in relation to syntactic boundaries. Therefore, a theory of L2 prosody learning must be able to account not only for learning difficulties predicted by typological differences across languages but also those arising from perceptual distances between L1 and L2 prosodic categories.
This study used a laboratory speech task with a construction that is relatively infrequent in spontaneous speech, which may raise a potential concern regarding the generalizability of the results. The motivation behind this design was to pay closer attention to the use of prosody while minimizing possible interference of segmental, lexical, and even contextual variables. The read-aloud task with no interlocutor created an environment in which prosody was the only cue that the speakers could freely use to communicate their intention. The fact that the results showed clear differences between L1 English and Korean-English bilingual speakers’ L1 Korean productions as well as between L1 English and Korean-English bilingual speakers’ L2 English productions highlights that prosody is indeed employed in different ways across languages even in the same context and can be an independent source of L2 learning difficulties.

6 Conclusion

This study investigated the use of prosodic cues for syntactic ambiguity resolution by native speakers of English and Korean-English bilingual speakers. The results of the production experiment indicated that the English speakers used both boundary marking (pause) and relative word prominence (elevated pitch and intensity) for disambiguation, while in L1 Korean, the Korean-English bilingual speakers mainly relied on boundary marking (pre-boundary lengthening and pause). In L2 English, the Korean-English bilingual speakers used word prominence for disambiguation, although the details of their use of phonetic cues were still different from those of native English speakers. In addition, these speakers did not show a significant use of boundary marking, despite the similarities between English and Korean, suggesting that a similarity between L1 and L2 does not always lead to positive transfer and that L2 learners may face more difficulties than those predicted by cross-linguistic prosodic differences.

Acknowledgments

I would like to thank the anonymous reviewers and the audience at the 177th Meeting of the Acoustical Society of America and the 94th Annual Meeting of the Linguistic Society of America. I owe special thanks to Professor Ellen Broselow and Professor Jiwon Yun for their support and helpful feedback. Finally, I thank the anonymous reviewers for their constructive criticism and suggestions. All remaining errors are mine.

Authors note

Hyunah Baek is now affiliated with Division of Liberal Arts and Sciences, Gwangju Institute of Science and Technology, Gwangju, Republic of Korea.

Funding

The author disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the National Science Foundation under Grant IBSS-1519908.

ORCID iD

Footnote

1. I follow Y.-K. Kim (1997) and M.-J. Kim’s (2002) analysis to consider one-word prenominal modifiers in Korean (e.g., pucilenha-n) as one-word relative clauses consisting of a verbal/adjectival stem and a complementizer morpheme. Whether these one-word modifiers are adjectives or relative clauses is still controversial. However, different assumptions concerning the representation and internal structure of the modifiers do not affect their syntactic relationship with the modified phrases and the attachment ambiguity structure examined in this article.

Appendix

List of Stimuli

List of English stimuli

(1) Jennifer blackmailed the boss of the clerk who was dishonest.
(2) Stacey wanted to invite the friend of the secretary who was Asian.
(3) Sara met the sister of the actress who was pregnant.
(4) The drunk man hit the brother of the neighbor who was yelling.
(5) George questioned the brother of the soldier who recently got divorced.
(6) Peter met the uncle of the guest who was a well-known boxer.
(7) The driver talked to the guides of the tourists who were waiting in line.
(8) The receptionist called in the clients of the lawyers who were arguing loudly.
(9) Susanna was dating the cousin of the famous artist who was a veteran.
(10) Rob talked to the coach of the champion swimmer who had a daughter.
(11) Linda helped to carry the child of the young woman who was upset.
(12) Wendy saw the teachers of the naughty students who were outside.

List of Korean stimuli

(1) Pucilenha-n sonye-uy enni-ka cemsim-ul mek-koiss-da.
Diligent-comp girl-poss older.sister-nom lunch-acc eat-prog-dec
(2)  Yumyengha-n nam-haksayng-uy nwuna-ka hakkyo-ey wa-ss-da. Famous-comp male-student-poss older.sister-nom school-lat come-past-dec
(3) Wus-koiss-nun aki-uy emeni-ka nampyen-ul macihay-ss-da. Smile-prog-comp baby-poss mother-nom husband-acc greet-past-dec
(4) Nol-koiss-nun sonyen-uy yetongsayng-ul tolbwacu-ess-da. Play-prog-comp boy-poss younger.sister-acc take.care-past-dec
(5) Cenhwa-lul pat-koiss-nun namca-uy ayin-i pigonhay-poy-ess-da. Phone-acc take-prog-comp man-poss girl.friend-nom tired-look-past-dec
(6) Kyosil-ey tuleka-n haksayng-uy tongsayng-un pay-ka  Classroom-lat enter-comp student-poss brother-top stomach-nom  kopa-ss-da. hungry-past-dec
(7) Nolay-lul coaha-nun aki-uy nwuna-ka cam-eysey kkay-ss-da. Song-acc like-comp baby-poss older.sister-nom sleep-abl wake.up-past-dec
(8) Byengwen-ese ilha-nun kyoswu-uy puin-i cenhwa-lul kelewa-ss-da. Hospital-loc work-comp professor-poss wife-nom phone-acc call-past-dec
(9) Byenhosa-in alumtawu-n yeca-uy namtongsayng-i mal-ul Lawyer-comp beautiful-comp woman-poss younger.brother-nom talk-acc  kkenay-ss-da. start-past-dec
(10) Seonggongha-n celm-un namca-uy nwuna-ka chinkwu-tul-ul  Succeed-comp young-comp man-poss older.sister-nom friend-PL-acc manna-ss-da. meet-past-dec
(11) Wus-koiss-nun wuswuha-n haksayng-uy citokyoswu-ka yiyaki-lul  Smile-prog-comp excellent-comp student-poss advisor-nom talk-acc  sicakhay-ss-da.  start-past-dec
(12) Ca-koiss-nun kwiyewu-n aki-uy apeci-lul mellise poa-ss-da. Sleep-prog-comp cute-comp baby-poss father-acc from.distance see-past-dec

Appendix B

Mixed-effects regression models

L1 English productions

(1). Word duration
lmer(word_duration ∼ attachment * position + (1|participant) + (1|sentence), data = engL1)
(2). Pause duration
lmer(pause_duration ∼ attachment * position + (1|participant), data = engL1)
(3). Mean pitch
lmer(semitone ∼ attachment * position + (1+position|participant) + (1+position|sentence), data = engL1)
(4). Mean intensity
lmer(intensity ∼ attachment * position + (1+position|participant) +(1+position|sentence), data = engL1)

Korean-English bilingual speakers’ L1 Korean productions

(1). Word duration
lmer(word_duration ∼ attachment * relevel(position, ref = “RC”) + (1|participant) + (1|sentence), data = korL1)
(2). Pause duration
lmer(pause_duration ∼ attachment * relevel(position, ref = “After RC”) + (1+position|participant) + (1|sentence), data = korL1)
(3). Mean pitch
lmer(semitone ∼ attachment * relevel(position, ref = “NP1”) + (1|participant) + (1|sentence), data = korL1an)
(4). Mean intensity
lmer(intensity ∼ condition * relevel(position, ref = “NP1”) + (1|participant) + (1+position|sentence), data = korL1)

Comparisons between L1 English productions and Korean-English bilingual speakers’ L2 English productions

(1). Word duration
lmer(word_duration ∼ attachment * position * group + (1|participant), data = eng_merged)
(2). Pause duration
lmer(pause_duration ∼ attachment * position * group + (1|participant), data = eng_merged)
(3). Mean pitch
lmer(semitone ∼ attachment * position * group + (1|participant), data = eng_merged)
(4). Mean intensity
lmer(intensity ∼ attachment * position * group + (1|participant), data = eng_merged

Korean-English bilingual speakers’ L2 English productions

(1). Word duration
lmer(word_duration ∼ attachment * position + (1+position|sentence), data = engL2)
(2). Pause duration
lmer(pause_duration ∼ attachment * position + (1+condition|participant) + (1|sentence), data = engL2)
(3). Mean pitch
lmer(semitone ∼ attachment * position + (1|participant) + (1|sentence), data = engL2)
(4). Mean intensity
lmer(intensity ∼ attachment * position + (1|participant) + (1|sentence), data = engL2)

References

Allbritton D. W., McKoon G., Ratcliff R. (1996). Reliability of prosodic cues for resolving syntactic ambiguity. Journal of Experimental Psychology: Learning Memory and Cognition, 22, 714–735.
Aoyama K., Flege J. E., Guion S. G., Akahane-Yamada R., Yamada T. (2004). Perceived phonetic dissimilarity and L2 speech learning: The case of Japanese /r/ and English /l/ and /r/. Journal of Phonetics, 32, 233–250.
Backman N. (1979). Intonation errors in second-language pronunciation of eight Spanish-speaking adults learning English. Interlanguage Studies Bulletin, 4(2), 239–265.
Baek H., Yun J. (2018). Prosodic disambiguation of syntactically ambiguous phrases in Korean. MIT Working Papers in Linguistics, 88, 89–100.
Barr D. J., Levy R., Scheepers C., Tily H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68, 255–278.
Bates D., Mächler M., Bolker B. M., Walker S. C. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48.
Beckman M. E., Pierrehumbert J. B. (1986). Intonational structure in Japanese and English. Phonology Yearbook, 3, 255–309.
Boersma P., Weenink D. (2019). Praat: Doing phonetics by computer [computer program] (version 6.1.07). http://www.praat.org
Cho T., Kim J., Kim S. (2013). Preboundary lengthening and preaccentual shortening across syllables in a trisyllabic word in English. Journal of the Acoustical Society of America, 133(5), EL384–EL390.
Clifton C., Carlson K., Frazier L. (2002). Informative prosodic boundaries. Language and Speech, 45(2), 87–114.
Cooper W. E., Paccia-Cooper J. (1980). Syntax and speech. Harvard University Press.
Dussias P. E. (2003). Syntactic ambiguity resolution in L2 learners: Some effects of bilinguality on L1 and L2 processing strategies. Studies on Second Language Acquisition, 25, 529–557.
Flege J. E. (1995). Second language speech learning: Theory, findings, and problems. In Strange W. (ed.), Speech perception and linguistic experience: Issues in cross-language research (pp. 233–277). York Press.
Fodor J. D. (1998). Learning to parse? Journal of Psycholinguistic Research, 27(2), 285–319.
Fodor J. D. (2002). Prosodic disambiguation in silent reading. In Hirotani M. (ed.), Proceedings of the Northeast Linguistic Society 32 (pp. 112–132).
Gee J. P., Grosjean F. (1983). Performance structures: A psycholinguistic and linguistic appraisal. Cognitive Psychology, 15(4), 411–458.
Hirschberg J., Avesani C. (1997). The role of prosody in disambiguating potentially ambiguous utterances in English and Italian. In Botinis A., Kouroupetroglou G., Carayannis G. (eds.), Intonation: Theory, models and applications (pp. 189–192). ESCA.
Jackson C. N., O’Brien M. G. (2011). The interaction between prosody and meaning in second language speech production. Unterrichtspraxis, 44(1), 1–11.
Jun S.-A. (2003). Prosodic phrasing and attachment preferences. Journal of Psycholinguistic Research, 32(2), 219–249.
Jun S.-A. (2005a). Intonational phonology of Seoul Korean revisited. UCLA Working Papers in Phonetics, 104, 14–25.
Jun S.-A. (2005b). Korean intonational phonology and prosodic transcription. In Jun S.-A. (ed.), Prosodic typology: The phonology of intonation and phrasing (pp. 9–54). Oxford University Press.
Jun S.-A. (2014). Prosodic typology: By prominence type, word prosody, and macro-rhythm. In Jun S.-A. (ed.), Prosodic typology II: The phonology of intonation and phrasing (pp. 520–539). Oxford University Press.
Jun S.-A., Bishop J. (2015a). Priming implicit prosody: Prosodic boundaries and individual differences. Language and Speech, 58(4), 459–473.
Jun S.-A., Bishop J. (2015b). Prominence in relative clause attachment: Evidence from prosodic priming. In Frazier L., Gibson E. (eds.), Explicit and implicit prosody in sentence processing: Studies in honor of Janet Dean Fodor. vol. 46 of the Studies in Theoretical Psycholinguistics (pp. 217–240). Springer.
Jun S.-A., Oh M. (1996). A prosodic analysis of three types of wh-phrases in Korean. Language and Speech, 39(1), 37–61.
Kang S., Speer S. R. (2003). Prosodic disambiguation of syntactic clause boundaries in Korean. In Garding G., Tsujimura M. (eds.), Proceedings of the 22nd West Coast Conference on Formal Linguistics (pp. 259–272).
Kim M.-J. (2002). Does Korean have adjectives. MIT Working Papers in Linguistics, 43, 71–89.
Kim Y.-K. (1997). Agreement phrases in DP. UCL Working Papers in Linguistics, 9, 1–24.
Klatt D. H. (1975). Vowel lengthening is syntactically determined in a connected discourse. Journal of Phonetics, 3, 129–140.
Kraljic T., Brennan S. E. (2005). Prosodic disambiguation of syntactic structure: For the speaker or for the addressee? Cognitive Psychology, 50, 194–231.
Kuznetsova A., Brockhoff P. B., Christensen R. H. B. (2017). lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82(13), 1–26.
Lee A., Xu Y. (2012). Revisiting focus prosody in Japanese. In Ma Q., Ding H., Hirst D. (eds.), Speech prosody 2012 (pp. 274–277).
Lee E.-K., Watson D. G. (2011). Effects of pitch accents in attachment ambiguity resolution. Language and Cognitive Processes, 26(2), 262–297.
Lee Y., Wang B., Chen S., Adda-Decker M., Amelot A., Nambu S., Liberman M. (2015). A crosslinguistic study of prosodic focus. 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 4754–4758.
Lee Y.-c., Xu Y. (2010). Phonetic realization of contrastive focus in Korean. Speech Prosody 2010, paper 033, 1–4.
Mennen I. (2015). Beyond segments: Towards a L2 intonation learning theory. In Herment S., Delais-Roussarie E., Avanzi M. (eds.), Prosody and language in contact: L2 acquisition, attrition and languages in multilingual situations (pp. 171–188). Springer.
O’Brien M., Gut U. (2010). Phonological and phonetic realisation of different types of focus in L2 speech. In Wrembel M., Kul M., Dziubalska-Kolaczyk K. (eds.), Achievements and perspectives in SLA of speech: New sounds 2010 (pp. 205–215). Peter Lang.
O’Brien M. G., Jackson C. N., Gardner C. E. (2014). Cross-linguistic differences in prosodic cues to syntactic disambiguation in German and English. Applied Psycholinguistics, 35(1), 27–70.
Pierrehumbert J. B. (1980). The phonology and phonetics of English intonation (Unpublished doctoral dissertation). Massachusetts Institute of Technology.
Pierrehumbert J. B., Hirschberg J. (1990). The meaning of intonational contours in the interpretation of discourse. In Cohen P. R., Morgan J., Pollack M. E. (eds.), Intentions in communication (pp. 271–311). MIT Press.
Price P., Ostendorf M., Shattuck-Hufnagel S., Fong C. (1991). The use of prosody in syntactic disambiguation. Journal of the Acoustical Society of America, 90(6), 2956–2970.
R Core Team. (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing. http://www.R-project.org/
Riazantseva A. (2001). Second language proficiency and pausing: A study of Russian speakers of English. Studies in Second Language Acquisition, 23(4), 497–526.
Schafer A. J., Carter J., Clifton J., Charles Frazier L. (1996). Focus in relative clause construal. Language and Cognitive Processes, 11, 135–163.
Schafer A. J., Speer S. R., Warren P. (2005). Prosodic influences on the production and comprehension of syntactic ambiguity in a game-based conversation task. In Trueswell J. C., Tanenhaus M. K. (eds.), Approaches to studying world-situated language use: Psycholinguistic, linguistic, and computational perspectives on bridging the product and action tradition (pp. 209–225). MIT Press.
Shen X. S. (1993). The use of prosody in disambiguation in Mandarin. Phonetica, 50, 261–271.
Snedeker J., Trueswell J. (2003). Using prosody to avoid ambiguity: Effects of speaker awareness and referential context. Journal of Memory and Language, 48, 103–130.
Speer S. R., Kjelgaard M. M., Dobroth K. M. (1996). The influence of prosodic structure on the resolution of temporary syntactic closure ambiguities. Journal of Psycholinguistic Research, 25(2), 249–271.
Straub K. (1996). Prosodic cues in syntactically ambiguous strings: An interactive speech planning mechanism. In The 4th International Conference on Spoken Language Processing (pp. 1640–1643).
Trofimovich P., Baker W. (2006). Learning prosody and fluency characteristics of second language speech: The effect of experience on child learners’ acquisition of five suprasegments. Applied Psycholinguistics, 28, 251–276.
Turk A. E., Shattuck-Hufnagel S. (2007). Multiple targets of phrase-final lengthening in American English words. Journal of Phonetics, 35, 445–472.
Venditti J. J. (1994). The influence of syntax on prosodic structure in Japanese. OSU Working Papers in Linguistics, 44, 191–223.
Vigário M. (2003). Prosody and sentence disambiguation in European Portuguese. Catalan Journal of Linguistics, 2, 249–278.
Wagner M., Watson D. G. (2010). Experimental and theoretical advances in prosody: A review. Language and Cognitive Processes, 25(7–9), 905–945.
Wightman C. W., Shattuck-Hufnagel S., Ostendorf M., Price P. J. (1992). Segmental durations in the vicinity of prosodic phrase boundaries. Journal of the Acoustical Society of America, 91(3), 1707–1717.
Wu W. L., Xu Y. (2010). Prosodic focus in Hong Kong Cantonese without post-focus compression. Speech Prosody 2010, paper 040, 1–4.
Yang P.-L. (2010). English language proficiency and production of prosodic disambiguation: A preliminary study of Taiwanese English learners. English Language & Teaching, 34(3), 51–84.