Ranking Competencies of Oral Output: A Unit of Analysis for Low-Proficient L2 Speakers

The evaluation of spoken language requires a rigid structure to make the analysis reliable, however, it is challenging to set the right criteria that work for different types of speech samples. Spoken language may be evaluated from different perspectives, depending on what to look at (e.g., intelligibility, Complexity, Accuracy, and Fluency). The former tend to concern a holistic evaluation of the spoken data, whereas the latter involves numeric measurements. Although they both analyse oral data, they do not seem to coexist in the same research field. Also, the research tools in both fields do not cater for evaluating data produced by speakers of low proficiency. This is because once a speech is labelled “low”, there are no additional classifications for further analysis. This paper, therefore, attempts to create those further categories. In this paper, intelligibility and existing methods for analysing spoken data in Second Language Acquisition (SLA) are reviewed to see if there are some overarching themes in teaching, assessing, and analysing spoken languages. Some of the issues from intelligibility and SLA are, then, delineated for designing a unit of analysis. The paper, finally, proposes the hierarchical C-unit, which is designed to deal with oral data produced by low-proficient speakers.

the locations of syllables differently from that of the Singapore English speaking participants. Also, the Singaporean speakers showed sensitivity to their own English variety in differentiating stress, which was not seen in speakers from other regions. Tan maintains that Singapore English speakers should be regarded as native speakers of their own English variety, rather than being categorised as non-native speakers of English used in the Inner Circle. Interestingly, in another study of perception by Rivers (2011), different results were reported.
In Rivers's study,48 Japanese university students' attitudinal responses towards 10 speakers with different accents in English (1 Japanese speaker, 7 non-native speakers from different Asian countries, and 2 native speakers from the US and the UK) were looked at. He found that more than half of the Japanese participants thought that the Japanese speech sample was spoken by an American speaker, the American sample was spoken by a British speaker, and the Chinese sample was provided by a Japanese speaker. Unlike Tan's Singaporean study, the Japanese participants did not show any sensitivity towards the Japanese accented English. In this paper, Rivers revealed that the Japanese speakers valued the varieties of English represented by the native English speakers high; on the other hand, the varieties of English spoken with Asian accents were perceived negatively. From his study, Rivers suggests implications of English language teaching in Japan and that Japanese English learners are trapped in their inferiority complex. This complex is driven by a failure to achieve the unrealistic standard of native-like English imposed by their government, which prevents them to be open to the true diversity of English language. The negative notion of Asian accented of English varieties in Japanese learners needs to change before they can achieve a realistic goal that English is an international language. These types of research can certainly provide a great insight into how intelligible speakers of the Expanding Circle are to other non-native speakers or to native speakers. Only recently, studies like Tan's (2015) and River's (2011) started to examine the standpoint of non-native speakers of English, however, they have not been able to suggest any methods how non-native speakers should study to be more confident in their English as a Lingua Franca. These studies represent the very situation that Berns (2008) suggests in terms of ELF usage that non-native English speakers need to be confident in their own accents and not let them be the obstacle in communication. As traditional studies in intelligibility inherently focus on the differences between native and non-native accents and are still prevalent, as in Anderson-Hsieh, Johnson and Koehler's (1992) study, it seems that the ELF-WE divide would not dissolve anytime soon. Therefore, it would take a while to set the standard for determining what is intelligible and what is not from the ELF perspective. Another problem with incorporating the view of ELF into the classroom is that there is no universal definition of how intelligibility can be measured (Pickering, 2006). Nevertheless, since intelligibility is regarded as a prerequisite for successful communication (Sewell, 2010), constructing the means to evaluate and provide training for English learners to improve their intelligibility seem to be a logical step to take in the field of English as a Second or Foreign Language (ESL/EFL) teaching. www.scholink.org/ojs/index.php/selt Studies in English Language Teaching Vol. 5, No. 3, 2017 564 Published by SCHOLINK INC.

Intelligibility and Instruction
According to Jenkins (2004), more and more teaching materials have non-native speakers to expose learners to a variety of accents to emphasise that English is an international language. However, the English spoken by non-native speakers in these materials are intentionally made intelligible. Jenkins states that this is more for the teachers to raise awareness about English as an international language, rather than teaching students the accented pronunciation. Despite this recent change in English instruction, many teachers and learners prefer to target for native-like form of English although they also believe that learners should achieve international intelligibility and accented English is acceptable. Munro and Derwing (1995), in fact, found that accented English spoken by Cantonese, Japanese, Mandarin, Polish and Spanish speakers, did not influence the comprehensibility of speech. Similarly, in their later study (Munro & Derwing, 1999), 10 proficient Mandarin speakers and two native speakers of English were evaluated for intelligibility and comprehensibility. They again found that foreign accent did not interfere with comprehensibility. They further found that speakers who made grammatical mistakes tended to have pronunciation errors. Therefore, grammatical errors may lead to reduction of intelligibility, not due to foreign accents. Orikasa (2016) conducted a study on how Japanese speakers perceived intelligibility of speakers from different backgrounds (one female and one male speaker each from the US, China, Korea, and Vietnam). The researcher found that speech rate was the main contributor of unintelligibility. The speakers with the lowest intelligibility were the female US speaker and the male Vietnamese speaker. Orikasa concludes that the language pedagogy needs to be inline with the current diversity of English users. Finally, Thomson and Derwing (2015) reviewed the papers on pronunciation training given in the classroom and computer-assisted pronunciation teaching modes. While pronunciation training can be highly effective in the classroom setting, the time constraint was the biggest problem. Computer-assisted mode, on the other hand, has an advantage in this respect that learners can access to pronunciation training as often as they want. According to Thomson and Derwing, the studies on pronunciation training lacked the ecological validity as each study only focused on limited aspects of pronunciation (e.g., improvements on one or two phonemes), therefore, the results need to be interpreted with caution. Drawing from the papers on intelligibility, even the stakeholders in English education have mixed opinions and, therefore, the conflicting views seem to be manifested in the direction of English teaching and research that they are still contemplating on the target outcome of English education.

Assessment
Smith and Rafiqzad (Berns, 2008, p. 333) suggest that "a true evaluation of speakers' comprehensibility should be judged by both native and non-native speakers" and this applies to native speakers as well.
Regarding assessment related to English as a Lingua Franca and World Englishes, Elder and Davies (2006) proposed two models for ELF. According to them, ELF is understood in four ways: 1) English is used where some of the participants are non-native speakers, 2) English is used where all the participants are non-native speakers and from different language backgrounds, 3) English is used where intelligibility are yet to exist.

Analysing Spoken Data in Applied Linguistics
In the field of applied linguistics and SLA, oral output has been analysed from numerous aspects. As in intelligibility research mentioned in the previous section, pronunciation is well researched. Fluency is another aspect of speech that has been widely looked into (e.g., Lennon, 1990;Derwing, Munro, Thomson, & Rossiter, 2009;Bosker, Pinget, Quené , Sanders, & de Jong, 2012).   Lennon (1990) suggests two senses of fluency in an EFL context: a broad sense and a narrow sense.
The former refers to the global language performance and often is the highest point of a language proficiency scale, and the latter points at a component of oral proficiency among other factors, such as grammar and vocabulary knowledge. Apart from analyses of pronunciation and fluency, one of the most prominent methods of analysing language data is Complexity, Accuracy, and Fluency (CAF) that looks at grammatical complexity, grammatical accuracy and fluency in speech, which has been employed by many researchers (e.g., Yuan & Ellis, 2003;Iwashita, Brown, McNamara, & O'Hagan, 2008;Skehan & Foster, 2008). CAF may be employed in many language studies, however, it is not without issue. Norris and Ortega (2009) summarise how complexity was measured in different studies.
In their paper, they tabulated units of analysis used in different studies (1979 to 2007) that employed CAF. They found that each study used different methods of measuring complexity, that is, different units of analysis were found and some of them were not used in SLA. They then focused on 16 more recent studies from 2001 to 2008 to find out how complexity was measured. Compared to earlier studies, newer studies employed relatively similar methods from each other, but even so there were three main units (T-unit, C-unit, and AS-unit) of analysis found in these papers.

Unit of Analysis for Spoken Data
In Foster, Tonkyn and Wigglesworth's (2000) study, units for measuring spoken language were investigated. They found that more than half of the articles (44 out of 87) that they reviewed in four major academic journals in Applied Linguistics and SLA fields provided no definition of unit of measurements. The authors also found that the definitions of the units provided in the other half of the articles varied greatly in how much detail provided in defining these units. Three loosely categorised units are presented in their paper: mainly semantic units, mainly intonational units, and mainly syntactic units. Under "mainly semantic units", for instance, three different studies are categorised but they all have different names and definitions from each other. The definitions in this category rarely stand alone and usually accompanied by grammatical and intonational criteria, due the difficulties in relying only on semantic properties. With the "mainly intonational units" category, the authors mention that the studies listed under this category used problematic methods especially using intonation and pauses may not function as a unit when measuring the samples produced by L2/FL or dysfluent speakers that can have varieties of speech patterns. In the "mainly syntactic units", a unit is defined as sentence, idea unit, T-unit, or C-unit. The first definition, sentence, is problematic especially in spoken language, the authors do not elaborate on this matter. The idea unit is defined as a clause constructed by pre-and post-V elements. The most well known unit, T-unit, is used for both spoken and written data, which is generally defined as an independent clause or a dependent clause are counted as T-unit. This is, however, unworkable when data of dysfluent speakers are analysed because it would contain many incomplete clauses. C-unit seems to be the solution to the problem found in using T-unit. For example, Iwashita (2001)  C-unit, grammatical predications or a meaningful word that serves as an answer to a question, like "yes" or "no" are regarded as C-units. Rimmer (2010) states that the core of the problem is deciding what is grammatical or not in measuring grammatical complexity. For instance, he gives an example by showing a phase like "How awful he was!" which can be expressed as "How awful!" (p. 501) in an interaction, however, a problem arises when this phrase is being analysed for grammatical correctness that if this shorter utterance can be treated as grammatical or ungrammatical. If T-unit is used for analysing this phrase, it would not be regarded grammatical, on the other hand, C-unit would allow it to be counted as one unit. Finally, Foster, Tonkyn and Wigglesworth (2000) proposed the AS-unit, a syntactic unit, for segmenting oral discourse which is defined as a single speaker's unit consists of an independent clause or sub-clausal unit, with subordinate clauses. They claim that unlike T-unit, AS-unit allows C-unit like utterances that may not include a verb.
These units are, however, often unsuitable for dealing with data collected from dysfluent or low-proficient L2 users because once the data is under the threshold of "low" or "extremely low", no further analysis is conducted. This paper concerns this type of data that falls under such categories because a new unit of analysis capable of analysing the speech samples of low-proficient speakers is needed. It is, therefore, important to further analyse C-unit because low-proficient speakers often produce series of C-units in order to put their meaning across to their interlocutors. This would help understand the characteristics of many low-proficient and dysfluent speakers who exist in language classrooms in the Expanding Circle (i.e., Japan).
In an attempt to create a new unit of analysis, it would make sense to revisit some of the issues that were discussed in the previous section; especially, the issues related to intelligibility. The following is the list of the main issues about intelligibility that were discussed earlier.
• Foreign accents are not the major contributors to unintelligible speech.
• Grammatical error may lead to pronunciation errors.
• English as a second or foreign language speakers need to be confident in their pronunciation.
• Native speakers are not often the most intelligible speakers to non-native speakers, and non-native speakers may be more intelligible to other non-native speakers.
• Teachers and learners both prefer native-like oral output.
• Pronunciation training is effective but current studies lack ecological validity.
• Both teachers and learners should acknowledge a diverse variety in English.
From the list above, all the components in CAF may not need to be incorporated for very limited users of English. Complexity may be analysed by looking at C-units, rather than T-units or AS-units.
Accuracy may be determined by looking at grammatical features in speech samples and coherence in an utterance. As C-unit is already defined as a smallest meaningful word or phrase, determining if the C-unit is grammatical or ungrammatical could be a good way to grasp the levels of low-proficient speakers. Fluency may be determined by number of words uttered in a certain time frame. However, low-proficient speakers may have problems with uttering a sentence or a C-unit with a fast enough speech rate that can be captured as one chunk of unit because they often have long pauses between words or even within a word. Pronunciation may be looked at, however, it is challenging to assess pronunciation of already dysfluent speakers and what is acceptable and what is not at this level may primarily be raters' preference, and therefore, this component may not be as important as accuracy of syntax. Based on these assumptions, I propose the hierarchical C-unit for analysing spoken data collected from speakers with English with low proficiency.

Hierarchical C-unit
The current units for analysing speech data are not able to analyse the speech samples that are uttered by very low-proficient speakers. The unit I have adopted to create the hierarchical C-unit (HC-unit) is based on the C-unit (see Iwashita, 2001), which can also be defined as an independent clause with its modifiers (e.g., subordinate clauses) that cannot be further divided into smaller units without obscuring the basic message. Because a C-unit is usually considered as the smallest meaningful word, a phrase, or a sentence that can be both grammatical and ungrammatical, erroneous utterance can be analysed if it makes sense in the context where the utterance that contain C-units are spoken. However, it is important to analyse C-units beyond counting the number of units in a certain length of an utterance.
For the teachers and researchers who are usually working with L2 or FL speakers with very limited oral skills, it is beneficial for them to know the type and quality of learners' C-units. The HC-unit was designed to look at spoken data collected from English speakers from the Expanding Circle because this is where a large number of speakers with low English proficiency exist, therefore, the speech samples of speakers of English as a foreign language are the data I refer to in this paper, rather than that of from the speakers of English as a second language.

The Structure of the Hierarchical C-unit
This section explains the structure of the HC-unit and demonstrates how it can be used to analyse actual data. The following figure (Figure 1) exhibits the structure of the HC-unit. The top tier indicates a C-unit, which can be broken down into two main categories: Grammar related (G), and Vocabulary related (V). If a C-unit does not contain any errors, the categorisation would end here. However, if the C-unit contains errors, then the errors can be categorised in two different levels. If the error is not serious and it does not confuse the message to be put across, it can be put under G1 or V1. If the error is more serious and it confuses the message to be conveyed, then it can be sorted under G2 or V2-1/V2-2. It is common for language learners to choose incorrect words or expressions, or exhibits difficulties in expressing themselves, therefore, V2-1 and V2-2 can differentiate these instances. V2-1 may be used when a speaker chooses a wrong word or phrases for the context, however, V2-2 may include the situation when a speaker replaces a L2 word to his/her native language or a partial word/phrase. Unlike other methods of analysing oral data (e.g., CAF), analysing complexity by calculating mean length of utterance, T-unit, or C-unit, etc. is extremely challenging. The hierarchical C-unit's categories (G or V) and correctness (levels 1 or 2) can provide some basic information of accuracy and complexity.
The spoken data used in these samples are collected from Japanese university students with science and engineering background, who attended a first-year compulsory English course. The speakers were all regarded as low-proficient speakers of English as their TOEIC scores ranged from 180 to 300, or CEFR A1 to A2 equivalent. There are a few ways of collecting samples from FL speakers. For example, recordings can be obtained through a few different ways; interviews, monologues, dialogues, or tasks.
The following samples were collected through a task as a part of course assignment. The learners listened to interviews about how people in the UK feel about Christmas on BBC Learning English (2013). BBC provides downloadable audio files and transcripts for ESL/EFL learners. The audio file was uploaded to the university's Learning Management System (LMS), together with the task instruction. In the task, the learners were to record their opinions about the interviews of three UK speakers. The LMS allowed the person with an administrative role to set the maximum number of time each student could listen to the uploaded materials. In the task instruction, they were asked to record their own opinions about the three British speakers and upload MP3 files to the LMS. This is an easy way to collect speech samples, however, it is hard to control the length of the speech because some speakers may only manage to speak one sentence or phrase and some of them may be able to utter a few sentences. There is always a risk of students reading their own scripts, however, it is highly unlikely that TOEIC 180-300 speakers can suddenly speak like higher-level speakers. The topic of interview or tasks can also influence the length of speech greatly and the interlocutor's FL proficiency may deeply affect the outcome of samples, too. This is because the speakers with low proficiency usually have very limited knowledge of vocabulary and depending on their history of learning the target language, the range of vocabulary they can recognise and use may vary. Similarly, because these learners at this level are often unaware of the mistakes that their interlocutor makes and they often copy each others' expressions, if two low-proficient speakers engage in a dialogue, therefore, as mentioned in Elder and Davies's (2006) paper, it is important to choose an experienced interlocutor for assessment, which applies to data collection as well.

In the following examples, a one-second pause is represented by (.), and different numbers in brackets indicate actual length of pauses in seconds. A vertical slash | indicates a boundary of C-unit. Each new
C-unit is shown in a new line and the parts that are not counted as a C-unit are attached to the end of the previous C-unit. The hierarchical categories are in < >. Correct (or acceptable) C-units are not annotated.
1. Utterance with no serious mistakes: G1 and V1 a) 1 C-unit, 3 mistakes |I agree with woman <G1> (.) because her opinion is interesting, very compact <V1> and easy to (.) image <G1> Christmas| Example a) contains one C-unit. The first half of this C-unit can be analysed as G1 because the missing article "the" does not affect the meaning. In the subordinate clause contains V1 [very compact] and G1 [image].
b) 1 C-unit, 3 mistakes |I agree with the second man because as for <G1> the time to spend with a family <G1>, it is important for anyone <V1>| This sample was a little more fluent than other samples, however, it was short. The expression "because as for…" is awkward and it may be labelled as <G1>, however, the error is not serious and meaning is still clear. The V1 label indicates that "anyone" is a wrong word choice, which again, is not serious and 2. Utterance with a serious grammatical mistake: G2 c) 5 C-units, 4 mistakes |I think that I agree with their opinion <G1> because I (2) alway [always] happy <G2>| (6) I (2) |I think that I like Christmas| (2) |Christmas is events <G1>| |I like events| (2) |Events is very (3) good <G1>| The first C-unit contains an example of G2. The speaker probably wanted to say that he or she is always happy when the Christmas day comes, however, the coder or evaluator needs to make a deduction of the meaning from the context. The example d) has two long sentences. However, in the first sentence, the speakers says I agree with "a woman" then he goes on to elaborate the reason by saying "three of us" <V2-1>. The coder can only guess if the speaker wanted to say "three of them" (the British interviewees) or "those three opinions", or something different. Also, "are appropriate" makes it more complicated because it is unclear what this is referring to.
In the following section, analyses of different samples are demonstrated. This data was obtained from the similar type of students who were Science and Engineering undergraduate students. The task was not a part of their assessment and students were asked to provide their speech samples as research participants of a different study. This was a timed speech and students were asked to describe pictures as much as they could. The above examples show that three different students were describing about the same picture. The amount of information they put into their 30-second utterances is different. In e), "she is a beauty" may not be a mistake, however, considering the level of their target language, it is unimaginable that the sentence was uttered in the same way that more advanced speakers would. All the C-units with mistakes were not serious, however, this method of data collection can limit their speakers to have short sentences. These three examples, h) to j) show some typical utterances recorded from the speakers with higher proficiency than other students. Note that these students were still in the basic-level class whose TOEIC scores were under 300. In h) and i) the speaker mixed up the gender usage, however, because the task is picture description, it helped the coder to understand whom this speaker was referring to. If the speaker made the same mistake in the other task, it would have been confusing and the meaning can be obscured. In i), the fourth C-unit, "Comedy", is a valid example of a fragmented phrase which can be regarded as a C-unit. Similarly to i), the last C-unit from j) is fragmented, however, sometimes when speakers get excited while speaking, they talk to the pictures like in this example. these utterances produced within 30 seconds, therefore, they also spoke each word slower than the previous examples and had longer pauses and more self-corrections, which resulted in a fewer C-units.

Higher
In this data set, because the time for each descriptive task was set, the number of C-units within the time limit is a good indication of how fluent the speakers were. However, it is difficult to rank speakers with very different speaking styles like in k) and m). In k), for example, the speaker uttered 5 C-units with multiple errors while m) produced only 2 C-units with many long pauses. In terms of complexity, k) only used present tense with many mistakes, whereas m) may only uttered two C-units, they were without obvious mistakes and in the past tense, which is considered slightly more advanced than using the present simple. When comparing utterances like these, researchers need to be clear about what they are looking at. If fluency or amount of information per utterance is more important, k) may be a better speaker, however, if syntax is more important as discussed earlier in the intelligibility section, then m) may score higher.
In this paper, I have introduced the hierarchical C-unit with two different types of data, both collected from learners with TOEIC 180 to 300. As of 2016, the mean TOEIC score in Japan was 516, which ranked 41st out of 49 countries worldwide (ETS, 2016). The country with the lowest mean score was Indonesia with the score of 397. Therefore, the data dealt in this paper was even lower than the lowest mean score that verifies the purpose of using the data I used for presenting the HC-unit. The two different sample data sets showed that methods in collecting data brought out totally different types of utterances although they were both collected from EFL learners from the faculty of the same university and very similar TOEIC scores.
There are a few issues to consider when evaluating and analysing spoken data of low-proficient speakers and intelligibility. For instance, speaking with correct syntax may be important because foreign accents do not affect intelligibility. However, what is the difference between foreign accents and mispronunciation? The example d) was presented with speaker's very heavy accent. The words "think" and "possible" may be acceptable to be pronounced as "shink" and "posshibulu", however, how about "aplipliate" and "laber"? As in Tan's (2015) paper, Singapore English speakers are sensitive towards their own pronunciation, however, in Orikasa's (2016) paper, the Japanese students thought that the speech sample spoken by a Japanese speaker as American English. This indicates that Singapore accent is established and Singapore English speakers speak in more of less the same way.
English speakers in Japan, on the other hand, are probably not used to using English in every day situations, and therefore, they do not have an identity of speakers of Japanese English. EFL speakers in Japan may have a certain tendency in their English pronunciation, however, unlike Singapore English speakers, they do not possess the same norm and identity in their English, therefore, their pronunciation may differ greatly between speakers. The finding in Munro and Derwing's (1999) study suggested that speakers who made more syntactic errors, they tended to mispronounce words more frequently. I believe this is true because the accent in Singapore English is not due to syntactic errors and they know exactly what they mean. However, considering the fact that the speaker is a basic-level EFL user who made multiple syntactic errors, "aplipliate" is more likely a mispronunciation than an accent because the language identity may only be achieved through sharing the same accent and the its users feel that it is their language. In this sense, the acquiring grammatical knowledge is more important than fixing their pronunciation because the learners need to know exactly what they mean.

Conclusion
This paper attempted to incorporate some main ideas and arguments from two research areas that are closely related but not in a close collaboration into designing a new unit for analysing spoken data produced by very limited users of English as a foreign language. It is clear that the definition given for the hierarchical C-unit may not be applicable to different types of data and researchers may need to make some adjustments for the data type they have in hand. Because C-units can take so many different forms, counting the number of C-units alone is not enough for analysing the lower level learners because the quality and the complexity of a C-unit is what researchers and educators should look into more closely. Transcribing and finding C-units in a large quantity of spoken data are already time consuming, however, systematically labelling types of C-unit, searching for errors, and ranking the severity of errors may provide a good starting point to understand the kind of data that researchers are dealing with, and decide how they want to analyse from this point. In this respect, those two data sets have shown how differences in data format can influence in analysing them and how researchers need to adjust the way they analyse or even the method of data collection to ensure that they obtain the information they wish to work on. Either the field of intelligibility in World Englishes and English as a Lingua Franca or Second Language Acquisition, whatever the method is helping language learners with perfecting their grammatical skills seems to be the pivotal factor for improving their overall intelligibility.