A Preliminary Review of Eye Tracking Research in Interpreting Studies: Retrospect and Prospects

The field of Interpreting Studies (IS) has witnessed an exponential increase in the development of new data-gathering techniques aimed at investigating some of the underlying cognitive and psychological processes. The present article provides a preliminary look into research studies applying eye tracking technology in the field of IS over the past few decades. The present study also aims at exploring the theoretical basis for different applications of eye tracking equipment in the investigation of the cognitive processes underlying interpreting by analyzing empirical research studies related to cognitive aspects of translation. The sampled studies are analyzed in terms of the contribution they provide for the joint development of eye tracking research in IS, in terms of the methodology used and the way data are processed and presented. Finally, the present article concludes with a discussion on future research focusing on possible developments and applications of eye tracking to authentic interpreting situational contexts. The final section presents new challenges and opportunities for unexplored applications of eye tracking in the field of IS. It is argued that interdisciplinary approaches can show the full range of possibilities of eye tracking research in the field of IS.


Introduction
The fact that eye movements may be a source of information on human cognitive processes and psychological aspects is not a discovery of modern science. In ancient China the philosopher Mencius once stated: "Look at the pupil of a man's eye. How can he conceal his character?" (Note 1), thus perceiving the pupil as the door to an inner psychological world.
The literature abounds with papers reviewing and summing up researches and empirical studies using eye tracking equipment (Carpenter, 1988;Duchowski, 2002;Jacob & Karn, 2003;Rayner, 1978Rayner, , 1998 amongst others). Over the past few decades, numerous disciplines have conducted eye tracking research, including cognitive science, psychology, Human-Computer Interaction (HCI), marketing research, medical research (neurological diagnosis), and psycholinguistics, most notably with the visual world paradigm, which was initiated by Tanenhaus et al. (1995). As emphasized by Tanenhaus in a paper published a decade later "the visual world paradigm is now sometimes referred to as the action-based version of the visual world paradigm. Taking advantage of the advent of accurate lightweight head-mounted eye-trackers, [it is possible to] examine eye movements as participants follow instructions to perform simple tasks with objects in a workspace" (2007, p. 447).
Specific applications of eye tracking research include language reading, music reading, human activity recognition (Shinar, 2008) and the perception of advertising.
However, the possibility of extending the application of eye tracking techniques to research in Interpreting Studies (IS), perceived as an autonomous and relatively independent research field within the broader realm of Translation Studies (TS), is quite unexplored. Most of the existing studies are limited to different aspects of sight translation.
The pioneers of this new direction of research were McDonald & Carpenter (1981). The two authors posited a model of interpretation, parsing, and error recovery in Simultaneous Translation (ST), which is now more accurately referred to as Simultaneous Interpreting (SI), by analyzing two expert and two amateur interpreters' eye fixations in relation to their rendition of idiomatic expressions. The results showed that the eye fixation patterns differ according to whether the idioms are interpreted literally or idiomatically. The combination of cognitive studies with Translation and Interpreting (T&I) research can lay the basis for T&I software design or T&I training (Chang, 2011).
Over the past few decades, scholars have been using eye tracking as a methodology for research on the translation process (Chang, 2011;Hyönä et al., 1995;Pavlović & Jensen, 2009;Sharmin et al., 2008).

Literature Review
Eye tracking is a way to observe and scientifically measure the point of gaze or the motion of an eye relative to the head. In other words, eye trackers enable researchers to observe eye positions and eye movements through which they can understand the physical reactions and cognitive activities of the person performing cognitive tasks (Richardson et al., 2009 (Rayner, 1998(Rayner, , 2009) because eye movements seem to indicate the mental processes that take place during a given cognitive task (Just & Carpenter, 1980;Rayner, 1998).
Eyes are characterized by fixations (or gazes) which are the area or region the eye is looking at in a given period of time. The Gaze Duration (GD) is defined as "the sum of all fixation duration beginning with the first fixation in the region until the eye leaves the region, regardless of direction, either to the right or left of the region boundary" (Huang, 2011, p. 49). The neuro-cognitive and psychological patterns underlying eye movements are analyzed in numerous studies focusing on aspects such as attentional focus, point-of-gaze issues and gaze locations in eye-tracking graphs (Goldberg & Helfamn 2010;Hyrskykari, 2006;Lawrence & Eizenman, 2004;Sharmin et al., 2008).
Over the past few decades, eye tracking has become one of the major research methods in cognitive psychology insofar as it yields important information on how human beings cognitively process the information load.
Eye tracking research started some fifty years ago, traditionally in the 1950s, with Yarbus (1967) who carried out major eye tracking research projects. The first papers, all through the 70s, were mainly systematic studies into reading and picture viewing.
Roughly, the development and history of eye tracking research can be divided into four phases (Rayner, 1998). The first phase  can be considered as a pre-phase when researchers started to discover the physiological properties of eye movements. In the second phase ) eye movements were perceived as an application of behaviorism. The third phase  saw the enhancement of technology with the consequent improvement of eye tracking techniques. In the 1970s eye tracking research expanded rapidly, particularly studies on reading. A thorough overview of the research in this period is given by Rayner (1978). Later, Just and Carpenter (1980) formulated the influential strong eye-mind hypothesis, according to which "there is no appreciable lag between what is fixated and what is processed" (Just & Carpenter, 1980, p. 331). In other words, when a subject looks at a word or object, s/he also thinks about it (processes cognitively). However, during the 1980s, the eye-mind hypothesis was often questioned in light of covert attention, which may be defined as the attention to something that one is not looking at (Posner ,1980;Wright & Ward, 2008). This is significant in IS: for instance, in sight translation, the issue of reading ahead is often related to where our attentional focus is on the text and not necessarily to what we are looking at. In other words, an interpreter might be interpreting a segment of a sentence while focusing his or her attention, with the tail of the eye, on the following fragment, even though his or her pupil is still gazing on the previous one. If the "reading-ahead" hypothesis (Agrifoglio, 2004;Sampaio, 2007;Weber, 1990) is deemed to be valid, the oral production of a segment would overlap with the reading of the following segment during sight translation. In Huang's words: "when the first fixation on a region occurs, the interpreter is either orally producing in the target language the content of a prior region, orally producing the content of the region he or she is currently fixating or not producing any content" (Huang, 2011, p. 48). The first situation would be regarded as reading ahead, while the second and third would not. In case of covert attention during eye tracking recordings, the resulting scan path and fixation patterns would often show not where our attention has been, but only what the eye has been looking at; therefore, eye tracking would not indicate cognitive processing, which appears to be an intrinsic relative limit of this form of research. Eye tracking rather gives indications on variations in processing load.
The fourth and current phase (1998-present) is characterized by a spectrum of interactional and interdisciplinary applications of eye tracking, broadening its use to fields such as music reading, driving, drawing, and the perception of advertising.

Eye Tracking Research and Interpreting Studies (IS)
The articles applying eye tracking to IS reviewed in the present section are the most representative researches, namely McDonald and Carpenter (1981), Tommola and Niemi (1986), Hyönä et al. (1995), and Huang (2011). More recent studies are also discussed, such as (Chen, 2013) and Kokanova et al. (2018). Other significant studies applying eye tracking in the field of TS, and not specifically in IS, which is the focus of the present paper, are O'Brien (2006) Over the past decade, Huang (2011) and Chen (2013) have attempted to reach a more insightful understanding of the underlying processes of comprehension in interpreting by dint of applying eye tracking, with the specific aim "to address issues of the horizontal and vertical perspectives […] and also to explore whether [the so-called] reading ahead exists in sight translation" (Huang, 2011, p. 37) or what can be defined as "read ahead".
The studies reviewed in the present article mainly used pupil size, number of fixations and the duration of fixations as eye movement measures. Other possible eye movement measures include first fixation duration, gaze duration, fixation probability, re-fixation probability, go-past time, regress-out rate, rereading time/rate, and total viewing time, which could also be applied to gain further insight into the process of comprehension in sight translation. Further information on possible future research directions will be provided in the final paragraph.
In the following sections, the relevant literature conducting eye tracking research in the field of IS will be analyzed and reviewed in terms of the methodological approach, the quantitative data collection methods and results. The studies will be presented in chronological order and examined with a critical approach. For reasons of scope, the present article focuses solely on those studies directly related to IS.

McDonald & Carpenter (1981)
The research carried out by McDonald & Carpenter (1981) was ground-breaking because it was the first application of eye tracking in the field of IS. McDonald & Carpenter (1981) presented a model of interpretation, parsing, and error recovery in SI. The study examined two expert interpreters and two amateur German-English bilinguals sight translating 44 texts from English into German while their eye fixations and oral translations were recorded. The texts contained some idiomatic phrases such as "hit the nail on the head" or "break the ice", which could be comprehended and interpreted literally or in an idiomatic way. The study aimed at investigating the cognitive processes underlying the sight translation of ambiguous phrases and their parsing during comprehension and interpretation. The authors achieved this by analyzing the chunking of ambiguous phrases through manifested eye movement indices.
Comprehension of an idiomatic phrase was classified as idiomatic if the entire phrase was read as one chunk without any regressions in first-pass reading. On the other hand, comprehension was classified as literal if the phrase was chunked at the level of the syntactic boundaries within the sentence. Not only did the detected scan paths of the interpreters show how idiomatic phrases were translated but also reflect the sub-processes of interpreting because each phrase received at least two scans: the initial comprehension of the phrase was marked by the initial sequence of forward gazes, which is the first-pass reading in eye movements. The second process is manifested by regressions which constituted the second scan or the re-reading of the phrase, which is when oral translation or interpretation occurred. If participants detected an error in their comprehension/interpretation after the second scan, the eye regressed back again to the preceding segment which contained the ambiguous idiomatic phrase while the previous oral translation ended up being corrected. In other words, "the eye fixations [allowed] the simultaneous translation task to be divided up into finer sub-processes. The model suggests that translation builds on normal reading processes, including parsing and error detection. Moreover, individual differences in expertise are reflected in the quality rather than the duration of the translation process" (McDonald & Carpenter, 1981, p. 246). Although this study was not aimed at investigating the process of sight translation per se, it paved the way for future studies exploring cognitive processes underlying sight translation insofar as the results revealed that the initial process of comprehension in sight translation consists of the same comprehension processes that are used in normal English reading. This was evidenced by the fact that the rate for the initial reading of a phrase in sight translation was similar to that for normal reading (between 200 to 300 words/minute), thus supporting the vertical perspective in translation, as readers need to comprehend before being able to proceed to translating (McDonald & Carpenter, 1981;Huang, 2011).  Tommola and Niemi (1986) and Hyönä et al. (1995) The second study to apply eye tracking in the field of IS was Tommola and Niemmi (1986). They used pupil diameters as a form of measurement. The hypothesis according to which the pupillary response can be applied to study the variation in processing load during SI was also explored by Hyönä et al. (1995). Both studies lend "good support to the use of the pupillary response as an indicator of processing load" (Hyönä et al., 1995, p. 598).
The study conducted by Tommola & Niemmi (1986) aimed at measuring the cognitive loading during SI. There was only one participant involved performing SI of five Finnish texts into English. Larger pupils are generally interpreted as indicating higher cognitive load, as compared to a baseline (Rayner, 1998). The results showed that the participant's pupil diameter peaked during translation when restructuring of the output English sentence was required because of the syntactic differences between the source language (Finnish) and the target language (English) (Tommola & Niemmi, 1986).
In a later study, Hyönä et al. (1995) "investigated the sensitivity of the pupillary response to reflect variations in processing load during language processing by comparing three different tasks that are self-evidently different in complexity" (Hyönä et al., 1995, p. 600), i.e., passive listening to a text passage, shadowing and SI from English into Finnish (the participants' native language). The hypothesis was that SI "should be associated with the highest [pupillary] dilation levels" (Hyönä et al., 1995, p. 600). In this case, researchers talk about a non-directional hypothesis or two-tailed hypothesis (Brown, 1988) "because there is a systematic [either positive or negative] relationship between the dependent and independent variable" (Brown, 1988, p. 110). The dependent variable is the pupil's diameter dilation, which depends on the complexity of the cognitive load of the given tasks (independent variable), namely passive listening, shadowing and SI. The participants were all students in the department of translation studies at the university of Turku, Finland, and had all received one year of SI instruction.
It is opportune to notice that this study cannot be called an experiment in the strict sense of the word.
The sampling was not random because all participants were students of the translation department, with only one year of experience in interpreter training. In this case researchers talk about convenient sampling (Brown, 1988) or a naturally selected sampling which usually occurs when the researcher is a teacher and uses participants from a given class or department. "A sample of convenience contains elements or persons selected because of their accessibility. [It is also called] a nonprobability sample" (Johnsons, 1992; author's emphasis). Moreover, in any experiment, extraneous variables should be controlled. Firstly, environmental issues must be taken into consideration. In Hyönä et al. (1995) naturally occurring variables are indeed excluded. The experiment was carried out in a "room […] illuminated with fluorescent lamps; no daylight was allowed to enter the room […] to prevent momentary variations in the amount of light reflected to the pupil" (Hyönä et al., 1995, p. 601 mentioned in the limitations of the study, to preserve, enhance or simply control the internal and external validity of the research. However, it is true that the pupillary dilatation is always independent of the will of the participants, hence such variables are not necessarily to be included. What is mentioned in the discussion, though, are two factors that may challenge the whole validity of the study, namely the fact that "one cannot draw any conclusions as to whether the pupillary response reflects momentary fluctuations in processing load [because] the task effect may merely reflect a long-term increase in the level of general arousal [and] the carry over effect of task difficulty after the completion of the task proper supports such a conclusion" (Hyönä et al., 1995, p. 601). Secondly, "the fact that listening induced the lowest levels of pupil dilatation may also be attributed to the lack of any output requirements" (Hyönä et al., 1995, p. 601). Both these issues are addressed in the second experiment conducted in the same study, which is reviewed in the following paragraph. The order of the three presentations varied systematically and the nine subjects were assigned randomly to one of the three versions; however, task order was always the same for all participants: listening, shadowing, and SI. It is true that the assignment of the tasks was random; however, the sampling for the study was not chosen randomly. Moreover, we are not even informed on whether the nine participants were self-selected or not. It would be interesting to see if the participants, were they not novice interpreters, experienced the same pupil dilatation or if it reflected a minor cognitive load given the experience of professional interpreters. This aspect is discussed in a later study conducted by Chmiel and Mazur (2013), which found no group effect in total task time and processing of lexical items. As duly pointed out by Chmiel and Mazur (2013), this result indicates that one year of training might not be enough to show discrepancies in the development of the skills required for sight interpreting.
In the second experiment conducted by Hyönä et al. (1995), the researchers inform the reader that all subjects have Finnish as their native language and are fluent in English. However, the construct of fluency is not further operationalized, leaving the reader wondering how fluent the participants actually were or what the researchers even meant by fluent. The three tasks involved interpreting, shadowing or listening to single words. It can be stated that the SI task was transformed into a lexical translation exercise. However, as the authors duly admit, SI also "involves other cognitive operations" (Hyönä et al., 1995, p. 601), which means that IS is cognitively more complex than a mere lexical translation task.
Bearing in mind these limitations, the results confirmed the findings of the first experiment since they showed that the pupil manifested a significantly larger dilation during single word interpretation than during shadowing; the magnitude of the pupil dilation was also larger during shadowing compared to when the participants engaged in passive listening. The pupil size also reflected variations in within-task demands because "words that were determined to be more difficult to translate induced higher levels of pupil dilation than did easily translatable words. Repeating words in English [the participants' non-native language] was accompanied by increased pupil dilations. Words which did not have an obvious one-word equivalent in the target language were defined as difficult words to translate; these "difficult words" brought about an increase in pupil size and the pupil dilated more when repeating non-native words (Hyönä et al., 1995). Future researchers could create a total processing load model associated with SI on the basis of such experiments, by using eye tracking and employing scientific methods such as pupillometry.

Huang (2011)
Up until Huang (2011), no study had previously explored issues such as the process of comprehension in interpreting with the specific aim to address aspects of the horizontal and vertical translation perspectives. At the same time, no other study had scientifically proven whether concepts such as "read ahead" are just a myth or factually exist in sight translation. In Huang's thesis, eye movements of interpreters were tracked during three tasks: Silent Reading (SR), Reading Aloud (RA), and Sight Translation (ST). "Both the eye movement data as well as the oral production data of participants were recorded so that the two modes of information could be matched up to understand the relationship between comprehension (Note 3) and production" (Huang, 2011, p. 39). Huang's hypothesis was along the lines of the results yielded by McDonald and Carpenter (1981), i.e., eye movement patterns in ST support a vertical approach (meaning-based translation as opposed to literal translation) and reflect a comprehension process similar to that of normal SR. As previously mentioned, this study also aimed at establishing whether the "reading-ahead" claim (Agrifoglio, 2004;Sampaio, 2007;Weber, 1999) is valid or not. Eighteen Taiwanese graduate students majoring in interpreting participated in the experiment and were duly paid for it. In other words, the participants were not chosen randomly.
Huang's choice can be justified because, in Taiwan, participants who could perform such a task with an interpreting training background are few and far between. Huang's study was also the first to analyze the dynamics of applying eye tracking equipment to the analysis of ST with participants having Mandarin Chinese as their first language. As specified by Huang herself, "the experiment was a one-factor, three level (silent reading, reading aloud, and sight translation) design" (Huang, 2011, p. 41).
As should be done in any experiment, the material was somehow manipulated. Borrowing Huang's words, "the experiment sought to eliminate major discrepancies in the difficulty level of test materials.
Also, passages of general rather than specific topics were selected to avoid participants' familiarity of a topic resulting in different eye movement patterns" (Huang, 2011, p. 43). The level of difficulty of the chosen texts was evaluated by two graduate students from the Graduate Institute of Translation and Interpretation (GITI), National Taiwan Normal University (NTNU). To make sure that the evaluation was consistent, there was a second assessment carried out by one professor and six students from GITI.
However, no statistical inter-rater reliability test was carried out to determine the degree of agreement amongst the different raters. In Huang (2011), reading ahead is not necessarily perceived as an inter-sentence phenomenon; it can simply refer to the fact that an interpreter is looking maybe even simply one or two words ahead while still translating the previous items. In order to analyze this "reading-ahead" phenomenon, Huang (2011) investigated "participants' first pass fixation onset times Fixation onset times were then matched up with both eye tracking and oral output data to examine what the participant was fixating at each first fixation onset" (Huang, 2011, p. 53) and what s/he was orally producing at the same time. The results of Huang's study are a new scientific piece in support of McDonald and Carpenter (1981): "comprehension in sight translation does not consume more cognitive efforts than normal silent reading when interpreting from Chinese into English" (Huang, 2011, p. 75).
In other words, problems do not lie in the comprehension phase, but rather in the reformulation and production phase. Hence, interpreting training should focus more on developing translation strategies than on enhancing trainee interpreters' language skills. Huang's initial research question was whether ST and SR may be similar in their initial reception and if reading ahead could be empirically observed using eye tracking. As far as the linguistic combination Chinese-English is concerned, the results show that "comprehension in ST does not consume more cognitive efforts than normal SR" (Huang, 2011, p. 75).

Discussion and Conclusion
The findings examined in the present article along with the empirical support for the read-ahead hypothesis (Huang, 2011)  movements and time synchronization in order to analyze "what the interpreter [is] producing at each fixation in each passage rather than only the first fixation of each sentence" (Huang, 2011, p. 77).
Secondly, comparative studies could set out to investigate the difference between novice interpreters and experts, or mere bilinguals versus trained interpreters, in terms of analyzing their "reading ahead" skills or other related issues. Some efforts have been made in this direction: a notable example is the M.A. thesis written by Chen (2013). In her study, Chen suggests that experienced interpreters engage in other efforts besides comprehension during first pass reading when sight translating, as opposed to novices who seem to be able to handle only comprehension. She concludes by making a hypothesis, i.e., the reformulation stage during sight translation can be divided into two levels; novices seem to complete only one (the basic) level of reformulation, whereas experienced interpreters can finish both levels, therefore delivering a better performance. Another direction for future research, as Huang duly admits, is the possibility to include "audio input […] on tracking eye movements of interpreters during sight translation (or simultaneous interpreting with text), as this would be a relatively more authentic context for the interpreter" (Huang, 2011, p. 77).
Other possible research directions include an analysis of eye-voice span (instead of the more traditional ear-voice span), language directionality differences in a comparative perspective, text chunking, audio-video input substitutability and real practice simulation.
Finally, I would like to add some concluding remarks on real practice simulation(s). Shinar (2008) carried out an interesting study which, at first, seems to be irrelevant to IS. The study is about drivers' attention. The results show that there are systematic relationships between the sources of information, the fact that the driver needs to drive safely, and the spatial distribution of the driver's visual fixation.
As the driver gains skills and experience, the pattern of fixation changes in a systematic manner. If the same research design and methodologies were applied to IS, it would be relevant in conducting experiments aimed at seeking the discrepancies between expert and novice interpreters within the context of conference or court interpreting, for example. Using an eye-tracker, a 3D high-tech mobile device, researchers could examine the spatial distribution of interpreters' visual fixation, for instance in the booth or in the courtroom, and examine whether there are any discrepancies to be tracked amongst different groups of participants. This is only one of the possible practical applications of eye tracking research to reinforce the interdisciplinary nature of IS and to increasingly understand the cognitive mechanisms underlying the complex task of interpreting in all its different modes and settings, including simultaneous, consecutive, whispered, relay, and liaison interpreting.