The multimodal representation of emotion in film: Integrating cognitive and semiotic Approaches  

  • Smaller Small Medium Big Bigger
  • Default Helvetica Segoe Georgia Times

The paper demonstrates how the social semiotic approach, combined with cognitive theory of emotion structure, is able to provide a comprehensive theoretical account of how various film techniques represent emotion. It is also significant for the study of viewer emotion, which to a large degree stems from character emotion.


DOI 10.1515/sem-2013-0082 Semiotica 2013; 197: 79 – 100

Dezheng Feng and Kay L. O’Halloran

The multimodal representation of emotion in film: Integrating cognitive and semiotic Approaches


Abstract: This study provides a semiotic theorization of how emotion is represented in film to complement the cognitive approach, which focuses on how film elicits emotion from viewers. Drawing upon social semiotic theories and cognitive theories of emotion, we develop a multimodal framework in which filmic representation of emotion is seen as combinations of semiotic choices derived from cognitive components of emotion. The semiotic model is employed to investigate how emotive meaning is realized through verbal and nonverbal resources. At the discursive level of film, the choices available in the shot organization of eliciting condition and expression are examined. The paper demonstrates how the social semiotic approach, combined with cognitive theory of emotion structure, is able to provide a comprehensive theoretical account of how various film techniques represent emotion. It is also significant for the study of viewer emotion, which to a large degree stems from character emotion.

Keywords: film; emotion; multimodal representation; social semiotics; cognitive appraisal theory


Dezheng Feng: The Hong Kong Polytechnic University. E-mail: Cette adresse e-mail est protégée contre les robots spammeurs. Vous devez activer le JavaScript pour la visualiser.

Kay L. O’Halloran: Curtin University, Australia. E-mail: Cette adresse e-mail est protégée contre les robots spammeurs. Vous devez activer le JavaScript pour la visualiser.

1 Introduction

While there are a number of cognitive studies that focus on how film devices elicit emotion from the viewer (e.g., Carroll 2003; Smith 2003; Tan 1996), few theorists provide a systematic account of how emotions are represented in film. Complementary to cognitive theories that attribute the  understanding of film to the cognitive capacity of viewers, semioticians argue that films are constructed in ways that guide interpretation prior to handing over the task of understanding to viewer’s cognitive capacity (Bateman and Schmidt 2011: 1, emphasis added). In light of this, the present study aims to develop a semiotic approach for understanding the filmic construction of emotion, continuing the efforts of Bateman and Schmidt (2011) and Tseng (2009). Meanwhile, we accept the cognitive position that films systematically exploit the “folk psychology” of emotions (Newman 2005: 119). Integrating the methods and findings from both cognitive appraisal theory (e.g., Frijda 1986; Lazarus 1991) and social semiotic theory (e.g., Halliday 1994), the present study provides a comprehensive multimodal account of how emotion is represented in film.

According to cognitive appraisal theory, the appraisal of emotion antecedents drives response of physiological reactions, motor expression, and action preparation (Frijda 1986; Lazarus 1991; Scherer and Ellgring 2007). For example, anger may be produced by an act of another person, which is appraised as an obstruction to reaching a goal, and is expressed with physiological changes (e.g., raised heart rate) and aggressive actions. These components thus form a scenario or schema consisting of the appraisal of eliciting condition, subjective feeling and reaction/expression.

Cognitive linguists have investigated linguistic expressions in relation to the cognitive components of emotion (e.g., Kövecses 2000). Kövecses’ (2000) major insights are that descriptive expressions of emotion are mostly metaphorical, and that these metaphorical expressions can be systemized according to a “folk model” of emotion consisting of five stages: Cause, Emotion, Control, Loss of Control, and Behavioral Expression. Thousands of seemingly unrelated linguistic metaphors (e.g., I am going to explode) are instances of conceptual metaphors (e.g., anger is heat) which are instances of higher-level conceptual metaphors from different stages of the model (e.g., Loss of Control). We argue that literal expressions and nonverbal resources (e.g., facial expression, movement) also fall into the cognitive structure of emotion, giving rise to a multimodal approach to emotion representation.

The second theoretical basis is social semiotic theory, more specifically Michael Halliday’s (1994) Systemic Functional (SF) model of language. From the SF perspective, language consists of three strata: phonology/graphology, lexicogrammar, and discourse semantics, which are related through the concept of realization. Following the principle of stratification, we assume that such semiotic strata exist in film (Bateman and Schmidt 2011; Tseng 2009), as displayed in Figure 1 (the realizational relation is represented by slanted arrow). As such, emotive meaning and the discursive organization in shots and syntagmas are realized by the linguistic and nonverbal resources, which are rendered in audio and visual tracks. This stratified semiotic model allows us to investigate how emotive meaning is constructed across strata.

Fig. 1: Semiotic strata in language and film

Following social semiotic theory, texts consist of choices made at different strata (Halliday 1994). In the case of film, the causes and the character’s linguistic and nonverbal expressions of emotion are not spontaneous as in real life, but are semiotic discursive constructs designed by filmmakers. The semiotic approach enables us to move beyond cognitive psychological studies to examine how emotions are “designed” in film. The phenomenology of the causes and expressions of emotion in real life provides resources for the semiotic choices and the psychological theories of emotion provide us with tools to categorize those resources (cf. Newman 2005).

Combining the social semiotic and cognitive approaches, we develop frameworks for investigating the multimodal construction of emotion in film. The filmmaker’s semiotic choices are examined in relation to the cognitive structure of emotion. In Section 2, a brief account of the cognitive components of emotion is provided. Following this, the representation of eliciting conditions and expressions of emotion is examined in Section 3, and the configuration of eliciting condition and expression through film editing is explored in Section 4. We conclude with a description of how the social semiotic approach, combined with cognitive theories of emotion, is able to provide a theoretical account of emotion representation in film in Section 5.

2 Resources of representation: The cognitive components of emotion

2.1 The cognitive components of emotion

The main theoretical basis for investigating emotion representation is the cognitive appraisal theory (e.g., Frijda 1986; Scherer and Ellgring 2007), which argues that emotion antecedents drive response patterning in terms of physiological reactions and behavioral expressions. Although with slight differences, cognitive theorists agree that emotions involve antecedents, the interpretation and evaluation of antecedents, subjective feelings, physiological changes, and behavioral reactions. These components thus form a scenario or schema consisting of antecedents, appraisal, subjective feeling, and reaction/expression. We will work with a three-stage model of emotion representation involving the eliciting condition (EC), the feeling state (FS), and expression (Ex).

In film, the eliciting condition can be represented as narrative events that are distinct from the expression of emotion. When there is a reaction to the eliciting condition, which can be verbal or nonverbal, the expression stage is reached. In this sense, the internal feeling state can only be inferred based on the eliciting condition or the expression of emotion. Language, however, is able to encode the feeling state symbolically through lexical items (e.g., happy, angry). For example, we can report the feeling state of others as in “he is angry.” In the expression of one’s own emotion, however, which is our main concern, linguistic expressions belong to the expression stage, regardless of whether language is used to recount the eliciting condition (e.g., “I got the job”) or the feeling state (e.g., “I feel happy”). In this paper, we examine the multimodal construction of eliciting condition and the character’s subsequent expression of emotion, while recognizing that such stages of emotion representation are interactive and potentially recursive in nature (i.e., the expression of emotion may become the eliciting condition for subsequent expression of emotion).

The emotion scenario or schema describes how our knowledge of emotion is stored in memory (Bartlett 1932). This schematic representation significantly facilitates our recognition of emotion because one or several of the components are able to activate our knowledge of a specific emotion. For example, a smiling face alone can be recognized as happiness because it activates our “happiness schema.” Therefore, partial representation of the emotion scenario is also able to communicate emotion. But more often, the eliciting condition and the expression are represented consecutively in film to fully communicate the emotion and engage the viewers.

2.2 The appraisal of eliciting conditions

It seems safe to assume that basic emotions and their eliciting condition and expression in films can be understood by most audiences. As Ortony et al. (1988: 3) note, “it is apparent that writers can reliably produce in readers an awareness of a character’s affective states by characterizing a situation whose construal is  assumed to give rise to them.” The reason is that appraisal of an eliciting condition is generally shared amongst members and groups of a society (Bless et al. 2004). Experiments have also shown that both children and adults can report and agree on typical antecedents of several common emotions (e.g., Smith and Ellsworth 1985).

As a result, filmmakers are able to speculate (correctly most of the time) viewers’ emotional reactions based on cultural knowledge. It is thus possible for filmmakers to “design” film emotions to optimize engagement with viewers. The filmic and discursive strategies for designing emotion are elaborated in Sections 3 and 4.

2.3 The multimodal resources of emotion expression


Modern studies of emotion have been modality specific; that is, they focus on language (e.g., Kövecses 2000; Martin and White 2005; Wierzbicka 1990), the face (e.g., Ekman and Friesen 1975, 1978), the voice (e.g., Banse and Scherer 1996; Scherer 2003) or the body (e.g., Wallbott 1998).

In terms of facial expression, it is generally accepted that certain configurations of facial muscle groups are universally judged to be associated with particular emotions (Ekman and Friesen 1975). Accordingly, psychologists have developed portraits of facial patterns to account for basic emotions of happiness, surprise, fear, anger, disgust, and sadness (e.g., Ekman and Friesen 1975, 1978; Izard 1971). However, Carroll and Russell (1997: 165) argue that patterns of facial expressions arise only secondarily, through the coincidental co-occurrence of two or more different components. In Hollywood films, although professional actors’happiness is represented by smiles in 97% of cases, surprise, anger, disgust or sadness rarely show the predicted pattern of facial expression (found in 0 to 31% of cases; Carroll and Russell 1997). This study challenges the position that facial expressions are hardwired in the emotion experience and suggests the need for a framework that accommodates comprehensive multimodal analysis of the representation of emotion.

The evidence for emotion-specific patterns in vocal features is not as strong as that for facial expression (Wallbott 1998: 880). These parameters are generally considered in relation to the arousal level of emotion. The emotive meanings of body movements, gestures and actions are even less clear, in that differential patterns of bodily activity do not fall into clusters characteristic of discrete emotions (Planalp 1998: 34). Therefore, it is reasonable to consider these resources as continuous expressions of underlying dimensions of emotion, such as arousal and valence.

Multimodal accounts of emotion are rare, despite acknowledgement by affective scientists that emotions are almost always expressed by multimodal signs in face, voice, gestures, and so forth (Scherer and Ellgring 2007: 158). Scherer and Ellgring (2007) investigate how professional actors use prototypical multimodal configurations of expressive actions to portray different emotions. The resources they consider are the modality specific parameters of facial expressions (action units), vocal variables (frequency, amplitude, etc.) and bodily actions (gestures).

In terms of recognition rate, they find that with only ten multimodal variables, the accuracy rate of cross-validated prediction is much higher than monomodal discrimination.

The finding that combinations of facial, vocal, and bodily cues can better predict emotions than the single modalities confirms the need for multimodal analysis. However, Scherer and Ellgring (2007) show that the coders’ recognition rate of the multimodal expression of professional actors is only slightly more than 50%. There are two main reasons for this result. First, as they acknowledge, the portrayal segments consist of brief standardized utterances, with often only a single facial expression and a single gesture per segment. The more idiosyncratic and predictable material actions, such as slamming the door in a real-life anger scenario, are not considered. Second, the eliciting condition is not provided to the coders. According to cognitive appraisal theory, emotion is distinguished by the cognitive appraisal of antecedent events. If these events are excluded, the recognition rate decreases. To fully understand the communication of emotion, we need to take into consideration all variables, such as the situational context and the multimodal expression of emotion. This may be impractical for psychological experiments, but systematic multimodal discourse analysis is able to shed light on this complex issue.

3 Multimodal construction of eliciting condition and expression


In what follows, frameworks for the multimodal construction of the eliciting condition (EC) and expression (Ex) of emotion are developed and are illustrated with examples from well known films and television series. The basic organizing assumption is that meaning is constructed across the semiotic strata in which the cognitive components of emotion, organized by shots and syntagmas, are realized by the audio-visual resources rendered as audio and visual tracks in film (cf. Figure 1).


3.1 The multimodal construction of eliciting conditions


Film theorists mostly focus on (facial) expression for studying character emotion (e.g., Coplan 2006; Plantinga 1999), and some studies include the eliciting condition as the context or criteria of emotion (e.g., Carroll 1996; Newman 2005). Different from them, regarding the eliciting condition as part of the emotion scenario puts us in a stronger position to theorize its structure and investigate its relation to emotion expression, as we shall demonstrate in this section and Section 4.

There have been many attempts to categorize the complex eliciting conditions of emotion, for example, Ortony et al.’s (1988) three categories of events, persons, and objects. In theorizing the eliciting conditions, we do not attempt to categorize the material world “out there,” but to categorize the ways in which the outer world affects the character’s subjectivity. The system is shown in Figure 2, where five eliciting condition effects (EC1–EC5) are identified.

Fig. 2: The representation of eliciting condition

The primary distinction is between eliciting conditions whose relations to the emoter are represented and those which are unrepresented. If the relation is represented, the cause of emotion may be what the emoter does or says (EC1). For example, a character may feel proud for accomplishing something or feel guilty for saying something. The eliciting condition can also be what the character sees/hears/feels through visual (EC2), auditory (EC3) or somatic (EC4) senses. For example, a person may be terrified by what he/she sees, saddened by what he/she hears, or delighted by physical sensations he/she receives. If the relation is unrepresented (EC5), the eliciting condition is presented to the viewer as a narrative event, but viewers don’t know how the emoter accesses that event. For example, in Ridley Scott’s Gladiator (2000), the event that the old emperor is dead (eliciting condition) is presented to the viewers, and then in a shot the tearful face of her daughter is featured, but how she accesses the eliciting condition is not shown.

The eliciting condition in film is represented using audio-visual resources (i.e., language, facial expressions, gesture, etc.), where the shot is the basic unit. The five types of eliciting conditions result in different syntagmatic organizations of eliciting condition and expression and these syntagmas are essential for the understanding of filmic representation of emotion, as shall be discussed in Section 4. The eliciting condition and the expression are mostly sequentially arranged, in which some emotion-inducing event happens first and the character’s emotional reactions follow. The emotion-inducing event is significant in filmic communication of emotion. First, it doesn’t only enable us to infer the character’s emotion and but also makes us anticipate the character’s emotional reaction.

Second, it may provoke the viewer’s feeling. As is noted in Section 2.2, the appraisal of many events is culturally shared. Therefore, we are not only able to infer the character’s emotional reaction, but also feel the emotion to some extent based on our identification with the character.

3.2 The multimodal representation of emotion expressions


The film character’s linguistic and nonverbal expressions of emotion are not spontaneous as in real life, but are semiotic discursive constructs designed by filmmakers. Therefore, the first dimension in our multimodal framework involves the resources of verbal and nonverbal expression. The framework also includes discursive choices, which include the quantity of expression (simple/complex) as well as the context of expression (individual/interactive). The dimensions with their respective systems are displayed in Figure 3. The dimensions of verbal and  

Fig. 3: The representation of emotion expression

non-verbal expressions and discursive choices are elaborated in Section 3.2.1. The stylistic choices, including camera positioning, music, and so forth, are not discussed separately, but are pointed out where relevant in the ensuing discussion.

3.2.1 The multimodal resources of emotion expression


The expression of emotion is mainly studied in two disciplines: linguistics, which focuses on the verbal expression, and psychology, which focuses on nonverbal expression. We propose the multimodal framework of verbal resources (Ex1) and nonverbal behavior (Ex2; see Figure 3).

In terms of the Peircean trichotomy of iconic, indexical and symbolic signs, language is symbolic, making it the most abstract (and complex) resource for emotion expression. The multimodal framework for analyzing linguistic expressions of emotion integrates the social semiotic Appraisal Theory (which should be distinguished from cognitive appraisal theory; Martin and White 2005) and the cognitive components of emotion.

The first distinction for linguistic expressions of emotion is signal and denotation (Bednarek 2008). Kövecses (2000) makes a similar distinction, with the categories of “expressive” and “descriptive” representations of emotion. Signals typically includes expletives such as “wow,” “yuk,” “oh, my god,” and so forth. They express the emotion in a more reflective way and do not “describe” the emotion. Denotations describe some elements of the emotional experience. There are two choices for the denotation of emotion: direct and indirect. Direct denotation is simpler and includes the literal emotion terms which “inscribe” the “feeling state” of the emotion scenario directly. The second option is indirect expression.

Martin and White (2005) provide descriptions of several linguistic strategies such as lexical metaphor, intensification, and so on that realize the indirect expression of emotion. However, such strategies are not clearly structured. Based on the cognitive components of emotion, two types of indirect expressions can be distinguished: those describing the eliciting condition, and those describing the resultant expression or action in the emotion scenario. In the utterance “I am so angry, my boss just fired me for no reason, I smashed the door heavily,” the three clauses describe the feeling state, the eliciting condition and the expression stage respectively.

Nonverbal behavior signifies emotion in a different way. In Peirce’s trichotomy, nonverbal behavior indexes emotion (Forceville 2005). However, we argue that the nonverbal expressions in film are different from those in real life because they are not spontaneous. That is to say, films “design” the facial expression, gesture and so forth based on the real life expressions. Therefore, we need to add an iconic stage in the process of signification and consider the visually represented behavior as icons of indexes, rather than indexes themselves. Therefore, in Figure 4 in Section 3.2.2, the character mimicries the real expression of happiness that indexes the emotion. Meanwhile, this study does not aim to work out a “grammar” of nonverbal behavior (see Feng and O’Halloran forthcoming; Martinec 2001 for attempts of this kind), rather, as the nonverbal expressions of emotion can normally be unambiguously recognized in Hollywood movies, we shall merely interpret the meanings of facial expressions or vocal features based on the studies reviewed in Section 2.3.

3.2.2 Discursive choices of representation


Emotion expression may be as simple as a single facial expression, or as complex as unfolding across several scenes. Simple representation depicts the synchronized expression which involves maximally one unit from one or more modalities, for example, the expression of one clause, accompanied with one facial expression and/or one gesture. Complex representation includes consecutive expressions from one or more modalities. For example, the film can first represent the facial expression, followed by linguistic expressions and a series of emotional actions. We also distinguish between interactive and individual expressions. The former is expressed to interactant/s, which are subsequently analyzed in relation to the structure of the interaction, and the latter is not expressed to others. Interactive and individual expressions employ the same verbal and nonverbal expressions and can be simple or complex.

Simple expression, whether interactive or individual, is represented by the reaction shot, although reaction shots are able to depict complex expressions as well. The most prominent element in the reaction shot is facial expression, which is the exclusive focus of many film analysts (e.g., Carroll 1996; Coplan 2006; Plantinga 1999). Facial expressions may occur alone in the reaction shot and are featured at a close distance. More often, facial expressions are accompanied by other verbal or nonverbal expressions. When gestural or bodily cues are represented, medium shot is used to depict the gesture or torso. The shot from Episode 12, Season 4 of David Crane and Marta Kauffman’s Friends (1998) illustrated in Figure 4 is a relevant case in point. The character Rachel is depicted in a medium shot which shows her smiling face, upward posture and lifted arms. In the soundtrack, the character Rachel is uttering the words “I am an assistant buyer” in high pitched voice, which indicates excitement. The reaction shot may stand alone, but it usually works together with eliciting condition shot and comprises syntagmas, such as the point of view (POV) structure and reverse shots, as discussed in Section 4.

Fig. 4: Illustration of simple individual expression

In what follows, we shall focus on interactive expressions, which are essential in the representation of emotion. By studying emotion expression in the structure of interaction, we make a significant move from treating emotion as a personal phenomenon to treating it as an interpersonal one. In interaction, simple expressions are those expressed in one move while complex expressions are those expressed in several moves.

We situate the expression move in the basic unit of interaction, namely, the exchange (Martin and Rose 2007). At the level of exchange, two types of interactive expression can be distinguished: those that are reactions motivated by the previous move and those that express a pre-existing emotion, as displayed in Figure 5. The upper part of Figure 5 shows the structure of interaction-motivated interaction, in which one move is the eliciting condition and the other move is the reaction. This kind of expression is discussed in eliciting condition-expression configuration in Section 4.4. The lower part of Figure 5 shows the expression of the pre-existing emotion. The expression move may be preceded by the initiation of the secondary knower (K2) who asks about the primary knower’s (K1) emotion, or otherwise the expression is the first move. For the expression to be an interactive move, linguistic expression needs to be present, which may represent the eliciting condition, the feeling stage or the expression, and it is normally accompanied by nonverbal behavior. Following the expression, there is typically a response in the second move.

Interaction is typically represented by reverse shots, in which the two speakers are depicted in two alternating shots as they speak. The two shots from

Fig. 5: The communication of emotion in multimodal interaction

Episode 12, Season 4 of Friends in Figure 6 illustrate how the choices in reverse shots are made. In the first shot (Move 1), Monica (K1) expresses her emotion to Rachel by recounting the eliciting condition that she is offered the job of head chef in high pitched, loud voice, accompanied with the facial expression of smile. Rachel (K2) responds to Monica in the second shot (Move 2) with surprise.

Fig. 6: Emotion communication in reverse shots

However, as with all realizational relations, there is no one-to-one correspondence between the interaction structure and the shot structure: two or more turns/moves may be represented by one shot and one turn/move may be represented by several shots.

The combination of single units of modalities such as facial expression and gesture are considered by psychologists studying multimodal expression of emotion (e.g., Scherer and Ellgring 2007). As noted in Section 2.3, the limitation of this approach is that the limited variables are unable to account for the complexity of emotion expressions, which include idiosyncratic actions and emotions which take place over time. These complex expressions are significant in the representation of emotion. Very often, the immediate reaction is followed by several shots or scenes of individual expression, or the emotion is expressed in multiple-turn interaction (interactive expression). Many complex expressions involve both individual and interactive expressions and may extend across several shots or even several scenes. In a shot from Tom Shadyac’s Patch Adams (1998), Patch’s emotion is expressed with several resources in several shots after being kissed by Corinne, the girl who he admires. Patch first makes the “wow” sound, which shows his enjoyment, then he laughs happily, and dances as he walks away. The expression includes facial expression, linguistic expression, and material action and communicates the intensity of Patch’s happiness.

As a discursive choice, the expression depicted is determined by many factors, in particular the intensity of the emotion and the genre of the film. Complex expressions of emotion over long durations of time tend to appear in female-oriented genres like melodrama and romance, while in male-oriented genres like action movie, emotions are often expressed over shorter time periods. In the melodrama Patch Adams, for example, Patch’s grief after his girlfriend Corinne was murdered is expressed over approximately nine minutes. The expressions include his immediate facial reaction after learning about the news, crying at Corinne’s coffin, leaving the medical school, conversing with his two classmates, attempting to jump off the cliff, and his speech, which blames God for the murder.

Such full-fledged expression is undoubtedly motivated by his intense grief and despair, but the filmmaker’s choice to allocate nine minutes to Patch’s display of emotion is certainly a discursive choice. The discursive choice is quite different in Ridley Scott’s (2000) Gladiator, which is a Roman epic and a male-oriented action film. When Maximus sees that his wife and son have been murdered, he cries with much anguish at the sight of their corpses. However, this is the only expression of grief and the film gives it several seconds before moving on to another stage of the narrative. Maximus’ emotion may be no less intense than Patch’s, but the filmmaker chooses a more compact way to depict the emotion.

4 Filmic organization of eliciting condition and expression


In Section 3, we discussed filmic choices/resources for representing eliciting condition and expression, which are normally both represented to guarantee the accurate depiction of emotion. A further issue to address, which is also a key aspect of filmic representation of emotion, is how they are co-deployed, or organized.

Previous studies only explain the working mechanism of one or two filmic resources, for example, Carroll’s (1996) theorization of the POV structure. In this section, we provide a comprehensive account of the shot-connecting devices and examine how causal relations between the eliciting condition and expression are represented by formally connected shots. The framework counts as a step Towards explaining how the textual logic of film enables interpersonal meaning (cf. Bateman and Schmidt 2011).

In our model, the eliciting condition-expression configuration is systematically organized by shots and syntagmas. However, as with previous models, there is no one-to-one correspondence between the choices from eliciting condition expression configuration and the choices of their filmic organization. For example, two interaction turns can be realized by reverse shots or a single shot. Nevertheless, patterns can be found between the semantic layer and the expression layer.

To account for the eliciting condition-expression configuration, we draw upon the “grande syntagmatique” (the syntagmatic categories for narrative film) proposed by Metz (1974; see also Bateman 2007). The options for syntagma are significantly fewer than Metz’s grande syntagmatique because the causal-temporal relations between the eliciting condition and expression mean that only narrative syntagmas are relevant for emotion representation. Other syntagmas, such as parallel syntagma, which depicts conceptual relations (e.g., classification) between events, are not relevant. The shots and syntagmas available for representing eliciting condition-expression configuration are shown in Figure 7.

Fig. 7: Shots and syntagmas of eliciting condition-expression configuration

We are concerned with the shot relation which connects the eliciting condition and the immediate linguistic or kinetic response within the basic unit of syntagma.

There are cases where the eliciting condition and the expression are not organized in one syntagma. First, the eliciting condition is presented to the viewer as a narrative event and somehow the emoter knows it but we do not know how he/she accesses it (the case of EC5 in Figure 2). Second, the filmmaker creates a separate scene for the character to express his/her emotion. In an episode in Friends (Crane and Kauffman 1998), Rachel is given the job of assistant buyer during her conversation with her supervisor Joana. There are naturally emotional reactions immediately after learning about the news, but the film cuts to another scene and Rachel only expresses her emotion in the scene after that. Third, as pointed out in Section 3.2.2, complex expressions may extend across several scenes and hence extend beyond the autonomous syntagma.


4.1 The single shot representation

The representing capacity of one shot is indefinite. It can be as simple as a single facial expression or as complex as a whole film. The eliciting condition and expression can be represented within one shot in many ways. One special type is when the eliciting condition is represented by linguistic recount as part of expression (cf. Figure 3). Although the eliciting condition we discussed is parallel to expression, linguistic recount is undoubtedly a way of representing the eliciting condition. In this case, eliciting condition is related to expression as part of it and they are typically represented by reaction shot. Normally, the eliciting condition is verbally recounted, accompanied by nonverbal expression (with or without verbal recount of expression). A good case in point is the example of Figure 4, in which Rachel’s speech recounts the eliciting condition that she is an assistant buyer and the expression is simultaneously constructed by the voice, the facial expression and the gesture.

Other types of shot will not be specified as there are so many things a shot can depict. In terms of eliciting condition and expression, a single shot can depict the character and the object he/she is looking at, the multiple turns of interaction, or the action and reaction portrayed by one tracking shot. However, such configurations are more typically represented by narrative syntagmas, which are discussed below.

4.2 Projection and the POV structure

Projection depicts the character and what he/she sees and thinks. We shall focus on the former, which is represented by the POV structure. POV structure typically portrays what the character sees and how he/she reacts to it, constituting the EC2^Expression type (cf. Figure 2). Carroll (1996) develops a cogent theory of POV representation of emotion. According to Carroll, the point/glance shot sets out a global range of emotions that broadly characterize the affective states the character could be in. The point/object shot, then, delivers the object or cause of the emotion, thereby enabling us to focus on the particular emotion. In the approach developed in this paper, it is only one simple device, albeit powerful, for representing emotions in film. The most celebrated example is perhaps the two shots at the beginning of Stephen Spielberg’s Raiders of the Lost Ark (1981), shown in Figure 8. The point/object shot shows the close-up of a skeleton, followed by the point/glance shot which shows the terrified face of a character. Admittedly, this structure is the most convenient and easy-to-understand technique to represent emotion.

Fig. 8: Illustration of the POV structure

Two points need to be stressed, based on Carroll’s (1996) classic theory. First, POV structure is only one of the many mechanisms of emotion representation, as pointed out by Plantinga (1999) and suggested in our system. Second, there are variations to the POV structure. One obvious variation is the order of the object and reaction. Naming the point/object and point/glance shot A and shot B respectively, we can get A^B and B^A structures. Then it seems that Carroll’s (1996) treatment of reaction as “ranger finder” and object as “focuser” only applies to the B^A structure. Another variation is that the object and the reaction can be represented in one shot, either within the same frame or by a panning/tilting camera. Third, the POV shots may be elaborated by subsequent shots. That is, the object or the reaction may be portrayed by more than one shot, as they often are. Taking A^B structure as an example, it is often reiterated by another pair of object-glance configuration (A^B + A′^B′), showing the object from a different angle and the character’s reaction with slight variation, as in shot 3 and shot 4 in Figure 8, which follow the first two shots immediately. Variations of this reiteration include showing the object again without showing the character (A^B + A′ + A′′ + . . .) and showing the character’s reaction in several shots (A^B + B′ + B′′ + . . .). The multiple reaction shots are commonly used to highlight the character’s emotion, together with the long duration and close distance of the shot. This technique is used not only to guarantee our recognition of the emotion portrayed, but also to invoke our empathy (Plantinga 1999).

4.3 Alternating syntagma and reverse shots

Alternating syntagma portrays two or more series of events or interacting partners by turns. The most common example is the shot-and-reverse shot structure which depicts two interacting partners. The eliciting condition in the reverse shot structure is typically verbal (i.e., EC3 in Figure 2), although it can also be the nonverbal EC4. In an example from Patch Adams (Shadyac 1998), the first shot shows Corinne kissing Patch and the reverse shot shows Patch’s expression of excitement and they form the EC4^Expression configuration.

In interactions where the eliciting condition includes verbal information, the reverse shots are normally consistent with the speaker turns and the examination of its structure can show the speaker roles and exactly how the emotion is communicated.

The general framework is illustrated in Figure 9, complementing the interaction framework in Figure 5. The interaction structures are based on the systemic functional studies of interaction (e.g., Martin and Rose 2007; O’Donnell 1990). As with Figure 5, the unit of analysis is exchange, and the focus of investigation is the options of move, such as initiation and response.

The structure of information oriented exchange is typically K1^K2 (Martin and Rose 2007; O’Donnell 1990), which represents the eliciting condition and expression respectively. One character says something in the first shot and is followed by another character’s reaction in the reverse shot. A piece of information can be reacted to in various ways, for example, with surprise if it is unexpected, or with indignation if it violates moral standards.

The example of Figure 6 in Section 3.3.2 is a good case in point. In the first shot, Monica’s expression of emotion that she is offered the job as head chef is also the eliciting condition of Rachel’s emotion which is expressed in the second

Fig. 9: Interaction structure and eliciting condition-expression configuration

shot. The reaction is unambiguously surprise, with the verbal signal “oh, my god” in high pitch voice and the open mouth. The eliciting condition and expression corresponds to the speaker turns, organized by reversed shots, realized by facial expression, language and vocal features, and finally rendered as audio-visual tracks. The relation between different layers of semiosis is illustrated in Figure 10.

Fig. 10: Semiotic strata in reverse shots

In action-oriented interaction, D2 (secondary doer) reacts to D1’s (primary doer) speech act like in K1^K2 structure. The D1^D2^D1f structure seems more common, in which D1 reacts to D2’s acceptance/undertaking or rejection/refusal. The expected responses normally cause positive emotions and the unexpected responses cause negative emotions. The example in Figure 11 from Gladiator illustrates emotional reaction to the goal-incongruent response. Commodus orders Maximus’ death and Maximus asks Qintus to look after his family (D1) in the first shot. This request is denied (D2) in the second shot. In the third shot, Maximus screams loudly and rushes forward to attack Qintus (D1f). This is a typical D1^D2^D1f structure, organized as reverse shots and realized with verbal and nonverbal resources in medium close shots. Maximus’ anger toward Qintus is unambiguously represented with the eliciting condition (Qintus’ refusal) and his aggressive behavior.

Fig. 11: D1^D2^D1f structure in reverse shots

4.4 Linear narrative syntagma and successive action shots

Linear narrative syntagma captures the eliciting condition and expression as two successive actions, namely, the action and the reaction. The actions may be continuous or discontinuous in form, but two shots depict them as succeeding actions from one participant. The shots feature what the character does or says (EC1) as eliciting condition and how he/she responds to his/her action/speech. However, such configuration of eliciting condition and expression is less common because reaction usually does not immediately follow action. The emoter often responds to the effect of his/her action, instead of the action, so there is typically a shot of the result of the action before the reaction shot. For example, Episode 12, Season 1 in Friends, there is a scene in which Monica is playing table football with others. The first shot shows her action of playing the ball, the second shot shows the ball she scored and the third shot shows her excitement. The successive action is interrupted by the second shot which forms a POV structure with the third shot.

To summarize, this section examines the discursive resources for organizing the eliciting condition and expression. It shows that different configurations of eliciting condition and expression are organized in different syntagmas, as shown in Figure 12.

Fig. 12: Eliciting condition-expression configuration and syntagmas


5 Conclusion

This study provides a semiotic theorization of how emotion is represented in film, complementing cognitive approaches which focus on how film elicits emotion from the viewers. We develop a semiotic framework in which the filmic representation of emotion is seen as semiotic discursive choices and we apply the stratified semiotic model to film discourse to investigate how emotive meaning is realized through the choices of verbal/nonverbal resources and filmic devices. Meanwhile, the framework also draws upon the cognitive components of emotion which provide structure to the representational choices at the semantic level. Then choice systems for the representation of the two main components of eliciting condition and expression are developed. At the discursive level, the choices available in the shot organization of the eliciting condition and expression are examined.

The paper concludes that the social semiotic approach, combined with the cognitive account of emotion structure, is able to explain how emotion is constructed in film, although not all resources are fully discussed (e.g., the use of music, color, etc.). Such semiotic discussions complements current studies which focus on film viewer’s emotional response. It does not only explain how various film techniques work to represent emotion, but is also significant for the study of viewer emotion since character emotion is the most important source that elicits viewer emotion.

Acknowledgments: The research for this article was supported by Interactive Digital Media Program Office (IDMPO) under the National Research Foundation (NRF) in Singapore (Grant Number: NRF2007IDM-IDM002-066)



Banse, Rainer & Klaus Scherer. 1996. Acoustic profiles in vocal emotion expression. Journal of Personality and Social Psychology 70(3). 614–636.

Bartlett, Charles F. 1932. Remembering. Cambridge: Cambridge University Press.

Bateman, John. 2007. Towards a grande paradigmatique of film: Christian Metz reloaded. Semiotica 167(1/4): 13–64.

Bateman, John & Karl-Heinrich Schmidt. 2011. Multimodal film analysis: How films meanLondon: Routledge.

Bednarek, Monica. 2008. Emotion talk across corpora. Basingstoke: Palgrave.

Bless, Herbert, Klaus Fiedler & Fritz Strack. 2004. Social cognition: How individuals construct social reality. Hove: Psychology Press.

Carroll, Noël. 1996. Theorizing the moving image. Cambridge: Cambridge University Press.

Carroll, Noël. 2003. Engaging the moving image. New Haven: Yale University Press.

Carroll, James M. & James A. Russell. 1997. Facial expressions in Hollywood’s portrayal of emotion. Journal of Personality and Social Psychology 72. 164–176.

Coplan, Amy. 2006. Catching characters’ emotions: emotional contagion responses to narrative fiction film. Film Studies 8. 26–38.

Ekman, Paul & Wallace V. Friesen. 1975. Unmasking the face. Englewood Cliffs, NJ: Prentice Hall.

Ekman, Paul & Wallace V. Friesen. 1978. The facial action coding system. Palo Alto, CA: Consulting Psychologists Press.

Feng, Dezheng & O’Halloran, Kay. L. (forthcoming). Representing emotions in visual images: A social semiotic approach. Journal of Pragmatics.

Forceville, Charles. 2005. Visual representations of the idealized cognitive model of anger in the Asterix album La Zizanie. Journal of Pragmatics 37. 69–88.

Frijda, Nico. 1986. The emotions. Cambridge: Cambridge University Press.

Halliday, Michael. 1994. An introduction to functional grammar. London: Arnold.

Izard, Carroll E. 1971. The face of emotion. New York: Appleton Century Crofts.

Kövecses, Zoltán. 2000. Metaphor and emotion: Language, culture, and body in human feelingCambridge: Cambridge University Press.

Lazarus, Richard. 1991. Emotion and adaptation. New York: Oxford University Press.

Martin, James R. & Peter White. 2005. The language of evaluation. New York: Palgrave Macmillan.

Martin, James R. & David Rose. 2007. Working with discourse. London: Continuum.

Martinec, R. 2001. Interpersonal resources in action. Semiotica 135(1/4). 117–145.

Metz, Christian. 1974. Film language: A semiotics of the cinema. Oxford: Oxford University Press.

Newman, Michael. 2005. Characterization in American independent film. Madison: University of Wisconsin-Madison dissertation.

O’Donnell, Michael. 1990. A dynamic model of exchange. Word 41. 293–327.

Ortony, Andrew, Gerald Clore & Allen Collins. 1988. The cognitive structure of emotions. New York: Cambridge University Press.

Planalp, Sally. 1998. Communicating emotion in everyday life. In Peter Andersen & Laura Guerrero (eds.), Handbook of communication and emotion, 30–48. San Diego: Academic      Press.

Plantinga, Carl. 1999. The scene of empathy and the human face on film. In Carl Plantinga & Greg Smith (eds.), Passionate views: Film, cognition, and emotion, 239–255. Baltimore: Johns Hopkins University Press.

Scherer, Klaus. R. 2003. Vocal communication of emotion: a review of research paradigms. Speech Communication 40. 227–256.

Scherer, Klaus. R. & Heiner Ellgring. 2007. Multimodal expression of emotion: Affect programs or componential appraisal patterns. Emotion 7. 158–171.

Smith, Craig & Phoebe Ellsworth. 1985. Patterns of cognitive appraisal in emotion. Journal of Personality and Social Psychology 48. 813–838.

Smith, Greg R. 2003. Film structure and the emotion system. Cambridge: Cambridge University Press.

Tan, Ed S. 1996. Emotion and the structure of narrative film: Film as an emotion machine. Mahwah, NJ: Lawrence Erlbaum.

Tseng, C. 2009. Cohesion in film and the construction of filmic thematic configuration: A functional perspective. Bremen: University of Bremen dissertation.

Wallbott, Herald G. 1998. Bodily expression of emotion. European Journal of Social Psychology 879–896.

Wierzbicka, Anna. 1990. The semantics of emotions: fear and its relatives in English. Australian Journal of Linguistics 10(2). 359–375.



Dezheng Feng (b. 1983) is a Research Assistant Professor at the Hong Kong Polytechnic University 〈Cette adresse e-mail est protégée contre les robots spammeurs. Vous devez activer le JavaScript pour la visualiser.〉. His research interests include social semiotics, multimodal discourse analysis, and cognitive linguistics. His publications include “Visual space and ideology: A critical cognitive analysis of spatial orientations in advertising” (2011); “Intertextual voices and engagement in TV advertisements” (with P. Wignell, 2011); “The visual representation of metaphor: A social semiotic perspective” (with K. O’Halloran, 2013); and “Multimodal engagement in television advertising discourse” (2013).

Kay L. O’Halloran (b. 1958) is an associate professor at Curtin University, Australia 〈Cette adresse e-mail est protégée contre les robots spammeurs. Vous devez activer le JavaScript pour la visualiser.〉. Her research interests include multimodal analysis, social semiotics, and mathematics discourse. Her publications include “Inter-semiotic expansion of experiential meaning: Hierarchical scales and metaphor in mathematics discourse” (2008); Multimodal analysis within an interactive software environment: Critical discourse perspectives” (with S. Tan, B. A. Smith & A. Podlasov, 2011); “The semantic hyperspace: Accumulating mathematical knowledge across semiotic resources and modes” (2011); and “Multimodal discourse analysis” (2011).

¡Bienvenido! Inscríbete por email a nuestro servicio de suscripción de correo electrónico gratuito para recibir notificaciones de noticias, novedades y eventos.