A report on the Multimodal Analysis Lab, Interactive & Digital Media Institute (IDMI), National University of Singapore
Challenges to Multimodal Digital Semiotics
We reported in April 2010 (SemiotiX Bulletin, Issue 1) on projects underway in the Multimodal Analysis Lab at the Interactive & Digital Media Institute (IDMI) at the National University of Singapore, which aim to develop software resources and associated techniques for multimodal semiotic research and teaching. We discussed the major issues being addressed through the development of such software, in particular: the diversity and complexity of multimodal signs and discourse within contemporary, especially digital societies; the relations of the material and abstract in signs with respect to computational processing of semiosis; and relating individual signs to the ‘semiosphere’ which they ‘inhabit’ and which “makes the specific signatory act real” (Lotman, 1985/2004: 208). We gave a brief overview of some of the affordances of contemporary software which enable semioticians to access, analyse and theorise about signs and semiosis, and also some of the limitations, making the point that digital analysis must always occur in tandem with more traditional ways of working, with the semiotician in control of the technologies and techniques available, if we are to integrate the knowledge and techniques of both semiotics and computational sciences.
One of the major challenges in the development of software for multimodal analysis is the variety of media types and data which such software must handle. Software development, as with the human sciences, have tended towards compartmentalization during what Halliday referred to (1991: 39) as ‘the age of disciplines’, thus most relevant software applications tend to operate on specific types of data or provide specific tools for particular tasks. An intrinsic problem for multimodal digital software then is the computational integration of resources for the analysis of written text, image, sound, video, hypermedia and potentially any other media of communication (cf. Schmidt et. al. on software interoperability). The development of an integrated database and interface structure capable of processing and then relating the disparate types of media and their associated analyses has proved a major challenge for the Multimodal Analysis Lab team. While working on the development of a database structure suitable to the task, we have also been working through a series of prototype interface designs (e.g. Figs. 1.1-1.2) by which we hope not merely to analyse but to picture the semiosphere at work, as the outcome of a large range of manually and algorithmically generated analyses.
Fig. 1.1 GUI for Time-Stamped Annotation (Systems on Right)
Fig. 1.2 GUI for (Overlapping) Speech Analysis in Video (Text on Right)
The Digital Semiosphere
By bringing together within one computational environment a range of data capabilities, we add value to existing software tools. To give an example, software applications such as Praat offer powerful algorithms for analyzing sonic phenomena such as waveform and spectrogram, fundamental frequency and formant analysis, and also tools for manipulating (for example slowing down the speed of a sound file, resynthesising speech as a sequence of tones), navigating around and annotating data (via GUI interface resources such as zoom, loop playback, insertion of boundaries and text), significantly assisting the analysis of speech and other sonic semiotic resources. Similarly powerful resources (built on platforms like Matlab) are available for image processing and video analytics but few if any of these specialised software applications afford the integration of the analyses generated – whether by human manual input or by computational processing – in such a way as to relate the analysis of each of these media, an essential precondition for the multimodal semiotician.
We also aim for the software to enable and encourage the integration of techniques from mathematical and other sciences. One of the novel techniques we propose to use for visualization of multimodal annotation is based on state diagram representation. State diagrams, or state-transition diagrams are widely used in computer science to describe the behavior of systems. Analogously one can consider unfolding multimodal phenomena as a dynamical system, described by the means of categorical annotation systems used by the analyst (see Fig. 1.1). At any moment of time the ‘state’ of the multimodal phenomena is represented by a finite combination of currently active system choices. A change in state (i.e. different combination of choices becomes active) defines the ‘transition’. This way, we can represent conventional tier-based multimodal analysis as a directed graph, where vertexes of the graph represent states and graph’s edges represent transitions. The direction of the edge displays the nature of the transition, from one state to another. The graph (or network) view provides the analyst with an alternative perspective on the analysis, revealing patterns (in terms of both repetitive and marked changes) which are difficult to see in a conventional representations. In what follows, we demonstrate the usefulness of the state-transition diagrams for the analysis of multimodal news discourse.
Graphic Tools for Visualizing Patterns in Multimodal News Discourse
In addition to opening up new opportunities to collect, transcribe, analyze and critically engage with multimodal data (e.g. O’Halloran et al., 2010; 2011 in press; Smith et al., 2010, accepted for publication), advances in digital technology have also altered the ways in which news networks present information to their audiences (e.g., see Allan, 2006). The confluence of the internet and cable network television has led to new, variegated forms of news discourse, much of which tend to be mediated “live” through ‘talk-as-interaction’.
The following example of such news discourse is part of a larger project (Tan forthcoming, Tan et. al. submitted for publication) that examines how social identities and relationships are constructed and negotiated multimodally in semiotic space by popular business news networks such as Bloomberg, CNBC, FOXBusiness, and Reuters. Adopting a multidisciplinary perspective, by combining social semiotic theory, conversation and discourse analysis, and other interdisciplinary approaches and methodologies, the study is interested in the multiple patterns, strategies, and resources that are drawn upon, often simultaneously, for representing social actors and events through the dynamic interplay of self- or delegated naming, displays of on-screen characters, video captions, titles and subtitles, graphics, logos, still images, sound effects, studio props, etc., and how these semiotic systems interact.
In attempting to bring to light the complexities of how identities and social relations are constructed and represented in multimodal discourse, we find that charting the exchange structure of mediated interactions in terms of dialogic frames or phases (e.g., see Clayman, 1991; Clayman & Heritage, 2002; Lauerbach, 2007; Martin & Rose, 2007; Montgomery, 2007) through manual annotation in the digital interface can afford the analyst with a two-dimensional view how the discourse unfolds along both the vertical (paradigmatic) and horizontal (syntagmatic) axes (see Fig. 2.1).
Accordingly, one can observe, for example, that in dyadic news reports the information concerning the event under discussion is presented largely in the form of statements addressed at the viewer (Figure 2.1, marked A), whereas the dialogue between the discourse participants is restricted to the maintenance of social (i.e. interpersonal) functions (Figure 2.1, marked B), such as represented by greetings, calls, and minimal response tokens (e.g., see Martin & Rose, 2007; Gardner, 2001).
Figure 2.1: Discourse Structure –Representation in the Digital Annotation Interface
By visualizing the same data in the form of network graphs in circular layout format (e.g., see Donath et al. 1999; Freeman, 2000; for a discussion of the design features of graphical interfaces in social network analysis) – and by comparing them with other state data obtained from other annotated news videos in the corpus – it is revealed that dyadic news reports and presentations exhibit a very linear, sequential discourse structure (Figure 2.2).
Fig. 2.2: Linear Discourse Structures – Dyadic News Reports and Presentations
Live two-way interviews, commentaries or debates, or adversarial interviews, in contrast, are characterized not only by short and frequent turn-exchanges per speaker, but also by very complex, interactive, and recursive, non-linear exchange structures, which are also the distinctive feature of multi-party panels and round table discussions (Figure 2.3).
Fig. 2.3: Non-linear Discourse Structures – Affiliated Commentaries, Panels and Round Table Discussions
The complex nature of these often highly-charged, interactive exchanges is also reflected in the amount of on-screen time that is accorded to varying degrees to representations of social actors in dynamic video footage, where certified experts have to negotiate and compete for visual (as well as dialogic) space with anchors/presenters, panelists, and embedded actualities, with limited or no exclusive on-screen time (see Figure 2.4; representations of certified experts are framed in cyan).
Fig. 2.4: On-Screen Presence of Social Actors in Affiliated Round Table Discussions
By comparison, dyadic interviews with certified experts exhibit less complicated discourse structures (Figure 2.5), which are similarly reflected in on-screen representations, where certified experts together with the anchor/presenter form the central core in a visual cluster through which the rest of story or event is mediated (Figure 2.6)
Fig. 2.5: Discourse Structure of Dyadic Interviews with Certified Experts
Fig. 2.6: On-Screen Presence of Social Actors in Dyadic Expert Interviews
As such, the above example highlights how mid-level visualization tools, such as time-based network graphs, can yield valuable insights into the idiosyncratic representational choices that are adopted for different discourse types by these networks.
The Digital Semiologist
One of the advantages of our interdisciplinary team is not just that we have computer scientists with expertise in sound and vision, as well as mathematicians, but that the team is dedicated to the task of developing resources and techniques for digital semiotics (rather than being a computational project per se, as is customary). This has allowed us, from the outset and throughout, to prioritise the study of semiotics as the primary goal, directing any computational or other associated scientific research and development within the lab has been towards that goal: to provide technologies and techniques to furnish a digital (Saussure 1916/1974: 17) “science that studies the life of signs within society”.
As mentioned at the beginning, the semiotician must always be at the heart of the design of any dedicated software for semiotics research: computational resources are best thought of as tools in the hand of the trained analyst, rather than as autonomous agents, algorithms constructed in disciplinary isolation from the projected user. This is a crucial principle, one we have affectionately dubbed the ‘vacuum cleaner analogy’: we aim not to build a vacuum that can clean the house automatically, but to put into the hands of skilled cleaners a powerful machine with which to accomplish the necessary task. This brings interdisciplinarity to the heart of the project: software resources are fast and accurate but unable to think and deal in abstractions; humans are intelligent but slow and prone to mistakes, particularly in analysis over large data sets; but computational tools placed firmly in the hands of trained human analysts constitute an immense potential for semiological research, analogous to inventions such as the printing press, telescope and microscope.
Report authored by Kay O’Halloran (Lab Director), Bradley Smith, Sabine Tan, Alexey Podlasov, Stefano Fasciani and Alvin Chua, on behalf of the Multimodal Analysis Lab team.
Allen, Stuart (2006). Online News: Journalism and the Internet. Maidenhead: Open University Press.
Clayman, Steven E. (1991). News Interview Openings: Aspects of Sequential Organization. In Paddy Scannell (ed), Broadcast Talk, pp. 48-75. London; Newbury Park: Sage Publications.
Clayman, Steven, and Heritage, John (2002). The News Interview: Journalists and Public Figures on the Air. New York: Cambridge University Press.
Donath, Judith, Karahalios, Karrie, and Viegas, Fernanda (1999). Visualizing Conversation. Journal of Computer-Mediated Communication. Vol. 4(4) (http://jcmc.indiana.edu/vol4/issue4/donath.html)
Freeman, Linton (2000). Visualizing Social Networks. Journal of Social Structure, Vol. 1(1) (http://www.cmu.edu/joss/content/articles/volume1/Freeman.html)
Gardner, Rod (2001). When Listeners Talk: Response Tokens and Listener Stance. Amsterdam; Philadelphia: John Benjamins.
Halliday, Michael A. K. (1991). Towards probabilistic interpretations. In Eija Ventola (ed.), Trends in Linguistics Studies and Monographs 55: Functional and Systemic Linguistics Approaches and Uses, pp. 39-61. Berlin: Mouton de Gruyter.
Lauerbach, Gerda E. (2007). Presenting Television Election Nights in Britain, the United States and Germany: Cross-Cultural Analyses. In Anita Fetzer and Gerda E. Lauerbach (eds.), Political Discourse in the Media: Cross-Cultural Perspectives, pp. 315-375. Amsterdam; Philadelphia: John Benjamins.
Lotman, Yuri (1984/2005). On the Semiosphere. Sign System Studies, 11(1), 201-229.
Martin, James R., and Rose, David. (2003; 2007) (2nd Edition). Working with Discourse: Meaning Beyond the Clause. London; New York: Continuum.
Montgomery, Martin (2007). The Discourse of Broadcast News: A Linguistic Approach. New York: Routledge.
O’Halloran, Kay L., Tan, Sabine, Smith, Bradley A., and Podlasov, Alexey (2010). Challenges in Designing Digital Interfaces for the Study of Multimodal Phenomena. Information Design Journal, 18:(1), 2-12.
O’Halloran, Kay L., Tan, Sabine, Smith, Bradley A., and Podlasov, Alexey (in press 2011). Multimodal Discourse: Critical Analysis within an Interactive Software Environment. Critical Discourse Studies 18(2).
Rohlfing, K., Loehr, D., Duncan, S., Brown, A., Franklin, A., Kimbarra, I., et al. (2006). Comparison of Multimodal Annotation Tools – Workshop Report. Online-Zeitschrift zur Verbalen Interaktion, Ausgabe 7, 99-123.
Saussure, Ferdinand de (1917/1974). Course in General Linguistics (trans Wade Baskin). London: Fontana/Collins.
Schmidt, Thomas, Duncan, Susan, Ehmer, Oliver, Hoyt, Jeffrey, Kipp, Michael, Loehr, Dan, Magnusson, Magnus, Rose, Travis, and Sloetjes, Han (2009). An Exchange Format for Multimodal Annotations. In Michael Kipp, Jean-Claude Martin, Patrizia Paggio and Dirk Heylen (eds.), Multimodal Corpora From Models of Natural Interaction to Systems and Applications
Smith, Bradley A., Tan, Sabine, Podlasov, Alexey, and O’Halloran, Kay L. (accepted for publication). Analyzing Multimodality in an Interactive Digital Environment: Software as Metasemiotic Tool. Social Semiotics.
Tan, Sabine (forthcoming). Multimodal Approaches to Business News Discourse Mediated on the Internet and Television. Unpublished PhD-Thesis. National University of Singapore.
Tan, S., Podlasov, A. & O’Halloran, K. L. (submitted for publication). Re-Mediated Reality and Multimodality: Graphic Tools for Visualizing Patterns in Representations of On-line Business News.