Easy and efficient access to large amounts of data has become an essential aspect of our everyday life. In this paper we investigate possibilities of supporting information representation through the combined use of multiple modalities of perceptions such as sight, touch and kinesthetics. We present a theoretical framework to analyze these approaches and exemplify our findings with case studies of three emergent projects. The results are a contribution to a larger discussion of multimodal information representation at the intersection of theory and practice.
Contemporary culture is built upon massive heaps of data. We live in an increasingly complex world where every individual has to deal with data and make data-driven decisions. Given this development, it is necessary to make data accessible and readable for large, non-specialist audiences.
Information representation plays a more vital role than ever as it becomes an interpretive layer to the knowledge inherent in raw data. The common approach is to focus on visualization in order to alleviate the burden of combing through endless amounts of data. We argue, however, that we must also look at modalities beyond vision — tactile and kinesthetic in particular — in order to make use of a fuller range of human cognitive abilities.
As proponents of grounded cognition argue, all aspects of our senses play a role in cognitive processing (Barsalou, 2009) and can thus be leveraged to support our understanding and handling of data. In this paper we look at several examples of information representation that employ multiple sensory modalities.
In the first part of this article we present an initial framework for classifying approaches that enhance the cognitive efficacy of information representation. With this framework we turn to three emergent data visualization projects which help to illuminate the conceptual and to concretize the abstract. This framework allows us to compare the projects’ methods and chart their inventions as part of a larger effort toward enhanced information representation. We analyze how these representations reflect on such qualities as the social and the affective.
While the projects vary in size, scope and stage of development, they exemplify possibilities for emergent forms of information representation meant to enhance more traditional forms. Our method is a type of “critical making” characterized by a cycle of rapid prototyping, user feedback, and critical analysis. Reflecting upon these projects and subjecting them to outside inspection lends rigor and improves their further iterations.
Each of the three authors is involved in one or more of these projects, and all projects bring forward aspects of multimodal information representation. In this sense our focus is not predominantly on the projects but rather on developing a high-level view of their applied strategies.
Since there is no one-size-fits-all solution to data management and analysis, we need a framework that allows us to specify epistemic enhancement strategies independent from a specific dataset and its method of perceptual representation. We need one that has just enough structure for analytical and explanatory purposes, but with just enough flexibility to respond to changing circumstances. The concepts developed by Paul Humphreys (2004) resonate with the requirements of such a framework.
Humphreys argues that humans extend their natural observational skills with instruments (e.g., with telescopes and microscopes) and their analytical abilities with computational models (e.g., with algorithms and simulations). These extensions are of three basic types: extrapolation, conversion and augmentation, which we briefly outline as follows. Extrapolation extends an existing modality such as vision in order to see more or further. Conversion takes phenomena that are accessible to one sensory modality, and converts them to others such as the visual display of sonic information. Finally, augmentation allows the detection of phenomena that could not be accessed by human sensory organs such as magnetism, which is only accessible to human perception through technical means, making it a qualitatively different type of extension.
The table below provides a summary of Humphreys’ framework:
Below we introduce the three case studies. After a brief overview, we analyze and compare their various strategies for cognitive enhancement.
In the last decade, a large amount of public and governmental data has been made available through Web sites such as www.data.gov (U.S.) and data.norge.no (Norway). Available data include diverse types, ranging from social to geospatial data, from climate to survey data. However, even though the data is publicly available, it is often not easy to access or understand by non-specialists.
The VizBox is a platform for visualizing geographical and spatial data to make information available and engaging for non-specialists, and thereby reach a wider audience. Currently, the VizBox is an experimental prototype that allows graphics to be presented on a topographic model of the southern part of Norway (see www.vizbox.no).
The platform consists of a physical box with a three-dimensional model placed on top of it. Graphics are projected onto the surface from within the box (Figure 1). Users can interact with the model using gestures, and additional information is presented on an adjacent touch screen. One of the benefits of this setup is that it is relatively simple and cheap to build and place in various contexts.
Figure 1: The basic VizBox setup. Graphics are projected onto a physical model from within the box via a mirror. Gestures are captured using a Leap Motion sensor. Here, a model of the School of Cinematic Arts complex of USC is placed on the box.
The main advantage of the VizBox is to allow the combination of tangible representations (physical and graspable) with intangible representations (e.g., sound, image, video) (Ishii, 2008). As a Tangible User Interface (TUI), it makes use of people’s sophisticated skills for sensing and manipulating their physical environment (Ullmer and Ishii, 2000).
The VizBox platform opens up for a range of diverse 3D surfaces, representation of information and modes of interaction. In this paper we focus on one specific application: a topographic model of Norway that is used to visualize and interact with statistical data (Figure 2).
Figure 2: 3D printed model of the southern part of Norway.
The model was produced using 3D printing technology. The thickness of the model is less than 1 mm., so that graphics can be projected onto the model from within the box and be visible from the outside. As a result, the graphics do not disappear when users move their hands above the model; if the graphics were projected from above, the hands would create shadows.
The topographic model of Norway is primarily used for presenting statistical data related to geographical areas in Norway. This includes data on two different topics. The first data set is from a survey on overall life satisfaction carried out by the Agency for Public Management and eGovernment (Difi). The survey was done in 2013 and polled 30,000 Norwegians. The second set contains data from Statistics Norway on crime and wealth in different municipalities.
Geographical variation and regional differences is one of several interesting parameters to investigate and visualize from these datasets. Therefore, the average score for satisfaction, crime and wealth have been calculated for each municipality, and are represented by colors ranging from the lowest average score to the highest. The result, which is commonly referred to as a choropleth map, is then projected onto the topographic model of Norway (Figure 3).
Figure 3: Graphics projected onto the model of Norway, with additional information presented on an adjacent screen as the user points to different areas of the model. Left: a choropleth map represents average life satisfaction for each municipality, on a scale from 1–100. All municipalities are within the range of 67–94, and colored accordingly from brown to yellow. Right: a choropleth represents the amount of violations of the law in each municipality in 2012, ranging from 0 to 205, colored from blue to yellow. Note: Larger version of image available here.
As users point a finger at a location on the map, supporting information is revealed on an adjacent screen: the name of the municipality and its score in relation to the overall distribution of all municipalities. Consequently, the combination of gestures and the adjacent screen allows users to get ‘details-on-demand’ (Shneiderman, 1996).
The Large Scale Video Analytics project (LSVA) endeavors to make vast filmic archives accessible for the purposes of research and scholarship. In an era in which it is nearly as easy to author with video as it is to write with words the ability to efficiently analyze filmic sources is vital. The LSVA joins a growing trend of cultural analytics with image-based media (cf., Hochman and Manovich, 2013).
Indeed, digital video — whether natively digital or analogue film that has been digitized — is a widespread form of cultural production. But even as image-based media are exploding as a means of communication and expression, the resultant archives are massive, disconnected datasets. Thus, our ability to research this crucial aspect of contemporary culture is severely stymied by limitations in semantic image retrieval. We have to cope with incomplete metadata, and the lack of a precise understanding of the actual content of any given archive. Indexing and analyzing video data is by no means a technologically trivial problem (Ranguelova and Huiskes, 2007) and, as such, it is best tackled using multiple approaches and diverse methodologies.
A collaboration between supercomputing scientists, cinema scholars, visualization experts and digital humanists, the LSVA facilitates discovery by providing new visual and linguistic strategies for searching and analyzing these large video collections. (Kuhn, et al., 2012). The project uses emergent supercomputing architecture to integrate multiple algorithms for image recognition, novel data visualization methods, and crowdsourced tagging.
The LSVA’s back-end is incredibly complex, taking advantage of high-performance supercomputing architecture. This makes the system powerful enough to process graphics files which are large and require a substantial amount of processing, much of which cannot be done in advance of a given user inquiry. But the complexity of the computing architecture is problematic for non-computer scientists. As such, the system requires a user interface that reduces the amount of structural information in order to make it amenable to humanities researchers. The Medici interface is user-friendly and resembles the layout of video hosting sites with a player window and a staging area of clips (Figure 5).
Figure 4: Cuts from visualization that highlights form more than content.
For our current purposes, the most interesting aspect of the LSVA is its array of visualization tools. These enhance research in several ways: novel visualizations employ spatial and temporal simultaneity, revealing unique aspects of a single film sequence; comparative visualizations represent relationships among multiple films within an archive; and, finally, the integration of visualization imagery becomes an input tag and a front end process that feeds the Medici content management system and enhances word-based labels, helping to close the semiotic gap that occurs when words are applied to images.
Figure 5: The LSVA’s Medici interface in query mode.
The third case study develops a new search tool and interface for libraries, turning the search process into a reflective and engaging experience. The role of the library in the digital age is shifting. Due to projects like Google Books, Wikipedia and the like, the library, for many, has lost its role as the primary place to go for research (Shuler, 2007). The LibViz project intervenes in this situation and our underlying questions are: How can the library remain attractive and how can it translate its opportunities such as serendipitous search, the availability of physical knowledge objects, and the integration of the search process into the creative and reflective process of research into the digital realm?
The focus of the LibViz project is to rethink and re-design the search process. Our main approach rests on making the search interface more easily accessible and to implement digital equivalents of some of the important opportunities of the physical library. Current information management infrastructures in the library sector are well developed and powerful. But as studies indicate, a lot remains to be done on the level of usability of search tools and the representation of search data. For an overview see (Wallace, 2008).
The current prototype of the LibViz project works with select collections of the libraries of the University of Southern California. The library holdings comprise a large collection of books, rare books, manuscripts, images, audio and video sources, and various other classes of objects. The heterogeneous holdings present the interesting challenge of accommodating numerous object-types in one search system.
The LibViz project extends the communication modes employed in information representation beyond the textual and visual by integrating touch and proprioception. Searches on the library holdings are carried out and represented on a large format touch screen, which combines graphical and textual representation of search hits and related items.
The interaction flow of this interface uses both gestural interaction and touch, effectively extending the modalities involved into the processing of the search results into the domain of haptics. In the current iteration the LibViz interface is displayed on a 55” touch screen equipped with a depth camera, which senses users in front of the screen. Search results are displayed in a visual representation of relevant items, wherever possible as a rendering of the object qualities of the actual item.
The search hits are arranged in a spiral shape resembling a ‘whirlwind’, which can be navigated and spun by the user through gesture interaction. This allows the user to dynamically arrange the results, opening or narrowing the search by using different speeds of spinning and selection. A fast spin opens the search and adds a larger amount of new, more loosely related items to the cluster in the ‘whirlwind’. Slower spinning keeps the search narrow and adds fewer and more closely related items. Objects can be ‘pinpointed’ in order to use them as weighted search criteria.
Figure 6: Functional prototype of the ‘whirlwind’ containing representations of ‘flying’ books (left). Image of the current prototype on a large format touch screen (right).
In parallel with the focus on the physical component of the information representation, we are interested in particular in the object quality of the books and objects in the library holdings. Books and objects are photographed or 3D-scanned to invoke their presence as a material object. The representation of object features makes them more easily evaluable and distinguishable from one another.
The current prototype of the LibViz project makes an extensive collection of artist books available through the search interface. As sculptural art objects these books have strongly distinguishing features. In order to experience them in the appropriate way, they can be explored as individual objects, as three-dimensional representations or as animated sequences demonstrating characteristic dynamics such as their unfolding or opening.
These representations are designed such that they provide sufficient information about the core qualities of an item, but do not replace it. After having seen the digital representation, we hope the user will want to see the actual item for the physical experience. The representation functions as a teaser of sorts; it is meant to be provocative rather than comprehensive.
Figure 7: Two examples of objects from the artist-book collection. The objects are represented as high-quality 3D-scans and as animations showing the dynamics of the book objects. The representations can be navigated and explored interactively.
In order to address the problem of interface unfamiliarity, we are using the technique of “narrative embedding” for the operations of the LibViz system. The metaphor of a whirlwind of books has been seen in various stories and depictions (Gaiman and McKean, 2005; Joyce, 2012). An image search for “books” in conjunction with “wings” or “birds” reveals that the visual likening of the pages of an opened book and wings is a common image. We are using this image in the representation of books in the system. The notion that a user might be able to stir up a whirlwind containing various books flying along does not seem to be too far fetched given this narrative embedding.
In the following section we discuss the main interventions made by the three projects and present a mapping of how they relate to our framework of epistemic enhancement. The table below provides an overview of the key techniques.
Note: Larger version of table available here.
The focus on sensory modalities is closest to Humphrey’s framework. While his main approach is directed outwards to perceive and understand the world around us through various epistemic enhancements, our interest is directed inward, asking how we can support the meaning-making process of data-representations through epistemic enhancements. To put information to meaningful use we need to understand it and be able to extract from it what is relevant.
Numerous efforts have been made to support this process and a lot of these efforts have focused on technological solutions to extend the cognitive capacities of human beings. We can trace a history going from early inscription methods to Vannevar Bush’s MEMEX, and finally to the ever increasing processing speeds of modern computers deployed to sort through the heaps of data.
The other avenue aimed to support humans in their cognitive access to information focuses on their perceptual means. The most prominent example is information visualization. It uses the capacity of the visual sense to easily parse complex sets of data and find patterns among them. The sense of vision has significant advantages over the processing of abstract textual or tabular data (Few, 2009). Therefore it is a suitable approach to alleviate the burden of understanding large amounts of data.
The task of information visualization centers on finding forms of representation that leverage the visual sense and its accuracy and power to solve the tasks at hand. By distributing the cognitive load to other sensory modalities beyond the sense of vision, it is possible to further support the access to information. According to the findings of grounded cognition researchers, at any given time people perceive the situation around them and formulate a conceptualization of this situation by integrating all sensory modalities activated in this situation. This means visual, auditive, and the motor aspects of bodily action as well as smell and taste are equally involved into cognitive processing and contribute to the formation of concepts and memory (Barsalou, et al., 2003; Barsalou, 2009).
The idea of extending our epistemic access to data representations to additional sensory modalities is in holding with the category of conversion. Humphreys formulated conversion as the process of converting information normally available to only one sensory modality to another modality. In this way the information becomes accessible in a different, enhanced way. The enhancement is achieved through the specific capacities of the new, added modality.
The motivation behind the three case studies discussed in this article is to focus on strategies to support cognitive efficiency rather than technical efficiency. We are aiming in particular to improve access to information through easier parsing and sorting, and through memory clues that help us to read and remember.
By contrast, technical approaches have seen more vigorous development than the strategies aimed at cognitive enhancement. While development in visualization techniques has been ongoing since the nineteenth century, more sustained research in visualization exists only since the last 20 years. The National Science Foundation-sponsored “Workshop on Visualization in Scientific Computing” held in 1987 is seen as a major turning point in this direction. The report issued as a result of this workshop describes the field as emergent (McCormick, et al., 1987).
These case studies exemplify different ways of involving additional modalities beyond vision into the cognitive processing of information, which will be analyzed in the following.
The VizBox project uses visualization techniques and extends them into the realm of haptic perception, using touch and gesture. Visual representations of data are projected onto three-dimensional objects, such as a geographic profile of Norway. Users can interact with them through gesture and through touching the surface of the projection-object. The ‘extruded map’ helps users orient themselves on the map, and may enhance users’ ability to form mental representation of space (Chen and Kratky, 2013), the data and their spatial relationships.
Abstract data are rendered more accessible through a physically embodied, spatial representation. This implementation of direct object interaction makes any supplementary control devices superfluous. Arguably, the three-dimensionality engages more fully the human sensorial apparatus than flat graphics presented on screens or paper.
It also relates to users’ familiarity with the geographic profile of the country. Data can easily be located by pointing to certain places and exploring them with immediate visual feedback. This form of tactile exploration allows users to rely on their kinesthetic and haptic abilities as well as their visual perception to engage with the information representation.
The VizBox makes use of the principle of conversion across multiple sensory modalities in order to extend the range of sensory and epistemic access to the represented data.
While the VizBox relies on tangible physical embodiment as data representation and the resulting haptic perception as a way of facilitating access to the data, the LSVA project relies on virtual embodiment. The familiar medium of film is translated into representations employing three dimensions as a way of displaying multiple variables of the films.
The LSVA has created forms such as movie cubes or movie cylinders to show cross sections of data across the entire movie. This technique can highlight movement ratios and other parameters allowing the viewer to quickly get a summary understanding of the characteristics of a film. The implied three-dimensionality of these forms lets movies appear as buildings or spatial constructs. Therefore viewers can bring their knowledge of spatial structures to bear to understand complex temporal structures with one glimpse.
Again, a form of conversion is used to translate a temporal medium into spatialized representations to activate them in new ways. The approach builds on the viewer’s innate grasp of simple objects relationships. By employing these unusual forms to the analysis of films LSVA is able to make subtexts perceivable in a visible and tangible sense.
Figure 8: Left: Movie-Cube visualization of a brief sequence of footage. Right: Cylinder view of same sequence.
In the effort to improve access to information in the context of search, the LibViz project follows several strategies. It integrates the visual and haptic sensory modalities to support cognitive processing of information obtained during the search process. Visual perception has been explored in the design of search interfaces in many ways through textual as well as pictorial forms of representation.
The LibViz project adds haptics as both touch as well as proprioception to support the formulation of a mental image of relationships between the various search results. The project joins tangibly embodied haptic sensation and implied haptic sensation. The latter is activated through the interaction with virtual three-dimensional representations of the objects users are dealing with on the screen.
The search hits are displayed on a large touch screen. Through touch interaction individual items can be examined and precise rearrangements can be made. The action of configuring the display physically implicates the user into the resulting visual constellation.
This approach alleviates the burden of visual and cognitive processing as the user is not presented with a ready made graphic that he has to decipher and evaluate. Instead he is himself responsible for the changes and translations altering the display. From the beginning the user has continuous control over the emergence of the visual structure.
Cognitive enhancement resting on the haptic modality has been studied in other contexts and seen as a promising enabler. The extension of involved modalities uses the benefits attributed to multimodal interaction (Bernsen, 2008). The physically controlled, dynamic display system avoids the cognitive processing load associated with what has been called “the black-box-effect of algorithms”. This effect is a problem arising when visual constellations are calculated without a transparent origination (Turkay, et al., 2011).
A similar form of cognitive load is associated with the use of traditional text-only interfaces for library search. The text needs to be read and processed on an abstract level to form a mental representation of the relationships between the different search results. The described dynamic visual representation alleviates this burden.
Presenting object features as implied haptics in the search representation helps users to identify and recognize objects. Object features support our memory of what we have seen and accessed previously. In this way the data of search results are better rooted in cognitive representations and the evaluation of search hits can be more efficient. Moreover, having a preview of the item to look for, allows to more easily locate them on the shelf in the book stacks.
Both the tangible aspects as well as the implied haptic aspects in the search interface are in support of the conversion principle of adding sensory modalities. Even though the implied haptics do not deliver a real physical experience, they pertain to the haptic sensory modality. Studies have shown that affordances seen in object representations play equally a role in the cognitive conceptualization (Borghi, 2004; Pecher, et al., 2013).
Familiarization and defamiliarization
Familiarization and defamiliarization are two opposing processes of the accumulation or devalorization of knowledge of a certain domain or situation. They are terms known in literary theory as well as in psychology. Familiarization works through two different mechanisms. The first is the act of gaining knowledge through exposure to a specific, previously unknown, phenomenon. The second is the act of gaining knowledge through likening an unknown concept to something that is known from past experience.
Defamiliarization, in turn, is the act of making a familiar phenomenon unfamiliar. This enables the viewer to perceive it anew, outside of the habits of the familiar (‘see it with new eyes’). The notion of defamiliarization was first raised in the circle of Russian formalism by Viktor Shklovsky (1990). It also has played an important role in the epic theater of Berthold Brecht who used it as a way of breaking through the patterns of what we think we know and therefore do not question anymore.
In its effect on cognitive efficiency the approach of using familiarization or defamiliarization is an aspect of extrapolation. Familiarization enables the user to perceive more and with more acuity, building on existing experience in order to solve problems that are unfamiliar. Defamiliarizing a supposedly known context enables a renewed analytical access.
In their use of familiarization and defamiliarization strategies the three case studies follow different approaches. The VizBox and LibViz projects use familiarization in order to communicate the unfamiliar datasets in a seamless and effective way. The LSVA project follows the opposite approach. The films analyzed in the project are abstracted from their narrative content and familiar characters and scenes in order to reveal their subtexts. This approach aims to thwart the impression that the films and their messages are familiar and easily understood.
The VizBox makes use of familiarization in order to bring otherwise abstract and unfamiliar data into a familiar domain. Public data often comes in the shape of tables and numbers, and is not easily interpreted and assessed by non-experts. The geographic representations employed in the VizBox project are familiar representations that the inhabitants can easily relate to. The map gives users a pathway into the data by providing a familiar context from which to understand and explore the data.
By treating video as data, the LSVA project helps us see it afresh and examine the narrative elements in a more macroscopic way, helping highlight the convergences and divergences across genres. This can help us to see the limits of generic conventions, it also allows us to explore the implications of the cumulative impact of common story elements become naturalized and unquestioned.
The brain is often unable to distinguish between fact and fiction, particularly with image based media that seems to be relaying some corporeal reality (Kuhn, 2013). Since all filmic texts are mediated, the boundaries between fact and fiction are, in many ways, highly problematic. This is a particularly vital function of the LSVA as video “evidence” is used to catch or enforce crimes: for instance, traffic violations are standardly issued based on recorded infractions via surveillance cameras, and video footage is now commonly used in court cases.
For more than 20 years visual interfaces have been studied in the field of search interface design. Nevertheless, as van Hoek and Mayr (2013) state, “until now only minor contributions have been adapted to today’s DLs [digital libraries].” As the main reason for this lack of adoption the authors identified the fact that those interfaces do not produce a noticeable increase in access efficiency. Even though in questionnaires the users show a preference for visual interfaces, this preference does not result in measurable increase in efficiency.
Most study authors identify a lack of familiarity with novel interface strategies as the explanation for low scores of otherwise promising interface concepts. This unfamiliarity is not a problem for information representations or applications that a user will be utilizing on an ongoing basis. In this case a training time to become familiar with the particular ways information is handled and presented is acceptable.
In a library search interface, though, the necessity for training is a problem. Ideally, users should be dealing with the evaluation of the content of their searches rather than with the operation and deciphering of visual display systems.
The LibViz project uses a narrative embedding to increase familiarity with the search operations. This kind of metaphoric embedding has been already successfully used in user interface design, for example, in the desktop metaphor of the graphical user interface. The desktop metaphor provided a familiar setting to the computer environment and allowed users to more easily and efficiently operate their machines (Vaananen and Schmidt, 1994).
Constructing a familiar metaphor for the library search process, though, is an odd task. Most potential library patrons know the action of search per se. Adopting a metaphor that encompasses the search process by leveraging another domain of knowledge or action seems counterintuitive. Instead we aim to provide a narrative embedding for all elements of the process which draws on pre-existing narrative competencies and evokes stories or traditions that users are likely to be familiar with.
Affect versus rationality
The emotional aspect of data visualization can be powerful in countering the purely rational aspect of data. While static visuals can lend a certain amount of emotion, they are limited in their ability to keep users actively engaged with complex datasets. Brain research suggests the vital need for affect in learning; rationality alone is simply not enough (Immordino-Yang and Damasio, 2007). These projects then, are all intentionally tuned into the delicate balance between the logical and the emotional, though in markedly different ways.
The LibViz project and the VizBox both seek to bring affect to otherwise rational and ‘dry’ data by embedding narrative, gestural interaction and symbolically rich representations. The LSVA, in contrast, attempts to remove narrative and lessen affect. The goal is to allow users to examine the form and subtexts of the film itself. The narrative content and the emotional impact most films are intended to produce are rather a distraction in this analysis.
The LibViz project as well as the VizBox make use of gestures for controlling and navigating large amounts of information, so to speak ‘with our fingertips’. Such gestural interfaces often harken back to science fiction imaginations of interfaces to come. A well known example appears in the film Minority Report (Spielberg, 2002). It made the gestural interface, that puts the operator into a position akin to a conductor in virtuoso-like control of his orchestra, a popular icon.
The use of body language as a form of narrative interface can enforce the story-aspect of the narrative embedding. It also provides an efficient and pleasurable form of interaction (Álvarez and Peinado, 2012).
In the LibViz project, the affect-quality of the search process is an important concern. The goal is to make the search process pleasurable and give it playful aspects to encourage users to spend time searching comprehensively rather than just trying to get to the item they were looking for. The importance of positive emotional states for the search process has been analyzed and documented by several researchers, such as Fulton (2009).
The narrative embedding and its effect on the ease and emotional connotation of the task of searching can be understood as a form of extrapolation. It extrapolates the cognitive and emotional attitudes users have towards search into the domain of a ‘storied’ experience. This experience is related to, but generally not activated in the context of search. The narrative embedding not only makes the operation of the information representation easier, it also provides an emotional aspect.
In a similar way, the VizBox seeks to bring an affective dimension to rational, statistical data. It is sometimes argued that statistical data in general, and survey data in particular, is a form of information that is ‘dehumanized’: subjective human experience has been brutally removed and replaced with numerical data points.
This becomes especially apparent when the data itself concerns levels of satisfaction — affect disguised as rationality. By bringing data into a familiar, material, visual and social context, it might be argued that the information is yet again brought to life, and ‘re-humanized’. The material representation of the map of Norway may also serve as a powerful symbol for national identity. In this way feelings of national, regional or local pride can be activated and used to produce engagement.
The belief that knowledge is socially constructed has enjoyed a rich history in education, linguistics, philosophy and the arts. Although the approach is nuanced differently in different disciplines, social constructivist approaches share certain tenets: a rejection of the completely autonomous individual and complete objectivity. While data seems to be objective in its descriptive features and its measurement of artifacts, there is increasing attention to the constructed nature of data itself (Drucker, 2011; Markham, 2013) as well as its various types of representation. Paired with the emergence of distributed social cognition, social networking, and collective intelligence, the social aspects of data representation are an increasingly important consideration.
We can distinguish two forms of the social component in these projects: direct and deferred. We might also refer to these social interactions as synchronous and asynchronous as users either work in real time to use a data representation, or they leave traces for other users who come after. Each of these projects contains some aspect of the social and we link this aspect to Humphreys’ extrapolation since several brains extend the capabilities of the individual brain.
The VizBox offers opportunities for direct (and synchronous) social interaction. The physical presence of the VizBox itself enables and encourages social interaction and collaboration. This is a common advantage of tangible user interfaces, as pointed out by Hornecker and Buur (2006). Such interfaces lend themselves to face-to-face interaction, as several people can easily look at the same data at the same time, and still be able to interact with each other.
Because of its materiality and physical size, the VizBox can be placed in a context that allows it to work as a social object; as a result, it might reach new audiences and allow for serendipitous encounters.
The LSVA offers opportunities for deferred or asynchronous social interaction in the form of crowdsourced tagging. This is a crucial aspect of the project which allows the system to become more valuable the more it is used. Using words to describe images is problematic. Not only are there varying words for similar concepts (for instance, the words cease, desist, stop, end, terminate all mean roughly the same thing) — there are also multiple natural languages that should be considered. Thus, the more tags, the better.
Moreover, distributed social cognition, a form of collective intelligence that considers interactions between and among humans but also between humans and machines, suggests that the social aspect of the LSVA could lead to augmentation. That is to say that the combination of computer vision and natural language tagging may eventually make the whole greater than the sum of its parts. This can help us to understand the benefits of many users making tagging decisions independently of each other, after which the interface compiles them and creates rich search opportunities combined with machine querying. The computer can be seen as another actor in this social interaction: the LSVA’s algorithms which are based on Euclidean geometry for feature extraction and pattern recognition can parse images at the level of the pixel, which is invisible to the human eye.
The interaction techniques of the LibViz project realize effectively the possibility for collaboration of several people in a search. Both gesture as well as touch input on a big screen allow for multiple users to view and interact with the search. The interface hands the general control of the ‘whirlwind’ to the person who was first tracked in front of the screen. On the level of higher detail, though, multiple users can operate together and explore clusters or individual search hits. In this way several people can collaboratively parse the search results and find shared strategies to make their search more efficient.
Further, the co-presence in front of the screen enables a discussion between the users. This allows members of research groups to share and discuss their search results in real-time. It thus eliminates the need for a second step of sharing results among the group members after the individuals have completed their searches.
The form of real-time search collaboration is in line with our focus on the physical and immediate material aspect of the real object collection. It offers a more efficient collaboration than a time-deferred online search. According to Marti Hearst “much online interaction on social sites is for the social experience of the interaction, rather than for problem-centric information seeking.” Only a small percentage of questions on social network sites are focused on factual information search (Hearst, 2011).
Real-time collaboration in search is seen as a promising strategy and has been explored in several cases (Pickens, et al., 2008). Nevertheless, besides the focus on real-time collaboration, the LibViz project also implements a form of time-deferred collaboration. Previous searches are analyzed and inform the constellation of items in the ‘whirlwind’ in future searches. In this way the expertise that other users have manifested in their searches accrues in the system and can be used for intelligent associations of items and search hits.
This paper examines several strategies to enhance the epistemic access to data representations. Focusing on the cognitive side rather than the technological, we have analyzed three emergent data representation projects. Our goal is to empirically develop a categorization of different methods of epistemic enhancement. Along with those case studies we present a theoretical framework to classify epistemic enhancement.
The result are four main areas of epistemic enhancement, hinging on the integration of multiple sensory modalities (at the current state visual and haptic perception), strategies of familiarization and defamiliarization, affective qualities and social interaction.
Natural next steps would be to conduct longer-term user studies to determine measurable effects of multimodal information representation strategies. These studies should be designed such that they avoid skewing the results through the effect of different levels of familiarity and training.
In the future more systematic observational studies will be done. Once revisions based on these studies are integrated, we will aim to test other sensory modalities such as sound. In the case of the LibViz project sound could enhance orientation in the interface and for the evaluation of the search results. For the LSVA project it will be important to extend the film analysis to sound as a way to access undercurrents of meaning coded in that aural register.
All findings stemming from these three projects will have to be examined in other data contexts in order to test the generality of the discussed strategies. In the case of the VizBox, first steps have already been undertaken to explore other three-dimensional surfaces and contexts for using the platform. As a first step, a model of the complex of the School of Cinematic Arts in Los Angeles has been tested in order to demonstrate the wider potential of the VizBox as a platform for visualization and interaction (Figure 9).
Figure 9: Graphics projected onto the model of the School of Cinematic Arts at USC. Left: By pointing to individual buildings, relevant information and media content appears on an adjacent screen. Right: By selecting an event on the adjacent screen, the building in which the event is taking place is highlighted, and a line of moving dots shows where you should go to get to the right building.
Our approach balances theoretical reflection and the empirical insight of practical project realization. Even though most of the projects are still ongoing, we feel that it is meaningful to share the current state with the larger community of researchers in the field. We hope that by refusing to confine ourselves to a high-level theoretical exercise of finished projects exclusively, we can better discover the limits of both the theory and its various applications.
We also believe this approach signals our desire to contribute to an ongoing conversation, rather than shutting down inquiry or attempting to have the last word on any of the topics raised. This work is but the tip of a massive iceberg and there is a need for further research across many fields and mobilizing multiple methods.
About the authors
Andreas Kratky is a media artist and assistant professor in the Division of Interactive Media and Games and the Division of Media Arts + Practice in the School of Cinematic Arts at the University of Southern California. His work is broadly interdisciplinary and comprises research in human computer interaction and digital humanities as well as numerous award winning media art projects, which have shown internationally in institutions like the ICA in London, ICC in Tokyo, HDKW in Berlin, Centre George Pompidou in Paris, or REDCAT in Los Angeles.
E-mail: kratky [at] gmail [dot] com
Virginia Kuhn is an associate professor in the Division of Media Arts + Practice in the School of Cinematic Arts at the University of Southern California. Her work centers on digital rhetoric and the hybrid texts it engenders, those that blur the lines between the factual and the fictive, between word and image, between art and argument. Her work can be found in a variety of peer-reviewed print and digital journals and she serves on the editorial boards of several periodicals. She directs a graduate certificate in Digital Media and Culture and teaches a variety of classes in new media, all of which marry theory and practice.
E-mail: virginiakuhn [at] gmail [dot] com
Jon Olav Eikenes is an interaction designer from Norway, currently working with data visualization and interface design in Oslo. He earned his Ph.D. on kinetic interface design from the Oslo School of Architecture and Design in 2010, in which he coined the term ‘navimation’ to describe and analyze visual movement in screen-based interfaces. Eikenes visited the School of Cinematic Arts in Los Angeles as a Fulbright researcher in 2013-2014, where he studied and explored the potentials of data visualization.
E-mail: jonolav [dot] eikenes [at] gmail [dot] com
N. Álvarez and F. Peinado, 2012. “Exploring body language as narrative interface,” In: D. Oyarzun, F. Peinado, R.M. Young, A. Elizalde, and G. Méndez (editors). Interactive storytelling: Fifth International Conference, ICIDS 2012, San Sebastián, Spain, November 12-15, 2012. Proceedings. Lecture Notes in Computer Science, volume 7648. Berlin: Springer, pp. 196–201.
doi: http://dx.doi.org/10.1007/978-3-642-34851-8_19, accessed 19 May 2015.
A.M. Borghi, 2004. “Object concepts and action: Extracting affordances from objects parts,” Acta Psychologica, volume 115, number 1, pp. 69–96.
doi: http://dx.doi.org/10.1016/j.actpsy.2003.11.004, accessed 19 May 2015.
L.W. Barsalou, 2009. “Simulation, situated conceptualization, and prediction,” Philosophical Transactions of the Royal Society of London B, volume 364, number 1521, pp. 1,281–1,289, and at http://rstb.royalsocietypublishing.org/content/364/1521/1281, accessed 19 May 2015.
doi: http://dx.doi.org/10.1098/rstb.2008.0319, accessed 19 May 2015.
L.W. Barsalou, W.K. Simmons, A.K. Barbey, and C.D. Wilson, 2003. “Grounding conceptual knowledge in modality-specific systems,” Trends in Cognitive Sciences, volume 7, number 2, pp. 84–91.
doi: http://dx.doi.org/10.1016/S1364-6613(02)00029-3, accessed 19 May 2015.
N.O. Bernsen, 2008. “Multimodality theory,” In: D. Tzovaras (editor). Multimodal user interfaces: Signals and communication technologies. Berlin: Springer, pp. 5–29.
doi: http://dx.doi.org/10.1007/978-3-540-78345-9_2, accessed 19 May 2015.
T. Chen and A. Kratky, 2013. “Touching buildings — A tangible interface for architecture visualization,“ In: C. Stephanidis and M. Antona (editors). Universal access in human-computer interaction: Design methods, tools, and interaction techniques for eInclusion. Lecture Notes in Computer Science, volume 8009. Berlin: Springer, pp. 313–322.
doi: http://dx.doi.org/10.1007/978-3-642-39188-0_34, accessed 19 May 2015.
J. Drucker, 2011. “Humanities approaches to graphical display,” Digital Humanities Quarterly, volume 5, number 1, at http://www.digitalhumanities.org/dhq/vol/5/1/000091/000091.html, accessed 4 April 2014.
S. Few, 2009. Now you see it: Simple visualization techniques for quantitative analysis. Oakland, Calif.: Analytics Press.
C. Fulton, 2009. “The pleasure principle: The power of positive affect in information seeking,” Aslib Proceedings, volume 61, number 3, pp. 245–261.
doi: http://dx.doi.org/10.1108/00012530910959808, accessed 19 May 2015.
N. Gaiman and D. McKean, 2005. MirrorMask. New York: William Morrow.
M.A. Hearst, 2011. “‘Natural’ search user interfaces,” Communications of the ACM, volume 54, number 11, pp. 60–67.
doi: http://dx.doi.org/10.1145/2018396.2018414, accessed 19 May 2015.
W. van Hoek and P. Mayr, 2013. “Assessing visualization techniques for the search process in digital libraries,” In: S.A. Keller, R. Schneider, and B. Volk (editors). Wissensorganisation und –repräsentation mit digitalen Technologien. Berlin: K.G. Saur, pp. 63–85.
N. Hochman and L. Manovich, 2013. “Zooming into an Instagram city: Reading the local through social media,” First Monday, volume 18, number 7, at http://firstmonday.org/article/view/4711/3698, accessed 18 March 2014.
doi: http://dx.doi.org/10.5210/fm.v18i7.4711, accessed 19 May 2015.
P. Humphreys, 2004. Extending ourselves: Computational science, empiricism, and scientific method. Oxford: Oxford University Press.
M.H. Immordino-Yang and A.R. Damasio, 2007. “We feel, therefore we learn: The relevance of affective and social neuroscience to education,” Mind, Brain, and Education, volume 1, number 1, pp. 3–10.
doi: http://dx.doi.org/10.1111/j.1751-228X.2007.00004.x, accessed 19 May 2015.
H. Ishii, 2008. “Tangible bits: Beyond pixels,” TEI ’08: Proceedings of the Second International Conference on Tangible and Embedded Interaction, pp. xv–xxv.
doi: http://dx.doi.org/10.1145/1347390.1347392, accessed 19 May 2015.
W. Joyce, 2012. The fantastic flying books of Mr. Morris Lessmore. New York: Atheneum Books for Young Readers.
V. Kuhn, 2013. “Web three point oh: The virtual is the real,” In: C. Haynes and J.R. Holmevik (editors). High Wired Redux: CyberText Yearbook. Jyväskylä, Finland: University of Jyväskylä Press, at http://cybertext.hum.jyu.fi/articles/155.pdf, accessed 18 May 2014.
V. Kuhn, M. Simeone, D. Bock, K. Franklin, A. Craig, L. Marini, and R. Arora, 2012. “Large-scale video analytics: On-demand, interactive inquiry for moving image research,” E-SCIENCE ’12: Proceedings of the 2012 IEEE Eighth International Conference on E-Science, pp. 1–5.
doi: http://dx.doi.org/10.1109/eScience.2012.6404446, accessed 19 May 2015.
A.N. Markham, 2013. “Undermining ‘data’: A critical examination of a core term in scientific inquiry,” First Monday, volume 18, number 10, at http://firstmonday.org/article/view/4868/3749, accessed 18 May 2014.
doi: http://dx.doi.org/10.5210/fm.v18i10.4868, accessed 19 May 2015.
B.H. McCormick, T.A. DeFanti, and M.D. Brown (editors), 1987. “Visualization in scientific computing,” Computer Graphics, volume 21, number 6, at http://www.sci.utah.edu/vrc2005/McCormick-1987-VSC.pdf, accessed 19 May 2015.
D. Pecher, R.M. de Klerk, L. Klever, S. Post, J.G. van Reenen, and M. Vonk, 2013. “The role of affordances for working memory for objects,” Journal of Cognitive Psychology, volume 25, number 1, pp. 107–118.
doi: http://dx.doi.org/10.1080/20445911.2012.750324, accessed 19 May 2015.
J. Pickens, G. Golovchinsky, C. Shah, P. Qvarfordt, and M. Back, 2008. “Algorithmic mediation for collaborative exploratory search,” SIGIR ’08: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 315–322.
doi: http://dx.doi.org/10.1145/1390334.1390389, accessed 19 May 2015.
E. Ranguelova and M. Huiskes, 2007. “Pattern recognition for multimedia content analysis,” In: H.M. Blanken, H.E. Blok, L. Feng, and A.P. de Vries (editors). Multimedia retrieval. Berlin: Springer, pp 53–95.
doi: http://dx.doi.org/10.1007/978-3-540-72895-5_3, accessed 19 May 2015.
B. Shneiderman, 1996. “The eyes have it: A task by data type taxonomy for information visualizations,” VL ’96: Proceedings of the 1996 IEEE Symposium on Visual Languages, pp. 336–343.
doi: http://dx.doi.org/10.1109/VL.1996.545307, accessed 19 May 2015.
V. Shklovsky, 1990. Theory of prose. Translated by B. Sher. London: Dalkey Archive Press.
J. Shuler, 2007. “Academic libraries and the global information society,” Journal of Academic Librarianship, volume 33, number 6, pp. 710–713.
doi: http://dx.doi.org/10.1016/j.acalib.2007.09.018, accessed 19 May 2015.
C. Turkay, J. Parulek, N. Reuter, and H. Hauser, 2011. “Integrating cluster formation and cluster evaluation in interactive visual analysis,” SCCG ’11: Proceedings of the 27th Spring Conference on Computer Graphics, pp. 77–86.
doi: http://dx.doi.org/10.1145/2461217.2461234, accessed 19 May 2015.
B. Ullmer and H. Ishii, 2000. “Emerging frameworks for tangible user interfaces,” IBM Systems Journal, volume 39, numbers 3–4, pp. 915–931.
doi: http://dx.doi.org/10.1147/sj.393.0915, accessed 19 May 2015.
K. Vaananen and J. Schmidt, 1994. “User interfaces for hypermedia: How to find good metaphors?” CHI ’94: Conference Companion on Human Factors in Computing Systems, pp. 263–264.
doi: http://dx.doi.org/10.1145/259963.260478, accessed 19 May 2015.
W. Wallace, 2008. “Academia’s big guns fight ‘Google effect’,” Guardian (22 April), at http://www.theguardian.com/education/, accessed 2 February 2014.
Received 22 August 2014; accepted 20 May 2015.
“Coping with the big data dump: Towards a framework for enhanced information representation” by Andreas Kratky, Virginia Kuhn, and Jon Olav Eikenes is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.
Coping with the big data dump: Towards a framework for enhanced information representation
by Andreas Kratky, Virginia Kuhn, and Jon Olav Eikenes.
First Monday, Volume 20, Number 6 - 1 June 2015
A Great Cities Initiative of the University of Illinois at Chicago University Library.
© First Monday, 1995-2017. ISSN 1396-0466.