Data are widely understood as minimal units of information about the world, waiting to be found and collected by scholars and other analysts. With the recent prominence of ‘big data’ (Mayer–Schönberger and Cukier, 2013), the assumption that data are simply available and plentiful has become more pronounced in research as well as public debate. Challenging and reflecting on this assumption, the present special issue considers how data are made. The contributors take big data and other characteristic features of the digital media environment as an opportunity to revisit classic issues concerning data — big and small, fast and slow, experimental and naturalistic, quantitative and qualitative, found and made.
Data are made in a process involving multiple social agents — communicators, service providers, communication researchers, commercial stakeholders, government authorities, international regulators, and more. Data are made for a variety of scholarly and applied purposes, oriented by knowledge interests (Habermas, 1971). And data are processed and employed in a whole range of everyday and institutional contexts with political, economic, and cultural implications. Unfortunately, the process of generating the materials that come to function as data often remains opaque and certainly under–documented in the published research.
The following eight articles seek to open up some of the black boxes from which data can be seen to emerge. While diverse in their theoretical and topical focus, the articles generally approach the making of data as a process that is extended in time and across spatial and institutional settings. In the common culinary metaphor, data are repeatedly processed, rather than raw. Another shared point of attention is meta–data — the type of data that bear witness to when, where, and how other data such as Web searches, e–mail messages, and phone conversations are exchanged, and which have taken on new, strategic importance in digital media. Last but not least, several of the articles underline the extent to which the making of data as well as meta–data is conditioned — facilitated and constrained — by technological and institutional structures that are inherent in the very domain of analysis. Researchers increasingly depend on the practices and procedures of commercial entities such as Google and Facebook for their research materials, as illustrated by the pivotal role of application programming interfaces (API). Research on the Internet and other digital media also requires specialized tools of data management and analysis, calling, once again, for interdisciplinary competences and dialogues about ‘what the data show.’
The first three articles begin to address the very idea of data, the ways in which data serve to frame particular conceptions of reality, and how meanings are ascribed to data in social contexts. In the opening contribution, Tom Boellstorff traces some of the historical and conceptual origins of data and meta–data, illustrating how current research on big data may benefit, among other things, from anthropological categories of thin versus thick description (Geertz, 1973) as well as of the raw and the cooked (Lévi–Strauss, 1969). Annette Markham explores how any given conception of data serves to frame and prefigure the domain of study, and outlines alternatives to predominant positivist framings of data and inquiry. Klaus Bruhn Jensen, in his essay, presents a model for conceptualizing and studying meta–data, simultaneously as products and processes of communication, with reference to the meta–communication that always accompanies communication in the everyday sense.
The middle group of articles elaborate on some of the methodological and analytical issues associated with big data and digital media, each article drawing on empirical work about a particular social domain. Grounding her discussion of social media metrics in the history of mass media audience measurement, Nancy Baym draws on her studies of music and musicians to identify some persistent ambiguities and limitations of metric and big–data analysis when it comes to assessing artists’ and users’ social and personal values. Rasmus Helles returns to the topic of the long tail of Internet uses (Anderson, 2004), and shows how qualitative genre studies can be combined with big–data procedures in order to better account for the generative mechanisms behind the long–tail distribution of Web site use. Alex Halavais explores the potential of big data for promoting a collaborative, participatory variety of science that involves non–professional or peer scientists, as exemplified by concrete interactions on the popular discussion site Reddit (http://www.reddit.com).
The third set of articles addresses the state of the art and future of research, which are being affected in fundamental ways by new conditions of access to data as well as new conditions of disseminating findings and insights to other researchers, collaborators, and stakeholders. With special reference to Twitter, Farida Vis discusses some of the analytical resources available to social media research, while noting the relative neglect, also from a big–data perspective, of the visual — still and moving images — as key elements of what is being communicated and, hence, must be studied online. In the final article, Axel Bruns reviews some of the requirements of academic research and publishing that rely on big data, and considers ways of reconciling new methodological opportunities with classic standards of data quality and scholarly quality as such.
This special issue is intended as an invitation to more scholarly and social dialogues about the data that we all make as communicators, citizens, and consumers, and which some of us take a special interest, either professionally or non–professionally, in processing and remaking for different purposes. Big data represent one more resource for understanding how people communicate and interact socially — like other data and resources, big data present methodological, political, and ethical issues and choices. Whatever we do with the data that become available and accessible to us, we are making them.
Several of the articles in this special issue were first presented as papers to a roundtable session entitled ‘Digital data — lost, found, and made’ at the Internet Research 13.0 Conference, University of Salford, U.K., 18–21 October 2012. The guest editors and presenters are grateful to the organizers of the conference and to the participants in the roundtable, who offered constructive questions and comments about the presentations. The same papers were presented at a pre-conference seminar with the same title on 16 October 2012 at the Centre for Communication and Computing, University of Copenhagen, Denmark — http://ccc.ku.dk/. The guest editors and presenters would like to thank the Centre for Communication and Computing for organizing and financing the seminar, and the participants for their contributions to the discussion.
About the authors
Rasmus Helles is assistant professor at University of Copenhagen in the Department of Media, Cognition and Communication. Helles’ research addresses media sociology with a special focus on the integration of digital media in everyday life.
E–mail: rashel [at] hum [dot] ku [dot] dk
Klaus Bruhn Jensen is Professor in the Department of Media, Cognition, and Communication, University of Copenhagen, Denmark, and Vice Head of its Center for Communication and Computing. He is Life Member for Service of the Association of Internet Researchers and a Fellow of the International Communicology Institute. Recent publications include contributions to the International encyclopedia of communication (12 volumes, Blackwell, 2008– , http://www.communicationencyclopedia.com/), for which he serves as Area Editor of Communication theory and philosophy; Media convergence: The three degrees of network, mass, and interpersonal communication (Routledge, 2010); and A handbook of media and communication research: Qualitative and quantitative methodologies (Second edition, Routledge, 2012).
E–mail: kbj [at] hum [dot] ku [dot] dk
Chris Anderson, 2004. “The long tail,” Wired, volume 12, number 10, at http://www.wired.com/wired/archive/12.10/tail.html, accessed 1 September 2013.
Clifford Geertz, 1973. “Thick description,” In: Clifford Geertz (editor). The interpretation of cultures: Selected essays. New York: Basic Books, pp. 3–30.
Jürgen Habermas, 1971. Knowledge and human interests. Translated by Jeremy J. Shapiro. Boston: Beacon Press.
Claude Lévi–Strauss, 1969. The raw and the cooked. Translated by John and Doreen Weightman. New York: Harper & Row.
Viktor Mayer–Schönberger and Kenneth Cukier, 2013. Big data: A revolution that will transform how we live, work, and think. Boston: Houghton Mifflin Harcourt.
Received 16 September 2013; accepted 17 September 2013.
“Introduction to the special issue ‘Making data — Big data and beyond’” by Rasmus Helles and Klaus Bruhn Jensen is licensed under a Creative Commons Navngivelse–IkkeKommerciel–IngenBearbejdelse 3.0 Unported License.
Introduction to the special issue ‘Making data — Big data and beyond’
by Rasmus Helles and Klaus Bruhn Jensen.
First Monday, Volume 18, Number 10 - 7 October 2013