Home made big data? Challenges and opportunities for participatory social research
First Monday

Home made big data? Challenges and opportunities for participatory social research by Alexander Halavais

Why should big data be the purview of only national governments, corporations, and (sometimes) academics? Many Web communications and other electronic traces are made by people as part of their everyday lives, and many of these same producers of big data have questions they want to answer about their own lives. This paper suggests some ways of thinking about big data in the context of questions people are trying to answer for themselves and their communities. It then takes an initial glance at conversations on Reddit to see how users draw on evidence, and concludes by arguing that there is space for home–made big data.


1. Introduction
2. Data and observation
3. Big, open data
4. Doing science in public
5. Reddit and making truth
7. Citation needed
8. Gathering data
9. Reddit science and science science
10. Supporting robust participatory research



1. Introduction

Data are not found, they are made, and though sometimes forgotten temporarily, this recognition remains at the root of the practice of empirical research. Today, just as many in the social sciences are considering the potential of “big data,” there are new ways in which non–academics are presenting and evaluating claims about social and human behavior. This includes discussion in new conversational forums — the coffee houses of the twenty–first century — that make claims about how we behave and how to change that. We also see echoes of new questions about evidence in the emergence of the “quantified self” movement (Wolf, 2010), providing tools for self–monitoring and sharing the resulting information easily with others. At present, however, there is a gap between the collection of these data and their use. While academic researchers are increasingly interested in the ways in which large data collections can tell us more about mass society, those collecting their own data are often using it primarily for self–discovery. Bridging these two groups would yield benefits for both.

Removing some of the impediments around collecting, sharing, and — to a lesser extent — analyzing sources of data opens up new structures of collaborative, participatory research. Non–academics have long done social and behavioral research, just as non–journalists created news long before and after the professionalization of journalism. In both cases, networked communication makes it relatively easier for non-professionals to engage on their own time and in their own ways, finding communities of likeminded individuals. The resources available to today’s amateur are different not just in scale but in type from those available in previous decades. As these interested publics gather, they form their own invisible colleges, with their own standards for exchange, existing on the fringes of academia, and often ignored.

Research consists of more than data and its analysis. Training scholars usually requires a thorough grounding in the history of a field and the epistemological and interaction framework discovered by working with other researchers. As we have seen especially in the natural sciences, poor education has led to serious public misunderstanding of scientific conclusions and how they are reached, and this has made it difficult to address serious issues of public concern, from climate change to public health. This extends beyond the natural sciences, to critical approaches to using evidence and making claims about human behavior and social interaction.

While “lifehacking” and the quantified self may seem like a cultural movement without reference to research in the academic world, efforts to make claims about self–improvement share a common heritage with the development of science, and represent an interesting form of participatory, open research. After suggesting a typology of public research and the place of data in it, we turn to three exemplar forums (or “subreddits”) on the popular discussion site Reddit to see how those who are interested in areas of lifehacking and self–improvement are making their own arguments when discussing these issues with peers. By far the most frequent claims to evidence (when they are made at all) are to anecdotal self–experience exclusively, but there are examples of early attempts to aggregate these experiences more formally. Finally, we explore some of the ways in which lifehacking and the quantified self may lead to new forms of citizen social and behavioral science.



2. Data and observation

We could describe the process of research without ever using the word “data.” One view of the scientific model for research suggests that scientists make observations and use the records of these observations to reject hypotheses about how the world works and collectively arrive at new hypotheses to test. Data, a collection of these recorded observations, represent a relatively visible part of the work that researchers do as well as a building block of scientific argumentation. They are the evidence, external (eventually) from the researcher that allows for claims of objectivity.

For many, data are axiomatic; something that comes before and helps make up information, knowledge, and wisdom [1]. Sherlock Holmes suggests the constructive nature of data in The Adventure of the Copper Beeches: “‘Data! data! data!’ he cried impatiently. ‘I can’t make bricks without clay’” [2]. “Datum,” as used in Latin, and applied in analytical fields like mathematics before ever applied to observations, suggests a “given”: a starting point from which argument or proof may follow. The use by researchers (and likely by Holmes, despite the above quotation) instead begins with empirical observation and the work done to discover evidence. Data are not assumed, but created in the work of structured observation.

This view of data as somehow pre–existing seems to be resurgent in work around “big data” and “data mining.” Talking about “mining” or “gathering” data, rather than creating it, may be little more than an artful turn of phrase; something akin to Michelangelo’s (likely apocryphal) suggestion that he merely set David free of his slab of marble. Likewise, journalists do not like the idea that they “make news” or “manufacture news,” as the titles of two books examining the work of journalism suggest (Fishman, 1980; Tuchman, 1978), since this would imply fabrication. Rather, they “gather facts” [3]. Eliding a history that suggests that conditions, contexts, methods, and instruments profoundly affect the data collected, especially social scientists seem to want to assume that data gathered is somehow purer than that created through more active processes.

Those who promote the use of unobtrusive measures in the social sciences do not generally defend this idea of purity. In his classic on the subject, Webb (1966) treats unobtrusive observations like any other measure and suggests that their power lies mainly in providing multiple perspectives of a particular phenomenon and reducing the reactive effect on subjects. The more widely used instruments of the social and behavioral scientists — the experiment, survey, and interview — tend to have an active effect on the subject being studied. In contrast, unobtrusive measures, from worn floor tiles to instrumented beds, take on the patina of found knowledge. But Webb recognizes that these “found” data nonetheless represent observations made by the researchers.

This carries over to the use and abuse of “big data,” again, particularly in the social sciences. The definition of big data differs according to context, though the version most used outside of academia refers — perhaps in part to encourage sales — to collections large enough that they outstrip the capacity of existing hardware and software systems to store, transmit, and analyze them. Within the sciences, the shift has more to do with a change in the ways in which many think about the process of discovery. The hypothetico–deductive model has been challenged as often as it has been articulated, but big data seems to represent a more sustained challenge to the ways in which scientists make inferences about the world. Rather than testing hypotheses against a set of observations, it is possible to draw inferences, detect patterns, and make predictions without any a priori conceptions of outcomes. While this may lead to accurate predictions, it takes further work to find meaning in the patterns or to make the inductive leap toward explanatory theory, and more data affords the opportunity to embrace new kinds of complexity in our analyses. Big data have the potential to partially fulfill Francis Bacon’s hope for the social sciences as something more than just theory extrapolated from interactions at the individual level. Unfortunately, the shift to the use of big data in the social sciences also presents a host of potential for abuse (boyd and Crawford, 2011).

In the natural sciences, the recent move to big data has not necessarily meant leaving behind traditional questions of reliability and validity. No one expects that the massive amounts of data collected by the Large Hadron Collider or various shared astronomical instruments to be free from the influence of the process by which they were collected. Indeed, when it comes to mining data, both the removal from the initial context and the scale can serve to magnify issues of validity and reliability [4]. Just as the (sometimes controversial; Shapin, 2009) “sociological turn” has come to the natural sciences and there is growing recognition of the role of scientists in constructing their interpretations of observable phenomena (Fine, 1996), some social scientists have moved in the other direction, glossing over the questions of validity, particularly when the data are of large scale and represent online traces: logs of activity or connections between Web users, for example.

But this alone should not suggest that using “big data” means using it badly. Depending on how central one considers Durkheim’s investigation of suicide — which drew entirely on published statistics and sought to find meaningful relationships within those data — to the development of sociology, detecting social structure through the trace data of the Internet may represent the latest point in a longer history of using large scale data to build theory. And as Latour suggests, more than a century ago Gabriel Tarde had his own vision of big data in the social science, drawing on large–scale statistics and making them available to the public, that is “inescapably connected to the digital world to which we now have access” [5]. New technologies have vastly increased our capacity to gather and manipulate social data, but as new as this technological capability may be, it is in some ways deeply rooted in the relatively short history of sociology as a field.



3. Big, open data

The relationship of data to the process of research is neither simple nor obvious, but the idea that evidence should be open to external scrutiny is essential to what makes scientific knowledge different from authority–based and esoteric forms of explanation. Because no one’s argument may be free from scrutiny, it is important that the evidence used be verifiable. For this reason, openness of both results and process is essential to the practice of science (Popper, 1945).

In some ways, the existence of these large, relatively available databases represents an offering left at the shrine of Web 2.0, which tends to rely on a set of common values. Open is better, and by opening things, it makes them available for re–use. One of the three pillars of the Perl programming language, Laziness is a virtue (Christiansen, et al., 2012), has made its way into the lifehacking world, and providing data so that others do not have to do so is an essentially virtuous act.

The difficulty with public databases is that the peculiarities of the data collection are often difficult to figure out and require a significant amount of contextual understanding to be shared effectively (Ribes and Jackson, 2013). Nonetheless, the methods of organizing both data and people are often shared between scientists working in institutional settings and those who are participating in building a public, commons–based culture. These shared tools and shared methods should facilitate more exchanges between the scientific establishment and various publics.

Research done in public institutions like universities and research institutes is not, of course, the only source of such large collections of data. Since such collections can relate to just about anything, a great deal of big data is collected by private corporations about their own operations, indicating, for instance, the global logistics surrounding delivering and selling products at “big box” retailers or chain restaurants. But these sometimes also include data recorded and analyzed to better understand the behavior of customers, not just of shops like these, but potential audiences of advertisers. These include the databases of the large social media platforms — Google and Facebook, for example — and records of the continuous monitoring projects of signals intelligence organizations like the U.S. National Security Agency (Lesk, 2013), and generally remain inaccessible to the “targets” (or subjects) of the data collections — and often nearly as inaccessible to researchers in universities, as Axel Bruns notes in this issue.

These data records have become richer as more people share more of their lives, and with lifelogging and the growth of the quantified self–movement, there are greater opportunities for comparison and aggregation. In some ways, the quantification of our lives is a natural outgrowth of other kinds of public self–expression. The step between cataloging each place we visit on FourSquare or documenting and sharing our micro–insights on Twitter to recording the number of steps we take on FitBit or sharing our latest document updates on GitHub is a fairly small one. For now, the impetus for such lifelogging might be characterized as “collaborative introspection,” but a by–product is a massive increase in the amount of big data tracing our everyday lives. When such collections become available not only to corporate and government surveillance operations, but are shared and aggregated among those who produce the traces, it raises the opportunity for new kinds of experimentation and research by amateurs.



4. Doing science in public

The idea of “public science” or “citizen science” is vague enough to contain a number of different structural relationships. At its core is perhaps the idea of a public understanding of science. Although there have been calls for technocracy as an appropriate way of managing public policy, most notably by Veblen (1921) and others during the middle part of the twentieth century, until recently the question of the public’s role in scientific endeavors has largely revolved around the issue of scientific literacy as a way of supporting and legitimizing policy decisions based on science. In addition, public knowledge of science has helped to promote the practical application of research to technological innovation, from the industrial revolution to today (Stewart, 1992). Dewey suggested that education should help to encourage a “scientific habit of mind” among wider publics (Dewey, 1910; Steinkuehler and Duncan, 2008). Often this is translated into broader efforts at educating the public, via schools, museums, and other public institutions, in basic scientific literacy (Rutherford and Ahlgren, 1990). At the more active level, a literate public can take on “undone science” as an cause for activism, presenting a public case for better support of institutionalized science (Frickel, et al., 2009). This has been particularly true in the case of environmental issues.

The traditional view of scientific literacy positions institutionalized science as inherently knowledgeable and forward thinking, and the public as possessing little more than a rudimentary understanding of what scientists do and the world they live in. It often begins with the assumption that the public is prone to ignorance, and institutionalized science is the best and only way to enlighten society (Irwin and Michael, 2003). More recent work has suggested that the relationships between traditional scientists and wider publics is far more varied and complicated, and invites participation of scientists in broader public debates (Bauer, et al., 2007). It also suggests that “citizen scientists” can be effective partners in engaging in scientific investigation within their own communities, sometimes making discoveries that institutional scientists might ignore [6].

Over the last few years the issues both of big data and of crowdsourcing have provided for new ways of collaboration between institutional scientists and various publics. Although it is not neatly divided into these categories, we might think about a two–by–two matrix of institutional scientists and interested publics producing or analyzing data. The most traditional quadrant, of course, is institutional scientists producing data and analyzing it. It may be the same researchers collecting and analyzing the data, or they may be sharing data with their colleagues. In the natural sciences, this work is increasingly done in teams, and across teams, and in the social sciences and humanities new forms of data and data infrastructure (“e–Social Science,” “cyberinfrastructure,” and “digital humanities”) mean that team research is becoming more common, though still unusual.

The second pattern of work recruits communities of interested (or paid) assistants to collect data that can then be used by institutional researchers to analyze and build theory. Much of the work in this area happens in environmental science, where members of a community can collaborate in collecting samples, or making observations. This could be coordinated before the Internet and other networks provided for connection (as, for example, with the National Audubon Society’s century–long Christmas Bird Count in the United States; P.A. Stewart, 1954), but is accelerated by the capacity of networked communication to coordinate data collected by a large number of contributors. Part of this is simply a matter of distributing effort, but since the effect of this research often has policy implications, it also represents public science as a social movement (Kinchy and Perry, 2012). As more members of the public carry smart phones, it becomes possible to create ad hoc networks for “participatory sensing” (Burke, et al., 2006). And researchers are not just turning the public into sensors, but into filters and processors, by leveraging their pattern recognition abilities to find solutions for protein folding or code other kinds of content (Khatib, et al., 2011; Heer and Bostock, 2010).

This is one area where the quantified self–movement has already made inroads. Researchers have noticed that sites that aggregate user–collected data about their own health and fitness represent a rich source of data for analysis (Swan, 2013). Although much of the data is collected by individuals to monitor their own health, other behaviors are also monitored — location, mood, and attention, among others — and the number of sensors and amount of unobtrusive instrumentation continue to increase. It is not just monitoring of self that is happening, of course, though this may be the most widely shared. The rise of analytics in a wide range of institutions and organizations allows for the tracking of workers, of students, of citizens, and of customers in new ways. Though not all these data are as openly available, they are increasingly co–opted by researchers seeking to understand social behavior.

The mirrored version of this is arrangement of the public as data collectors is citizen scientists making use of data produced by those working within the traditional institutions of science and of government — the world of “open data.” Open government initiatives have made available large collections of data that are used, for example, for managing cities, ranging from business licenses to demographics to geographic–keyed sources of data. Both policy mandates for government–funded research and public pressure have made scholarly communication more available to the public, as well as the tools and sources of data that are described by scientific publications. The promise of this openness is that any citizen can make use of the data to enable their own process of discovery. Although making this data available publicly makes this potentially possible, in practice, there are other barriers to effective use (Gurstein, 2011). Certainly tools for visualizing public data (like Many Eyes, or Google’s Public Data Explorer) and a number of open source analysis tools make these sources of data more tractable, but they remain difficult to approach, even for those with expertise in the area. Nonetheless, conversations between journalists and programmers yield new understandings of how open data might be interpreted and used (Parasie and Dagiral, 2012), and it may be that as these skills and tools are more widely disseminated, this becomes less of a bottleneck.

This bottleneck extends to the fourth and rarest combination: data being produced or collected by publics, and then used by the same or other publics to draw new insights. The ideal citizen science project involves those from the public in each stage of the research, and results not just in drawing interesting conclusions, but providing the research team with new knowledge, skills, and insights (Bonney, et al., 2009). There are several examples of collaborative commons for more general knowledge, including things like Wikipedia and Open Street Map. Encyclopedias and maps were generally thought to be produced by experts for public use, but in these two cases, the public is producing the information and curating it. But these are not directly related to social research — indeed, Wikipedia forbids original research. Finding examples of citizens engaged in science is made more difficult because they are not privileged as experts in the institutionalized peer–reviewed journals, and the knowledge gathered by non–professional scientists therefore is less likely to appear in scientific literature or affect public policy (Suryanarayanan and Kleinman, 2013).

A more realizable goal might be self–experimentation. Single–subject experimentation counts among its exemplars the original quantified self–proponent, the sixteenth century investigator Santorio Santorio, who measured the food he consumed, his temperature, and his weight on a continuous basis for 30 years (Altman, 1988). Santorio has been followed by a long string of biologists, chemists, psychologists and others who saw themselves as the most convenient subject of their experiments. Well–designed studies with an “N of 1” that apply and withdraw treatments, for example, can represent a compelling way to try out an idea and discover if it has merit (Neuringer, 1981), and it is often the only avenue open to a moonlighting researcher with limited means.

While many have rejected the idea of the public as an audience, or at most a cheerleader, for “real” scientists, few have embraced the opposite quadrant, of publics doing their own research outside of the scientific establishment, at least in the natural sciences. The idea of “garage biotech” raises alarms for some people, including the police (Ledford, 2010). The mistakes are easy to find, especially in the United States, a country that has of late celebrated pseudoscience — from anti–immunization efforts, to the teaching of “intelligent design,” to obfuscation of climate change evidence. It is likely that the effect of the Internet in both opening access and making it easier to spread poorly thought–out ideas has contributed to this.

The same leveling process opens up tools that can be employed in new models of public research and scientific inquiry, and if employed may give rise to citizens who not only appreciate research and find new ways to apply it to their own lives. Understanding how we interact socially has been taken up by those in the business management world, seeking new efficiencies. This makes up a significant part of the lifehacking world, with books like Getting things done (Allen, 2002) leading many to apply approaches of worker productivity to their private lives. Likewise, recent work in positive psychology (Diener, 2000) both attracts public attention and invites self–oriented investigation. Given a long history of experimentation surrounding self–improvement, it seems natural that we should find the vanguard of new practices of social experimentation and research among those who are trying to improve their own performance in various ways.



5. Reddit and making truth

New discoveries are often shared by scientists via new forms of online social media. Scientists, professional or not, are coming to use the same communication tools that have become popular in other spheres of public life to present their ideas at various stages of their work. The means of accomplishing this include publication on blogs (Kjellberg, 2010; Bonetta, 2007; Halavais, 2006; Sauer, et al., 2005), tweets (Letierce, et al., 2010), and the use of various social networking sites (Procter, et al., 2010). These are natural venues for researchers who are not part of institutional scholarship to also share their work.

Reddit is a collaborative filtering and news discussion site and follows in a long tradition of community curation sites like Digg, fark.com, metafilter, and Slashdot. It allows users to submit links or publish short texts (“self–posts”) that can then be voted up or down by the community. Like other collaborative filtering sites, it also allows for comments on these articles, and for those comments to be voted up or down by the users. The collaborative filter provides a way of finding comments judged as high–quality by the community, but also represents a mechanism for disciplining users and enforcing tacit community standards (Halavais, 2009). The site as a whole is divided up into topically–oriented “subreddits.” Any user can start a subreddit, and a selection of the most popular subreddits make up the front page of the site.

The site receives tens of millions of visits monthly, and 15 percent of 18–29 year old men in the United States are users (Duggan and Smith, 2013). Though it has been available for eight years, a number of recent events, including a presidential AMA (Ask Me Anything) in 2012, catapulted it further into the public eye. Another episode that received wide attention, and eventual condemnation, was an attempt to crowdsource the identity of the Boston Marathon bombers (Kaufman, 2013). While not really a scientific puzzle, it represented a challenge of another sort, an attempt to make sense of abundant but largely incomplete data in the form of a heterogeneous collection of photographs, videos, and eyewitness accounts. Redditors (as users of the site refer to themselves) eagerly pursued information and ultimately contributed to misidentifying the bombers and fueling a witch hunt for the wrong suspects (Pickert and Sorensen, 2013). This represented a substantial blow to the feeling among many redditors that the site was particularly good at identifying new trends and information quickly and accurately, and prompted a bit of introspection.

Many of the subreddits (individual forums) on Reddit attempt to address questions of discovery and truth–finding, both about current events and larger natural and social questions. Among the top subreddits ranked by subscribers are /r/todayilearned (5th), /r/science (7th), and /r/askscience (22nd). (The /r/ prefix indicates a subreddit, so that the “Today I Learned” subreddit may be found at http://reddit.com/r/todayilearned.) Many of these are related directly to questions more directly focused on institutional science, and while they would help us understand some of the more popular forms of public engagement in research, they are less likely to provide examples of discussion around self–experimentation and collective public research. A number of subreddits approach issues of self–improvement or lifehacking in one form or another, including /r/lifehacks, /r/productivity, /r/gtd (“Getting Things Done”), and /r/QuantifiedSelf/, among others. Three of the most popular subreddits were selected for analysis; their subscriber rank and number of subscribers as of July may be found in Table 1. The /r/fitness subreddit is a community in support of those interested in weightlifting, /r/keto provides discussion around ketogenic dieting, and /r/nootropics provides space for discussing the use of drugs and other forms of cognitive enhancement. These three may differ from the popular default subreddits both because the users may be more experienced redditors, and because subreddits tend to create and reproduce their own community characteristics (Leavitt and Clark, 2013).


Table 1: The three subreddits examined, the number of subscribers in each as of July 2013, the rank of their popularity, and the average number of comments per article.
/r/fitness 366,880 29 12.2
/r/keto 74,594 217 13.2
/r/nootropics 22,364 735 12.9


How do people in these subreddits interact around questions that have evidence–based answers? How do they make claims about efficacy, safety, suitability, and the like? How do they convince others that their perspective is correct, or how do they arrive at such a determination as a group?

There are a number of ways this could be discovered, including approaching the users directly. But the content itself provides a first view of how truth–claims are established, at least within the three subreddits examined here. A traditional content analysis could determine whether the writing provides an indication of scientific habits of mind. Steinkuehler and Duncan (2008), for example, examined a World of Warcraft discussion board and coded content for indications of scientific and systems thinking.

Perhaps because these three subreddits focus on self–improvement, it seemed clear early on that such a direct coding would find little in the way of useful differences in approaches to discussion. Instead, the work was open–coded without an a priori set of expectations around how redditors might present their ideas. Starting in May of 2013, the most recent 600 articles in each of the subreddits in Table 1 were accessed. The first hundred of these were ignored, to provide enough time for commenting to take place. The remaining 500 articles and their comments were read carefully, in an attempt to understand how they made use of claims and provided arguments and evidence for those claims.




The most common form of response for all three of the subreddits consisted of an offering of advice, usually without much in the way of explanation. Sometimes this is the result of a self–post that specifically asks for such advice, e.g., “Should I take my protein shake before or after the workout for best results, and should I take it within half an hour of my workout ending, and also; water or milk with the protein shake??? Thanks, any advice will be greatly appreciated.” The responses to such a direct question almost always indicated a bit of advice that could be taken as personal opinion, sometimes along with explicit evidence or demonstration of expertise.

One typical post on /r/fitness would include a video of someone doing a squat, and comments from a number of people about how the original poster’s hands were positioned or how their weight was balanced. That it was the redditor’s own opinion was implicit, as was the idea that there was one best way to execute the exercise. Among these, there were a handful of indications that they were providing this advice because it had only been effective personally, and as one redditor noted, others should “do whatever the hell works for you.” This kind of disclaimer of generalizability was common across all three subreddits, but the number of explicit appeals to personal experience was lower in /r/fitness (135 total comments) than it was in /r/nootropics (525 total comments) or /r/keto (669 total comments, or about 10 percent of all comments). In most cases, advice was provided without any indication of its source or other indication of its credibility.

Redditors in /r/nootropics would often indicate what combination (“stack”) of pharmaceuticals worked for them, or what process they took in arriving at their current supplements. While /r/fitness commenters rarely discounted their own experience, many in /r/nootropics made clear the limits of their own experience by providing a “disclaimer” indicating it was “just” their personal experience, or ending with “YMMV” (“Your Mileage May Vary”). In many other cases, the indication that the posting was their personal experience was offered as a bit of a hedge; no universalizing claim was being made even if the post was detailed and made clear arguments. The redditor was interested mainly in improving their own performance or knowledge, and offered their own case merely as a useful indication among others. Perhaps the majority of these comments could be summarized as “this is what works for me.”

There seems to be a consensus on /r/nootropics that there are significant differences among individual reactions to various forms of cognitive enhancers, so that, for example, one form of racetam might work better, or differently, for any given user. There is evidence that some people are genetically predisposed to have little noticeable effect from taking the drug modafinil, for example, which is intended to promote wakefulness and is used within the nootropic community for mild cognitive enhancement. So while participants are happy to share their personal recipes of nootropic stacks, there is an understanding that these are personalized, and apply to the biology, desires, and tolerance of risk of each individual. The relatively high number of indications that an opinion is based on personal experience is likely not just a difference in discourse style, but a reflection of the underlying expectations concerning generalizability.

At least within these three groups, the emphasis is heavily on application: there is an interest in getting it right because it has a direct effect on what redditors want to achieve in their own lives. Arriving at a generalized, repeatable result is important, but not essential. It is a space in which differentiation is desired, and so coming up with a solution that “works for me” may be preferable to one that “works for everyone.”



7. Citation needed

A small number of the comments made reference to “the literature” in some form, though the body of acceptable secondary sources is much broader than what might be found in academic circles. The most common reference to the literature was oblique in the extreme, indicating that “studies have shown” that something was true, without naming the studies. Others claimed “there hasn’t been much research on this,” which is much more difficult to challenge.

Not that such statements were often challenged. A handful of comments requested that the poster back up their claims with citations (19 such requests in /r/nootropics, and six and three in /r/keto and /r/fitness, respectively). In most cases, this represented a direct request for citations, or a more off–handed note that the discussion lacked reference to the scientific literature. Some of the ambivalence in the relationship to the scientific literature is suggested in one redditor’s hedging of a request:

As far as I know (see my sibling comment), all of the evidence for modafinil tolerance is also anecdotal.

(If I’m wrong and there is actual scientific evidence for modafinil tolerance, do please share. I’d be very eager to read it. (Damn it, there’s no way to make that sound non–sarcastic, but seriously, I like reading studies, and that would be interesting data. (Now I just sound weird. “I like reading studies”? Who says that?)))

When challenged, most posters provided some form of evidence, sometimes from the scientific literature, after having been — often politely — called on to produce it. The participants in these discussions, however, generally seem to consider research presented in traditional scientific journals to be just one of a range of sources of evidence.

Redditors in these subreddits supported their claims with citations to outside sources only very rarely. When they did, the most frequently linked source was Wikipedia, across all three subreddits. These citations were often simply to define a term or idea. There were a number of references to scholarly, peer–reviewed studies, almost always linking to an openly accessible version of the work (most often in PubMed). In the cases where studies were referenced they were sometimes assailed for having small sample sizes or on other methodological grounds. In several cases where the sample size was questioned, there was a brief and simple discussion of statistical power.

The other linked evidence came from a broad set of sources on the Web. Links from /r/fitness were more likely to reach fitness blogs and similar sites, as well as videos from acknowledged expert trainers. Links from all three pointed to news stories and documentaries, as well as to other articles on Reddit itself. In all, these diverse sources of evidence were more frequently linked than traditional scholarly sources. While there are subreddits that specifically indicate that reference must be made to the scientific literature, in the three presented here, the literature remained firmly at the periphery.



8. Gathering data

Particularly important to our discussion of the potential of public research is how redditors collect and make use of data. A survey of subreddits beyond these three suggests that few, including those that take a more traditionally scientific approach, make use of primary evidence. The degree to which redditors do not collect, interrogate, and rely on data that represents valid or reliable measurement may, at first, seem disappointing. These samples suggest that the systematic collection or use of data on Reddit is relatively rare. However, there are reasons to see interesting potential.

The most common form of evidence presented in these groups is redditors’ own experiences. Usually these are collected by redditors with the intention of shaping their own future behavior. There are exceptions, of course. A contributor who accidentally overdosed on a new nootropic compound, taking many times the maximum recommended amount, reported the results of this unintentional experiment explicitly (if somewhat ironically) “for science,” indicating that his experience might serve others more directly. But even when self–experiments are shared anecdotally, they help to form tacit knowledge within the community. There is a consensus among those who participate on /r/nootropics, for example, that a combination of caffeine and L–theanine is a relatively safe and effective way of increasing mood and alertness. This is not particularly news in the scientific community (e.g., Haskell, et al., 2008; Kelly, et al., 2008), but those on /r/nootropics seem to be adopting it based largely on the subjective self–reports of other members of the community. They additionally make practical decisions regarding the availability and quality of the compounds, the appropriate doses, and the best ways of administering them, apparently based largely on these self–reports.

There are several calls for “surveys” of such self–reports, either indicating experience with a particular substance or some combination of substances. One such request garnered a dozen responses to a questionnaire. Some of these were discussed, but an explicit attempt to summarize the responses was not apparent. Less explicit kinds of collections of opinions and self–reports happened in all three subreddits, often around problem–solving or articles seeking advice. The most common tacit knowledge becomes explicitly acknowledged in the FAQ for each subreddit (including the caffeine/L–theanine combination noted above), but there remains a significant amount of common knowledge that can be gained only through ongoing participation (including lurking) in the subreddits.

Particularly on /r/keto, this sharing takes another step forward. Many who are successful at losing weight on a ketogenic diet provide a link to their open profiles on MyFitnessPal, a site used for tracking meals and activity, along with before and after photos. The photos provide an opportunity for the community to celebrate success, and as encouragement for those seeking to lose weight. The food and exercise diaries represent a parallel to the personalized anecdotal reports within the subreddits, but with much more specificity. No one would follow these diets exactly, of course, but they provide the viewer with more specific examples of generalized principles that seem to work. MyFitnessPal represents an important practical tool for the denizens of /r/keto — so much so that redditors built a browser tool that allows them to more effectively use the site to track fiber against carbohydrate, a function not available on the site itself.



9. Reddit science and science science

It is difficult to make generalizations about Reddit as a whole on the basis of a small number of subreddits, and even more difficult to generalize to the broader public forms of scientific discourse. Each of these subreddits linked out to other kinds of forums related to the subreddits’ topics, often with their own norms of discourse and ways of arriving at truth. But as a whole, the site encourages certain kinds of interactions, and these patterns of interaction produce particular “epistemic cultures,” which shape the ways in which participants know what they know. Understanding one epistemic culture can help us to understand how people come to know in other social contexts as well [7]. Although it represents a modest first glance at truth–claims on Reddit, we can begin to see some of the contours of that space.

This variability extends to the ways in which truth-claims are evaluated. Often, this is related to the culture surrounding the topic of the subreddit. The /r/ThreeKings subreddit, for example, indicates readers should “not attempt to establish what is ‘true’ or ‘false’, nor to judge anyone’s beliefs, but simply respect and support the people who choose to share their paranormal journeys with us.” On the other hand, /r/AskScience and related subreddits draw exclusively on answers with reference to the scholarly literature or indicate scholarly expertise of redditors by using visible labels. The three subreddits discussed here occupy the interesting space between these extremes where determining truth is important, but existing academic structures are not adopted wholesale.

The /r/nootropics subreddit makes clear that they are interested in science, at least in the trappings of the site. The FAQ for the subreddit begins “This is an area of science practised with the objective of changing your brain’s neurochemistry.’ The FAQ provides references for doing reliable “N of 1” research. Hovering over the upvote indicator for any given comment presents a help text reading “Solid science!” And while there is significant evidence of a scientific literacy and interest in the subreddit — as well as in the other two — much of the discussion and exploration follows a pattern that is difficult to associate directly with traditional scholarly practices.

The /r/fitness subreddit has several references to “broscience” (Mazzetti, 2012), or the kinds of advice and myths frequently shared in the gym that have no scientific basis. This is often presented dismissively (“is that science–science or just broscience?” with similar railing against “pop psych” in the /r/nootropics subreddit) but there seems to be something that rests between science and broscience. While redditors may dismiss various forms of broscience, or submit common advice (like the need to remain hydrated) to skepticism, they also rely on a set of experts, expertise, and ways of problem solving that are orthogonal to traditional institutionalized science and medicine. As in other contexts (Liverpool, et al., 2004), bridging these epistemic communities can benefit both.

The construction of expertise in each of these subreddits is complex. As noted, many stake their claims not on who they are, but on the experiences they have had, or draw in outside experts either from the scholarly literature or from those who are respected by the community. In very rare cases they draw on specific evidence. Instead, they rely on their writing and seeming fluency on a topic to convey capability. There are a number of other statements that intend to demonstrate expertise, that are often idiosyncratic to the topic being discussed or the person presenting ideas. One redditor, for example, provided a footnote to observations about the effectiveness of some nootropics, marking them with an asterisk and noting that “These I know from being a Russian,” since the drugs in question were more rarely discussed on the subreddit and would have been less familiar to English–speaking communities. Throughout, the tendency to suppress personal experience and provide a generalized truth claim, while it does appear, is less common than in more scholarly settings.

There is a somewhat uneasy relationship to institutionalized medicine, as well. This is demonstrated particularly in references to doctors, which appeared in the sampled topics from all three subreddits. Several made reference to conversations with their own doctors as a way of supporting their claims. One redditor, for example, responded to a request for information on how best to deal with stretch marks:

... laser treatments are the only thing that look like they’re actually effective at reducing their appearance. I asked my doctor about vitamin e oil/coco butter and she said I could try it but that it’s not likely to do much if anything. They will fade with time though. Mine were pretty red/purple when I first got them and they’re much lighter in color now (~5 years later), though still very noticeable.

On one hand, suggestions that professional scientists and medical professionals had absolute knowledge in the area were often rapidly refuted by responses from others redditors. Likewise, outright dismissal of those working in institutionalized science and medicine was often met with rebuke. No doubt, a number of people participating in these publics also are part of various scholarly communities (cf., Hess, 2011). Again, rather than representing an anti–science or an alternative to scientific practice, the discussions seemed to suggest complementarity.



10. Supporting robust participatory research

Reddit’s design did not emerge as a laboratory for participatory research, any more than the coffee houses of eighteenth century London were built to support the practical development of a new natural science (Stewart, 1999). In their own ways, each draws together people into a community of interaction, and provides a space for supporting shared exploration. Open and fair skepticism is a vital part of what makes up inquiry, and the willingness to accept local knowledge on Reddit might be seen as heretical to the existing ways of establishing truth in the scientific community. But it is easy to forget that collaborative discovery also requires trust [8]. Research in publics outside of institutionalized contexts thrives when there is a strong feeling of good will, shared purpose, and trust. Strengthening the connections among research communities not just within established scholarly institutions, but across public contexts, is the best way to insure not just a population of literate readers of scholarship, but a culture of people who make new discoveries relevant to their own needs and the needs of their communities.

The subreddits above represent well-trafficked social spaces on the Web where these new kinds of discussions and experiments are taking place. There are many others, each with their own patterns of interaction, ways of establishing trust and credibility, and expectations for establishing claims of truth. Just as early conversations around objectivity and the ability to compare across cases marked a transition from natural philosophy to a new science of nature, we are seeing hints in a similar direction within these discussions. A future history might see these as hints of what is to come, though there are good reasons to doubt it.

Although the future may be now, and unevenly distributed, that does not mean that any particular interesting bits we find today portend a new future. There is promise here, for new kinds of bottom–up big data, big data with a soul and a purpose, “thick data” that retains its social context (Boellstorff, this issue). But realizing that promise will not be easy. We see small shards all around the Web of these kinds of local uses of big data for interesting purposes by those who are creating the traces being collected. We also see some larger shards — projects like Ushahidi (http://www.ushahidi.com/) that acts as an aggregator of data during crises, and reflects these data back to the community of contributors and others in useful ways (Okolloh, 2009). None of these efforts represents anything like a challenge to the global expansion of surveillance by governments and corporations. It could, and such an effort might provide a bit of a counterpoint to such efforts, much like in David Brin’s (1999) suggested “transparent society”, but if this occurs at all, it will only be with the hard work needed to build new, flexible forms of research infrastructure. An ecosystem of platforms that draw in comparable data around observation, but also allow for context and community to inform that data, would move us much closer to new kinds of public learning communities.

Ideally, those learning communities will not become isolated and balkanized, but will share their insights with one another, and build bridges to scholarly institutions. Those bridges are not difficult to build and real collaboration between practically minded self–experimenters on the Internet and social and behavioral scientists working within scholarly institutions will benefit both groups. Already we are seeing some institutions look for ways to build data collections that will serve researchers and the public (Comstock, 2013). But to be successful, those efforts need to be joined with an appreciation for local forms of knowledge–building in networked communities, and efforts to foster new spaces of discovery. End of article


About the author

Alexander Halavais is an associate professor in the School of Social and Behavioral Sciences at Arizona State University, where he researches ways in which social media change the nature of scholarship and learning, and allow for new forms of collaboration and self–government. He serves as the president of the Association of Internet Researchers, and is affiliated with the Digital Media and Learning Hub at the University of California and the Learning Sciences Institute at ASU. His most recent book was Search engine society (Polity, 2009), and he is working on a book tentatively titled Participatory surveillance.
E–mail: halavais [at] asu [dot] edu



1. Stoll, 2000, pp. 185–186.

2. Doyle, 1892, p. 289.

3. Tuchman, 1978, p. 52.

4. Hand, et al., 2001, p. 45.

5. Latour, 2010, p. 159.

6. Roth and Barton, 2004, pp. 158–159.

7. Knorr–Cetina, 1999, pp. 241–242.

8. Gregory and Miller, 1998, pp. 101–103.



David Allen, 2002. Getting things done: The art of stress–free productivity. New York: Penguin Books.

Lawrence K. Altman, 1988. Who goes first? The story of self–experimentation in medicine. Berkeley: University of California Press.

Martin W. Bauer, Nick Allum, and Steve Miller, 2007. “What can we learn from 25 years of PUS survey research? Liberating and expanding the agenda,” Public Understanding of Science, volume 16, number 1, pp. 79–95.
doi: http://dx.doi.org/10.1177/0963662506071287, accessed 24 September 2013.

Laura Bonetta, 2007. “Scientists enter the blogosphere,” Cell, volume 129, number 3, pp. 443–445.
doi: http://dx.doi.org/10.1016/j.cell.2007.04.032, accessed 24 September 2013.

Rick Bonney, Caren B. Cooper, Janis Dickinson, Steve Kelling, Tina Phillips, Kenneth V. Rosenberg, and Jennifer Shirk, 2009. “Citizen science: A developing tool for expanding science knowledge and scientific literacy,” BioScience, volume 59, number 11, pp. 977–984.
doi: http://dx.doi.org/10.1525/bio.2009.59.11.9, accessed 24 September 2013.

danah boyd and Kate Crawford, 2011. “Six provocations for big data,” paper presented at the Oxford Internet Institute’s “A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society” (21 September), at http://softwarestudies.com/cultural_analytics/Six_Provocations_for_Big_Data.pdf, accessed 24 September 2013; also at http://dx.doi.org/10.2139/ssrn.1926431, accessed 24 September 2013.

David Brin, 1999. The transparent society: Will technology force us to choose between privacy And freedom? New York: Basic Books.

Jeffrey A. Burke, Deborah Estrin, Mark Hansen, Andrew Parker, Nithya Ramanathan, Sasank Reddy, and Mani B. Srivastava, 2006. “Participatory sensing,” Workshop on World–Sensor–Web (WSW ’06): Mobile Device Centric Sensor Networks and Applications, at www.escholarship.org/uc/item/19h777qd.pdf, accessed 24 September 2013.

Tom Christiansen, brian d. foy, and Larry Wall with Jon Orwant, 2012. Programming Perl. Fourth edition. Sebastopol, Calif.: O’Reilly.

Jonah Comstock, 2013. “Calit2 forms quantified self data sharing initiative,” MobiHealthNews (29 May), at http://mobihealthnews.com/22666/calit2-forms-quantified-self-data-sharing-initiative/, accessed 24 September 2013.

John Dewey, 1910. “Science as subject–matter and as method,” Science, new series, volume 31, number 787 (28 January), pp. 121–127, and at http://www.jstor.org/stable/10.2307/1634781, accessed 24 September 2013.

Ed Diener, 2000. “Subjective well–being: The science of happiness and a proposal for a national index,” American Psychologist, volume 55, number 1, pp. 34–43.

Arthur Conan Doyle, 1892. Adventures of Sherlock Holmes. New York: Harper & Bros.

Maeve Duggan and Aaron Smith, 2013. “6% of online adults are reddit users,” Pew Internet & American Life Project (3 July), at http://www.pewinternet.org/Reports/2013/reddit.aspx, accessed 24 September 2013.

Arthur Fine, 1996. “Science made up: Constructivist sociology of scientific knowledge,” In: Peter Galison and David J. Stump (editors). The disunity of science: Boundaries, contexts, and power. Stanford, Calif.: Stanford University Press, pp. 231–254.

Mark Fishman, 1980. Manufacturing the news. Austin: University of Texas Press.

Scott Frickel, Sahra Gibbon, Jeff Howard, Joanna Kempner, Gwen Ottinger, and David J. Hess, 2009. “Undone science: Charting social movement and civil society challenges to research agenda setting,” Science, Technology, & Human Values, volume 35, number 4, pp. 444–473.
doi: http://dx.doi.org/10.1177/0162243909345836, accessed 24 September 2013.

Jane Gregory and Steve Miller, 1998. Science in public: Communication, culture, and credibility. New York: Plenum Trade.

Michael Gurstein, 2011. “Open data: Empowering the empowered or effective data use for everyone?” First Monday, volume 16, number 2, at http://firstmonday.org/article/view/3316/2764, accessed 24 September 2013.

Alexander Halavais, 2009. “Do Dugg diggers Digg diligently? Feedback as motivation in collaborative moderation systems,” Information, Communication & Society, volume 12, number 3, pp. 444–459.
doi: http://dx.doi.org/10.1080/13691180802660636, accessed 24 September 2013.

Alexander Halavais, 2006. “Scholarly blogging: Moving towards the visible college,” In: Axel Bruns and Joanne Jacobs (editors). Uses of blogs. New York: Peter Lang, pp. 117–126.

David Hand, Heikki Mannila, and Padhraic Smyth, 2001. Principles of data mining. Cambridge, Mass.: MIT Press.

Crystal F. Haskell, David O. Kennedy, Anthea L. Milne, Keith A. Wesnes, and Andrew B. Scholey, 2008. “The effects of L–theanine, caffeine and their combination on cognition and mood,” Biological Psychology, volume 77, number 2, pp. 113–122.
doi: http://dx.doi.org/10.1016/j.biopsycho.2007.09.008, accessed 24 September 2013.

Jeffrey Heer and Michael Bostock, 2010. “Crowdsourcing graphical perception: Using mechanical turk to assess visualization design,” CHI ’10: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 203–212.
doi: http://dx.doi.org/10.1145/1753326.1753357, accessed 24 September 2013.

David J. Hess, 2011. “To tell the truth: On scientific counterpublics,” Public Understanding of Science, volume 20, number 5, pp. 627–641.
doi: http://dx.doi.org/10.1177/0963662509359988, accessed 24 September 2013.

Alan Irwin and Mike Michael, 2003. Science, social theory and public knowledge. Philadelphia: Open University Press.

Leslie Kaufman, 2013. “Bombings trip up Reddit in its turn in the spotlight,” New York Times (28 April), at http://www.nytimes.com/2013/04/29/business/media/bombings-trip-up-reddit-in-its-turn-in-spotlight.html, accessed 24 September 2013.

Simon P. Kelly, Manuel Gomez–Ramirez, Jennifer L. Montesi, and John J. Foxe, 2008. “L–theanine and caffeine in combination affect human cognition as evidenced by oscillatory alpha–band activity and attention task performance,” Journal of Nutrition, volume 138, number 8, pp. 1,572S–1,577S.

Firas Khatib, Seth Cooper, Michael D. Tyka, Jefan Xu, Ilya Makedon, Zoran Popović, David Baker and Foldit players, 2011. “Algorithm discovery by protein folding game players,” Proceedings of the National Academy of Sciences, volume 108, number 47, pp. 18,949–18,953.
doi: http://dx.doi.org/10.1073/pnas.1115898108, accessed 24 September 2013.

Abby J. Kinchy and Simona L. Perry, 2012. “Can volunteers pick up the slack? Efforts to remedy knowledge gaps about the watershed impacts of Marcellus Shale gas development,” Duke Environmental Law & Policy Forum, volume 22, number 2, pp. 303–385, and at http://scholarship.law.duke.edu/delpf/vol22/iss2/3/, accessed 24 September 2013.

Sara Kjellberg, 2010. “I am a blogging researcher: Motivations for blogging in a scholarly context,” First Monday, volume 15, number 8, at http://firstmonday.org/article/view/2962/2580, accessed 24 September 2013.

Karin Knorr–Cetina, 1999. Epistemic cultures: How the sciences make knowledge. Cambridge, Mass.: Harvard University Press.

Bruno Latour, 2010. “Tarde’s idea of quantification,” In: Matei Candea (editor). The social after Gabriel Tarde: Debates and assessments. London: Routledge, pp. 145–162.

Alex Leavitt and Josh Clark, 2013. “Upvoting Hurricane Sandy: Event-based news production on a social news site, Reddit,” at http://alexleavitt.com/papers/2013_DRAFT_reddit_sandy_newsproduction.pdf, accessed 24 September 2013.

Heidi Ledford, 2010. “Garage biotech: Life hackers,” Nature, volume 467, number 7316 (7 October), pp. 650–652, at http://www.nature.com/news/2010/101006/full/467650a.html, accessed 24 September 2013.
doi: http://dx.doi.org/10.1038/467650a, accessed 24 September 2013.

Michael Lesk, 2013. “Big data, big brother, big money,” IEEE Security & Privacy, volume 11, number 4, pp. 85–89.
doi: http://doi.ieeecomputersociety.org/10.1109/MSP.2013.81, accessed 24 September 2013.

Julie Letierce, Alexandre Passant, John Breslin, and Stefan Decker, 2010. “Understanding how Twitter is used to spread scientific messages,” Proceedings of the WebSci10: Extending the Frontiers of Society On–Line, at http://journal.webscience.org/314/, accessed 24 September 2013.

Joan Liverpool, Randell Alexander, Melba Johnson, Ebba K. Ebba, Shelly Francis, and Charles Liverpool, 2004. “Western medicine and traditional healers: Partners in the fight against HIV/AIDS,” Journal of the National Medical Association, volume 96, number 6, pp. 822–825.

Dom Mazzetti, 2012. “What is Bro Science?” YouTube, at http://www.youtube.com/watch?v=OXO2azb3_PE&t=2m14s, accessed 24 September 2013.

Allen Neuringer, 1981. “Self–experimentation: A call for change,” Behaviorism, volume 9, number 1, pp. 79–94.

Ory Okolloh, 2009. “Ushahihi, or ‘testimony’: Web 2.0 tools for crowdsourcing crisis information,” Participatory Learning and Action, volume 59, number 1, pp. 65–70, and at http://pubs.iied.org/pdfs/G02842.pdf, accessed 24 September 2013.

Sylvain Parasie and Eric Dagiral, 2012. “Data–driven journalism and the public good: ‘Computer–assisted–reporters’ and ‘programmer–journalists” in Chicago,” New Media & Society, volume 15, number 6, pp. 853–
doi: http://dx.doi.org/10.1177/1461444812463345, accessed 24 September 2013.

Kate Pickert and Adam Sorensen, 2013. “Inside Reddit’s hunt for the Boston bombers,” Time (23 April), at http://nation.time.com/2013/04/23/inside-reddits-hunt-for-the-boston-bombers/, accessed 24 September 2013.

Karl R. Popper, 1945. The open society and its enemies. London: Routledge & Kegan Paul.

Rob Procter, Robin Williams, James Stewart, Meik Poschen, Helene Snee, Alex Voss, and Marzieh Asgari–Targhi, 2010. “Adoption and use of Web 2.0 in scholarly communications,” Philosophical Transactions of the Royal Society A, volume 368, number 1926 (13 September), pp. 4,039–4,056.
doi: http://dx.doi.org/10.1098/rsta.2010.0155, accessed 24 September 2013.

David Ribes and Steven J. Jackson, 2013. “Data bite man: The work of sustaining a long–term study,” In: Lisa Gitelman (editor). ‘Raw data’ is an oxymoron. Cambridge, Mass.: MIT Press, pp. 147–166.

Wolff–Michael Roth and Angela Calabrese Barton, 2004. Rethinking scientific literacy. London: RoutledgeFalmer.

F. James Rutherford and Andrew Ahlgren, 1990. Science for all Americans. New York: Oxford University Press.

Igor M. Sauer, Dominik Bialek, Ekaterina Efimova, Ruth Schwartlander, Gesine Pless, and Peter Neuhause, 2005. “‘Blogs’ and ‘wikis’ are valuable software tools for communication within research groups,” Artificial Organs, volume 29, number 1, pp. 82–83.

Steven Shapin, 2009. “Here and everywhere: Sociology of scientific knowledge,” Annual Review of Sociology, volume 21, pp. 289–321.

Constance Steinkuehler and Sean Duncan, 2008. “Scientific habits of mind in virtual worlds,” Journal of Science Education and Technology, volume 17, number 6, pp. 530–543.
doi: http://dx.doi.org/10.1007/s10956-008-9120-8, accessed 24 September 2013.

Larry Stewart, 1999. “Other centres of calculation, or, where the Royal Society didn’t count: Commerce, coffee–houses and natural history in early modern London,” British Journal for the History of Science, volume 32, number 2, pp. 133–154.

Larry Stewart, 1992. The rise of public science: Rhetoric, technology, and natural philosophy in Newtonian Britain, 1660–1750. Cambridge: Cambridge University Press.

Paul A. Stewart, 1954. “The value of the Christmas bird counts,” Wilson Bulletin, volume 66, number 3, pp. 184–195.

Clifford Stoll, 2000. High–tech heretic: Reflections of a computer contrarian. New York: Anchor Books.

Sainath Suryanarayanan and Daniel Lee Kleinman, 2013. “Be(e)coming experts: The controversy over insecticides in the honey bee colony collapse disorder,” Social Studies of Science, volume 43, number 2, pp. 215–240.
doi: http://dx.doi.org/10.1177/0306312712466186, accessed 24 September 2013.

Melanie Swan, 2013. “The quantified self: Fundamental disruption in big data science and biological discovery,” Big Data, volume 1, number 2, pp. 85–99.
doi: http://dx.doi.org/10.1089/big.2012.0002, accessed 24 September 2013.

Gaye Tuchman, 1978. Making news: A study in the construction of reality. New York: Free Press.

Thorstein Veblen, 1921. The engineers and the price system. New York: B. W. Huebsch, Inc.

Eugene J. Webb, 1966. Unobtrusive measures: Nonreactive research in the social sciences. Chicago: Rand McNally.

Gary Wolf, 2010. “The data–driven life,” New York Times (28 April), at http://www.nytimes.com/2010/05/02/magazine/02self-measurement-t.html, accessed 24 September 2013.


Editorial history

Received 16 September 2013; accepted 17 September 2013.

Creative Commons License
This paper is licensed under a Creative Commons Attribution 3.0 United States License.

Home made big data‽ Challenges and opportunities for participatory social research
by Alexander Halavais.
First Monday, Volume 18, Number 10 - 7 October 2013

A Great Cities Initiative of the University of Illinois at Chicago University Library.

© First Monday, 1995-2019. ISSN 1396-0466.