Search engines: From social science objects to academic inquiry tools
First Monday

Search engines: From social science objects to academic inquiry tools by Filippo Trevisan

This paper discusses the challenges and opportunities involved in incorporating publicly available search engine data in scholarly research. In recent years, an increasing number of researchers have started to include tools such as Google Trends ( in their work. However, a central ‘search engine’ field of inquiry has yet to emerge. Rather, the use of search engine data to address social research questions is spread across many disciplines, which makes search valuable across fields but not critical to any one particular area. In an effort to stimulate a comprehensive debate on these issues, this paper reviews the work of pioneering scholars who devised inventive — if experimental — ways of interpreting data generated through search engine accessory applications and makes the point that search engines should be regarded not only as central objects of research, but also as fundamental tools for broader social inquiry. Specific concerns linked to this methodological shift are identified and discussed, including: the relationship with other, more established social research methods; doubts over the representativeness of search engine data; the need to contextualize publicly available search engine data with other types of evidence; and the limited granularity afforded to researchers by tools such as Google Trends. The paper concludes by reflecting on the combination of search engine data with other forms of inquiry as an example of arguably inelegant yet innovative and effective ‘kludgy’ design (Karpf, 2012).


The story so far: Search engines, power, and society
The methodological turn: Search engines as social science inquiry tools
Going forward: Google Trends as a research tool




Despite the rise of social media as gateways to information in recent years, search engines remain the main channel through which Internet users retrieve and access online content in countries such as Britain (Dutton and Blank, 2013) and the United States (Purcell, et al., 2012a). Similarly, search — albeit government-censored — constitutes by far the most popular online activity in non-democratic contexts such as China (China Internet Network Information Center (CINNC), 2011), where Internet penetration is still lower than in the West but growing rapidly. Thus, given its centrality to twenty-first century informational practices, Internet search is not only poised to have a deep impact on contemporary politics and society, but also likely to generate digital ‘traces’ that could enable researchers to gain a better understanding of emerging socio-political trends. Nevertheless, scholars have so far largely refrained from this type of research, which reaches beyond the investigation of the direct consequences of search engine use, calling instead for a multi-disciplinary debate on the value of search engine data as empirical evidence in the age of ‘big data’ social science. Although recent work has acknowledged that publicly available search engine information “can indeed represent a phenomenal tool for the researcher to bring to light the social unconscious” [1], an organic debate on this process is yet to be had, which in turn fuels a vicious circle for which those who may wish to engage in this kind of research lack solid bases to support experimentation.

Until now, the vast majority of the existing work on search engines framed them as ‘research objects’ to be studied in and for themselves (for a useful example, see Fuchs, 2013) as opposed to sources of methodological innovation. Indeed, this could be seen as a consequence of the fact that, while “on-going collaboration is needed between the commercial Web search companies and academic researchers to continue to identify and track trends in Web search” [2], so far the search industry has been all but forthcoming in supporting the establishment of such partnerships. Having said that, however, one is left to wonder whether researchers have truly explored all avenues available to them or instead something more could have been achieved despite the lack of extensive crossovers between the search industry and academia. In particular, readily available accessory search tools such as Google Analytics, Trends, Correlate, and News provide new types of information that could be of great relevance in social science research. Therefore, academics would benefit from asking how this data relates to existing methods and, more importantly, what questions could be addressed with it. Undoubtedly, such a quest for methodological innovation is going to be a somewhat messy process of trial and error, but one that holds great promise for the future of social science. While new communication technologies and the ways in which people use them have long eluded ‘traditional’ research strategies, publicly available search engine data could provide part of the solution to this problem, generating methodological renewal not only in Internet studies, but across a wide range of social science disciplines too. If scholars make the first step in breaking the current stalemate in this area, technology developers may be more likely follow, making collaborative research innovation a more solid reality.

Following a brief review of the main trends in search engine scholarship, this paper argues for the integration of search engines and their publicly available accessory applications in research that tries to capture and interpret emerging socio-political trends. Key opportunities and challenges involved in this process are identified and discussed, together with an overview of recent social science work that has experimented with data collected through search engine tools such as Google Trends, Google News and Google Zeitgeist. In particular, Google Trends is brought into the spotlight as an especially powerful tool that could enable academics to explore new questions, although it should be approached with some caution. This application’s advantages vis-à-vis traditional methods for the study of public opinion, the possibility to employ it for analyzing crisis situations, and its ability to facilitate both comparative investigations and research on the relationship between ‘old’ and ‘new’ media are discussed. In doing so, it is hoped that this paper will inform the development of new social science methodologies while also alerting technology developers to some of the challenges faced by academics. At the same time, this paper identifies areas in which there may be scope for improvement. These include ‘country effects’ (i.e., issues with Google’s share of the search market in different countries), the limited flexibility currently afforded to researchers by Google Trends, and doubts over the representativeness of its data.



The story so far: Search engines, power, and society

Ever since the Internet became commercially available in the 1990s, search engines have played a crucial role in orienting online traffic, distributing content, and constructing knowledge (Van Couvering, 2008). However, in the past two decades the practice of searching itself has undergone a radical transformation as user habits shifted from browsing lists that resembled traditional telephone directories in the Internet’s early days to trusting commercial search engines to select the content that best fits their needs automatically in recent years. This change accelerated with the advent of Google in 1998, whose “models of a good search engine, a good search result, and good algorithmic logic have become normalized as the industry standard” [3]. Arguably, the influence of this process has been so pervasive that it reached beyond the relationship between individual users and online media, generating a ‘Googlization’ trend capable of affecting multiple economic, social and political aspects of life (Vaidhyanathan, 2011; Lovink, 2009; Rogers, 2009a). As Vaidhyanathan (2011) stated, “we are folding the interface and structures of Google into our very perceptions” [4], which in turn begs the question of whether “anything (or anyone) matter[s] if it (or she) does not show up on the first page of Google results” [5]. Broadly speaking, these views are backed up by studies of user preferences. Despite a slight decrease in the use of search engines in very recent years, which resulted from the growth of social networking platforms such as Facebook and Twitter as avenues for finding information online, users continue to show great levels of trust in the ability of search engines to provide them with the right answers to their questions (Sanz and Stančík, 2014). This is especially the case among young people who have no experience of the Internet prior to Google (Gunter, et al., 2009; Purcell, et al., 2012b), although that does not necessarily imply that they approach search results uncritically (Thornton, 2010).

In the wake of these developments, a large body of literature has flourished around the socio-technical aspects of search engines, which scholars have cast primarily as research objects (Rogers, 2010). That is, scholars have been more interested in talking about the role played by search engines in shaping society rather than exploring ways in which these tools can augment our knowledge of social, political, and economic trends. This work has focused especially on the mechanisms that regulate Internet search — where the logic underpinning Google’s PageRank and, more recently, Hummingbird algorithms has constituted a blueprint for other search providers to emulate — as well as their impact on social, cognitive and informational practices. To describe this work, Zimmer (2010) coined the expression “Web search studies,” which he envisaged as having four main strands, including: 1) search engine bias; 2) the role of search engines as gatekeepers of online information; 3) the ethics of search; and, 4) legal as well as policy implications of search. Although in practice these lines of inquiry overlap a great deal, the first two — search engine bias and the gatekeeping role of these platforms — received the greatest amount of attention in social science scholarship. While this is not the place for an exhaustive review of this literature, it is useful to reflect briefly on the central arguments it introduced. In particular, understanding how users acquire information through search engines and what they expect of keyword searches can provide valuable context to the incorporation of search engine data in social science methods. This can help to ensure that researchers interested in this type of methodological innovation ask the right questions and challenge established conventions should that be necessary.

The implications of search engines for online traffic and content popularity have been at the center of Web search studies for over a decade (Introna and Nissenbaum, 2000). Although most scholars accept the fact that the Web “is anything but a level, unvariegated network” [6] as a ‘necessary evil’ that makes the Internet navigable, many have criticized the way in which “search engines both contribute to the selection of more prominent sites, and in turn are more influenced by them” [7]. Some have gone as far as to question the very purpose of search, stating that directing users towards specific Web sites eliminates “the randomness and non-directional qualities of browsing that produce serendipitous encounters with information [that] are essential to innovation” [8]. In particular, Google’s practice of adopting the number of in-links and traffic levels registered for a given Web site as proxies for its relevance has split academia between those who claim that search engines contribute to the consolidation of dominant ideas and strengthen already powerful actors on one side (see for example: Diaz, 2008; Finkelstein, 2008; Reilly, 2008; Rogers, 2004), and those who regard this mechanism as merely fulfilling user demands by optimizing results according to personal preferences and previous search history on the other (Goldman, 2008; 2011). In this framework, the skeptical perspective was usefully conceptualized by Hindman (2009) and Hindman, et al. (2003), who introduced the idea of ‘Googlearchy’ to indicate the propensity of search engines in their contemporary form to perpetuate the status quo by favoring established information sources (e.g., traditional news outlets, major political parties, multinational corporations, academic institutions, etc.) while at the same time marginalizing minority voices and alternative narratives. It has been argued that experienced and motivated searchers can mitigate the effects of search engine bias by approaching results with a critical eye (Halavais, 2009). However, the vast majority of users remain consistently exposed to it, since research has shown that most people look at increasingly fewer search results, often focusing exclusively on the very first search engine output page (Jansen and Spink, 2005; 2006).

Although these trends have supported the growth of a fairly compelling pessimist narrative on the ‘politics’ of search, users have not shown great awareness of these dynamics, let alone concern for their potential effect on information pluralism. Instead, they have developed high levels of trust and attachment to major search providers — especially Google — to the point that, when faced with irrelevant results, they tend to question their own proficiency in searching rather than the ability of search engines to direct them to the right information (Hillis, et al., 2013). Recent survey data has shown that, while users worry about the privacy implications of search giants such as Google and Yahoo! collecting personal information to support customized searches and advertising, this does not detract from their ‘faith’ in the trustworthiness of search results (Purcell, et al., 2012a). For example, the confidence in Google’s algorithm has reached beyond the domain of ‘everyday’ life, influencing medical practice in ways that prompted reflections on the viability of a ‘Google Medicine’ engine (Giustini, 2005). More broadly, the success of search engines has played a fundamental role in the development of other major Internet service providers (ISPs). Most notably, this appears to be the case of Facebook, which at the time of writing (February 2014) had made a beta version of its “Graph” semantic search service ( available to all English language users in the U.S. with a view to making it available in other countries in the future. Despite the lack of a detailed plan for a global rollout, Facebook’s determination to develop a search functionality that emphasizes content ‘relevance’ marked an important extension of ‘Google-like logic’ to a social networking space where so far the norm had been for users to simply ‘stumble upon’ information as it appeared in their newsfeeds.

In light of these considerations, the continued popularity of search engines invites social scientists to consider them not only as research objects. It is now time to focus on search engines as tools for social inquiry capable of providing researchers with a wealth of information, including entirely new types of data. In particular, search details are bound to expose the interests, attitudes, and opinions of users in new and powerful ways. As Scheitle (2011) wrote, “if people are concerned or interested in a particular issue, they will be more likely to search for resources, news, Web sites, discussion boards, and other types of information related to that issue” [9]. Indeed, empirical work on search behavior has shown that “search engine use does not necessarily equate with a search for information” [10]. On average, only about half of search queries are carried out to fill an actual knowledge gap [11]. However, this does not detract from the fact that all queries are invariably underpinned by ser interest in specific topics, people, or services. Being able to measure variations in the levels of such interest would fundamentally contribute to the study of contemporary social, political, and economic trends. In addition, an effective integration of search data into social science research would constitute a way to mitigate the claims of those who have questioned the ethics of corporations such as Google for supposedly ‘exploiting’ Internet users when collecting personal information from them with a view to turning it into profit rather than contributing to social progress (Gauntlett, 2011). Thus, it is crucial that academics, while remaining alert to the dynamics and implications of search discussed above, capitalize on the centrality of search engines to contemporary informational patterns and social interaction. What kinds of data are available that could enable researchers to pursue this path? What types of questions could be explored? And, finally, how does this compare with both traditional and ‘virtual’ research methods, and should there be a broad distinction between the two? The rest of this paper focuses on these issues.



The methodological turn: Search engines as social science inquiry tools

In their seminal study of search trends and search engine user behavior, Spink and Jansen (2004) called for the development of electronic transaction log analysis as a valuable alternative to surveys “to gain a clearer understanding of the interactions among searcher, content and Web search engine” [12]. While this practice gathered some popularity among researchers focusing on ‘first generation’ search engines such as AltaVista (Halavais, 2009), as new providers took hold of the market, difficulties in obtaining this type of data severely restricted the scope and significance of this type of inquiry. Typically, search companies such as Google and Yahoo! have been extremely protective of user activity logs, both for commercial reasons and due to the ethical implications of releasing such data in the public domain. However, one has to ask also whether the analysis of raw search records is truly essential to unlock the potential of search engines for social science inquiry or there could be other, less onerous ways for scholars to gather and analyze search information that respect both the business model of search providers and the privacy of individual users. Although this issue has yet to be tackled organically in methodological literature, several recent studies have taken a pragmatic approach to the incorporation of data drawn from search engine accessory tools. This work challenged the presumed indispensability of activity log data, clarifying instead how the excessive emphasis placed on the need for raw search records has created a self-perpetuating impasse that has hindered methodological innovation.

Pioneering authors in this area have experimented with free online tools such as Google Trends (Scharkow and Vogelsgang, 2011), Google News (Weaver and Bimber, 2008) and Google Zeitgeist (Jeong and Mahmood, 2011). These have been incorporated into research designs either as ‘methods’ in their own right or in combination with more established techniques such as surveys and content analysis. In particular, Google Trends has become the focus of growing interest among social scientists from the moment Google released a report that examined the possibility of using this tool to generate estimates for key economic indicators more rapidly and accurately than would be the case through traditional government and industry data (Varian and Choi, 2009). In its most recent iteration — which since September 2012 includes a series of additional functionalities previously assigned to a separate application named Google Insights for Search —Google Trends shows fluctuations in the search volume for any keyword(s) within any timeframe from 2004 onwards provided that users carried out a minimum number of relevant searches. Its output consists of normalized data that is measured against the total number of searches registered during the period under scrutiny. Individual Google Trends scores are obtained by assigning a value of 100 to the day or week — depending on the length of the period under scrutiny — for which the highest number of relevant searches was registered and calculating all other scores in function of their distance from the top result on a scale from zero to 99. In addition to its longitudinal dimension, Google Trends also includes a series of geographical filters that enable users to organize the data by country, state/region, or city. Furthermore, users can download Google Trends output as CSV documents and use these to build additional visualizations or carry out further statistical analysis.

In addition to the research cited above, a number of studies have sought to apply Google Trends data to a broad array of social science disciplines. These include:

  • Politics and public opinion studies (Scheitle, 2011; Scharkow and Vogelsgang, 2011; Manzano and Ura, 2013; Trevisan, 2013);
  • Economics and business (Hand and Judge, 2011; Kaesbauer, et al., 2012; Schmidt and Vosen, 2012; Smith, 2012);
  • Public health and epidemiology (Breyer, et al., 2011; Carneiro and Mylonakis, 2009; Carr and Dunsiger, 2012; Connolly, et al., 2009; Frijters, et al., 2013; Glynn, et al., 2011; Gunn and Lester, 2012; Metcalfe, et al., 2010; Walcott, et al., 2011; Zhou, et al., 2011);
  • Research on the environment and responses to natural disasters (Chay and Sasaki, 2011; Sherman-Morris, et al., 2011; Van der Velde, 2012; Wilde and Pope, 2013).

Undoubtedly, this type of research is still in its infancy. Thus, as the incorporation of Google Trends data in social science scholarship is poised to spread to other research areas, it is going to require additional refinement too. In this context, it is crucial for researchers to engage in a wide-ranging discussion about the opportunities and the challenges involved in turning search engines from objects of research into inquiry tools. While this is not the place for an in-depth examination of the various statistical techniques employed in the studies cited above, it is particularly useful to examine the main trans-disciplinary themes that emerged from the innovative literature published to date instead.



Going forward: Google Trends as a research tool

Overall, the potential benefits of Google Trends for social science scholarship are concentrated in two main areas, including:

  1. Advantages over traditional methods concerned with the identification of broad socio-political trends (e.g., public opinion surveys); and,
  2. Augmented opportunities to study the relationship between off-line events and online behavior, especially in conjunction with crisis.

Conversely, scholars should also be aware of the most apparent limitations associated with this emerging methodology, namely: doubts over data representativeness when generalizing from search engine users to entire populations; language differences and ‘country effects’ in relation to search; as well as limited data granularity in the current version of Google Trends. It is helpful to discuss each of these aspects in detail, assessing to what extent they have been addressed in existing literature and identifying priority areas for future exploration.

Advantages over traditional social science methods

Using Google Trends to gather information about user interests, concerns, and behavior could have several advantages over traditional methods, especially surveys. While many may be unconvinced by Google’s own claim that this type of data could enable researchers to “predict the present” (Varian and Choi, 2009), its ability to reveal emerging socio-political trends quickly and reliably has been tested in the area of public opinion and public mood research. In particular, scholars such as Scheitle (2011) and Scharkow and Vogelsgang (2011) obtained encouraging results when they compared Google Trends data with traditional survey results in an effort to construct reliable ‘salience barometers’ to monitor public opinion on sensitive issues. In addition, Manzano and Ura (2013) have highlighted the possibility that aggregate search engine data could enable researchers to uncover information that otherwise would remain ‘hidden’ or could only be captured at much greater cost through survey questionnaires. In their recent study of public opinion reaction to the appointment of U.S. Supreme Court Judge Sonia Sotomayor, these researchers focused on the geographical distribution of relevant keyword searches as shown in Google Trends to test for possible links between people’s interest in this issue and their ethnic background. This made for a useful example of how this tool could be used not only to explore population-wide trends, but also to focus on issues of particular interest to specific social groups, including minorities.

More broadly, these and several other studies among those listed above emphasized the fact that Google Trends data, which can be obtained at virtually no cost, mitigates research bias and the incidence of incomplete or false responses as it relies on the elaboration of actual search logs rather than asking users to fill in questionnaires. Crucially, this means that researchers can investigate what Internet users actually do with search engines as opposed to discussing what they say they do. In addition, this constitutes a method that is both unobtrusive and respectful of the privacy of users since Google removes all personal information from the data before elaborating it and releasing Trends scores. Having said that, doubts have been raised with regard to the representativeness of this data and will be discussed in detail below.

Crisis research: Search trends vs. ‘real life’ events

A second research area to which Google Trends could make an especially relevant contribution is that which seeks to capture and explain the evolution of crises. In this context, the ability of search engine accessory tools to relay information about user interests almost in real time could be of crucial importance to those studying public reactions to emergency situations such as natural disasters with a view to enabling more effective institutional responses (Chay and Sasaki, 2011; Sherman-Morris, et al., 2011). More broadly, search engine aggregate data could be useful to analyze the relationship between online behavior and offline events over longer periods of time too. Online interest charts generated with Google Trends can be compared with a timeline of relevant offline events to verify the existence of links between online traffic for a given topic and certain ‘real world’ events. This approach could be especially useful for the study of the informational dynamics that surround election campaigns and high-profile political events more generally. For example, a simple query for the search popularity of the two main candidates in the 2012 U.S. Presidential elections (Barack Obama and Mitt Romney) highlighted how a majority of search peaks could be associated fairly straightforwardly with different types of offline events in addition to polling day itself (see Figure 1). These included mass media events such as the televised Presidential and Vice-Presidential debates (3, 16, 23 and 11 October respectively); ‘staged’ campaign events such as the Democratic National Convention (4–6 September); external crises such as the attack against the American consulate in Benghazi (11 September) and hurricane Sandy (29–30 October); and, perhaps most interestingly, candidates ‘gaffes’ such as Romney’s “forty-seven-percent” and “binders full of women” remarks (17 September and 16 October respectively).


Search popularity of U.S. Presidential candidates on, September-November 2012
Figure 1: Search popularity of U.S. Presidential candidates on, September-November 2012.


This preliminary information can then be used as the basis for framing more specific research questions and carrying out further analysis through Google Trends as well as other methods. Most notably, the relationship between search trends and traditional media coverage of these events could be examined through content analysis to determine whether citizens and journalists responded to them in a similar fashion. In addition, one could ask also if surges in user interest at specific moments were merely event-related and ephemeral or instead triggered broader ‘thematic’ searches capable of influencing a campaign in the long term.

Crucially, this use of Google Trends data could facilitate research about the interaction of online, print, and broadcast media, with specific focus on the effects of search on the broader “new media ecology” (Hoskins, 2013). As such, this approach was adopted to generate important contextual information in recent work looking at online reactions to press coverage of celebrity issues (Metcalfe, et al., 2010) as well as controversial policy plans in a contentious area such as that of welfare provision (Trevisan, 2013). Although these were just early attempts to develop a new methodology, they showed that Google Trends data can be much more than a cheap and sleek alternative to traditional surveys. Instead, it makes otherwise unobtainable information visible, not only complementing but possibly de-bunking conclusions based on traditional methods too. In this framework, the relevance of highlighting fluctuations in keyword searches associated with specific topics or ideas extends well beyond academic scholarship, providing useful insights to both public decision-makers and campaigners.

Thus, from a point of view of applied research, this approach could provide crucial opportunities for verifying the pervasiveness of public information campaigns and for studying both the genesis and evolution of dominant mass media frames in the age of the Internet. Although further empirical and conceptual work will be needed to better understand the potential of Google Trends in this emerging area, some pioneering scholars have reached intriguing conclusions when experimenting with ways to compare search engine records linked to selected topics to, for example, content analysis of news coverage (Metcalfe, et al., 2010) as well as public health advocacy campaigns (Glynn, et al., 2011). Overall, this could make for a fundamental line of social inquiry at a time in which both news production and consumption occur in a highly hybridized media environment (Chadwick, 2013), where traditional notions of ‘audience’ as information recipients and journalists as news providers have become blurred. To what extent are dominant media frames or, for example, election campaign messages reflected in online search patterns? Moreover, what is the relationship between key search trends and the news media agenda? While the potential for methodological innovation in this area is extensive, this process will undoubtedly involve challenges and limitations that researchers should consider very carefully, as discussed below.

‘Country effects’ and limited granularity

The global dimension of Google Trends and the geographical filters that can be applied to its output can facilitate international research by providing comparable data at virtually no cost, substantially expanding the scope for online research. This is especially relevant for the study of online politics, which so far has focused almost exclusively on election campaigns despite calls from scholars for a broader comparative outlook in this area (see for example: Gibson and Cantijoch, 2011; Lilleker and Jackson, 2011; Jensen, 2009). However, when attempting to draw comparative observations from search engine data, researchers should keep in mind a problem that may require them to integrate their analyses with additional information from other sources or, in some cases, restrict the scope of international work. In particular, investigations relying on Google Trends as the principal tool of inquiry should pay close attention to what could be described as ‘country effects,’ i.e. the consequences of Google’s relative position in the search market of the country or countries under scrutiny. With regard to this issue, three typical situations can be identified. First, there are countries in which Google enjoys an unrivalled dominance in the search market such as in the United States, United Kingdom, and other Western democratic nations. At the other extreme there are countries where Google is banned from operating, for example Iran. Third, there are several countries with very large populations where Google occupies a secondary position in the search market, which instead is controlled by ‘home-grown’ providers. Most notably, this is the case of China and Russia, where most searches are carried out through local engines Baidu (Halavais, 2009) and Yandex (Oates, 2011) respectively, which do not make search records publicly available. While this is effectively outside the control of researchers, we must be aware of its implications for data representativeness. In addition, local Internet censorship and surveillance practices should be taken into account too, as they are bound to affect the habits of search engine users and alter their perspectives on search results (Tang, et al., 2012).

A further issue that may hinder the type of research discussed above is the limited granularity applied by Google Trends in matching selected keywords with search records. In particular, this is likely to create specific difficulties when keywords and expressions with multiple meanings or that belong simultaneously to different semantic areas are investigated. This potentially opens up the analysis of search engine data to the influence of irrelevant searches, which can be mitigated only by applying fixed ‘category’ filters to Google Trends queries (e.g., ‘law & government,’ ‘news,’ ‘arts & entertainment,’ ‘health,’ etc.) or through the explicit exclusion of certain keyword combinations from the data used to elaborate final scores. Due to the arbitrary nature of these exclusion processes, which rely on the ability of the researcher to identify irrelevant keyword combinations a priori, Google Trends data is bound to include a certain amount of error. Although this does not constitute a problem in and of itself, the impossibility of quantifying such error due to the lack of access to complete search records makes instead for a potentially serious pitfall to the incorporation of Google Trends scores in broader statistical work. Thus, this calls not only for great care in the selection of keywords, but also for complete transparency on the part of researchers who ought to acknowledge these limits in the spirit of clarity and methodological innovation. In addition, attention should be paid to the language selected for investigation too, as searching for the same term(s) in different languages is likely to yield different results while Google itself generally tends to privilege English language sources (Al-Eroud, et al., 2011).

Representativeness of Google Trends data

As anticipated above, another key concern is the representativeness of Google data. While this issue affects countries where Google is not the leading search provider more directly, it is relevant also on a global scale. Indeed, Google Trends results are drawn from a comprehensive search log database. Yet, the issue remains of whether it would be appropriate for scholars to generalize conclusions obtained through this method in order to make claims about entire populations. Even in places where Internet penetration is close to 100 percent, not everyone uses search engines in the same way, let along preferring Google over other search providers. Thus, as with other types of online data, it would be misleading to assume that search engine records could simply stand in for general population indicators. Instead, in the age of ‘big data’ it is important to ask who is unlikely to be represented from a certain dataset and look for other ways of including their perspective (Boellstorff, 2013).

So far, pioneering scholars in this area have postponed the debate on these issues, prioritizing methodological innovation over the question of data validity. Thus, they limited themselves to assuming that, “unless the underlying mechanisms for seeking information about salient issues are fundamentally different for onliners than for offliners, the validity of online search queries for measuring the public agenda is not at risk” [13]. Similarly, other caveats may derive from the fact that, although Google Trends tells us when users were more likely to search for information on certain topics, it cannot reveal the motives that underpinned such specific searches. Consequently, researchers can only make reasonable assumptions as to why users were interested in a given issue or set of issues at a specific moment in time. In particular, this issue has been raised in public health studies based on Google Trends data. For example, Carneiro and Mylonakis (2009) noted in their work on tracking disease outbreaks using Google Trends that “naturally, all the people searching for influenza-related topics are not ill, but trends emerge when all influenza-related searches are added together” [14]. Although the comparison they carried out between Google Trends data and official medical records seemed to confirm this assumption, the issue of data representativeness remains a central one that ought to be examined exhaustively in further work if search engine accessory applications are to make broader inroads into academic inquiry.

While this is not the place for an in-depth discussion of digital divide issues and Internet usage patterns, it is important to point out that the issue of data representativeness is not exclusive to research that focuses on search engine results. Instead, it also applies to work that relies on data drawn from virtually any online platform, especially social media. In this context, it is useful to note that, despite on-going growth in the number of subscribers to social networking sites, search engine users remain both a substantially larger group as well as one that better represents the general population. This is apparent when comparing the latest search engine usage estimates with, for example, Twitter penetration rates in a country such as the United States, where Twitter is relatively popular. In 2012, over 90 percent of American Internet users in all age groups up to 65 relied on search engines to retrieve information online (Purcell, et al., 2012a). In contrast, only 16 percent of them used Twitter, which in turn was even less popular among those aged over 30 (Duggan and Brenner, 2012). In light of these observations, search engine data such as that provided by Google Trends makes for a substantially less skewed source of information on public mood and issue salience than Twitter records, which in recent years have nonetheless been employed to study people’s reaction to and involvement in a broad array of events, from election campaigns (see for example: Tumasjan, et al., 2010) to controversial TV debates (Anstead and O’Laughlin, 2011), and from the Arab Spring uprisings (Aday, et al., 2012) to the English riots of August 2011 (Procter, et al., 2013). Undoubtedly, a much broader debate will be needed to dispel all doubts over the representativeness of publicly available search engine data. However, these preliminary considerations make for encouraging premises, suggesting that further reflections on their value for social science research would not be in vain.




Provided that researchers are transparent about its limitations and identify clearly areas where improvements are needed, the incorporation of publicly available search engine data in social science scholarship can support the development of new cross-disciplinary research designs. In this framework, search engine accessory data could be inscribed among those methods that Karpf (2012) has defined as ‘kludgy,’ for which inelegant and experimental yet effective research solutions should be preferred over the prospect of becoming muddled in working with traditional methods that are unsuited to the analysis of a fast-moving technological and socio-political context. As innovative ways of employing Google Trends and other accessory search engine tools in social research continue to emerge, it is therefore crucial for scholars to tackle both opportunities and challenges in a comprehensive fashion through empirical as well as conceptual inter-disciplinary work.

Undoubtedly, the issues highlighted in this paper cannot represent an exhaustive agenda for the study of search engines as social science inquiry tools. Rather, this sought to provide some useful reference points by drawing together key themes that emerged from pioneering research and reflecting on some of the main issues across disciplines. It is hoped that, while researchers strive to be more creative in their approach to search engine data for social science research, technology developers will simultaneously benefit from this debate and be able to address some of the issues highlighted by academics. As Dutton (2013) noted in a recent paper, the social shaping of digital research “is not simply an academic endeavor. [...] Collaboration among sectors, and not simply academic disciplines, will be a major challenge, but another way to enable the social sciences to have a greater impact on digital technology and digital research, in particular” [15]. Given that all which happens online is ultimately ‘real’ (Rogers, 2009b), online data are bound to be essential to our ability to fully understand twenty-first century society. In this context, an organic debate on the need to adopt search engines not only as research objects but also academic inquiry tools constitutes a priority to ensure that fundamental opportunities to expand the scope of research and strengthen its impact are not unduly overlooked. End of article


About the author

Filippo Trevisan is a post-doctoral researcher at the University of Glasgow, where he earned his Ph.D. in political communication and public policy in 2013. Currently, he is working on a monograph about disability rights advocacy and the Internet in the U.K. and the U.S., which will be published by Routledge in 2015. His research blog can be found at
E-mail: filippo [dot] trevisan [at] glasgow [dot] ac [dot] uk



The author would like to thank Sarah Oates and Andrew Hoskins for their useful feedback on previous drafts of this paper. This work was supported by the Economic and Social Research Council (ESRC) grants: ES/I030166/1 — Civic Consumers or Commercial Citizens? Social Scientists Working with Google UK to Better Understand Online Search Behavior and ES/K007890/1 — Google: The Role of Internet Search in Elections in Established and Challenged Democracies.



1. Sanz and Stančík, 2014, p. 16.

2. Spink and Jansen, 2004, p. 189.

3. Hillis, et al., 2013, p. 53.

4. Vaidhyanathan, 2011, p. 7.

5. Ibid.

6. Halavais, 2009, p. 59.

7. Ibid.

8. Hill, et al., 2013, p. 60.

9. Scheitle, 2011, p. 287.

10. Waller, 2011, p. 774.

11. Ibid.

12. Spink and Jansen, 2004, p. 36.

13. Scharkow and Vogelsgang, 2011, p. 107.

14. Carneiro and Mylonakis, 2009, p. 1,557.

15. Dutton, 2013, p. 191.



Sean Aday, Henry Farrell, Marc Lynch, John Sided, and Deen Freelon, 2012. Blogs and bullets II: New media and conflict after the Arab Spring. Washington, D.C.: U.S. Institute of Peace, and at, accessed 22 October 2014.

Ahmed F. Al-Eroud, Mohammad A. Al-Ramahi, Mohammad N. Al-Kabi, Izzat M. Alsmadi, and Emad M. Al-Shawakfa, 2011. “Evaluating Google queries based on language preferences,” Journal of Information Science, volume 37, number 4, pp. 282–292.
doi:, accessed 22 October 2014.

Nick Anstead and Ben O’Laughlin, 2011. “The viewertariat and BBC Question Time: Television debate and real-time commenting online,” International Journal of Press/Politics, volume 16, number 4, pp. 440–462.
doi:, accessed 22 October 2014.

Tom Boellstorff, 2013. “Making big data, in theory,” First Monday, volume 18, number 10, at, accessed 15 November 2013.
doi:, accessed 22 October 2014.

Benjamin N. Breyer, Saunak Sen, David S. Aaronson, Marshall L. Stoller, Bradley A. Erickson, and Michael L. Eisenberg, 2011. “Use of Google Insights for search to track seasonal and geographic kidney stone incidence in the United States,” Urology, volume 78, number 2, pp. 267–271.
doi:, accessed 22 October 2014.

Antonio Carneiro and Eleftherios Mylonakis, 2009. “Google Trends: A Web-based tool for real-time surveillance of disease outbreaks,” Clinical Infectious Diseases, volume 49, number 10, pp. 1,557–1,564.
doi:, accessed 22 October 2014.

Lucas J. Carr and Shira I. Dunsiger, 2012. “Search query data to monitor interest in behavior change: Application for public health,” PLoS ONE, volume 7, number 10, e48158.
doi:, accessed 22 October 2014.

Andrew Chadwick, 2013. The hybrid media system: Politics and power. New York: Oxford University Press.

Sengtha Chay and Nophea Sasaki, 2011. “Using online tools to assess public responses to climate change mitigation policies in Japan,” Future Internet, volume 3, number 2, pp. 117–129.
doi:, accessed 22 October 2014.

China Internet Network Information Center (CINNC), 2011. “Statistical report on Internet development in China,” at, accessed 23 January 2014.

Mark P. Connolly, Maarten Postma, and Sherman J. Silber, 2009. “What’s on the mind of IVF consumers?” Reproductive BioMedicine Online, volume 19, number 6, pp. 767–769.
doi:, accessed 22 October 2014.

Alejandro Diaz, 2008. “Through the Google goggles: Sociopolitical bias in search engine design,” In: Amanda Spink and Michael Zimmer (editors). Web search: Multidisciplinary perspectives. Information Science and Knowledge Management, volume 14. Berlin: Springer, pp. 11–34.
doi:, accessed 22 October 2014.

Maeve Duggan and Joanna Brenner, 2012. “The demographics of social media users — 2012,” Washington, D.C.: Pew Research Internet Project, at, accessed 14 April 2013.

William H. Dutton, 2013. “The social shaping of digital research,” International Journal of Social Research Methodology, volume 16, number 3, pp. 177–195.
doi:, accessed 22 October 2014.

William H. Dutton and Grant Blank, 2013. “Cultures of the Internet: The Internet in Britain,” Oxford: Oxford Internet Institute, at, accessed 25 January 2014.

Seth Finkelstein, 2008. “Google, links, and popularity versus authority.” In: Joseph Turow and Lokman Tsui (editors). The hyperlinked society: Questioning connections in the digital age. Ann Arbor: University of Michigan Press, pp. 104–124.

Paul Frijters, David W. Johnston, Grace Lordan, and Michael A. Shields, 2013. “Exploring the relationship between macroeconomic conditions and problem drinking as captured by Google searches in the US,” Social Science & Medicine, volume 84, pp. 61–68.
doi:, accessed 22 October 2014.

Christian Fuchs, 2013. Social media: A critical introduction. London: Sage.

David Gauntlett, 2011. Making is connecting: The social meaning of creativity, from DIY and knitting to YouTube and Web 2.0. Cambridge: Polity Press.

Rachel Gibson and Marta Cantijoch, 2011. “Comparing online elections in Australia and the UK: Did 2010 finally produce ‘the’ Internet election?,” Communications, Politics & Culture, volume 44, number 2, pp. 4–17.

Dean Giustini, 2005. “How Google is changing medicine: A medical portal is the logical next step,” British Medical Journal, volume 331, number 7531, pp. 1,487–1,488.
doi:, accessed 22 October 2014.

Ronan W. Glynn, John C. Kelly, Norma Coffey, Karl J. Sweeney, and Michael J. Kerin, 2011. “The effect of Breast Cancer Awareness Month on Internet search activity — A comparison with awareness campaigns for lung and prostate cancer,” BMC Cancer, volume 11, at, accessed 22 October 2014.
doi:, accessed 22 October 2014.

Eric Goldman, 2011. “Revisiting search engine bias,” William Mitchell Law Review, volume 38, number 1, at, accessed 22 October 2014.

Eric Goldman, 2008. “Search engine bias and the demise of search engine utopianism,” In Amanda Spink and Michael Zimmer (editors). Web search: Multidisciplinary perspectives. Information Science and Knowledge Management, volume 14. Berlin: Springer, pp. 121–133.
doi:, accessed 22 October 2014.

John F. Gunn III and David Lester, 2012. “Using Google searches on the Internet to monitor suicidal behavior,” Journal of Affective Disorders, volume 148, numbers 2–3, pp. 411–412.
doi:, accessed 22 October 2014.

Barrie Gunter, Ian Rowlands, and David Nicholas, 2009. The Google generation: Are ICT innovations changing information-seeking behaviour? Oxford: Chandos.

Alexander Halavais, 2009. Search engine society. Cambridge: Polity.

Chris Hand and Guy Judge, 2012. “Searching for the picture: Forecasting UK cinema admissions using Google Trends data,” Applied Economics Letters, volume 19, number 11, pp. 1,051–1,055.
doi:, accessed 22 October 2014.

Ken Hillis, Michael Petit, and Kylie Jarrett, 2013. Google and the culture of search. New York: Routledge.

Matthew Hindman, 2009. The myth of digital democracy. Princeton, N.J.: Princeton University Press.

Matthew Hindman, Kostas Tsioutsiouliklis, and Judy A. Johnson, 2003. “‘Googlearchy’: How a few heavily-linked sites dominate politics on the Web,” paper presented at the annual meeting of the Midwest Political Science Association, Chicago; version at, accessed 22 October 2014.

Andrew Hoskins, 2013. “Death of a single medium,” Media, War & Conflict, volume 6, number 1, pp. 3–6.
doi:, accessed 22 October 2014.

Lucas D. Introna and Helen Nissenbaum, 2000. “Shaping the Web: Why the politics of search engines matters,” Information Society, volume 16, number 3, pp. 169–185.
doi:, accessed 22 October 2014.

Bernard J. Jansen and Amanda Spink, 2006. “How are we searching the World Wide Web? A comparison of nine search engine transaction logs,” Information Processing & Management, volume 42, number 1, pp. 248–263.
doi:, accessed 22 October 2014.

Bernard J. Jansen and Amanda Spink, 2005. “An analysis of Web searching by European users,” Information Processing & Management, volume 41, number 2, pp. 361–381.
doi:, accessed 22 October 2014.

Michael Jensen, 2009. “Political participation, alienation, and the Internet in Spain and the United States,” paper presented at the European Consortium for Political Research Workshop, Lisbon (27–30 April).

Yongick Jeong and Reaz Mahmood, 2011. “Reading the world’s mind: Political, socioeconomical and cultural approaches to understanding worldwide Internet search queries,” International Communication Gazette, volume 73, number 3, pp. 233–251.
doi:, accessed 22 October 2014.

Manuel Kaesbauer, Ralf Hohenstatt, and Richard Reed, 2012. “Direct versus search engine traffic: An innovative approach to demand analysis in the property market,” International Journal of Housing Markets and Analysis, volume 5, number 4, pp. 392–413.

David Karpf, 2012. “Social science research methods in Internet time,” Information, Communication & Society, volume 15, number 5, pp. 639–661.
doi:, accessed 22 October 2014.

Darren G. Lilleker and Nigel A. Jackson, 2011. Political campaigning, elections and the Internet: Comparing the US, UK, France and Germany. London: Routledge.

Geert Lovink, 2009. “Society of the query: The Googlization of our lives,” In: Konrad Becker and Felix Stalder (editors). Deep search: The politics of search beyond Google. Innsbruck: Studien Verlag, pp. 45–53.

Sylvia Manzano and Joseph D. Ura, 2013. “Desperately seeking Sonia? Latino heterogeneity and geographic variation in Web searches for Judge Sonia Sotomayor,” Political Communication, volume 30, number 1, pp. 81–99.
doi:, accessed 22 October 2014.

David Metcalfe, Charlotte L. Price, and John Powell, 2010. “Media coverage and public reaction to a celebrity cancer diagnosis,” Journal of Public Health, volume 33, number 1, pp. 80–85.
doi:, accessed 22 October 2014.

Sarah Oates, 2011. “Going native: The value in reconceptualizing international Internet service providers as domestic media outlets,” Philosophy & Technology, volume 24, number 4, pp. 391–409.
doi:, accessed 22 October 2014.

Rob Procter, Farida Vis, and Alex Voss, 2013. “Reading the riots on Twitter: Methodological innovation for the analysis of big data,” International Journal of Social Research Methodology, volume 16, number 3, pp. 197–214.
doi:, accessed 22 October 2014.

Kristen Purcell, Joanna Brenner, and Lee Rainie, 2012a. “Search engine use 2012,” Washington, D.C.: Pew Research Internet Project, at, accessed 15 March 2013.

Kristen Purcell, Lee Rainie, Alan Heaps, Judy Buchanan, Linda Friedrich, Amanda Jacklin, Clara Chen, and Kathryn Zickuhr, 2012b. “How do teens research in the digital world,” Washington, D.C.: Pew Research Internet Project, at, accessed 15 March 2013.

Paul Reilly, 2008. “‘Googling’ terrorists: Are Northern Irish terrorists visible on Internet search engines?” In: Amanda Spink and Michael Zimmer (editors). Web search: Multidisciplinary perspectives. Information Science and Knowledge Management, volume 14. Berlin: Springer, pp. 151–175.
doi:, accessed 22 October 2014.

Richard Rogers, 2010. “Internet research: The question of method — A keynote address from the YouTube and the 2008 election cycle in the United States Conference,” Journal of Information Technology & Politics, volume 7, numbers 2–3, pp. 241–260.
doi:, accessed 22 October 2014.

Richard Rogers, 2009a. “The Googlization question: Towards the inculpable search engine?” In Konrad Becker and Felix Stalder (editors). Deep search: The politics of search beyond Google. Innsbruck: Studien Verlag, pp. 173–184.

Richard Rogers, 2009b. The end of the virtual: Digital methods. Amsterdam: Vossiuspers UvA.

Richard Rogers, 2004. Information politics on the Web. Cambridge, Mass.: MIT Press.

Esteve Sanz and Juraj Stančík, 2014. “Your search — ‘Ontological Security’ — Matched 111,000 documents: An empirical substantiation of the cultural dimension of online search,” New Media & Society, volume 16, number 2, pp. 252–270.
doi:, accessed 22 October 2014.

Michael Scharkow and Jens Vogelsgang, 2011. “Measuring the public agenda using search engine queries,” International Journal of Public Opinion Research, volume 23, number 1, pp. 104–113.
doi:, accessed 22 October 2014.

Christopher P. Scheitle, 2011. “Google’s insights for search: A note evaluating the use of search engine data in social research,” Social Science Quarterly, volume 92, number 1, pp. 285–295.
doi:, accessed 22 October 2014.

Torsten Schmidt and Simeon Vosen, 2012. “Using Internet data to account for special events in economic forecasting,” Ruhr Economic Papers, number 382, at, accessed 12 April 2013.

Kathleen Sherman-Morris, Jason Senkbeil, and Robert Carver, 2011. “Who’s Googling what? What Internet searches reveal about hurricane information seeking,” Bulletin of the American Meteorological Society, volume 92, number 8, pp. 975–985.
doi:, accessed 22 October 2014.

Amanda Spink and Bernard J. Jansen, 2004. Web search: Public searching on the Web. Boston: Kluwer Academic.

Min Tang, Laia Jorba, and Michael J. Jensen, 2012. “Digital media and political attitudes in China,” In: Eva Anduiza, Michael J. Jensen, and Laia Jorba (editors). Digital media and political engagement worldwide: A comparative study. Cambridge: Cambridge University Press, pp. 221–239.

Stephen Thornton, 2010. “From ‘scuba diving’ to ‘jet skiing’? Information behavior, political science, and the Google generation,” Journal of Political Science Education, volume 6, number, pp. 353–368.
doi:, accessed 22 October 2014.

Filippo Trevisan, 2013. “Disabled people, digital campaigns, and contentious politics: Upload successful or connection failed?” In: Richard Scullion, Roman Gerodimos, Daniel Jackson, and Darren Lilleker (editors). The media, political participation, and empowerment. London: Routledge, pp. 175–191.

Andranik Tumasjan, Timm O. Sprenger, Philipp G. Sandner, and Isabell M. Welpe, 2010. “Predicting elections with Twitter: What 140 characters reveal about political sentiment,” paper presented at the Fourth International AAAI Conference on weblogs and social media, at, accessed 15 April 2013.

Siva Vaidhyanathan, 2011. The Googlization of everything (and why we should worry). Berkeley: University of California Press.

Elizabeth Van Couvering, 2008. “The history of the Internet search engine: Navigational media and the traffic commodity,” In: Amanda Spink and Michael Zimmer (editors). Web search: Multidisciplinary perspectives. Information Science and Knowledge Management, volume 14. Berlin: Springer, pp. 177–206.
doi:, accessed 22 October 2014.

Marijn van der Velde, Linda See, Steffen Fritz, Frank Verheijen, Nikolay Khabarov, and Michael Obersteiner, 2012. “Generating crop calendars with Web search data,” Environmental Research Letters, volume 7, number 2.
doi:, accessed 22 October 2014.

Hal Varian and Hyunyoung Choi, 2009. “Predicting the present with Google Trends,” Google Research Blog (2 April), at, accessed 14 April 2013.

Brian P. Walcott, Brian V. Nahed, Kristopher T. Kahle, Navid Redjal, and Jean-Valerie Coumans, 2011. “Determination of geographic variance in stroke prevalence using Internet search engine analytics,” Neurosurgery Focus, volume 30, number 6, p. E19.
doi:, accessed 22 October 2014.

David A. Weaver and Bruce Bimber, 2008. “Finding news stories: A comparison of searches using Lexisnexis and Google News,” Journalism & Mass Communication Quarterly, volume 85, number 3, pp. 515–530.
doi:, accessed 22 October 2014.

Gene R. Wilde and Kevin L. Pope, 2013. “Worldwide trends in fishing Interest indicated by Internet search volume,” Fisheries Management and Ecology, volume 20, numbers 2–3, pp. 211–222.
doi:, accessed 22 October 2014.

Xichuan Zhou, Jieping Ye, and Yujie Feng, 2011. “Tuberculosis surveillance by analyzing Google trends,” IEEE Transactions on Biomedical Engineering, volume 58, number 8, pp. 2,247–2,254.
doi:, accessed 22 October 2014.

Michael Zimmer, 2010. “Web search studies: Multidisciplinary perspectives on Web search engines,” In: Jeremy Hunsinger, Lisbeth Klastrup, and Matthew Allen (editors). International handbook of Internet research. Dordrecht: Springer, pp. 507–521.


Editorial history

Received 14 February 2014; accepted 7 October 2014.

Commons License
“Search engines: From social science objects to academic inquiry tools” by Filippo Trevisan is licensed under a Creative Commons Attribution 4.0 International License.

Search engines: From social science objects to academic inquiry tools
by Filippo Trevisan.
First Monday, Volume 19, Number 11 - 3 November 2014

A Great Cities Initiative of the University of Illinois at Chicago University Library.

© First Monday, 1995-2016.