In this paper, we outline a study of the Twitter microblogging platform through a sample of French users. We discuss sampling methodology and compare three “issues” taken from the collected set of tweets. Based on the empirical findings we make a case for extending the notion of “information diffusion” to take into account questions of meaning, values, and ideology. We propose the concept of “refraction” to take a step toward this end.
From diffusion to refraction
Since its creation in 2006, the Twitter microblogging service has emerged as a leading platform for short message communication and social networking. According to a recent study (comScore, 2011), Twitter reached one in ten Internet user at the end of 2011, after a year of strong growth (+59 percent). Perhaps even more significantly, Twitter has captured the public imagination due to its (strongly debated) role in political events such as the Iranian elections of 2009 (Morozov, 2009; Shirky, 2009) and what has come to be known as the “Arab spring” (Poell and Darmoni, 2012). These and other entanglements with “serious” matters have gone far in transforming Twitter’s image from a system essentially used to share “pointless babble” (Pear Analytics, 2009) to a platform that allows for communication and coordination in significant social movements. For this reason, but also due to its relative openness in terms of data collection, Twitter has quickly become a favored research objects for scholars from various fields.
While most studies currently focus on English language activity, the empirical research this paper is based on is related to a larger project — Internet: Pluralité et Redondance de l’Information (IPRI) — that studies the question of how the Internet contributes to information plurality in the French language Internet (Marty, et al., 2011). While digital networks have certainly been agents of globalization and transnationalization on different levels, the political news and debate sphere — our main object of concern — is, despite important developments over the last 20 years, still strongly organized around national actors, issues, and channels, even inside of the European Union (Wessler, et al., 2008). Certain findings reported in this paper may well be generalizable beyond our empirical focus, but we estimate that certain national particularities do indeed come into play. As this article uses the empirical terrain first and foremost as a resource for conceptual work, we will not stress these particularities more than necessary. While Twitter’s success in France is considerable, the microblogging service is not among the top five social networks in the country (comScore, 2011) and a study by AT Internet (2011) observed that in March 2011 only 0.2 percent of the visitors for the 12 top news sites in France were relayed by the microblogging platform. These figures indicate, in line with similar data for the U.S. (Pew Research Center, 2011), that despite the high number of created accounts, the actual use of Twitter may well be lower than the high levels of media attention would indicate. At the same time, very large volumes of tweets are created and especially media professionals seem to be have embraced the platform with open arms.
This paper aims at making three contributions to research on Twitter that are in large part related to theory and method: first, we presents a research methodology that is based on user sampling rather than subject sampling, which allows us to study the platform as a sphere (suggesting both demarcation from the outside, and coherence on the inside) as well as a series of conversations; second, we analyze and compare three case studies in order to highlight specific characteristics of the particular national sphere we are focusing on; and third, based on our empirical findings, we argue that the “dominant paradigm” in Twitter research — information diffusion — needs to be extended to better account for dimensions of (shared) meaning, values, and norms. This, in short, is what the concept of “refraction” seeks to address.
This paper follows the spirit of Joëlle Le Marec’s contemplations on the nature of empirical research, which aim at resolving the supposed contradiction between theory and field work from a science studies angle and somewhat related to a grounded theory perspective. Here, “[the field] is not the reservoir of facts and social reality as it is spontaneously perceptible in its complexity and richness [...] but a set of operations, unprecedented situations, singular confrontation that occupy the researcher on a daily basis.” (Le Marec, 2002) For Le Marec, the question is not whether “the theory” or “the field” takes precedent over the other but rather “what the field does to the concept” [our italics] by putting the researcher into a specific and complex epistemological situation that is characterized by a constant production of “surplus”. This means that for example in the case of a large–scale communication platform such as Twitter, there is always “more”, always an element of surprise — something that pushes beyond the concept and thereby pushes the concept. This is not a “reality check” where the facts correct an erroneous theory but rather a composite of methodology, data, and conceptual work caught up in the situated dynamics of research practice. In the context of this study, analytical methodology and conceptual apprehension had to be revised at several moments and we would like to make this “adjustment work” at least somewhat visible.
Studying Twitter through digital methods based approaches (Rogers, 2009) presents us with both opportunities and challenges. On the one side, different APIs (Application Programming Interfaces) provide relatively comprehensive access to user data and activity. While users can make accounts private, this is relatively rare — according to our tests about one in 10 accounts is protected — and most researchers are interested in the public face of Twitter communication in any case. On the other side, Twitter users now produce many millions of messages every day and such masses of data challenge the capabilities of even the most well funded projects. Every research project is therefore forced to decide, from the outset, on a method for creating a subset of data that will actually be analyzed. In the context of social media, the question of sampling is still far from being completely understood and we will therefore address it in some detail.
Sampling and data collection
In the empirical study of social and cultural phenomena, the question of how to create a corpus of “objects” to analyze is continuously present. In over 150 years of experimentation and debate in and around statistics, a set of standard sampling methods have stabilized and social scientists today have a good sense of the possibilities and limits of each approach. When studying large populations of people, quantitative approaches dominate and samples are most often stratified based on demographic information — usually census data — where individuals are selected in relation to the grid of categories (and their distributions inside of the population) these general surveys establish. When it comes to systems like Twitter, no such grid, e.g., of socio–economic parameters, is available and this introduces the difficult question of how to negotiate between the practical logistical limits most research projects are subject to and the hope to be able to infer from the sample to a larger population. We argue that there are at least five factors that will weigh on decisions concerning sampling methods:
The “epistemological outlook” of a project is decisive in the sense that different research questions but also different theoretical and methodological paradigms (quantitative, qualitative, representative, etc.) will lead to very different requirements and decisions.
Many projects may simply not be interested in studying a platform on the whole but focus on geographical, linguistic, temporal or topical subgroups and questions.
The technological capabilities of a project, which concerns both funding and members’ experience of working in necessarily interdisciplinary teams, will limit possibilities in a very practical sense.
Technical and legal limitations for data access (API restrictions, etc.) may have the effect that sample data cannot be compiled in the desired way.
Ethical considerations and the increasing requirement to “green light” empirical research by “ethics committees” may infer with researchers’ plans. These elements can vary strongly between individual institutions and national cultures.
With these contextual forces in mind, we can start looking at the practical options for sample construction. We can distinguish at least six methods:
A full sample is a possibility, at least in theory: after having been white–listed by Twitter in 2009, Cha, et al. (2010) accessed 55M active user accounts with the help of 58 servers and recuperated 1.7B tweets. But the quickly growing data volumes on Twitter introduce massive logistics and even for the cited example, one could argue that the need to define an observation period makes this a partial sample as well. Working with very large amounts of data also introduces the problem that heterogeneous and skewed distributions may make procedures relying overly on averages simply meaningless and therefore require sophisticated analytical tools.
A random sample can retain claims to representativeness and attenuate logistical requirements. Because it is, to our knowledge, simply not feasible to connect Twitter accounts to a census category grid, random selection is the only way to establish a representative sample. Twitter already provides different “statistically correct” data streams via its Streaming API and the incremental numbering scheme for account ids allows for direct sampling as well. The effects of non-normal distributions can produce problems here as well and because of the holes in the data, analysis on the micro level (e.g., following a particular conversation) is no longer feasible.
A topic sample is usually constructed by querying Twitter’s Search API for certain keywords or hashtags. This is the most common method used by humanists and social scientists. It is generally much less demanding on the levels of logistics, but it is obviously very difficult to make any strong claims about the platform’s uses beyond the studied subject.
Marker–based samples can be compiled with the help of geographical, linguistic or technical pointers provided by the platform. While Twitter does not produce segmentations based on nationality, tweets can be searched on the basis of language and/or geographical location. But language detection is less than satisfactory and only a very low number of tweets (less than two percent according to our testing) are geotagged. Sampling based on technical markers such as the software used to post a message is more promising and may, for example, allow one to study mobile users only.
Graph–based sampling usually proceeds by examining the friend/follower relationships and makes selections based on that data. Different methods from graph theory can be used to select certain dense zones in the network or only the most connected users. According to the selection method used, different biases weigh on the possibilities for interpretation.
Manual sampling is interesting for smaller projects and localized populations. One could, for example, collect the accounts for a country’s MPs or select particularly prominent individuals from a certain sector of society. This method is quite common and goes well with a more qualitative research outlook.
Every method implies a particular “epistemological spin” and will influence the kind of understanding that can actually be derived from the analysis of the gathered data. The fit between a research project and a sampling method will largely be negotiated around the five factors mentioned above.
In the context of our research project, we wanted to achieve four goals: first, to create a sample based on user accounts rather than subjects in order to be able to examine and compare a large variety of topics; second, to compile a “national” sample containing (mostly) French and French–language users; third, to focus on users interested in political subjects and current events; fourth, to have a sample that would be sufficiently large to claim at least a partial overview of the uses of Twitter in France. The goal to capture the most visible, the most public debates was privileged over representativeness, however. This decision is line with the aspiration to emphasize the mass media aspects of Twitter, its particular brand of “publicness”, rather than its uses for interpersonal communication.
Our methodology started out with a manually selected “core” of 496 accounts selected by a group of researchers. This core consisted mainly of politicians from all major parties, as well as activists, bloggers, and media professionals that had achieved a certain visibility on the platform . In a second step, we “snowballed” from the initial list by acquiring all users’ friends and followers through the REST API , which lead us to a pool of 326,532 accounts. To keep numbers manageable, we kept only those accounts that were connected (in either direction) to at least 10 users in our core set. Of the 24,351 resulting accounts, 22,322 were unprotected and 17,361 actually posted at least one message during the observation period (15 February 2011 — 15 April 2011). All of the analyses were performed on this latter set: using the REST API — probably the most reliable data access to Twitter  — we stored all tweets posted by these active users over the observation period, 5,883,657 in total.
The collection of tweets also allowed us to confirm the validity of our sample retrospectively by observing a strong coherence between our user sample and the usernames mentioned in the messages. While the final number of users is significantly smaller than the number of accounts created in France, a study by OpinionWay (Journal du Net, 2010) estimated that there were only 225K users in France at the end of 2010 and Spintank (2011) indicated an even lower 30K–80K regular users. We are therefore confident that our sample allows for at least some generalizations about the French Twitter territory at the time of observation. While our method captured a certain number of “celebrity” and spam accounts, an analysis of user profiles and tweet language confirms a very strong French dominance, centered around Paris: 5,828 users, roughly one third of our active population, explicitly named “Paris” in their location field. Our sample is also very much concentrated on users working in media or politics related professions and users interested in these topics: 1,549 account descriptions (8.9 percent) had the word “journalist” in it, which is quite a significant percentage and confirms the often–made observation that media professionals have adopted Twitter with particular verve (Hermida, 2010).
Analytical methods and case studies
Our analytical toolkit included a wide variety of statistical, graph–theoretical, and content oriented methods (for a full empirical investigation see Rieder and Smyrnaios, 2012), but in this paper we focus on the third set and follow an approach that combines quantitative elements with a close reading of actual tweets. While we have investigated a larger number of subjects, in order to be able to discuss certain details we will focus on three issues that were tweeted about in significant volumes during our observation period.
The first case concerns the underwater earthquake that occurred on 11 March 2011 off the Japanese coast, which caused a tsunami that left nearly 16K dead and then to a nuclear accident at a power plant in the Fukushima prefecture. The second case can similarly be classified as an “event”, but of a much smaller, mostly national scale: on 24 February 2011, Dior’s chief designer John Galliano is arrested by police after an anti–Semitic rant in a Paris bar and subsequently first suspended and then fired by his employer. Both of these events were followed from the “breaking” to the point where tweet volumes dropped off significantly, 11 days in both cases. The third subject was analyzed over the full observation period of two month and centers around France’s famous “three–strikes” anti–Internet piracy law known by the name of the institution charged with enforcing it: HADOPI (Haute Autorité pour la diffusion des œuvres et la protection des droits sur internet). There is no major “event” connected to this subject over the two–month period and it therefore provides a certain contrast to the other two.
We distinguished the subjects by means of a search query, which was relatively straightforward in all three cases, with “galliano” and “hadopi” as rather unambiguous issue identifiers and “japon” as the hashtag quickly established by French users to reference the events in Japan.
While this selection is by no means representative of the large variety of subjects that appeared in our sample — we counted a staggering 207K unique hashtags — they go far in showing variability and allow us to illustrate the more conceptual argument we will make further down.
While this selection is by no means representative of the large variety of subjects that appeared in our sample — we counted a staggering 230K unique hashtags — they go far in showing variability and allow us to illustrate the more conceptual argument we will make further down.
When looking at our three case studies, we immediately see that they are of quite different scale and do not show the same intensity concerning the number and intensity responses they provoke. Table 1 gives an overview of a number of basic indicators.
Table 1: Quantitative overview of the three case studies. Objects 2011 Japanese Earthquake John Galliano HADOPI Period analyzed 10–20 March 2011
24 February–6 March 2011
15 February–15 April 2011
Query “japon” “galliano” “hadopi” Number of tweets 44,803 4,965 5,850 Number of users 6,657 (38.3%) 1,907 (11%) 1,548 (9%) Average tweets per user 6.7 2.6 3.8 Tweets with URLs 56.6% 53.4% 68.2% Number of hosts linked 2,399 398 349 Percentage of links to top five hosts 13% 21% 52% Percentage of links to top five hosts 65% 46% 65%
As we may expect, the Japanese tsunami disaster provoked a much greater volume of tweets, from a significantly larger percentage (38.3 percent) of accounts , and a much higher number of tweets per user (6.7). Although the Galliano case is much ”closer to home“ the magnitude of the events in Japan do not leave the French Twitter users indifferent. We can also see that the number and variety of domain names from linked URLs  is higher than in the other two cases and a closer examination shows many more non–French sources appearing. This is a global event after all. But even the less spectacular topics are far from insignificant, provoking messages from about 10 percent of our users in both cases. The much higher percentage of URLs in the HADOPI case can be interpreted as a first indicator for a more toned–down quality that can be ascribed to the absence of major variations in temporal intensity compared to the ”burstiness“ of the two breaking events: much of the content posted around the anti–piracy institution constitutes information and documentation rather than expressions of outrage, sadness, or shock.
During our research, we found that while quantitative indicators did indeed provide an interesting first impression, a closer examination of message content was necessary to further understand differences and commonalities between the cases.
When analyzing the most popular retweets — a good starting point for characterizing the understandings surrounding a topic — one cannot help but notice the differences in tone and in particular the varying presence of humor, irony, and sarcasm. It may not be surprising that the 13 of the 20 most retweeted messages in the Galliano case are jokes or at least strongly ironic — the scandal does after all involve a video portraying the heavily intoxicated fashion designer declaring his love for Adolf Hitler. But even for the highly destructive tsunami, five out of 20 popular messages are humorous, such as the second most retweeted message:
@Nain_Portekoi: Le pape attristé par le tremblement de terre au #Japon...Depuis quand il est autorisé à critiquer le boulot de son boss?
[@Nain_Portekoi: The pope saddened by the earthquake in #Japan... Since when is he authorized to criticize his boss’s work?]
11 March 2011 — 22:34, https://twitter.com/Nain_Portekoi/status/46323031282421760
Surprisingly, the same analysis for the HADOPI subject yields only a single humorous message, a false takedown announcement for a popular blog on 1 April. The freedom of the Internet seems to be an issue that cannot be taken lightly, and this observation is further corroborated by the high number of URLs in the 20 most popular tweets (14), which is much lower for Japan (9) and Galliano (5). In the absence of scandal/catastrophe related excitement and the general gravity attributed to the subject, the HADOPI stream works like a highly attentive information network that is, at the one hand side, strongly dominated by a small number of specialized sources (the top two sources account for 46.8 percent of all links sent, a pattern of concentration that we have not observed anywhere else) and very active contributors but, on the other hand side, still receives attention from a relatively large number of users that write about or retweet it (1,548 users). When looking deeper into the contents of the stream, one finds that it provides meticulous information on the day–to–day developments of the subject matter. Users closely follow the subject on the levels of jurisdiction and lawmaking and the high percentage of URLs in tweets is a direct effect of the systematic referencing not only of news items and critical commentary — the author has not found a single positive appreciation of HADOPI — but also of legal materials and technical documents. New propositions or amendments proposed by MPs are duly reported and regularly retweeted.
Popular contents in the Japan earthquake stream are much more disparate: despite the fact that our sample is focused on French nationals living in France (and mostly Paris), an important type of message concerns the “coordination” functions often observed in the context of disasters (Bruns, 2011) and consists of important phone numbers, embassy contacts, calls for assistance (donations) and missing person inquiries, which mostly concern French expatriates or tourists, and their relatives in France, such as in this tweet, the third most retweeted:
@francediplo: Numéro du Centre de crise pour les familles ayant des proches au #Japon : 01 43 17 56 46 #séisme
[@francediplo: Number of the crisis center for families having close ones in #Japan: 01 43 17 56 46 #earthquake]
11 March 2011 — 16:42, https://twitter.com/francediplo/status/46234662711988224
A second type of message belongs to a “general information” category, which does however not consist so much of “general” news reporting — there is no lack of that on other channels after all — but rather of very specific developments and accounts (often linking to pictures or videos) as well as estimations of effect or “spectacular” facts, such as the displacement of the Earth’s axis due to the earthquake. A third category is made up by what we could call “repatriation” tweets, which comment on the event from an explicitly French perspective.
Here, we find the jokes and ironic remarks mentioned above, but also critiques of the Sarkozy government that use the catastrophe as a “hook”, France–related micro–scandals (e.g., the price Air France charges for flights out of Japan), comments on comments by French public figures and messages containing “what if this would happen in France?” speculations. This tweet by one of the most popular Twitter personalities in our sample is quite emblematic of the rather irreverent tone:
@Maitre_Eolas: Les japonais déclarent : “le séisme ok. Le tsunami passe encore. La fuite radioactive on assume. Mais là NON.” htt ...
[@Maitre_Eolas: The Japanese declare: “the earthquake OK. The tsunami, still manageable. We can deal with the radioactive leak. But this, NO.” htt ...]
17 March 2011 — 12:58, https://twitter.com/Maitre_Eolas/status/48352496166518785
The URL then links to an article about a possible visit of Nicholas Sarkozy to Japan. In general, it is difficult to overstate the place critique and ridicule of the (now former) right–wing government in general and the President in particular occupies. We will have to come back to this phenomenon later in this paper.
In the Galliano case, there is, as noted before, a strong dominance of humor and irony (e.g., links to the obligatory subtitled versions of scenes from Oliver Hirschbiegel film “Der Untergang”, which documents Hitler’s last days) but these messages are often very politicized in that they connect Galliano to the government and/or to another highly debated case of “public racism”, the multi–stage scandal around the journalist Eric Zemmour. This tweet, sent from a fake account for Nicholas Sarkozy’s wife, Carla Bruni–Sarkozy, became the fifth most retweeted messages and captures the atmosphere quite well:
@_Carla_Bruni: Bichon tu pourrais proposer à John Galliano un poste de conseiller politique à l’Elysée non ?
[@_Carla_Bruni: Darling you could propose a job to John Galliano as a political advisor at the president’s office no?]
2 March 2011 — 9:21, https://twitter.com/_Carla_Bruni/status/42862193405988864
Interestingly, the rather quick dismissal of Galliano by his employer, Dior, also became an occasion for what we propose to call “subject drift”, i.e., the connection of one subject to another one in order to make a particular statement. In our case, users started to ask why Galliano’s dismissal could go over so swiftly when most other cases of public racism went unpunished, again, most notably that of Eric Zemmour. The Galliano case became the exception that allowed these users to voice their outrage over what they perceive as the norm: little or no accountability for many public figures when it comes to racist statements. Connecting a smaller event to larger threads of political debate (Galliano => racism, every subject imaginable => Nicholas Sarkozy) is certainly the most common form of subject drift. Finally, there are also a number of purely informational messages linking to news accounts of the incident that become quite popular but these are in a clear minority compared to the comment/ridicule category just described.
This relative lack of purely factual news accounts in our subjects and the broad political and cultural consensus that characterizes popular contents was indeed a surprising revelation that led us to a quite different interpretation of the inner workings of our sample than initially anticipated.
From diffusion to refraction
In the case of this particular empirical research, what “the field did to the concept” (Le Marec, 2002) was not only a drift in methodology toward a more qualitative, content–focused approach, but also a reevaluation of what has become a dominant paradigm in Internet research, in particular in the context of computational methods and “big data”, namely the notion of “information diffusion”. While studying a system like Twitter with a user sample certainly has its drawbacks in terms of representativeness and completeness, it allows us to examine the platform not only through a thematic slice or quantitative abstractions, but also in a fashion that is more sensitive to subject relations, commonalities, and trajectories of stabilization. We will therefore first discuss the limits of approaches relying solely on the information diffusion paradigm and then propose an extension that, in our view, allows for an interpretation that goes further in making sense of our findings.
The spread of online platforms that are built on network architectures and that automatically produce analyzable data have been integral to (re)emergence of “diffusionist” approaches to communicative phenomena. Classic models or theories, such as the “two–step flow of communication” (Katz and Lazarsfeld, 1955) or the work on the “diffusion of innovations” (Rogers, 1962), which conceive of social relations as a medium through which ideas can spread, have experienced a second spring — the former theory was even found to be valid for Twitter (Wu, et al., 2011). Even Tarde’s (2001) idiosyncratic work on imitation as a cultural conduit has found a new generation of readers and commentators. Next to smaller fields such as “memetics” (Blackmore, 1999), a “new science of networks” (Watts, 2005) has emerged as the dominant way to study and model distributed communication in online platforms quantitatively. In combination, these developments exert a certain gravitational pull on both the conceptual and methodological levels towards a specific understanding of communication as diffusion of information in a network.
While there are considerable differences, diffusionist approaches share a conception of communication that makes a strong separation between an infrastructure and the (informational) “units” that circulate in it. These units can take different forms, from contents to behavior to ideas and opinions, and so can the infrastructure: transport networks, social networks, communication networks, all can be studied using the same conceptual and methodological toolkit. The vocabulary of diffusionist thinking has been widely adopted and terms like “spreading”, “cascade”, “percolation”, “contagion”, and so forth, are now commonly encountered in Internet research. The question of power is most often theorized either as influence, which goes back to early communication studies (Katz and Lazarsfeld, 1955), or as access to resources (often meaning access to information), which is associated with social exchange theory (Blau, 1964; Burt, 1992). Both elements are closely related to the question of network topologies and “powerful” actors — hubs, gatekeepers, influencers — that are thought to be located at structurally significant positions in the network. “Structure” here means something very different from the term’s use in the context of structuralist thinking: while the latter conceives of structure as a set of rules and mechanisms that shape both meaning and practice before concrete actors even come into play, the diffusionist approach uses it to denote actor constellations and thereby as external to the actors themselves.
Such a conceptual outlook is highly compatible with computational and especially graph theoretical methods of analysis and papers on “information diffusion” abound with power law distributions of connectivity measures and network visualizations. These methods seem perfectly suited for platforms like Twitter that are equipped with technical features favoring diffusion (“retweeting”) but they also resonate well with a contemporary understanding of political journalism organized around scandals, scoops, and information leaks, where the gesture of “unveiling” formerly unknown elements is indeed an act of diffusion. Notions like “real–time” play a significant role in that context and diffusion speed generally receives more attention than long–term dynamics.
While diffusionist approaches are very well suited for describing “bursty” forms of communication in distributed settings, in particular when the production of information is spread out over a territory, e.g., in the context of popular protests, natural catastrophes, and so forth, they are less well equipped to account for more pervasive aspects of politics as long–term processes that are not limited to the question of what information is available, but rather organized around the production of shared understandings, values, and issue hierarchies. Because a network’s structure is the prime source of explanatory capacity, little attention is paid to things like content, interpretation, habitus, and other elements that are related to the question of meaning, such as ideology, cultural hegemony, normalization, trivialization, and so on. Diffusionist approaches are therefore vulnerable to certain elements of Gitlin’s (1978) classic critique of what he perceived as the “dominant paradigm” of the time in the communication field, namely Lazarsfeld functionalist and empiricist outlook.
While a more in–depth discussion about the strengths and weaknesses of diffusionist approaches would be particularly important at this point in the development of Internet research, it is beyond the scope of this paper. On a very naïve level, we simply have to observe that in the context of our empirical research “information spreading” in the sense of “sharing of previously unknown facts” is probably not the most common and certainly not the most significant practice of the Twitter users in our sample. We would therefore like to extend the notion of diffusion with that of “refraction” to be able to better take into account questions referring to meaning, rhetoric, and ideology.
In a somewhat different context from this research, Donna Haraway (1997) introduced the notion of “diffraction” as an “optical metaphor for the effort to make a difference in the world”, which “is about heterogeneous history, not about originals”. In a similar spirit, we would like to propose the metaphor of “refraction” as a way to further think about the space between identical reproduction and total heterogeneity. We do not use Haraway’s term because it suggests a level of heterogeneity and diversity that is simply not observable in our user sample; on the contrary, as we have started to see, political attitudes and moral coordinate systems are largely shared. Rather than the spreading out of waves in all directions that is suggested by diffraction, refraction refers to a singular change in direction for a wave passing through a surface, e.g., transferring from air into water. When looking into a pond, we can still see the fish but our perception of both their size and position is skewed.
Especially when looking at the most popular tweets in our case studies, we find that neutral “reporting”, in the form of merely relaying or linking factual accounts without commentary, is the exception rather than the norm. The most successful tweets are most often those that add a “twist“ to the topic and “spin” it in a certain way, i.e., that “refract” it. Let us consider the following messages from the Galliano case, which were both in the top 10 of the most retweeted messages over the 11–day observation period:
@Le_Figaro: Alerte : Christian Dior suspend le couturier John Galliano de ses fonctions de directeur artistique http://tinyurl.com/6k ...
[@Le_Figaro: Alert: Christian Dior suspends the fashion designer John Galliano from his functions as artistic director http://tinyurl.com/6k ...]
25 February 2011 — 15:06, https://twitter.com/Le_Figaro/status/41136918691454976
@isaway: En fait #Galliano est complètement #hasbeen la mode est à haïr les musulmans enfin ! Pas les juifs !
[@isaway: Actually #Galliano is completely #hasbeen come on, it’s fashionable to hate the Muslims! Not the Jews!]
2 March 2011 — 19:33, https://twitter.com/isaway/status/43016135066652672
This first message was posted on 25 Feburary, the day the news about the fashion designer’s insults “broke”, and this represents the kind of factual reporting that fits well into the diffusionist paradigm. One can easily locate the starting point(s) of the “previously unknown” information and then study how users “become informed” and, by spreading the news, “inform” others. The second is written five days later, on 2 March, when the affair is in full swing. While we can certainly think about it terms of diffusion as well (the message indeed spreads by being retweeted), the striking element is the subject drift to the question of Islam, which is as hotly debated in France as elsewhere. This is what we mean by refraction: the “issue” is commented upon, connected to a different issue or a specific detail is underscored.
Figure 1: Co–word analysis, visualized with gephi, based on hashtags appearing at least 10 times in the “Galliano” dataset. Two hashtags are connected if they appear in the same tweet. Node size shows frequency, color (blue => yellow => red) shows betweenness centrality.
But this refraction is only possible in the length of a tweet because messages not only circulate in a network infrastructure, but also in a cultural sphere, a space of meaning that the tweet can mobilize to make its argument: if we follow Geertz (1973) in taking culture as “webs of significance”, we can see how even a very short message can convey complex meaning by drawing on a reservoir of shared ideas, debates, stereotypes, facts, trivia, and so on, which can be often be evoked with a single word. The second message plays with these webs on the level of medium specific conventions (“#hasbeen”), on the level of understandings about the fashion world where trends play a significant role, and on the level of the larger debates about racism. The tweet brings all of these elements together, drawing on them for its message and tying them together in the process. Shared meaning is both the condition and the outcome of the work the user does here. As Figure 1 shows, the joining element need not be profound in any way: the thematic “connector” between the Galliano incident and Muammar Gaddafi (here in French spelling) is a shared preference in headwear.
The category of messages we called “repatriation” tweets fit well into this line of interpretation if we consider the term to mean an anchoring not necessarily only into a national context but, more generally, into a familiar set of meanings and values. It is somewhat banal to underscore that users comment on issues from the perspective of their own immediate concerns; interestingly though, these concerns seem to be largely shared, even beyond subject matters. As indicated above, the dominant frame (Lakoff, 2009) in our sample is organized around criticism of the right–wing government in power at the time of our data collection, and in particular the person of President Sarkozy. This critique is virtually omnipresent and it is in this sense that refraction can be understood not necessarily as a multiplication of perspectives and opinions, but as the interpretation of many different events in terms of a limited set of largely shared concerns and ideas. If we follow boyd, et al. (2010) in that “retweeting can be understood both as a form of information diffusion and as a means of participating in a diffuse conversation”, this conversation may be diffuse on the level of the dynamics of individual messages, but allows — at least in our sample — for the emergence of focus points — issues, norms, references — that give it structure in terms of meaning rather than topology.
As Marwick and boyd (2011) indicate, Twitter has become a strategic medium — in particular for certain professions — that is used to achieve “micro–celebrity” and the immediate feedback users can get from their precarious “network audience” (Marwick and boyd, 2011) may lead to mainstreaming in terms of the diversity of opinions represented. When looking at our sample, one could also make the observation that it represents what Bourdieu (1996) called the “journalistic field”, the ensemble of media professionals that spend their time talking to and observing each other, progressively aligning their perspectives by imitating strategies and attitudes that “work” in terms of retweets, clicks, and other metrics. The “multi–faceted and fragmented news experience” Hermida (2010) speaks of may actually appear a lot less fragmented when we start looking at the communalities that form behind the microbursts that appear on our screen.
From a methodological standpoint, the question remains whether the concept of refraction can only be illustrated on a microscopic scale, by reading individual tweets, or if a more macroscopic approach is feasible. The following section proposes using co–word analysis as a means for the latter.
If we take Twitter hashtags to be a good conduit to study message content in a more condensed form, the topical diversity in our sample is staggering at first glance: we identified 207,059 unique elements in a pool of 2,217,937 hashtags posted. As we have argued, this diversity does not exclude concentration, however. The top five terms make up 7.6 percent and the top fifty 21.9 percent of all hashtags posted over the two month period. Focusing on a one–week period (28 February 2011 — 6 March 2011), over which 40,687 unique hashtags were used, an analysis of the 1,000 most used hashtags — accounting for 59.9 percent of all occurrences — allows us to make further observations.
While we are hesitant to statistically quantify the phenomenon we have labeled as refraction, mostly because we consider the concept to be interpretative (Geertz, 1973) rather than formal, co–word analysis (Callon, et al., 1983) is uniquely suited to study complex textual material in a more rigorous manner. If we take hashtags to be equivalents of “macro–terms” that “crystallize and synthesize”  discourse, the analysis of the relationships between these terms allows us to map attempts at drawing issues together.
Figure 2: Co–word analysis, visualized with gephi, based on the 961 of the top 1,000 hashtags that form a giant component. Node size shows frequency and link width expresses the frequency of two tweets co–occurring in a same tweet. Colors are provided by gephi’s community detection algorithm. For a higher resolution image, see http://bit.ly/YHP1cW.
Of the 1,000 most frequent hashtags, 961 appear in a connected component when we create a network of hashtags that are connected by co–occurring in a tweet. This giant component forms a small world, with a diameter of 6 and an average path length of only 2.66. Hashtags are very well connected, due to a high average degree (number of connections) of 19.95. This basically means that hashtags have a tendency to co–occur with a large variety of others, even if we consider that degree values are distributed quite unevenly — some hashtags are simply much more connected than others. If we take a closer look at the visual representation of the network, the structural organization of the network becomes clearer. There obviously are areas of thematic concentration: political debates on the right side, technology related topics on the left, and political upheavals in Africa and Asia at the top. These clusters are relatively well captured by the community detection algorithm provided by the gephi network visualization toolkit. If we see refraction merely as users making connections between different issues — and we would like to term to have a broader meaning that includes the other elements discusses above — the co–occurrence map shows to distinct levels: first, the high density in larger topic clusters indicates — unsurprisingly — that connections to “closer” issues are more frequent; this is particularly interesting in the upper center of the map, where events in Iran, Egypt, Lybia, Tunisia, and Ivory Coast (“civ2010”) are frequently brought together. Second, even if one discounts hub nodes such as “ff” (“follow Friday”) that signal platform conventions rather than issues, the full network holds well together, which means that connections between the different topic clusters is far from uncommon. Geertz’ notion of culture as webs of significance is, in a sense, made visible here.
To conclude, we would like to underline the caveat Hermida (2010) adds to his account of the “fragmented news experience” provided by Twitter: “The value does not lie in each individual fragment of news and information, but rather in the mental portrait created by a number of messages over a period of time.” What emerges from a content oriented examination of an admittedly small number of news subjects as they were discussed by a set of 17K French Twitter users is a “mental portrait” that takes the form of a sphere at the same time as that of a network: beyond the leveling layer of infrastructure, there is a production of insides and outsides, of borders (linguistic, cultural, political, etc.) and shared spaces. While a diffusion–oriented approach indeed shows a chaotic staccato of messages, bursts of attention, and the classic power–law distributions when it comes to connectivity and retweet frequency, a content–oriented perspective paints a much less heterogeneous picture. While our selected subjects do not share a common scale or temporality, shared values and concerns clearly shine through the most popular contents and lead us to diagnose the kind of commonality the image of the sphere evokes. Co–word analysis is certainly a means to bridge the gulf between the two concepts and methodological approaches: by modeling the relationships between hashtags as a network, we can begin to map the webs of significance manifesting in media spaces like Twitter and make claims about content in the face of very large amounts of data.
Whether the absence of political polarization, albeit often observed on Twitter when focusing on the U.S. (Conover, et al., 2011), is an artifact of our sampling method or simply the reflection of the dominance of center–left positions in the media–savvy population active on Twitter in France cannot be fully decided — sampling on Twitter remains a deeply problematic exercise. However, a recent study (Harris Interactive, 2012) indicates a strong left leaning by French journalists active on Twitter and confirms our suspicion that the right is simply underrepresented in the media circles that dominate the platform. While we can, in line with An, et al. (2011), confirm the presence of a wide variety of sources, the refraction of these sources to a limited number of shared reference points suggests a lot less diversity on the level of opinions and values than initially anticipated — at least on the level of mainstreaming that we have focused on.
So why do we call Twitter a “refraction chamber” rather than simply follow Sunstein (2001) and speak of an “echo chamber”? Because we want to put the emphasis on something that the latter metaphor captures only imperfectly: instead of merely being exposed to like–mindedness, we consider that the users are the driving force behind the production of shared values and understandings. More than just following homophilic “urges” that result in biased source selection (i.e., who to follow), refraction suggests that commonality is the result of labor on different levels and a product rather than an effet pervers, an unintended consequence. This difference may appear insignificant but it opens the door for interpretations that go further into the direction of ideological analysis.
While this research needs to be extended on virtually all levels, we hope to have shown that methodological and theoretical issues are supremely important when it comes to studying a complex communication system such as Twitter. We would also hope that a more intensive debate on methodology and theory in Twitter research would take place in the future, a debate that centers on the question how the impressive results from diffusionist approaches can be brought together with a perspective that goes further in accounting for matters of meaning and ideology.
About the author
Bernhard Rieder is an Assistant Professor at Amsterdam University’s Media Studies department and a researcher and developer at the Digital Methods Initiative.
E–mail: rieder [at] uva [dot] nl
The empirical research presented here has profited from major contributions by Nikos Smyrnaios and Raphaël Velt. The author would also like to thank the participants of the Digital Methods Initiative’s 2012 winter school for their valuable comments, in particular Anne Helmond, Catalina Iorga, Richard Rogers, and Michael Stevenson.
1. The initial sample was constructed by three researchers — all long–term Twitter users — through field research at the end of 2010 and aimed at constructing an anchor point for capturing users interested in political subjects. The first step consisted of attempting to compile an exhaustive list of accounts by French politicians from all major parties with the help of the platform’s search and navigation functions. This initial list of 380 users was complemented with 106 accounts maintained by activists, bloggers, and media professionals having achieved notoriety on political subjects on the platform. While we cannot guarantee representativeness — a near impossible task — particular attention was paid to include the full political spectrum in the initial set.
2. For an explanation of Twitter’s different APIs, see https://dev.twitter.com/docs/history-rest-search-api.
3. In contrast to the often–used search API, the REST API provides access on a per user basis and is subject to fewer limitations: the last 3,200 tweets per user can be collected. Due to rate limiting, only a limited number of users can be accessed per day. Using a rotating set of access tokens, we were able to access all user accounts roughly twice per day.
4. To provide some context, the three other main issues emerged around the events in Libya (140K messages over two months), Tunisia (40K messages over two months) and the cantonal elections in France (39K messages over two months).
5. For this analysis, all shortened URLs sent in tweets were translated to their long form.
6. Callon, et al., 1983, p. 199.
Jisun An, Meeyoung Cha, Krishna Gummadi, and Jon Crowcroft, 2011. “Media landscape in Twitter: A world of new conventions and political diversity,” Proceedings of the Fifth International Conference on Weblogs and Social Media, and at http://www.cl.cam.ac.uk/~jac22/out/twitter-diverse.pdf, accessed 2 November 2012.
AT Internet, 2011. “Sites médias: Une visite sur 6 issues de sites affluents vient de Facebook,” at http://www.atinternet.com/Ressources/Etudes/Enjeux-web-marketing/Reseaux-sociaux-Mars-2011/index-1-1-4-233.aspx, accessed 13 March 2012.
Susan Blackmore, 1999. The meme machine. Oxford: Oxford University Press.
Peter M. Blau, 1964. Exchange and power in social life. New York: Wiley.
Pierre Bourdieu, 1996. Sur la télévision: Suivi de L’emprise du journalisme. Paris: Liber éditions.
danah boyd, Scott Golder, and Gilad Lotan, 2010. “Tweet, tweet, retweet: Conversational aspects of retweeting on Twitter,” HICSS ’10: Proceedings of the 2010 43rd Hawaii International Conference on System Sciences, pp. 1–10.
Axel Bruns, 2011. “Towards distributed citizen participation: Lessons from WikiLeaks and the Queensland floods,” CeDEM11: Proceedings of the International Conference for E–Democracy and Open Government, pp. 35–52.
Ronald S. Burt, 1992. Structural holes: The social structure of competition. Cambridge, Mass.: Harvard University Press.
Michel Callon, Jean–Pierre Courtial, William A. Turner and Serge Bauin, 1983. “From translations to problematic networks: An introduction to co–word analysis,” Social Science Information, volume 22, number 2, pp. 191–235.http://dx.doi.org/10.1177/053901883022002003
Meeyoung Cha, Hamed Haddadi, Fabrício Benevenuto, and Krishna P. Gummadi, 2010. “Measuring user influence in Twitter: The million follower fallacy,” ICWSM ’10: Proceedings of the International AAAI Conference on Weblogs and Social Media.
comScore, 2011. “It’s a social world: Top 10 need–to–knows about social networking and where it’s headed” (21 December), at http://www.comscore.com/it_is_a_social_world, accessed 13 March 2012.
Michael D. Conover, Jacob Ratkiewicz, Matthew Francisco, Bruno Gonçalves, Alessandro Flammini, and Filippo Menczer, 2011. “Political polarization on Twitter,” ICWSM ’11: Proceedings of the Fifth International Conference on Weblogs and Social Media.
Clifford Geertz, 1973. The interpretation of cultures: Selected essays. New York: Basic Books.
Todd Gitlin, 1978: “Media sociology: the dominant paradigm,” Theory and Society, volume 6, number 2, pp. 205–253.http://dx.doi.org/10.1007/BF01681751
Donna J. Haraway, 1997. Modest_Witness@Second_Millennium.FemaleMan_Meets_OncoMouse. New York: Routledge.
Harris Interactive, 2012. “Les journalistes présents sur Twitter et la campagne présidentielle de 2012,” at http://www.harrisinteractive.fr/news/2012/CP_HIFR_Medias_14062012.pdf, accessed 22 August 2012.
Alfred Hermida, 2010. “Twittering the news: The emergence of ambient journalism,” Journalism Practice, volume 4, number 3, pp. 297–308.http://dx.doi.org/10.1080/17512781003640703
Journal du Net, 2010. “Twitter rassemblerait 225 000 utilisateurs en France” (12 November), at http://www.journaldunet.com/ebusiness/le-net/membres-twitter-en-france-1110.shtml, accessed 13 March 2012.
Elihu Katz and Paul Lazarsfeld, 1955. Personal influence: The part played by people in the flow of mass communications. Glencoe, Ill.: Free Press.
George Lakoff, 2009. The political mind: Why you can’t understand 21st–century politics with an 18th–century brain. New York: Viking.
Joëll Le Marec, 2002. Ce que le “terrain” fait aux concepts: Vers une théorie des composites. Paris: Université Paris Diderot — Paris 7, at http://sciences-medias.ens-lyon.fr/scs/IMG/pdf/HDR_Le_Marec.pdf, accessed 13 March 2012.
Emmanuel Marty, Nikos Smyrnaios, and Franck Rebillard, 2011. “A multifaceted study of online news diversity: Issues and methods,” Proceedings of the ECREA Journalism Studies Section and 26th International Conference of Communication, pp. 228–242.
Alice E. Marwick and danah boyd, 2011. “I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience,” New Media & Society, volume 13, number 1, pp. 114–133.http://dx.doi.org/10.1177/1461444810365313
Evgeny Morozov, 2009. “How dictators watch us on the Web,” Prospect, number 165, at http://www.prospectmagazine.co.uk/magazine/how-dictators-watch-us-on-the-web/, accessed 13 March 2012.
Pear Analytics, 2009. “Twitter study — 2009,” at http://www.pearanalytics.com/blog/wp-content/uploads/2010/05/Twitter-Study-August-2009.pdf, accessed 13 March 2012.
Pew Research Center, 2011. “Navigating news online” (9 May), at http://www.journalism.org/analysis_report/navigating_news_online, accessed 13 March 2012.
Thomas Poell and Kaouthar Darmoni, 2012. “Twitter as a multilingual space: The articulation of the Tunisian revolution through #sidibouzid,” NECSUS_European Journal of Media Studies, volume 1, number 1, at http://www.necsus-ejms.org/twitter-as-a-multilingual-space-the-articulation-of-the-tunisian-revolution-through-sidibouzid-by-thomas-poell-and-kaouthar-darmoni/, accessed 22 August 2012.
Bernhard Rieder and Nikos Smyrnaios, 2012. “Pluralisme et infomédiation sociale de l’actualité: Le cas de Twitter,” Réseaux, forthcoming.
Everett M. Rogers, 1962. Diffusion of innovations. New York: Free Press of Glencoe.
Richard Rogers, 2009. The end of the virtual: Digital methods. Amsterdam: Amsterdam University Press.
Clay Shirky, 2009. “The net advantage,” Prospect, number 165, at http://www.prospectmagazine.co.uk/magazine/the-net-advantage/, accessed 13 March 2012.
Spintank, 2011. “Twitter en France, au–delà de l’écume” (3 January), at http://www.spintank.fr/twitter-en-france-etat-des-lieux-chiffres-2011/, accessed 13 March 2012.
Cass Sunstein, 2001. Republic.com. Princeton, N.J.: Princeton University Press.
Gabriel Tarde, 2001. Les lois de l’imitation. Paris: Seuil.
Duncan J. Watts, 2005. “The ‘new’ science of networks,” Annual Review of Sociology, volume 30, pp. 243–270.http://dx.doi.org/10.1146/annurev.soc.30.020404.104342
Harmut Wessler, Bernhard Peters, and Michael Bruggemann, 2008. Transnationalization of public spheres. New York: Palgrave Macmillan.
Shaomei Wu, Winter A. Mason, Jake M. Hofman, and Duncan J. Watts, 2011. “Who says what to whom on Twitter,” WWW ’11: Proceedings of the 20th International Conference on World Wide Web, at http://www.www2011india.com/proceeding/proceedings/p705.pdf, accessed 2 November 2012.
Received 9 August 2012; accepted 15 October 2012.
This paper is licensed under a Creative Commons Attribution–NonCommercial–NoDerivs 3.0 Unported License.
The refraction chamber: Twitter as sphere and network
by Bernhard Rieder
First Monday, Volume 17, Number 11 - 5 November 2012