First Monday

Data not seen: The uses and shortcomings of social media metrics by Nancy K. Baym

This paper looks at the use of online data, especially social media metrics, to assess media audiences, with particular focus on musicians. It shows how audiences are defined by different sets of people, and grounds the use of social media to understand audiences in the history of mass media audience measurement. The second half of the paper focuses on visible social media metrics — likes, followers, and other such counts — outlining their appeal as measures and highlighting their fallibility and ambiguity. Throughout, the paper argues that different people construct information systems to collect, store, analyze, and interpret data, and that these are shaped by value systems. Metric and big data analysis generally serves economic values, while other approaches to data may be more appropriate for assessing social and personal values.


1. Desperately seeking the audience
2. Defining “audience”
3. Measuring audience size
4. Analyzing audience engagement
5. Social media metrics
6. Value systems
7. Conclusion



1. Desperately seeking the audience

It has been more than twenty years since Ien Ang’s (1991) insightful critique of television audience measurement, Desperately seeking the audience. Since then, audiences have appropriated the Internet to share, comment upon, create new materials with and influence media in ways that transform media industries. Correspondingly, industries have appropriated the digital traces audiences leave as they traverse media in order to better understand and manage audiences. This paper considers social media data in the context of audience measurement. Across Internet platforms, audience members instantaneously and continuously look at Web sites, click like and play buttons, follow, “friend,” tweet and more, generating data on massive scales in forms that can be counted and mined. The use of these data to understand audiences fits into a tradition of measurement nearly as old as mass media.

Different people in media industries have different goals when they conceptualize and make use of digital audience data. The inspiration to seek data, data collected, interpretations built, and choices made from data are framed by and constitute value systems. In the context of industries, the value of an audience is defined in terms of economic capital. In Web economies, economic capital is tied to social capital such as reputation and legitimacy (Arvidsson, 2011; Bermejo, 2007) in poorly understood ways.

To get from digital traces of audience practices to analysis requires “information systems” (Napoli, 2011). These shape what data is sought, how data are collected and stored, and how they can be interpreted and applied. There are many commercial audience information systems and platforms available, from historical stalwart Nielsen to widely used free platforms like Google Analytics. Some of these information sources and systems are now available to creators as well as business people in media industries. We know nearly nothing about how people approach and build ad hoc information systems to understand their own audiences.

I focus on musicians and the music industry, drawing on interviews I conducted between 2010 and 2012 with 37 musicians and three managers. Most were independent and had audiences prior to the Web. They were from six countries and performed in approximately a dozen genres. Music offers an exemplary leading–edge case of how audiences have changed and how post–digital media economies are transforming in the wake of those changes. Music audiences engage in every form of audience practice, and have been the pioneers of file sharing, both authorized and unauthorized. The advent of the MP3 and ensuing protocols for online sharing radically altered music distribution, taking it out of the hands of producers and putting it in the hands of anyone willing and able share files. In recent years, digital sales and authorized streaming sites have offered alternative (if controversial) new revenue streams. All of these practices — from sharing MP3s to purchasing a song online, to writing fan fiction about a band — leave digital traces ready to become data.

I begin by showing how “audience” is defined by different sets of people. I then turn to two standard industry strategies for approaching data about audiences — measuring their size and assessing their engagement — and show the challenges of assessing either with online data. The second half of the paper focuses on visible social media metrics — likes, followers, and other such counts — outlining their appeal as measures and highlighting their fallibility and ambiguity. I close by returning to the question of values and argue that the metric approach to understanding audiences through online behavior is generally motivated by thinking of audiences as providers of economic capital, but that economic capital is embedded in a system that includes social, relational, and personal values. Throughout, I consider how different people construct information systems and the skills it takes to make those systems work.



2. Defining “audience”

If you are going to look for an audience, you have to define what it is you are seeking. One reason the search remains desperate is that, as Raymond Williams famously put it, there are no masses, only ways of seeing people as masses. “Audience” is a discursive construct, articulated in different ways to serve differing purposes. It is, as Ang [1] put it, “a fictional construct that will always refuse definitive representation.” The people taken to form audiences differ in important systematic ways (Goffman, 1989), they speak with multiple voices (Livingstone, 2005), and the act of labeling them “audience” divorces them from the lived social world of actual individuals (Ang, 1991).

Jensen and Rosengren (1990) identified five academic approaches to audiences. Social scientific approaches seek to identify media effects or understand individuals’ media uses and gratifications. Humanistic approaches include literary criticism, cultural studies, and reception analysis. Similarly, Anderson (1996) articulated a lengthy list of ways audiences have been constructed. Within academia, Anderson identified the “formal analytic audience” constructed by scholars’ analytic perspectives, “situated empirical audiences” of historic individuals performing media attendance as part of the social action of everyday life, and “strategic empirical audiences” made up of interpretive communities in which the collective is seen as the active agent. Since then, we have seen a shift toward studying audiences in terms of fandom (e.g., Baym, 2000; Jenkins, 1992; Jenkins, 2006), which might be considered a variant of the strategic empirical audience. Theoretical and methodological differences aside, these modes of constructing audience are built on shared academic values of basic knowledge and understanding.

In contrast, analyses of the “institutional audience” constructed by industries are done to render audiences more predictable and controllable and are based on economic values. Industry use of audience data is part of the post–industrial age “control revolution” (Beniger, 1986) in which information is harvested to better control others. “Ideally,” writes Napoli [2], “audience data can reduce uncertainty, facilitate more effective predicting of audience behavior and consequently enable more effective strategic decision–making.” Industrial audience analysis efforts began in the early twentieth century when Hollywood film studios began saving and sorting fan mail into demographic categories in order to inform decision–making about future productions (Napoli, 2011). In Anderson’s (1996) terms, industrial audiences include the “formal encoded audience” (those for whom the industries create cultural materials) and the “aggregate empirical audience” (market segments such as working women aged 18–49 that can be sold to advertisers). The audiences constructed by industry are quantifiable economic assets.

In short, there is a gulf between audiences as they are constructed in academia and industry. Professional creators around whom audiences actually gather, are a third category. They may seem to be industry, but may not frame their audience only as economic assets but also in more relational and ephemeral terms involving emotions and aesthetics. These constructions of audience cannot be assessed with the same economically motivated data or tools. Indeed, from the beginning of audience analytics, creative staff including copywriters and film–makers have argued that a reliance on audience metrics conflicted with their sense of their work as art (Napoli, 2011).



3. Measuring audience size

3.1. Size counts in media industries

The most central form of industrial analysis has been counting audience size, historically with production numbers, sales or exposure (Ang, 1991). Press and periodical runs were the first form of quantified audience measurement (Bermejo, 2007). Since then, the pressure to create better measurement instruments has continued (Ang, 1991), a phenomenon enhanced by the “fetish of immediate numerical gratification” [3] and the more recent promise of “big data.” Although firms such as Nielsen had been in place for some time, Napoli (2011) cites the 1970s as the era in which audience measurement began moving from intuitive approaches to more systematic analyses. Measurement has many appeals, all of which stem from the needs of commercialism, rather than scientific interests (Bermejo, 2007).

Measurement turns audiences “into suitable objects in and for industry practices” [4]. In media like television, magazines and newspapers, and now Web sites, measurement creates commodifiable products that can be sold to advertisers. In media like books, film, and music that are not ad dependent, measurement offers the possibility of managing demand and consumption to avoid waste, scarcity, and to maximize profit (Bermejo, 2007). Given the importance of audience measurement to industry, audience research itself became a commodity, with Nielsen the dominant measurement provider (Buzzard, 2012).

With the advent of the World Wide Web in the mid–1990s, much analytic attention shifted to trying to measure online audiences (Bermejo, 2007). The “hit” became central to the early Web economy (Gerlitz and Helmond, 2013), although it was not clear how to price the value of a hit. Displaying the number of hits a site received “became a valuable self–advertising tool” in this “new environment in which brands, prestige, and quality stamps were still undefined” [5]. However, despite the traces left by visits to Web sites, those data are complex and hard to interpret (Bermejo, 2007; Buzzard, 2012). Site–oriented analytics offer the ability to look at page visits and unique users, but reveal little of audience demographics (Bermejo, 2007) except location. They also miss multiple visits by one person at different locations (Buzzard, 2012).

In music industries, audiences have traditionally been measured through sales and to some extent exposure. Until the 1990s, the industry standard was Billboard Magazine’s charts, based on a sampling of record store personnel rather than direct measures of sales. Chart position and exposure, mostly in the form of radio airplay, shaped the opportunities available to artists. Big data came to the industry in 1991, when Shalett and Fine developed Soundscan (now part of Nielsen) to measure music audiences by tracking over the counter sales through bar codes (Johnson, 2011). Much to the surprise of many in the music industry, Soundscan showed that genres thought to be marginal — specifically hip–hop and country — were in fact extremely popular. This, in turn, shaped corporate strategy in signings and promotion (Johnson, 2011), one example of how measurement shapes industry investment in cultural production (Napoli, 2011).

3.2. Creators’ perspectives on size counts

Creators are well aware that their numbers affect their opportunities. “At one point,” American singer–songwriter Erin McKeown told me, “Soundscan numbers were the most important numbers that someone in the music business looked at. If you were going to hire someone for a gig or if you were thinking about signing someone to a record label, you’d want to know what their Soundscans were.” Soundscan may not have been entirely trustworthy, but, as McKeown explained, there were some shared understandings of its limits and how to compensate for them:

Soundscan is a notoriously corrupt and unreliable system, but it was still used [...] you look at the numbers and you knew they were handicapped somehow but at least they’re numbers. Everyone kind of knew the way they were fucked up, so you could kind of paw your way through that and find some information about somebody. As an artist who has always sold more from the stage, as someone who has had some mainstream success but much under the radar success, I always felt like my Soundscan numbers were not fair [laughs] in terms of giving someone the real picture of the size of my fan base.

But even if they had viewed such counts as adequate benchmarks of their success, musicians have rarely had direct access to their own sales figures. Before social media provided explicit numbers visible on their sites, if they quantified audiences at all, they likely did it in terms of what venues they could fill, fan club memberships and the number of people who signed up for their (snail mail) mailing lists.

Mailing lists were the most ubiquitous early database that the musicians I spoke with kept. Most of those who kept early mailing lists relied on them not as measures of audience size, but as a means of communication. The British band Marillion, like most bands who had mailing lists, collected names and addresses at gigs. They later created a fan club with a paid subscription, giving them modest additional revenue and, more importantly, a way to reach their most ardent supporters (something they later mobilized to fan fund their recordings through pre–sales). “I had no idea how many people were opening it, reading it, anything,” said McKeown, “but [in] the early 2000s for me my e–mailing list was the most successful way that I could let people know what I was doing.” The size metrics these databases offered were by–products of another process — communication. The trails that became data were generated as “metacommunication” (see Jensen, this issue).

However, as Marillion’s keyboard player Mark Kelly told me, it is one thing to know the importance of collecting names and addresses and another to be good at data management. “Unfortunately, when people left we would just destroy their details, we never kept it all. So there’s probably thousands and thousands of people went through the fan club, we have no idea who they were because we weren’t so clued up about the sort of things that you should be doing.” Several of the musicians I spoke with had lost their audience data at least once, usually while changing from one data management system to another. This often happened with moving from one mailing list scheme to another, but could also happen when, for example, a musician left or deleted friends on MySpace. This demonstrates an important problem that persists and is amplified by the massive amounts of digital data now available — the people who create audiences through their work may be ill equipped to access, move, manage or interpret the data it takes to track and measure those audiences.

3.3. The Internet complicates counting even more

The Internet has seriously complicated the very basic problem of finding audiences to measure. Audiences have become more visible, but they have fragmented into more niches (Napoli, 2011). They have also become more autonomous, gaining new control over where, when, and how they consume media (Napoli, 2011). Rather than watching The Daily Show on television, for instance, people may encounter a clip from the show embedded in entirely different contexts such as a blog devoted to the slow food movement (Baym and Shah, 2011). Where, what and how do you count?

In music, fragmentation and autonomy can be seen in the proliferation of sites for purchasing (e.g., iTunes, Amazon) and authorized and unauthorized listening — which include streaming services (e.g., Spotify, Pandora,, SoundCloud), video sites (e.g., YouTube), and torrent sites (e.g., The Pirate Bay). Data access and management are challenged by the range of sites through which people may engage media and whether those sites allow data to be exported and, if so, in what forms. McKeown reflected on fragmentation, autonomy, and its consequences:

You add in piracy, and Soundscan numbers become even less useful because there’s all these uncounted copies of your music. For me, my experience has been in the last five years I get stopped enough on public transportation somewhere or in an airport, disproportionate I think to the number of records I’ve quote unquote “sold.” Like there’s a bigger group of people who know about my music than are being reflected in those numbers.

Her complaint about piracy was not so much that it might be costing her sales, but that it was increasing the gap between her experience of her audience size and the official metrics that others used to assess her value and thus to invest in promoting and supporting her. In the mid–2000s, analytics companies such as Big Champagne began responding to this by including measures of unauthorized downloads as well as purchases in their music audience data. The advent of authorized streaming sites provides additional data on individual listens and some demographic information about listeners, but, as I will discuss in terms of fallibility, capture only a fraction of all listens and none of the purchases.



4. Analyzing audience engagement

4.1. Measuring effect

Audiences have always been more engaged than measures of size could capture, but on the Internet that engagement is visible, traceable, and takes new forms. Online, engaged fans connect across time and space, build and sustain group discussion, create repositories of information and derivative works, and build new distribution networks amongst themselves, all of which become visible not only to one another, but also to industry and creators (Baym, 1993; Baym, 2007). Industries can no longer get away with conceptualizing audiences as passive, and now measure engagement (Napoli, 2011) and look for particularly engaged, influential, or target–market audience members. The meaning of “engagement” is highly contested within industry. Napoli (2011) cites a 2006 white paper from the Advertising Research Foundation that listed 25 different definitions of the term in use. When it is so unclear what “engagement” means, efforts to measure it are inherently fraught.

Analysts may seek to identify influencers using strategies like network analysis. They may also use strategies like sentiment analysis to distill feelings in efforts to understand what audiences most value (Arvidsson, 2011). Mining chatter about television shows from “the second screen” of Twitter is ever–hotter. Through strategies like this, songs or artists low on traditional exposure metrics or television shows with few viewers may be revealed to rate high on appearance in online conversations and time spent with them (Napoli, 2011), a shift of focus to depth rather than breadth of engagement. Some audience measurement firms may measure emotional response through techniques like facial recognition, heart rate monitoring, skin conductance analysis, surveys, and brainwave measurement (Napoli, 2011).

These strategies can be seen as means of objectifying audience affect (Arvidsson, 2011). Rather than counting bodies, they count feelings so that industry professionals can invest in products that will generate more and the right kind of feelings. This objectification of affect makes it possible to assign exchange value in an economy where ephemera like branding and innovation are more important than productivity (Arvidsson, 2011).

4.2. Information systems

However magnificent it may seem to have so much data available and to be able to mobilize that material in different ways, the promises of big data are a mixture of real potential with uncritical faith in numbers and hype about what those numbers can explain (boyd and Crawford, 2012; Bruns, this issue). To even begin to make sense of the data, people need expertise and skill as well as software and human resources. Many in the media industries have turned to (often commercial) “information systems” (Napoli, 2011) designed to aggregate and making sense of multiple sources of data for them. Others create ad hoc personal information systems which they may or may not combine with commercial systems. Comparing the commercial and DIY systems reveals the kinds of skills that people must have to build and use information systems in ways that accomplish their goals.

4.2.1. Commercial information systems

There are several commercial analytic companies in music, many also offering direct–to–fan platform services. Music analytic companies such as Musicmetric use sophisticated computer science to aggregate, mine, analyze and package the vast amounts of discourse and activities about individual artists and bands available online. Musicmetric describes itself like this:

We provide insights and understanding into consumer behaviour online globally for the entertainment industry. We aggregate and analyse all music related information available on the Web. From Web sites mentioning an artist or release, the social networks frequented by music fans, peer to peer networks used to trade music and anywhere music fans leave a comment, we aim to be there and to be tracking the activity. (Musicmetric, 2013)

The Echo Nest’s focus is using machine learning to understand everything about music. But it also offers tools for “audience understanding” to help companies “drive ad and subscription revenue by capturing dynamic listening analytics from every music fan, and using that data to segment audiences, increase conversion rates, and improve ad targeting” (The Echo Nest, 2013a). Recently, The Echo Nest has begun experimenting with using musical taste to predict taste in movies and politics and ultimately to “better understand, predict, and target high–value listeners” (The Echo Nest, 2013b).

4.2.2. DIY information systems

The Echo Nest and other similar businesses do extraordinary things with data, in large part because they use teams of expert computer scientists to work their information systems. Musicians and their representatives may use services like these, but usually when they do, it is as part of cobbled–together ad hoc systems to understand audiences and tailor messages to specific audience segments. These actors placed in data–analysis roles are engaged in practices that parallel the quantified–self devotees Halavais discusses (this issue). DIY information systems are built to analyze data in order to accomplish goals based on values. They entail access to data, storage, databases and software, expertise in analytic methods, and interpretive skills that include knowing when and how to focus on or discount data.

The Puerto Rican music manager Ariel Rivas described his job as now including data management and mining “every day.” He used a wide range of tools (“everything we can”) to attend to audience data, including Tweetdeck (“like having three or four million reporters connected and sending data to you. It’s incredible”), Facebook demographics, Google Analytics, and Top Spin (a commercial platform for direct–to–fan engagement with analytic tools built in). As a manager, his purpose is to identify geographic regions where they should focus their publicity, assess demand for concerts and price tickets, and to identify and build relationships with the most influential fans:

For example, one of the artists we represent, Rubén Blades, [...] we’re monitoring what happens on all the social networks and on Twitter, and we know that we have a huge amount of fans in Venezuela. We know that, because we have a lot of hits from Venezuela. Or we have a lot of fans in Colombia, and we can do more publicity or we can sell our shows at a higher price in that country [...] So, now we have a lot of tools to know what exactly is happening in each market. In the old days, they didn’t have that ability to know what happened in each market. You just had to call the radio stations.

Rivas also uses his observations of data flows around his artists to identify influential audience members and enlist them into larger promotional efforts. For instance, he saw that a Rubén Blades fan in Chilé had created a Facebook fan page with 100,000 followers. He contacted the fan, thanked him for “the wonderful work” and asked him to “be part of us.” They sent him “albums, tickets, and flowers,” and kept him updated on news and events in hopes he would share that information with the fans he had organized. This is indicative of how affective value is often created by unpaid people (Arvidsson, 2011).

D.A. Wallach is half of the American band Chester French, the first band on Facebook (he and his bandmate were conveniently enrolled at Harvard when the site first launched there). He is extremely strategic in using data to understand and address their audiences. He monitors communication about the band using a variety of tools including Google Alerts, RSS feeds from Twitter, IceRocket (“kind of like a social search engine”), and links people send them in e–mail messages. “I try to cast my net as wide as possible.” They manage the band’s e–mail with “sales force CRM software” on which they built their own fan management platform, something no one else they knew of had done, and which they later helped others do. This platform allows them to generate reports on people and segment them how they want:

We can do it geographically. We can do it based on behavior. We can do it based on demographics, whatever information we have. We can sort by any data points that we have on them. So when I write those notes, for instance, it puts their name in it. So it’s “Hello, Nancy.”

Returning to the earlier point that for musicians, metrics were by–products of communication, Wallach’s stance shows his orientation toward using data to design more appealing communication.

Zoë Keating, who makes ambient cello music that defies genre categorization, worked in San Francisco in the tech industry before becoming a full time musician. She too is motivated to analyze audience data in order to communicate with them more effectively. Coming out of the tech industry, she is considerably more at ease with computers and their interfaces than most and is easily able to mobilize skills including gathering her audience information from different platforms and capturing them in a form that allows her to sort and compare lists across those platforms. As she explained to me, she investigated her fragmented online audiences using “Google analytics and what have you.” She concluded that they’re all different:

The people who I’m connecting with on my Facebook fan page are different than the people on my Facebook profile page. I only share 500 people between those two groups. And then the people who are on MySpace are totally different. The people who are on my e–mail list are also for the most part different. I think they’re the people who aren’t necessarily on social media, some of them are, some of them aren’t. My mailing list is different. And then I also know that the people who are just listening to me on Twitter, they might be more casual or something and they’re not getting my mailing list message, for example. So they’re all different audiences and I have to talk to them differently.

An agency like Rivas’s or a band like Marillion may have staff to do that, and people like Keating or Wallach may have the skills. However, many musicians are liable to feel overwhelmed rather than empowered by the relentless flow of data. American jazz singer/guitarist Kate Schutt finds that the challenges of managing audience data are not worth the effort. At the time we spoke, she had two mailing lists that were not coordinated, “because I would have to do it manually and since January, I let my assistant go. So I’m not going to spend my time doing that.” Mike Timmins from Canadian band Cowboy Junkies expressed a similar sense of not being able to handle the data available to him:

I mean, all those metrics — there’s so many. And I’m sure there’s a way of figuring them out and looking at that stuff. But, God, who knows how to do that? And who has — and that’s more time. That’s more time involved. I think if you had a team of guys working on your Web site, which I’m sure some of the bigger bands do and their labels should but they probably don’t, yeah, all that stuff could be figured out. I mean, it’s always there. All those metrics and all those tools are there to do that stuff, but, God, I can’t even be... It’s hard enough to just do a blog, never mind do all that stuff.

Even people in industry positions that you would think require data analysis are not necessarily up to speed on even the simplest Web data skills. Timmins snidely hints, labels “should but probably don’t” figure out metrics. In fact, when Cory Ondrejka (2009) left Second Life and became head of digital at EMI, he learned that six months before Katy Perry’s song “I Kissed A Girl” took off, hits to her Web site spiked. No one at EMI was tracking Web site hits. They didn’t notice. There is thus a significant distinction to be made between the availability of digital data and being willing or able to construct information systems to use the data.



5. Social media metrics

5.1. The appeals of social media metrics

5.1.1. Counting size and engagement

Social media platforms such as MySpace, Facebook and Twitter, seem to offer a solution to some measurement problems. They provide metrics that require no expertise to find (I will return to how much expertise they require to interpret). Metrics are often made visible in the interfaces themselves, where they can serve as proxies for both audience size and engagement, as they stem from active audience choices to click, to follow, to like, to retweet, and so on. That social media platforms usually show visible metrics of a page or user’s popularity is no accident. It is part of the politics of these platforms (Gillespie, 2010) to set up counts in such a way that users become more engaged with the site while trying to increase their numbers. Banet–Weiser (2012), for instance, has described how young people are driven to receive “likes” on the content they create. All this commodification of affect through likes, follows and so on accrues to the platforms themselves, making platform designers powerful actors behind the kinds of data available online and the kinds of practices that motivate the creation of those data in the first place.

The behind–the–scenes work of platforms in shaping available data indicates the difficulties of using these numbers as measures of anything other than themselves. However, since these metrics are so visible, accessible, and seemingly such transparent markers of popularity and engagement, higher numbers are widely taken to imply more legitimacy, popularity, visibility, and influence (De Micheli and Stroppa, 2013) and thus more economic potential. All social media users can be ranked in this regard, not just media figures. Companies like Klout have commodified single numerical “influence” metrics of every publicly accessible social media account holder.

In music, promoters may use those numbers to decide whether to book an act, labels may use them to consider whom to sign, or television scouts may use them in deciding what music to use in soundtracks. McKeown explained how these new metrics supplanted or complemented Soundscan:

All of the sudden here’s this new number that can be used. So for a while it was MySpace views or number of friends on MySpace. And then it turned into Facebook fans and Twitter followers and I have heard in the music industry ‘this is someone good to tour with because they have X number of followers’ or ‘we’re interested in signing you because you’ve got X number of Facebook fans.’ And in some ways it’s replaced Soundscan.

Metrics may also be used to measure the impact of particular messages or who sends them. The manager Emily White compared responses to posts from the management team and from the musician (in this case the American singer–songwriter Brendan Benson):

We can see the impressions on the Facebook page, so when its Team BB posting tour dates, things like that, people are psyched and they know we’re available. But when it’s Brendan, the impressions just skyrocket and so do the comments and we just see the traffic really go through the roof when it’s directly from the artists. So that’s who people want to hear from and I think that’s a hard thing for some artists to wrap their head around.

Her final remark evokes a gap between industry and artist constructions of audiences as people to be addressed through social media or though art.

5.1.2. Building size and engagement

Higher numbers are taken to reflect bigger audiences and more engagement. They also stimulate more affective investment and engagement from others, what Gerlitz and Helmond (2013) call “the Like economy.” Focusing on Facebook, they argue that like buttons transform user affect into a metric that then intensifies that affect. “Within the Like economy,” they write, ”data and numbers have performative and productive capacities” [6]. To mobilize the productive capacities of data, users need skills to understand how sites and their algorithms work well enough to manipulate them, even if this is not consciously articulated or acknowledged as a skill. Sindre Solem, who fronts two Norwegian thrash death metal bands (Obliteration and Nekromantheon), spoke of the importance of staying active on Facebook to build an audience:

Every time I post something, you can see that you're getting more likes every time, because it reaches out to more and more people, because people who didn’t like you before can see if their friends have pressed Like or commented on what you posted. And so it’s a slow but safe way of building your reputation and showing that you’re still alive. And perhaps it gets people to check out your music.

One musician, who asked to remain anonymous on this point, seeks out and appropriates audience affect to build buzz. “I also make it a point to follow anyone who is discussing us on Twitter,” he told me, “because not only I think does it make them feel special, but it generally will generate additional tweets from them, like ‘Oh my God, he followed me,’ which is great, because the more volume of discussion you have about you on there, it kind of engages that medium the best.”

5.2. The fallibility of social media metrics

Visible social media metrics have thus become critical emblems of status and generators of future status, and are used by industry and artist actors to guide investment and to build audience size and buzz. Yet, such metrics may not measure what they seem. In this section, I discuss social media metrics’ algorithmic and affective skew, their partiality, the deceptive practices that may generate them, and the inherent ambiguity that arises from decontextualizing a moment of clicking from a stream of activity and turning it into a stand–alone data point.

5.2.1. Skew

Social media metrics are skewed by algorithms that foreground some messages and users over others through recommendations or automated feed editing. Wang, et al. (2013), for instance, showed that Facebook posts that receive more views due to algorithmic ranking receive more comments. This pattern is exacerbated by the fact that the more a post or user receives comments, likes or follows, the more likely algorithms are to make them visible to other users. To some extent, counts reflect opaque algorithmic decision making as much as they reflect expressions of interest.

Another skew in social media counts is that they are designed to reflect only positive affect, another element of the politics of platforms. The Like economy is all about approval. Although message content can always be negative, and some sites like Reddit and YouTube do allow for downvotes, usually there are few infrastructural ways to display negative affect (Gerlitz and Helmond, 2013). This makes the sites more appealing to advertisers and brands (Gerlitz and Helmond, 2013) and perhaps artists, but it means that counts cannot be countered by unfollows, dislikes, and so on. The picture is either of approval or obscurity.

5.2.2. Non–representativeness

Social media metrics — and not just the counts, but also the messages mined for sentiment and reaction — are also partial, non–representative samples of audiences and their engagement (Napoli, 2011). Who does and does not contribute to social media conversations depends on critical factors such as wealth, location, age, and education. Even Facebook, the most widely used social media site in the world, omits the vast majority of the global population, including significant segments of Western populations who nonetheless discuss and listen to music, television, books, and otherwise engage in media audience practices. Twitter may boast more than a billion accounts but it is used by a small sliver of the population. Even engaged audiences who do use these platforms may not pay much attention to them, and when they do, they may choose not to comment, click, or follow. Although people often assume that what audiences do online converts to what audiences do off–line, this may not be true, and if it is, the conversion mechanisms and rates are unknown.

Solem, for instance, told me that many more people show up for their shows in Bergen or Trondheim than accept invitations for those events on Facebook. “There’s still people that are not on Facebook, or at least not using it that actively,” he noted, “I go to shows I’ve never RSVPed.” McKeown also noted the disconnect between social media audience activity and live audiences: “I know people who have really lively online fan bases, many Facebook responses, lots of Twitter followers, who draw the same amount of people that I draw in my rooms.” She has come to think of her social media and live performance careers as separate: “You kind of have your online career where it’s like ‘how do you communicate with those fans and what do you do for them and how do you cultivate that interaction?’ And then there’s also like ‘do you give a good live show and when are you coming to this city?’‘

5.2.3. Deception

Social media metrics are also distorted by deceptive practices including bots and purchased engagement. De Micheli and Stroppa (2013) have revealed a lucrative market for selling Twitter followers and Facebook likes. More than two dozen services sell fake Twitter accounts. De Micheli and Stroppa (2013) estimate that four percent of Twitter’s user base is fake. Musicians are among those who seem to buy followers. For example, Diddy’s account gained nearly 200,000 followers in a single day, and shortly thereafter lost nearly 400,000 in one day (Perlroth, 2013). Another researcher working along similar lines argues that as many as 45.99 percent of some brand’s Twitters followers are bots, although this varies tremendously from brand to brand (Calzolari, 2012).

Wired writer Mat Honan describes his experiment in buying page views for a story:

Need to boost pageviews? You could rent a botnet for $2 an hour and point many thousands of visitors to your story. But counting pageviews is old–school — “engagement” is where it’s at today. So I went to Fiverr, a service that lets you pay people $5 for all sorts of tasks. First, I paid someone to get 6,000 people to spend at least 30 seconds viewing my story. To juice social media I paid $5 for 2,000 shares on Facebook. I also put down $5 for 500 people to tweet my story and another $5 for 500 retweets of my own tweet. Money can’t buy me love? Nonsense. But what really counts is influence. I need a hashtag with Klout. So I went to BuySellAds to pay someone influential to tweet it for me. I couldn’t afford the $4,600 Paris Hilton would charge, but I could swing $29 for Tiago Castro to tweet to his 100,000 followers.” (Honan, 2013)

5.2.4. Assessing the credibility of social media metrics

Social media counts are important — important enough to buy — but only when they are credible. Assessing credibility requires continually updating expertise as people figure out new ways to game metric systems. I asked Roger O’Donnell, a solo artist who spent a decade playing keyboards for The Cure, whether he uses MySpace anymore. “I think MySpace is a bit of a wasteland now,” he replied, “it began to be irrelevant when you could fake your fan numbers.” He went on to say, “the good thing about Facebook artist pages is you can’t do it, and if you can, I don’t know about it.” In the case of Soundscan, as McKeown described it, everyone knew the numbers were wrong and everyone shared a sense of how they might be wrong. In contrast, there are no such shared understandings of the fallibility of social media metrics. De Micheli and Stroppa (2013) conclude that given how frequently social media metrics are gamed, “there is no way to effectively measure a company’s presence on social networks.”

5.3. Ambiguous meaning

Even if we set aside the possibility of social media metrics being skewed by algorithms that foreground some content over other content, incomplete emphasis on the positive, non–representative partiality, and deception, they are still problematic in that their meanings — seemingly so obvious — are inherently ambiguous. The anthropologists Goodwin and Duranti (1992) have written on the problem that once plucked from the flow of action in order to become data, events seem like self–contained units rather than embedded in local situated contexts of production (see also Markham and Boellstorff, this issue). As Ang [7] would argue, the gap between the “decontextualized, measurable” audience that social media metrics instantiate and actual people behaving in contexts that do not lend themselves to quantification may enable a sense of prediction and control, but does not foster understanding. Dean [8] argues that the logic of social media is to circulate messages in ways that serve capitalism before meaning. “Uncoupled from contexts of action and application,” she writes, “the message is simply part of a circulating data stream. Its particular content is irrelevant. Who sent it is irrelevant. Who receives it is irrelevant. That it needs be responded to is irrelevant. The only thing that is relevant is circulation, the addition to the pool.”

What then are some of the contexts in which these acts of sharing, clicking like, following, signing up for an e–mail list and so on might have been situated before being decontextualized and recontextualized as an addition to the pool? Likes signify a range of affect such as excitement, agreement, compassion, understanding, irony and parody (Gerlitz and Helmond, 2013). Likes can also be offered in exchange for content. Cohen (2013) reports on a survey by Syncapse which found that people ‘liked’ Facebook brand pages for many reasons, including liking the brands and wanting to support them, but also getting a discount, participating in contests, seeing that friends liked them, researching brands and services, and following a recommendation. These “fans” may not last, “they may start following a brand for a specific contest or sweepstakes, then drop off when the campaign ends” (Perlroth, 2013).

Zoë Keating’s Twitter following offers a perfect example of how erasing the context of following can lead to metrics that do not measure what they seem. Though her music is niche, she has more than a million followers. This is not because that many people listen to or would pay for her music. It is because for some time she was among the accounts Twitter recommended to new users. After signing up for an account, people could click once to follow all such recommendations, seeding their following list with pre–selected people based on criteria even those on the list, including Keating, did not know. “I know that all of those people are not really following me,” Keating said, demonstrating her expertise in when and how to discount data, “I’m certainly under no illusions that they are all my fans.”

It is common for musicians, management and labels to take advantage of this willingness to exchange likes or personal information like an e–mail address for content or participation. Indeed it is widely considered a best practice. White gave the example of remix contests for their artists on the Indaba platform. “The fans love it,” she said, and “it’s just a total blast for us, and it’s also data collection for us. We’re collecting e–mail addresses and people are opting in to be on the e–mail lists, so that is just a huge win for everybody involved.” Like many of these practices, exchanging access for personal information predates social media. Kelly described how Marillion increased the number of people in their database from 6,000 to 23,000 by incorporating a message into album artwork “which basically just said, ‘If you want a free bonus disc to go with this album with some extra tracks, just write to us and we’ll send it to you.’” Later, after their music became available online, they used a similar strategy:

We made the whole album available through the Internet for free prior to release using a company called Music Glue. And the reason we did that is because we knew it was going to be available anyway and we were trying to collect more data. And we managed to pick up another 8,000 e–mail addresses doing that.

Ultimately, it is not clear what visible social media metrics might mean or, more accurately, what range of meanings they collapse. Individuals, such as Keating, may have developed a sense of what data to attend to and what data to discount, but generally there are no shared sensibilities around how to interpret data. Schutt, for instance, said with some exasperation, “I don’t even know if I have an audience. [...] I can tell on Constant Contact and all that how many people open my e–mails, but what does that really mean, you know?” “The raw numbers of it are always, always moot,” the English solo bass player Steve Lawson told me:

I’ve had people with millions of followers on Twitter recommend me, and it result in 70 clicks through to my Web site. And I’ve had people with a couple of thousand followers recommend me on Twitter and it result in 300 clicks. [...] I could stand on a couple of buildings shouting at the entire town and if they ignore me it doesn’t matter, but I can, you know, sit on a park bench and talk to three people and if they’re listening then it can change things. Attention is the currency, not the number of people you’re throwing the information towards.



6. Value systems

To summarize, audience metrics are as important as ever in social media. They serve as proxies for size, engagement and affect, but can distort what they seem to measure. Constructions of audiences and approaches to data are guided by goals shaped by value systems. Industry actors, including musicians and those who represent them, are motivated to understand Internet audience data for economic reasons. They want and need to predict and create taste and demand so they have guideposts for allotting their limited resources. As markers of profitability have become less tied to production and more tied to reputation, it has also become important to measure audiences in order to assess social values such as legitimacy, credibility, likeability and other kinds of status that can presumably (if mysteriously) be converted into economic capital.

I have also suggested that for these musicians, measurement may come as a by–product of or an antecedent to efforts to communicate better with audiences. This speaks to the relational and personal values that influence creators’ information systems. In this final section, I turn to how meaningful moments from the communicative flow of data may be selected instead of or alongside quantified data. The values guiding these approaches to data are interpersonal connection and the affirmation and validation of knowing that your work affects others.

Music writing can be a solitary and lonesome endeavor far removed from potential audiences. Several musicians mentioned how encouraging and inspiring it was to receive likes when they were working on new music. It is not just that they received likes that mattered, but when they received them. The Norwegian rock star Sivert Høyem said:

I like knowing that there’s a lot of people out there who are interested and seeing what their reactions are whenever I’m posting information about a new gig or a new tour or new music. I really like that.

White told me of how a member of Urge Overkill, whom she represents, responded to such responses:

Eddie would say, “Yeah, you know, I’ll be at home writing songs and does anybody care.” And then he’s like, “But then I post on Twitter and Facebook and all these people respond immediately. And I’m like, ‘Wow, people really care.’” You know, so I think it can be really wonderful instant gratification especially for a songwriter who is at home. It’s great for us as business people and as marketers; it’s so cool to post something about the artist and literally see instantly people talking about it and spreading it, but I can only imagine what that feels like for the person who’s creating the art.

From this perspective, metrics shape people’s sense of self–worth and professional value (Arvidsson, 2011). Their meaning is not directed outwards to stimulate more engagement or to demonstrate that there are enough fans to merit a tour or a recording contract, but inwards to affirm a more social and personal worth. The meaning and value of these metrics are premised not just on quantity, but what triggers the responses and when. Five social media mentions after a sold out show may be disappointing, while five likes within five minutes of posting that you are working on a new song may be affirming.

The data that matters most for assessing social value may not be measurable at all. In talking with musicians, it is clear that the most significant assessments of their worth — even those that come through social media — will never lend themselves to counting. They are the stories that come in posts, e–mail messages, and private messages which, from a metadata perspective, look interchangeable with all the other messages in the pool. It is the communication that matters in these cases, not the metacommunication. Timmins explains that most musicians get into music “because as fans they’d been deeply touched by music in some way or another, and usually by a handful of bands or musicians, and they have their own stories as fans.” When someone tells you

their story and how your music and what you’ve written or sung or played has deeply affected — it’s often extremely private and personal sections of their lives. It’s really amazing. It does validate the whole thing for you. You know, you go through periods where you think “What the hell am I doing this for, and who’s listening?” and then you only need one or two of those, and you go “Okay, well, right there that makes it — that’s worth it right there.”

I asked if some stories were more affirming than others.

Yeah. I mean, you know, there are those who — “This song was our first song at our wedding,” which is very fantastic and beautiful, and then there are those who, you know, “This is the album I listened to with my sister at her death bed,” that sort of thing.

The American rock musician David Lowery of Camper Van Beethoven and Cracker, who tends to provoke political arguments with his Facebook fans, told me this story:

I just remember this one guy who used to always argue and then I just noticed he sends me a message directly and it’s about his mom is actually basically dying and her final request is this one Camper Van Beethoven song, Take Me Down to the Infirmary. [...] I was kind of stunned and flattered that somebody would — basically the song that she wanted to hear on her deathbed and it was just wow, I — it never really occurred to me that our music could penetrate that far into someone’s emotional life or something like that.

These outstanding moments have deep social value but are invisible from a big data or metric perspective. These musicians build information systems that take them into account by treating some kinds of data as more important than other kinds of data. A million followers may in some ways be less valuable than a single post. These forms of value are not accounted for in economically motivated data analyses, and efforts to encapsulate them within the language of economics misses much of what reaching an audience means to a creator. Lawson described an argument he had with a music industry pundit after he received a message from someone who told him his father had just died and all he could do was listen to Lawson’s music:

[He] was trying to tell me that that was economics and I went “no it’s not.” At that point you’re just trying to come up with economics is everything, at which point the term means nothing. When it comes down to a relationship between two people and facilitating that, you know that it’s bigger than that and your attempts to squeeze it into the premises of your book, you know, are deeply unhelpful. So we had a bit of a shouting match which was quite fun. But you know it was kind of beyond anything metric or measurable, it was just an experience, it was like, okay there’s a validation in what I do as pure art and the human experience of that art, and someone sits down to consider the tragedy, you know the happiness that’s life and the sadness, and says ‘okay, what is the soundtrack to this?’

In some ways it is trite to point out that metrics cannot capture the emotional value of art, or that the emotions art invokes are beyond commodification. Yet at the same time, it is art’s power to give voice to such affect that motivates creators to create and audiences to spend money on those creations. The economics of art cannot be understood without grappling with affect. Grappling with affect entails learning how to weigh some moments — moments that may not be visible from big data or metric perspectives — more heavily than the stream as a whole.



7. Conclusion

There are more traces of audience activity out there to be collected and interpreted than ever before, and whatever future business models may work in media industries are likely to incorporate these data in some ways. The information systems built to handle and make sense of these data will be guided by constructions of audiences that are based on value systems which vary across actors. Some will build information systems in order to predict, control and make money. Others will be build them to understand whether and how they are moving others. Many will seek both, using different data strategies for each. Whatever the goals, making good use of these data entails skills at many levels, from constructing information systems in the first place to continuously updating the expertise required to know what data to count, discount, and how.

In a time when data appear to be so self–evident and big data seem to hold such promise of truth, it has never been more essential to remind ourselves what data are not seen, and what cannot be measured. Quantitative analyses of audience, including those based on sentiment mining (with all the ambiguities lost in those techniques), necessarily omit much of what provides real value to both audiences and artists. The challenge is not just academic. These quantitative data analyses affect what receives investment and publicity and what algorithms bring to social media users’ attention. “A key question,” Napoli [9] argues, “is whether the types of media products that succeed under the new success criteria being established in this post-exposure media environment will be the same or different from those that performed well under the old criteria.”

As metrics, especially visible metrics, rise as vectors for assessing worth, we need to remain keenly aware of the inherent multiplicity of meanings they collapse, the contexts in which they are embedded, and, perhaps most importantly, the depth of what they do not reveal. Claims based on analyses of social media data must be closely scrutinized with an eye toward what they omit, how they may be skewed, and how far they over–reach. These data can be extremely revealing when what they represent is fully understood and clever people with sophisticated information systems are in place to collect and make sense of them. The point is not to dismiss carefully done analyses of social media data, or as Ang [10] wrote, “to reject their value out of hand, but more positively to examine exactly what is known through such totalizing inquiries into the [...] audience, to query the discursive horizon they construct, as well as what vanishes beyond that horizon.” We are more than our Klout scores. Now, more than ever, we need qualitative sensibilities and methods to help us see what numbers cannot. End of article


About the author

Nancy K. Baym is a Principal Researcher at Microsoft Research and a Visiting Professor in Comparative Media Studies/Writing at the Massachusetts Institute of Technology. Her recent publications include the books Internet inquiry: Conversations about method (co–edited with Annette Markham, 2010, Sage) and Personal connections in the digital age (2010, Polity).
E–mail: baym [at] microsoft [dot] com



1. Ang, 1991, p. 52.

2. Napoli, 2011, p. 51.

3. Gitlin, 1983, p. 53.

4. Ang, 1991, p. 86.

5. Bermejo, 2007, p. 97.

6. Gerlitz and Helmond, 2013, p. 13.

7. Ang, 1991, p. 165.

8. Dean, 2005, p. 58.

9. Napoli, 2011, p. 18.

10. Ang, 1991, p. 156.



J.A. Anderson, 1996. “The pragmatics of audience in research and theory.” In: James Hay, Lawrence Grossberg, and Ellen Wartella (editors). The audience and its landscape. Boulder, Colo.: Westview Press, pp. 75–96.

Ien Ang, 1991. Desperately seeking the audience. London: Routledge.

Adam Arvidsson, 2011. “General sentiment — How value and affect converge in the information economy,” SSRN (19 April), at, accessed 22 July 2013.

Sarah Banet–Weiser, 2012. Authentic™: Politics and ambivalence in a brand culture. New York: New York University Press.

Geoffrey Baym and Chirag Shah, 2011. “Circulating struggle: The on–line flow of environmental advocacy clips from The Daily Show and The Colbert Report,” Information, Communication & Society, volume 14, number 7, pp. 1,017–1,038.
doi:, accessed 20 September 2013.

Nancy K. Baym, 2007. “The new shape of online community: The example of Swedish independent music fandom,” First Monday, volume 12, number 8, at, accessed 22 July 2013.

Nancy K. Baym, 2000. Tune in, log on: Soaps,fandom, and online community. Thousand Oaks, Calif.: Sage.

Nancy K. Baym, 1993. “Interpreting soap operas and creating community: Inside a computer–mediated fan culture,” Journal of Folklore Research, volume 30, numbers 2–3, pp. 143–176.

James R. Beniger, 1986. The control revolution: Technological and economic origins of the information society. Cambridge, Mass.: Harvard University Press.

Fernando Bermejo, 2007. The Internet audience: Constitution & measurement. New York: Peter Lang.

danah boyd and Kate Crawford, 2012. “Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon,” Information, Communication & Society, volume 15, number 5, pp. 662–679.
doi:, accessed 18 September 2013.

Karen Buzzard, 2012. Tracking the audience: The ratings industry from analog to digital. New York: Routledge.

Marco Camisani Calzolari, 2012. “Analysis of Twitter followers of leading international companies” (8 June), at, accessed 22 July 2013.

David Cohen, 2013. “Study: Facebook users like brands’ pages because they actually like the brands,” AllFacebook (26 June), at, accessed 3 July 2013.

Carlo De Micheli and Andrea Stroppa, 2013. “Twitter and the underground market,” 11th Nexa Lunch Seminar (Turin, 22 May), at, accessed 22 July 2013.

Jodi Dean, 2005. “Communicative capitalism: Circulation and the foreclosure of politics,” Cultural Politics, volume 1, number 1, pp. 51–74.
doi:, accessed 18 September 2013.

The Echo Nest, 2013a. “Power of our platform,” at, accessed 10 September 2013.

The Echo Nest, 2013b. “White paper: How music services can acquire, engage, and monetize high–value isteners” (12 August), at, accessed 10 September 2013.

Carolin Gerlitz and Anne Helmond, 2013. “The Like economy: Social buttons and the data-intensive Web,” New Media & Society (4 February), at, accessed 24 September 2013;
doi:, accessed 24 September 2013.

Tarleton Gillespie, 2010. “The politics of ‘platforms,’” New Media & Society, volume 12, number 3, pp. 347–364.
doi:, accessed 24 September 2013.

Todd Gitlin, 1983. Inside prime time. New York: Pantheon.

Erving Goffman, 1981. Forms of talk. Philadelphia: University of Pennsylvania Press.

Charles Goodwin and A. Duranti, 1992. “Rethinking context: An introduction,” In: Charles Goodwin and Alessandro Duranti (editors). Rethinking context: Language as an interactive phenomenon. Cambridge: Cambridge University Press, pp. 1–42.

Mat Honan, 2013. “How to use social media to juice your story’s popularity,” Wired (16 July), at, accessed 17 July 2003.

Henry Jenkins, 2006. Convergence culture: Where old and new media collide. New York: New York University Press.

Henry Jenkins, 1992. Textual poachers: Television fans & participatory cultures. London: Routledge.

Klaus Bruhn Jensen and Karl Erik Rosengren, 1990. “Five traditions in search of the audience,” European Journal of Communication, volume 5, number 2, pp. 207–238.
doi:, accessed 24 September 2013.

Maurice Johnson, 2011. “A historical analysis: The evolution Of commercial rap music,” unpublished master’s thesis, Florida State University, at, accessed 22 July 2013.

Sonia Livingstone, 2005. “On the relation between audiences and publics,” In: Sonia Livingstone (editor). Audiences and publics: When cultural engagement matters for the public sphere. Bristol, Eng.: Intellect Books, pp. 17–41.

Musicmetric, 2013. “Musicmetric — The world’s largest music trend data asset,” at, accessed 10 September 2013.

Philip M. Napoli, 2011. Audience evolution: New technologies and the transformation of media audiences. New York: Columbia University Press.

Cory Ondrejka, 2009. “Interview,” MIDEM, Cannes, France.

Nicole Perlroth, 2013. “Researchers call out Twitter celebrities with suspicious followings,” New York Times (25 April), at, accessed 22 July 2013.

Yi–Chia Wang, Moira Burke, and Robert E. Kraut, 2013. “Gender, topic, and audience response: An analysis of user–generated content on Facebook,” CHI ’13: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 31–34.


Editorial history

Received 16 September 2013; accepted 17 September 2013.

Creative Commons License
“Data not seen: The uses and shortcomings of social media metrics” by Nancy K. Baym is licensed under a Creative Commons Attribution–NonCommercial–ShareAlike 3.0 Unported License.

Data not seen: The uses and shortcomings of social media metrics
by Nancy K. Baym.
First Monday, Volume 18, Number 10 - 7 October 2013