This paper examines what would remain on Facebook if news content was removed, like the company temporarily did in Australia, in February 2021. Using a corpus of 3.3 million Facebook posts published in French in 2020 in four countries (Belgium, Canada, France and Switzerland), it compares media content to non-media content by submitting the text of the posts to three computational analyses: basic n-gram comparison, χ2 residuals and topic modeling. Two distinct spheres are defined within Facebook content: a “public interest” sphere, made up of media pages, and a “public’s interest” sphere, made up of non-media pages. Religious content and “inspirational” “feel good memes” were found to be most characteristic of a newsless Facebook.Contents
Introduction: The Australian standoff
Methods: Using four French-language countries as an example
Findings
Discussion and conclusion
Introduction: The Australian standoff
It was probably one of the biggest gambits in Internet history. On 17 February 2021, Facebook removed all Australian news content from its platform. Facebook users all over the world could no longer access articles from Australian media organizations, nor could they share URLs from those publications on their personal profiles, or on groups or pages they participate in. Even The Conversation, an international Web site used by academics to publish research findings, was barred from the platform because it is headquartered in Melbourne. The ban also worked the other way around as Facebook’s Australian users could no longer see or share any news, wherever it originated from in the world.
Facebook was reacting to Canberra’s News Media and Digital Platforms Bargaining Code. The legislation forces Google and Facebook to negociate deals with Australian media companies to compensate for news content in search results or users’ posts. While it aims to help finance Australian journalism, the Code had been denounced by both Web giants as “unworkable” [1].
In January, 2021, Google threatened, “if the Code were to become law in its current form”, to “have no real choice but to stop making Google Search available in Australia” (Silva, 2021). But it refrained from a total blackout after negociating changes in the Code with Australian Treasurer Josh Frydenberg (Leaver, 2021).
Facebook, on the other hand, surprised everyone by nuking the news in Australia. The ban, denounced by Prime Minister Scott Morrison and others, was however short lived. On 22 February, the Palo Alto company announced it too had convinced Australian lawmakers to amend their Code before enacting it. Basically, both platforms agreed to sign deals with Australian media companies before the Code became law.
In the end, everybody claimed they have won. Government and news businesses were happy Google and Facebook diverted some of their revenue to support journalism. Both Web behemoths were happy to escape the more constraining aspects of the Code.
While the conflict had been resolved in Australia, other countries contemplating similar legislation wondered if they would face the same sort of retaliation from platforms. Facebook, in particular, has said it would not hesitate to remove news content in Canada, even if it was as a last resort (Reynolds, 2021; Roy, 2021a).
Which leads one to wonder what would Facebook would look like if news content was permanently removed? What would remain on Facebook users’... “News Feed” if it was devoid of news? This is the research question that this article tries to answer.
Methods: Using four French-language countries as an example
As a journalism prof working in Montreal, the most natural “terrain” to conduct this research project is French-language Canada. But to establish whether my findings are restricted or not to my home area, I needed to examine what a news ban would look like in more than one country. France, Belgium and Switzerland, nations with sizeable French-language mediascapes, were selected. In all four countries, between 38 and 55 percent of the population mentions social media as a source for news. Everywhere, Facebook is the top social media used [2].
Sampling
I used CrowdTangle to download a sample of Facebook posts from all four countries. CrowdTangle is a Facebook-owned tool to discover public content on social media platforms. It is made available to academics through a partnership between Facebook and Social Science One.
One of the ways to access CrowdTangle data is by using a Web-based dashboard. It enables country-specific [3] searches, but for content published on Facebook pages only. For any given search, CrowdTangle can return, as of early 2021, as much as 300,000 posts.
I thus used a CrowdTangle dashboard to download 300,000 posts made on public pages which generated the most interactions in all four countries covered by this research, for each and every month of the year 2020. After removing duplicates, this gave me an initial sample of 13.4 million posts. I then filtered it three times (see Table 1).
First, I only kept posts in French using langId (Lui, 2016), langDetect (Danilak, 2020) and polyglot (Al-Rfou, 2016), three Python language-detection libraries. When two or three agreed on a given language, the post was classified in that language, otherwise it was classified as “unknown”. This reduced my sample to 5.3 million posts, which is not surprising given that French is not spoken by the majority in Canada or Switzerland, and given that even in France many pages publish content in English to appeal to an international audience.
Second, because I wanted to analyse the text of those posts (see Findings), I only kept those which had 100 characters or more. This includes memes, as CrowdTangle was able to extract text from images using OCR. After running this filter, my sample was further decreased to 4.2 million posts.
Third, I wanted to reduce the number of pages in my sample because I had to manually classify media vs. non-media pages. This work was tedious, so I only kept pages which had 100 posts or more in my initial sample. This left me with a final sample of approximately 3.3 million posts.
Table 1: Sample filtering. Country Posts in initial sample French-language posts Posts with 100 characters of text or more Posts in pages with 100 posts or more (final sample) Belgium 3,597,336 1,346,988 1,126,524 901,931 Canada 3,599,699 621,509 528,750 398,071 France 3,599,310 2,779,406 2,031,998 1,550,768 Switzerland 2,648,595 529,032 488,653 420,671 All four 13,444,940 5,276,935 4,175,925 3,271,441
Classification
The next step was to classify media and non-media pages. I certainly could not trust Facebook’s definition. “Consider [...] the absurdly wide scope of the ‘Media/News’ category on Facebook”, writes Columbia’s Emily Bell (2021). She quotes Gordon Crovitz, former Wall Street Journal publisher and cofounder of NewsGuard, a system that rates the credibility of news sources. According to him, platforms have demonstrated their “fundamental failure to understand the core concepts of journalism”. So I needed to craft my own criteria. I used the following. Media pages were:
- General news dailies and their regional or thematic pages (science, opinion, etc.), if any;
- Regional or local general weeklies;
- Newsmagazines;
- Talk radio programs, stations or networks, along with their regional or thematic pages, if any;
- Television news programs, stations or networks, along with their regional or thematic pages, if any;
- National, local or hyperlocal general news Web sites;
- Media or programs specializing in politics, business/economy, culture, science or sports.
I excluded, and thus classified in the non-media group, pages for:
- Individual journalists;
- Media or programs specializing in only one topic (soccer as opposed to all sports, litterature as opposed to all culture or mining as opposed to all business);
- Media or programs specializing in celebrities, lifestyle, cooking or fashion;
- Viral content agregators or pages;
- Corporate headquarters of media;
- Non-news television programs (sitcoms, series, comedy);
- Music radio programs, stations or networks;
- Pseudomedia used by corporations or special interests;
- News parody;
- Reinformation (sites repurposing authentic news with a specific political agenda, as defined by Blanc [2016]);
- Misinformation, disinformation or fake news;
- All other non news pages.
Table 2: Media and non-media subcorpora, by country. Non-media Media Country Posts Pages Posts Percentage of total posts Pages Percentage of total pages Belgium 609,869 1,654 292,062 32.4% 107 6.1% Canada 274,195 705 123,876 31.1% 108 13.3% France 1,178,885 3,089 371,883 24.0% 318 9.5% Switzerland 270,411 598 150,260 35.7% 56 8.6% All four 2,333,360 6,046 938,081 28.7% 589 8.9%
The result of this classification process is presented in Table 2. The percentage of media pages in Canada was greater perhaps because I was more familiar with the Canadian news ecosystem. In other countries, when I was unable to establish the news character of a page, I left it out, probably excluding smaller or local media. However, main news organizations were included. The full list of pages, both media and non-media, can be found in the Technical appendix.
It produced eight subcorpora, a subset of media and non-media Facebook posts for each country. On average, posts from media pages represent 28.7 percent of the total 3.3 million posts in my final sample.
To answer the research question at the heart of this work, I needed to compare, within each country, media and non-media corpora. To do so, I performed three different textual analyses.
Analysis 1: n-grams
Using spaCy (Honnibal and Montani, 2017, version 3.0.6), a natural language processing Python library, and its “fr_core_news_md” model for French, I first ran a preprocessing of the textual elements of the posts (stopword removal, lemmatization, lowercasing, etc.). It was followed by basic n-gram extractions for single lemmas, bigrams and trigrams to see which were most frequent in each subcorpus. The total number of lemmas in all preprocessed subcorpora was more than 106 million.
This first analysis was based on the “bag of words” approach which has been used for decades in text mining. It assumes that “words appear independently [in a document] and their order is immaterial” [4]. The weight of each word is its frequency, “which means terms that appear more frequently are more important and descriptive for the document” [5]. The same applies to bigrams and trigrams.
The approach has however been perfected by different weighing techniques to achieve better performance, according to the task at hand [6]. A corpus made of Facebook posts gives us a novel way of weighing n-grams: the total number of interactions generated by the posts they appear in.
This method also better reflects how Facebook users reacted to a given content. Measuring only frequencies could lead to make certain terms misleadingly salient. For example, an election campaign team or a marketing firm could publish hundreds of posts on Facebook, making occurrences of a given politician or a given product much more common. In the context of Facebook, a “space that celebrates the primacy of the emotional, the impulsive, over argumentation” (Cepernich, 2016), a market of emotions where the engagement of users is what gives meaning to content (and value to Facebook), it appears methodologically justified to use interactions rather than simple frequencies.
Analysis 2: χ2 residuals
There are other ways of comparing two corpora that go beyond simply counting n-grams. For the last 20 years, computational linguists have been tackling the issue. In this quest, Adam Kilgarriff’s work stands out as he has considered many statistical methods. He concluded that “χ2 [chi-squared] is [...] a suitable measure for comparing corpora, and is shown to be the best measure of those tested” [7]. The χ2 test tells us whether the observed frequency of a given n-gram in one corpus is greater than its expected frequency.
Oakes and Farrow pushed χ2 further by calculating the standardized residual [8], using this formula:
Dividing the difference between observed and expected frequencies by the square root of the expected frequency, this formula produces a positive or a negative value for each n-gram, by corpus. The greater the positive value, the more characteristic this n-gram is to that corpus. A great negative value means the opposite and a value close to zero means the word is equally present in both corpora.
Table 3 gives an example of how to calculate standardized residuals for three words taken from the media and non-media Facebook posts in Canada : travail (work), pandémie (pandemic) and recette (recipe).
Table 3: Standardized residuals calculations for three words from the Canadian subcorpus. Observed Expected Residuals Word Media Non-media Sum Media Non-media Media Non-media travail 4,414 12,037 16,451 4,473.40 11,977.60 -0.89 0.54 pandémie 14,476 9,887 24,363 6,624.85 17,738.15 96.46 -58.95 recette 1,373 20,995 22,368 6,082.36 16,285.64 -60.38 36.9 ... ... ... ... sum of all words 3,904,931 10,455,525 14,360,456
Table 3 illustrates how the expected frequency was computed: the word pandémie, for example, appears 24,363 times in both subcorpora. Given that 27.3 percent of all words are in the media subcorpus (3.9 million/14.4 million), the expected frequency of pandémie in this subcorpus is 27.3 percent of 24,363, or 6,624.85.
Table 3 also shows that the standardized residuals indicated that the word pandémie was more distinctive of the media corpus, that the word recette rather defined the non-media corpus and that the word travail, with values very close to zero in both subcorpora, was characteristic of neither.
Standardized residuals are a “statistical tool for exploratory research that allows for the identification of words that deserve deeper analysis, and not [...] an instrument of confirmatory analysis”, warned Bestgen (2014). Poudat and Landragin also pointed out that the χ2 test is based on the normal distribution. Insofar as most words are infrequent in most corpora, the χ2 test, and its residuals, are thus an “unreasonable” way to analyse text “unless you have an important volume of data” [9]. Given that a corpus containing more than one million words can be considered a “large corpus” [10], that the smallest of my subcorpora contains 3.6 million terms (trigrams from Canadian media pages), I will consider reasonable to use χ2 residuals to compare media and non-media corpora as part of the second textual analysis in the Findings section, keeping in mind that they are only “indicator of the potential interest of each of the numerous vocabulary differences between the corpora” (Bestgen, 2014).
I also chose to weigh the χ2 residuals by the number of interactions. Instead of calculating them using the frequency of each term (lemma, bigram or trigram), I used the sum of the interactions of the posts in which they were found. As discussed earlier, this gives a measure that was more adapted to the corpus used (Facebook posts).
Analysis 3: Topic modeling
Finally, in order to further explore the differences between both supcorpora, topic modeling was the third textual analysis (Lin, 1995; Ramage, et al., 2009). This technique clusters words which often appear together in documents, therefore revealing that they are probably somewhat semantically related. The models used also score each word to weigh a given topic relative to others. In a way, topic modeling lets the corpus speak by itself, even though the model doesn’t label topics. It is our job, as researchers, to name them.
I used the BERTopic algorithm (Grootendorst, 2021), based on the recently released Bidirectional Encoder Representations from Transformers (BERT) model (Devlin, et al., 2019). BERTopic provides two sets of embedding models: one for the English language and the other for multilingual documents. None are specifically adapted to French-language documents.
It was however possible to ask BERTopic to use other models for word embeddings. I used three: the aforementioned “fr_core_news_md” model by spaCy, which included a French regional newspaper (L’Est républicain) in its training corpus; the “flaubert_large_cased” model from the FlauBERT project (Le, et al., 2020), which includes some Belgian newspapers in its training corpus (Tiedemann, 2012); and the “camembert-base” model from the CamemBERT project (Martin, et al., 2020), trained on the French subcorpus of the OSCAR corpus, which did not seem to include any news content.
I ran BERTopic four times on each of all eight subcorpora with different parameters. The first run was using spaCy’s model and asking BERTopic to provide 20 topics with 20 bigrams each. The remaining three runs (once with each model) asked BERTopic to provide 12 topics with eight single words or bigrams. The use of three different models (spaCy, FlauBERT and CamemBERT) led me to believe the resulting analysis would be all the more robust.
This analysis was performed on the “raw” text of Facebook posts without lemmatization or removal of stopwords. Also, I did not weigh by interactions the terms used by the algorithm. However, instead of running the algorithm on single words, which was a limitation in most topic modeling approaches [11], I used it with bigrams on my first run with the spaCy model. In the following three runs, I let BERTopic use unigrams or bigrams. Joining names (such as “emmanuel macron”) or entities (like “états unis” or “real madrid”) often provided richer topics, facilitating their interpretation.
Topic modeling is extremely memory intensive for computers. With subcorpora as large as 37 million bigrams, in the case of non-media pages in France, I had to divide each by month, otherwise my code crashed. This strategy, though it meant each run took several hours, enriched the analysis, in the end, because it enabled me to explore one month at a time and tell the story of how topics in both media and non-media “spheres” within Facebook evolved over the year 2020.
The media loves Facebook
The first surprise is the sheer amount of content Francophone news media published on Facebook. Table 4 shows the top 50 pages that published the most posts in 2020 in the four countries covered by this study; 42 of them were media pages!
Table 4: Top-50 French-language pages by number of posts in Belgium, Canada, France and Switzerland (2020); media pages shown with grey background. Page Facebook ID Country Number of posts Type RTL Info RTLInfo Belgium 20,318 media Le Soir lesoirbe Belgium 17,656 media BFMTV BFMTV France 16,384 media Le Parisien leparisien France 16,290 media DH.be dhnet Belgium 15,049 media La Libre lalibre.be Belgium 14,673 media Le Nouvelliste nouvelliste Switzerland 14,466 media ArcInfo arcinfofanpage Switzerland 13,554 media La Parole Vivante laparolevivante Canada 13,512 religion RTBF Info rtbfinfo Belgium 13,338 media Le Vif Leviflexpress Belgium 12,834 media lavenir.net lavenir.net Belgium 12,292 media Sudinfo.be sudpresse Belgium 11,718 media Le Journal de Montréal jdemontreal Canada 11,660 media LN24 LN24LesNews24 Belgium 11,163 media TVA Nouvelles TVAnouvelles Canada 11,041 media 7sur7.be 7sur7.be Belgium 10,083 media La Côte LaCoteJournal Switzerland 9,998 media La Provence laprovence France 9,726 media 24heures 24heures.ch Switzerland 9,615 media Le Figaro lefigaro France 9,582 media Foot Mercato footmercato France 9,464 football 20 Minutes 20minutes France 9,160 media Franceinfo franceinfo France 8,762 media Radio-Canada Information radiocanada.info Canada 8,740 media Epoch Times Paris EpochTimesParis France 8,537 misinformation LeMatin.ch lematin.ch Switzerland 8,277 media Metro Belgique metrobelgique Belgium 8,256 media La Presse LaPresseFB Canada 8,252 media RTBF Sport RTBFSport Belgium 8,167 media Tribune de Genève tdg.ch Switzerland 8,145 media Ouest France ouestfrance France 8,046 media Le Monde lemonde.fr France 8,011 media Heidi.news Heidi.news Switzerland 7,916 media beIN SPORTS France beINSPORTSFrance France 7,789 media CNEWS CNEWSofficiel France 7,788 media FRANCE 24 FRANCE24 France 7,689 media Walfoot Walfoot Belgium 7,583 football L’EQUIPE lequipe.fr France 7,453 media RT France RTFrance France 7,417 misinformation Le Journal de Québec JdeQuebec Canada 7,338 media L’Express LExpress France 7,245 media Nordpresse nordpressed Belgium 6,690 parody RTSinfo RTSinfo Switzerland 6,590 media TF1 Le JT TF1leJT France 6,345 media RFI RFI France 6,293 media France Bleu reseau.francebleu France 6,139 media Lions de l’Atlas lionsdelatlas.net Belgium 6,099 football Horoscope du jour horoscopedujour.ch Switzerland 5,803 astrology 20 minutes online 20minutesonline Switzerland 5,792 media
This might be explained by the fact media organizations have complete teams dedicated to repurposing content for Facebook and other platforms (Bell, et al., 2017). But so do record companies, viral content producers, government departments and other organizations keen on maintaining a regular presence on Facebook. It seems fair to say that media are the single most active type of organization in the French-speaking areas of Facebook.
Nechushtai [12] has argued journalism organizations have been “infrastructurally captured” by Facebook. Not only do they depend on the Palo Alto company for reaching their readers, but they are driven to adapt “news production more broadly [...], to comply with the logics, norms, or business strategies of external platforms” [13]. In the United States, more recent studies report media organizations have been distancing themselves more from digital platforms. Tech giants’ increasing power, the greater scrutiny on their actions and the growing calls for regulation have translated into caution, even distrust, towards platforms. “Publishers are attempting to regain control over the future of their business” [14]. This movement was however not at all apparent in my subcorpora of French-language media posts. The capture of Francophone journalism by Facebook seems to be still strong and, dare I say, borders on rapture.
First glance at top non-media pages
Let’s take a look at what type of pages would remain if media pages were gone. For each country, I sorted non-media pages by number of posts, by number of interactions and by average interactions by post. All twelve tables would be too long to reproduce here (Tables 5 to 8, with the top-20 pages by average interactions, by country, are presented below). But what stands out of this exercise is that comedy and humor pages, artists and fan pages (for musicians most of the time) and what I’d call “feel good meme pages” are the most common.
That is not surprising, given that this content is the bread and butter of viral pages. The fact they appeared on top of the tables, sorted by interactions, was expected. More thorough textual analyses are needed to dig deeper into these corpora. Before doing that, though, I want to discuss three elements worthy of notice in top non-media pages.
Table 5: Top-20 non-media French-language pages in Belgium (with more than one post per day), by average interactions per post. Page Facebook ID Number of posts Interactions Interactions/post Type Merveille du monde merveilledumondejaninevero 1,565 21,344,119 13,638.4 Feel good memes Le Grand Cactus — RTBF LeGrandCactus 371 4,232,858 11,409.3 TV show The Voice Belgique — RTBF thevoicebelgique 370 4,179,007 11,294.6 TV show Brigade des nurses lesinfirmieres 579 3,630,483 6,270.3 Comedy Jérôme de Warzée jeromedewarzee 406 2,481,480 6,112.0 Artist Guillaume Corpard GuillaumeCorpard.officiel 514 2,556,520 4,973.8 Artist Gregory Lemarchal gregorylemarchal112 519 2,461,591 4,742.9 Fan page Pairi Daiza JardinDesMondes 1,032 4,637,333 4,493.5 Animals Tarmac TarmacRTBF 836 3,624,764 4,335.8 TV show Bisoutendresse bisoutendresse 2,045 8,666,761 4,238.0 Feel good memes Météo-Mons Meteo.mons 720 2,939,261 4,082.3 Weather Samuel Movie ActeurSamuelMovie 1,128 4,527,459 4,013.7 Artist Pape François PapeFrancoisVatican 711 2,808,529 3,950.1 Religion David Antoine DavidAntoine 906 3,249,951 3,587.1 Artist Woman and material & Aimee Virgile Makougoum Artisteaucameroun 475 1,356,705 2,856.2 Artist Permavenir Permavenir 683 1,655,625 2,424.0 NGO PTB ptbbelgique 904 2,138,859 2,366.0 Political party Fuck Love IHateYouAndOnlyU 391 887,657 2,270.2 Comedy Djanii Alfa Djanii.X 447 963,807 2,156.2 Artist Cristiano Ronaldo — France Ronaldo7France 987 1,942,312 1,967.9 Fan page
Table 6: Top-20 non-media French-language pages in Canada (with more than one post per day), by average interactions per post. Page Facebook ID Number of posts Interactions Interactions/post Type Justin Trudeau JustinPJTrudeau 480 6,973,554 14,528.2 Political figure François Legault FrancoisLegaultPremierMinistre 647 5,366,866 8,295.0 Political figure African Heroes africanheroesmagazine 1,171 7,013,702 5,989.5 Special interest Indochine officiel Indochineofficiel 402 1,597,326 3,973.4 Artist FFL — Fédération Française de la Lose FFLose 916 3,083,841 3,366.6 Comedy Éric Duhaime eduhaime 385 1,263,369 3,281.5 Political figure La parfaite maman cinglante laparfaitemamancinglante 1,153 3,604,682 3,126.4 Special interest La solution est en vous lasolutionestenvous 970 2,935,008 3,025.8 Feel good memes Ministère Paul Mukendi ministerepaulmukendi 418 1,223,633 2,927.4 Religion L’Anti-Média LAntiMedia 1,017 2,639,922 2,595.8 Misinformation Occupation Double occupationdouble 621 1,088,219 1,752.4 TV show Gabriel Nadeau-Dubois GNadeauDubois 421 720,253 1,710.8 Political figure Queen Fumi queenfumiofficiel 419 712,755 1,701.1 Artist EMCI TV emcitv 1,316 2,207,163 1,677.2 Religion Québec Niaiseries quebecniaiseries1 598 951,948 1,591.9 Comedy La Parole Vivante laparolevivante 13,512 21,379,356 1,582.2 Religion Ricardo Cuisine ricardocuisine 494 752,572 1,523.4 Food LES ETHNIES DE LA COTE D’IVOIRE ET D’AFRIQUE ethniesdeCI 1,101 1,643,412 1,492.7 Special interest District 31 ICIDistrict31 868 1,252,896 1,443.4 TV show Le Revoir JournalLeRevoir 900 1,298,617 1,442.9 Parody
Table 7: Top-20 non-media French-language pages in France (with more than one post per day), by average interactions per post. Page Facebook ID Number of posts Interactions Interactions/post Type Le Meilleur du Football LemeilleurduFootball92 3,691 146,312,371 39,640.3 Football Mathieu Rivrin • Photographe de Bretagne Mathieu.Rivrin.photographies 368 7,120,390 19,348.9 Artist Les marseillais W9 LesMarseillaisW9 368 6,434,885 17,486.1 TV show Nostalgies 60’–70’–80’ Nostalgies607080 692 12,095,023 17,478.4 Nostalgia 30 Millions d’Amis (Officiel) 30millionsdamis 663 10,400,103 15,686.4 Animals Marine Le Pen MarineLePen 604 8,944,603 14,808.9 Political figure Demotivateur demotivateur 5,711 80,736,401 14,137.0 Comedy Né pour brûler la gomme Nepourbrulerlagomme 516 7,230,470 14,012.5 Special interest Imineo imineoTV 505 6,939,097 13,740.8 Special interest One Voice onevoiceanimal 799 10,309,383 12,902.9 Animals Madame Connasse Madameconnasse0987654321 2,879 36,341,190 12,622.9 Comedy La vraie démocratie lavraiedemocratie 1,071 13,300,634 12,418.9 Reinformation Corsica ile Magique CorsicaileMagique 669 8,296,473 12,401.3 Tourism L214 Ethique et Animaux l214.animaux 383 4,663,064 12,175.1 Animals Gnration 80’s mageneration80 842 10,143,087 12,046.4 Nostalgia M6 M6 870 10,106,366 11,616.5 TV channel Nostalgie Du Football NostalgieDuFootball 524 6,074,626 11,592.8 Football Le Pépère lepepere 752 8,554,827 11,376.1 Comedy JEAN-MARIE BIGARD PAGE OFFICIELLE levraijeanmariebigard 1,197 13,259,818 11,077.5 Artist Ina.fr Ina.fr 1,408 15,450,584 10,973.4 Institution
Table 8: Top-20 non-media French-language pages in Switzerland (with more than one post per day), by average interactions per post. Page Facebook ID Number of posts Interactions Interactions/post Type Les temps sont durs pour les rêveurs bureaudesrevesbrises 410 2,682,193 6,541.9 Comedy Trust My Science TrustMyScience 2,579 5,641,255 2,187.4 Special interest La Bible Bibles 2,459 2,995,389 1,218.1 Religion Vanessa_beauti vanessabeauti 440 526,606 1,196.8 Business Darius Rochebin dariusrochebin 371 320,604 864.2 Journalist Forum Économique Mondial weffrancais 2,643 1,171,794 443.4 International organization Anthony Desonpere ADeSonPere 560 228,334 407.7 Political figure Hôpitaux Universitaires de Genève hopitaux.universitaires.geneve 456 174,366 382.4 Institution Te rappelles-tu rappelletoi 5,194 1,934,293 372.4 Nostalgia Les souvenirs de Dakar lessouvenirsdusenegal 520 189,883 365.2 Special interest Ambassade de Suisse en France AmbassadeSuisseParis 408 147,891 362.5 Institution Ville de Neuchâtel neuchatelville 724 259,582 358.5 Institution FCBarcelone French FCBarceloneFre 611 216,280 354.0 Football Ville de Genève — Officiel villegeneve.ch 397 122,243 307.9 Institution Animal mon Egal animal.mon.egal 947 260,594 275.2 Animals Objectif Fitness ObjectifFitnessOfficiel 473 124,453 263.1 Feel good memes Armée suisse armee.ch.fr 576 138,559 240.6 Institution Servette FC servettefootballclub 973 185,946 191.1 Football 228 & AfrikaExclu 228exclu 1,243 214,458 172.5 Special interest loisirs.ch www.loisirs.ch 1,174 180,057 153.4 Tourism
First, few misinformation or “reinformation” pages were found. The only exceptions were “Epoch Times Paris”, “RT France” and “La vraie démocratie” (in France), as well as “L’Anti-Média” (in Qubec). Between August and November 2020, Facebook said it has removed “about 3,200 Pages [...] for violating our policy against militarized social movements [and] about 3,000 Pages [...] for violating our policy against QAnon” (Facebook, 2021). It said it has also removed pages for otherwise “coordinated inauthentic behavior”. Many such pages had been publishing content in French in the four countries covered by this study (Belga, 2020; Facebook, 2020a; Gleicher and Agranovitch, 2020; Loiseau, 2020).
Second, apart from clickbait and artists/celebrities pages, religious content appeared to be what generated the most engagement among Francophone Facebook users. For example, “La Parole vivante”, publishing Biblical quotes and memes, was the page generating the most interactions in French Canada, media or non-media categories alike. Throughout 2020, Facebook users had shared, reacted or commented on its content more than 21 million times, or two interactions every three seconds. A page on Pope Francis, created in Belgium, attracted almost 4,000 interactions per post through the year. The fact that Pastor Paul Mukendi had been convicted of sexual assault in 2019, condemned to eight years behind bars in 2020 and was awaiting a second trial, still for sexual assault (Bergeron, 2020), did nor deter his followers, as his page was the ninth most engaging non-media page in French Canada. We’ll see later that religious content is not only common in, but characteristic of, the remainder of the non-media corpora.
Third, political figures are completely absent from the top non-media pages. The only exception is Canada where the official pages of Justin Trudeau and François Legault, respectively Prime ministers of Canada and of Québec, occupied the first two spots in terms of interactions per post. This might be because both politicians have held almost daily publicly transmitted press conferences during the COVID-19 pandemic, something heads of state or government in the other three countries have seldom done.
No news is feel good news on Facebook
After extracting n-grams for all eight subcorpora, I sorted them by the total number of interactions generated by the posts that they were found in. I kept only the top-1,000 in each and then put them all back together in three lists, each made of 8,000 lemmas, bigrams or trigrams which have been associated with content garnering the most attention in French-language Facebook pages in 2020.
What is most relevant, here, are not the lists themselves. They tell us, for example, that “cristiano ronaldo” is, after “timeline photo” and “photo from”, the top bigram in the French non-media corpus, appearing in 7,370 posts having elicited close to 54 million interactions in 2020.
More interesting was to inquire what n-grams present in non-media were absent from media. Bear in mind that we had limited ourselves to the top-1,000, so “absent” only meant absent from that particular list, not from Facebook. But here is precisely why this exercise is relevant. We had kept only the most significant expressions according to Facebook’s own logic, where shares, likes and comments are used to elicit users’ emotions and keep them coming back. We can thus say that we were beginning to draw light on what would be characteristic of a newsless Facebook. Tables 9 and 10 provide examples.
Table 9: Top five lemmas from non-media corpora, absent from media corpora, and present in all four countries. Lemma Posts Interactions dieu 75,193 122,997,101 aime 57,190 111,981,655 heureux 35,653 61,630,964 bonheur 31,164 60,600,496 3 36,531 59,923,266
Table 10: Top five bigrams from the non-media corpora, absent from the media corpora, and present in all four countries. Bigram Posts Interactions bon journée 10,133 22,199,695 bel journée 8,518 20,319,814 bon dimanche 4,753 9,531,666 chaîne youtube 5,492 9,504,084 jésus christ 8,279 9,493,523
In the case of 3-grams, the top of the list was occupied by “bon week end” (have a good weekend). This, and the expressions “bon/bel (lemmatized “belle”) journée” (have a nice day), “bon dimanche” (wishing someone a nice day, but on a Sunday), “aime” (love at the first or third person, singular), “heureux” (happy), “bonheur” (happiness) and “3” (emoji code for a heart, “<3”, stripped from the less-than punctuation character in the preprocessing phase), reflect how much Facebook was built on feel good memes of people wishing their loved ones a good day with teddy bears, sparkling hearts and rainbows on pink backgrounds. Figure 1 gives a typical example.
Figure 1: Feel good meme post by the “Une pomme par jour” (An apple a day) page posted on 8 December 2020. The message reads, “Life is short. Spend it with those who make you happy. Have a nice day.”
Facebook calls these “inspirational posts”. In the Discussion, we’ll see why the Palo Alto company has publicly favored this type of post over news content since 2018.
Two other keywords that stood out in the top non-news content were “dieu” (god) and “jésus christ”. The prevalence of religious content was one of the main surprises of this study, given the secular tradition of France, which also permeates French-language societies in Canada, Belgium and, to a lesser extent, Switzerland (Willaime, 2017). We will discuss it further below, as the other textual analyses performed on our corpora will continue to reveal the contents of a newsless Francophone Facebook.
God is everywhere
Twelve computations of weighed χ2 residuals were executed, one for each of the n-gram types extracted (lemmas, bigrams and trigrams) in every country. The top-50 results for each were then joined in two tables, one for those most characteristic of media pages, the other for those most characteristic of non-media pages. If a result appeared in two countries or more, it was included in Figures 2 and 3.
Figure 2: Most characteristic terms in posts from French-language media pages appearing in two countries (Belgium, Canada, France and Switzerland) or more, by sum of χ2 residual score weighed by interactions (2020). The number right of the bar is the number of countries this term was found in.
Figure 3: Most characteristic terms in posts from French-language non-media pages appearing in two countries (Belgium, Canada, France and Switzerland) or more, by sum of χ2 residual score weighed by interactions (2020). The number right of the bar is the number of countries this term was found in.
First, it might look surprising that the events generating the most attention in French-language media, apart from the COVID-19 pandemic, occurred in the United States. Terms related to the George Floyd murder, and ensuing demonstrations, and to the presidential election were among the most characteristic of media pages in the Francophone areas of Facebook. But that is predictable, given that “global news flows are dominated by Anglo-Saxon media” [15]. The dominance of news from the United States had also been observed in a content analysis of Instagram posts by Francophone news organizations from 2011 to 2020 (Roy, 2021b).
Unsurprisingly, “coronavirus” and “covid-19” were by far the top two terms most characteristic of media pages. This was line with the formidable amount of coverage devoted to the pandemic in 2020 by journalists worldwide. “Coronavirus” was the number one lemma characteristic of media pages in France, Belgium and Switzerland, while “covid-19” was number one in Canada.
This result also meant COVID-19 had not been characteristic of non-media pages. That was a little more unexpected, given the conventional wisdom according to which misinformation about the pandemic had been rampant on Facebook. Figure 3 shows no term related to any conspiracy theory having circulated in 2020 (vaccines, chloroquine, 5G technology, China, Bill Gates, etc.). Some related expressions were found in individual countries’ top-50 terms. The bigram “coronavirus confinement” was the 49th most characteristic of non-media pages in Belgium. In Canada, “presse covid-19” and “anti media” were found in the 38th and 39th positions. But these were the only traces of probable misinformation in content having generated the most interactions in French-speaking Facebook.
What was much more typical of non-media pages was content of little public interest (we will define this notion in the Discussion, below). After “photo”, the highest scoring term appearing in all four countries was “être” (to be). The dominance of that verb, the most common in the French language, was difficult to interpret. If it was as common in media pages, it would not show up in either Figures 2 or 3. Many reasons may explain why it was more characteristic of non-media pages.
One hypothesis: this indicates that news content was more impersonal and related less often to well-being than what could be found in non-media Facebook pages. It may also stem from the fact that Facebook and other social networks answer a need to self-represent (Nadkarni and Hofmann, 2012). In a study with more than 15,000 participants, Bastard, et al. (2017) developed a taxonomy of Facebook users. Only 17 percent were what the authors call “non active users” who preferred to lurk instead of participating in conversations with others or publishing content themselves. “For these users, Facebook isn’t a tool for self-expression or self-representation” [16]. Most people used Facebook to project what they wished to be. Thus, a second hypothesis: non-media content caters to that need.
The words “recette” (recipe) and “concours” (contest) appearing rather high in Figure 3, along with the bigrams “bon chance” (good luck, with “bonne” being lemmatized) and “huile olive” (olive oil), gave another indication of the type of content characterizing non-media pages on Facebook. Recipes and contests were surely of value to the social network’s users, but of little public interest value.
The word “temps”, which can either mean time or weather, had the highest χ2 residual score even though it appeared in the top-50 list in only two countries (France and Belgium). In which sense was it used in non-media pages? Weather information is a staple of news content. Some weather-related terms should therefore appear in the n-gram lists specific to media pages. Yet there were none. This meant that weather-related terms were found in proportionate amounts in non-media pages and that they were not specific of non-news Facebook content. So it was when it means “time” that “temps” is characteristic of non-media pages.
Besides, “prendre temps” (take [your, the] time) was the 34th in the list of non-media bigrams in Canada. This expression was another fixture of feel good memes, along with “joyeux anniversaire” (happy birthday), “prendre soin” (take care) and others mentioned earlier in this section.
Further terms of little value outside of Facebook that seemed to characterize non-media pages are “partager” (to share), “youtube”, “instagram”, and other calls to action. Also, terms that described a given post and that were untranslated, like “photo from” and “timeline photo”, appeared in my sample only because they were automatically generated when an administrator created a post on a Facebook page.
More significantly, “God” is a lemma that appeared in all four countries’ list of terms most specific of non-news content. It had been in posts generating close to 123 million interactions in 2020. Only “coronavirus” generated more in media pages during the year that the COVID-19 pandemic began! “Seigneur” (Lord) was also there, having made the top-50 list in two countries. Not appearing on in Figure 3, “allah” was the 26th most specific lemma in non-media content in Belgium.
The word “dieu” is part of many common French expressions. “Mon dieu!” is as prevalent as “Oh my god” in English. But closer inspection revealed that it was indeed in faith-based pages that the word was used most often and where posts collected the most interactions. Many of those pages have over a quarter million subscribers: Pastor Yvan Castanou’s or “Un miracle chaque jour” (A miracle a day) in France, “Darifton & compagnie”, a Muslim community in Belgium, “La Bible”, a Swiss page carpet bombing the social network with biblical memes, or “EMCI-TV”, a Christian evangelical television network based in Waterloo, Québec, broadcasting in Europe and Africa.
The Internet is a “mission field” [17] for which Facebook is well tailored. “Religious traditions [...] encourage the use of Facebook for proselytizing” [18]. Facebook’s algorithm may thwart these efforts by enclosing its users in filter bubbles (Pariser, 2011), making it difficult to preach to those not already converted. Yet, Brubaker and Haigh found in a survey that they conducted in 2017 that “Facebook use for religious purposes is primarily motivated by the need to minister to others” [19]. More than simply sharing information about themselves, devout Facebook users are motivated to share information about their religion: “They reach out to and uplift others by engaging in faith- based conversations and uploading faith-based messages” [20].
Figure 4 provides one example. It was posted by TopChrétien, a page based in France with more than a million subscribers. Posted in April, this video was a half-hour sermon by Singapore preacher Joseph Prince dubbed in French. It had 60,000 views at the time of data collection.
Figure 4: Repost of an English-language evangelical sermon by a French-language Christian page.
Churches can also use Facebook to expand. Kgatle (2018) documented the use of the social network in the emergence of prophetic churches in southern Africa and found it played a “major role” [21]. Apart from sharing messages, it provides a platform to organize and advertise online events and services, as well as a way for the faithful to attend those services virtually live or later on Watch.
We will see in the next subsection that non-media corpora were more varied, but that religious content was still very much present.
Topic modeling: The complete diet of a newsless Facebook
Topic modeling provided a finer, more granular analysis of content posted in the four countries studied during the year. It generated approximately 72,000 terms, in more than 5,000 topics
Figure 5: Selection of terms appearing in the clearest topics identified per country every month during 2020, in both media and non-media subcorpora; lenght of bar shows frequency, or number of posts included in a given topic.
Figure 5 synthesizes the main interpretable [22] topics by month and country in both media and non-media subcorpora. It paints a broad picture where content found in media pages generally relates to current events, mainly the COVID-19 outbreak, while what is published on non-media pages seems to tout Facebook users with easy to produce and enticing content (recipes, astrology, self-help, among others).
However, the difference between media and non-media was not as clear cut as what was revealed by the χ2 residuals analysis. This may be the main takeaway from the topic modeling phase of this study. The models produced topics common to both supcorpora, most notably sports, which seemed almost as commonplace in media pages as they are in non-media pages.
Topic modeling also demonstrated that the pandemic was not confined to media pages. It was discussed in non-media pages throughout the year, most particularly in March. That being said, it seemed that the non-media pages dealing with the pandemic were mostly government agencies, as terms were related to rules, like “masque obligatoire” (mandatory mask) or “rassemblements” (gatherings), along with the name of an agency, such as “conseil fédéral” (Federal Council, the executive branch of government in Switzerland), or invitations to call a number to be informed on sanitary measures. Sometimes, COVID topics in non-media pages were more personal in tone, inviting Facebook users to “prendre soin” and “soin autres” (to take care of others), providing advice, such as “nettoyer légumes” (wash vegetables), or telling them how to “vivre normalement” (live normally) despite the confinement.
Occasionally, a few other current events topics appeared in the non-media subcorpora. Apart from sports, as mentioned earlier, extreme weather events appeared most frequently. Storms Ciara and Dennis, in northern Europe, along with the fires in Australia — probably because of all the cute koala pictures (“koala” being a term also present in those topics) — topped the list. Three other topics that generated much attention worldwide were included in a few topics in non-media subcorpora: the U.S. presidential election; the Black Lives Matter demonstrations following the murder of George Floyd; and the beheading of teacher Samuel Paty, in a northern suburb of Paris, for having shown cartoons of Muhammad in class.
Were the posts associated with those topics denouncing BLM demonstrations or supporting them? Were they condemning the assassination of Samuel Paty, or condoning it? The terms picked up by the models did not enable us to tell what were the sentiments associated with those posts. Only in Donald Trump’s case was it clearer. The terms “voler élection” (steal election), “doit vivre” (must live) and “monde parallèle” (parallel world) indicate the posts seemed to ridicule the former president’s allegations of voter fraud.
Celebrity news was another topic that proved a standard of both media and non-media pages, even though it couldn’t be classified as “current events”. French rock singer Johnny Hallyday was present in many topics (he died in 2017!). So were reality TV stars from shows such as “Occupation Double” in Canada or “Koh Lanta” in Europe. According to one Swiss non-media topic in August, the “petit maillot” (small bathing suit) of a participant of the latter program makes “craquer internautes” (break the Internet).
Most topics, though, were relatively unique to non-media pages. Recipes have been found in topics in all countries almost every month. Typical terms are “recette facile” (easy recipe) with ingredients, like “beurre” (butter) or “pommes terre” (potatoes), to make “repas noël” (Christmas meal) or “délicieux gâteau” (delicious cake). Some self-help topics, mostly to “perdre poids” (lose weight), have appeared once in a while, but less often.
In Switzerland, astrology was surprisingly dominant in non-media pages. The non-media page posted the most in French in Switzerland was “Horoscope du jour” (Today’s horoscope). Terms such as “compatibles amour” (compatible zodiac signs for love) or “prédisent astres” (what are the stars predicting), for example, have defined one to five topics (out of 12) almost every month. In January alone, more than 1,950 Swiss non-media posts mentioned “horoscope”. Of course, not a single post predicted the COVID-19 pandemic.
But recipes and horoscopes sometimes made up topics in media pages, albeit rarely. Only two types of content were unique to the non-media francophone Facebook, thus confirming the findings of the first two textual analyses: faith-based and, to a lesser extent, “feel good” content.
To be more precise, it was Christian and Islamic content that appeared regularly in topics, mostly in Canadian and Belgian subcorpora. The expression “jésus christ”, for example, appeared in no less than 71 topics throughout the year in the non-media subcorpus, not once in the media subcorpus. Terms like “nom jésus” (name of Jesus), “seigneur jésus” (lord Jesus) or “prière” (prayer) also appeared in dozens of topics unique to non-media subcorpora. Those topics were sometimes the most salient of the month in some Canadian subcorpora. In some topics, the expression “lavictoiredelamour” was present. “La victoire de l’amour” is the name of a dominical television program on TVA, Québec’s most watched private TV network. The popularity of the network itself does not explain why religious content had been highlighted so frequently by topic modeling. One more probable hypothesis could be that content posted by this program had been shared and reposted by many other pages on Facebook and therefore appeared more often in my final sample.
In Belgium, Muslim content was regularly featured in top topics throughout the year, with terms adapting to events in the Islamic calendar, like “tarawih” and “iftar” during Ramadan, for example. In some topics, terms appear in Arabic, such as “عليه وسلم” (peace be upon Him).
In the “feel good” category, terms such as “gros bisous” (hugs and kisses), ”beaux rêves” (sweet dreams), along with expressions highlighted in χ2 residuals such as “bonne journée” (have a good day), turned up in many topics of all non-media subcorpora.
One last element that stood out was that only two, maybe three topics could be attributed to disinformation, which was very little. In June, in the Canadian non-media subcorpus, one topic included terms such as “donald trump”, [French president] “emmanuel macron” along with “oui avez”, “avez bien” and “bien lu” (yes, you’ve read correctly). The use of the second person, engaging the reader by talking directly to him or her, is a telltale sign of misinformation [23]. After swearing, it was the linguistic feature most often associated with fake news. It appeared almost seven times as much in fake news articles as in trusted sources. In November, in the non-media Switzerland subcorpus, another topic contained terms related to vaccines along with “expérience incroyable” (incredible experience or experiment). Rashkin and her colleagues also found that the use of superlatives could be a sign of misinformation, although not as clear as the use of the second person.
In March and November, the terms “chloroquine” and “raoult” appeared in French and Swiss non-media subcorpora. Didier Raoult, a Marseille-based doctor, had become a household name in much of the French-speaking world after publishing an article in the International Journal of Antimicrobial Agents. It claimed that “hydroxychloroquine treatment is significantly associated with viral load reduction/disappearance in COVID-19 patients and its effect is reinforced by azithromycin” (Gautret, et al., 2020). It stirred a debate in the French press, amplified by the subsequent endorsement of the treatment by then U.S. president Donald Trump. But it was difficult to attribute these topics to misinformation as Raoult’s name also appeared in three topics from the media subcorpus during the year. On closer analysis, it appears Mr. Raoult had been mentioned in almost 4,600 posts from the French subcorpus, 63 percent of which were posted in non-media pages. Among those pages, many were known for misinformation, such as “RT France”, “Epoch Times Paris”, “Gilets Jaunes Infos” or “L’Eveilleur Quantique”. One of the posts generating the most attention was a video (Figure 6) posted by “La vraie démocratie”, a page considered “one of the most influential francophone misinformation pages on Facebook” (Conspiracy Watch, 2020). The video has more than 178,000 total interactions and almost 3.5 million views as of July 2021. But not all pages mentioning Mr. Raoult could be labeled misinformation. Besides, the page having quoted him most often in 2020 (161 times) is the Marseille daily La Provence. His name also appeared in 34 posts by Le Monde. Raoult may certainly be synonymous with debate and controversy, but not necessarily with disinformation.
Figure 6: Post by reinformation page “La vraie démocratie” on 28 October 2020. It is a video excerpt of an interview of Dr. Didier Raoult aired on all-news channel LCI on 27 October. The all-caps text above the still frame reads: “You’ve all gone bonkers!” In the message text above, Raoult is quoted as saying: “In the end, will we all be locked up for the rest of our lives because there are viruses outside?”
This study should not be interpreted as demonstrating there was no disinformation on Facebook. It shows disinformation has not been a salient topic in Francophone areas, even when one subtracts media content from the social network. This does not mean that there is no fake news in French on Facebook, it only means it is not one of its main ingredients.
A great deal of the literature on fake news has focused on the United States. Perhaps it has skewed perceptions. Humprecht, et al. found that it was “the most vulnerable country regarding the spread of online disinformation” [24]. In their cross-national study of “resilience to online disinformation”, they clustered 18 countries in three groups. The U.S. was in a group all by itself, “characterized by high levels of populist communication, polarization, and low levels of trust in the news media” [25]. France was not included in this study, but Belgium, Canada and Switzerland, countries “with consensus political systems, strong welfare states, and pronounced democratic corporatism” [26], had been classified in the cluster most resilient to disinformation. The findings of this study support the conclusions of Humprecht and her colleagues.
All three textual analyses converged. The differences that they’ve helped pinpoint between media and non-media content on Facebook boiled down to a classical opposition between “public interest” and “the public’s interest”.
The concept of “public interest” could be synonymous with “common good”, “public service” or “general welfare”. It is something many professions vow to defend: public servants, politicians, lawyers, physicians, journalists, to name just a few.
In journalism, public service is a core ideal-typical value and a “powerful component of journalism’s ideology” [27]. Many journalists see themselves as working for the public first and foremost, the media organization employing them being second in the order of their loyalties. That their “primary commitment is to the public” is a “deeply felt tradition” (Kovach and Rosenstiel, 2014).
But it isn’t only for the public’s sake that journalists strive to serve it. The public service ideal goes hand in hand with the democratic ideal. “Journalists think of journalism as a service in the public interest, one that is shaped with an eye toward the needs of healthy citizenship” [28]. It can be argued that Western journalism sees “the role of the press as serving the public interest by providing the informational needs necessary for [a] self-governing republic” [29]. In doing so, “Journalists earn public trust and protection by loyally performing their public service duty to citizens” [30].
To fulfill this duty, normative as it may be, media organizations have historically found ways to reach the public that they aim to serve. They’ve done so by selling newspapers, by broadcasting radio and television news programs, by creating Web sites. In the last decade, social networks have become an increasingly important channel to reach the public. It is a channel media organizations don’t control. It is a channel where Facebook products (the original Facebook platform, Instagram, as well as messaging applications Messenger and WhatsApp) are by far the most used by citizens throughout the world to access news, according to the latest versions of the Reuters Institute Digital News Report (Newman, et al., 2021, 2020, 2019).
Facebook’s role goes beyond that of a neutral, passive platform. Beginning in 2013, it has actively courted news organizations, inviting them not only to share content on Facebook, but to create content specifically designed for it (Mattelart, 2020). Without ever explicitly recognizing it, Facebook has taken the public service duty to inform citizens upon itself too.
Facebook seemed to embrace that role in a 2017 manifesto following the 2016 U.S. presidential election. The social network’s founder pledged that his company’s “next focus will be developing the social infrastructure for community” (Zuckerberg, 2017). Among the five goals he described for the future of Facebook was that it should be “informing us”.
Although Zuckerberg wrote that “a strong news industry is [...] critical to building an informed community”, many decisions taken afterwards ran counter to the 2017 manifesto. Barely a year later, in a bid “to encourage meaningful social interactions with family and friends”, the CEO announced that his users will “see less public content, including news [...]. After this change, we expect news to make up roughly 4% of News Feed — down from roughly 5% today” (Zuckerberg, 2018).
In February 2021, while the storm was the fiercest in Australia, Facebook announced that it wished to reduce political content on its’ users News Feed. “We’ll temporarily reduce the distribution of political content in News Feed for a small percentage of people in Canada, Brazil and Indonesia this week, and the US in the coming weeks” (Gupta, 2021a). Content from news media could be labeled political, according to a New York Times story. “Under the new test, a machine-learning model will predict the likelihood that a post — whether it’s posted by a major news organization, a political pundit, or your friend or relative — is political” (Roose and Isaac, 2021).
Two months later, the Palo Alto company began surveying its users “to understand which posts they find inspirational [...] People have told us they want to see more inspiring and uplifting content in News Feed because it motivates them [...]. For example, a post featuring a quote about community can inspire someone to spend more time volunteering” (Gupta, 2021b).
Less news, more “inspirational” content. Facebook’s idea of an informed community is one where individual expression around personal interests and experiences are merely aggregated by algorithms. It is not the public sphere of deliberation where professionals gather and report news in the public interest an informed citizenry would be in right to expect (Carignan, 2021).
The kind of content that Facebook pushes to its users is a self-serving version of the public interest. Carol W. Lewis would call it “a fabricated version of the common good” [31]. Removing public interest content from Facebook helps us understand what kind of content the Palo Alto giant favors: sports, celebrities, “inspirational” “feel good memes”; fast food satisfying in the short term, but dangerous in the long run for the health of a democratic society, or what Mr. Zuckerberg calls a “civically-engaged community”.
“It is not true that news has value for Facebook”, Kevin Chan, head of public policy for Facebook, Inc., in Canada, told me (Roy, 2021a). I beg to differ. News content is extremely valuable for the company. It confers what Jane B. Singer calls “societal value” by conveying “information people can trust” [32]. That trust, she writes, “is best established and nurtured by those with an existential commitment to social responsibility”, i.e., journalists. Trust in news media is low in many countries, but trust in social media is much lower (Edelman, 2021).
Would Facebook users come back if they could no longer find trustworthy information on the platform? Maybe not, because as Singer points out: “the public needs some means of differentiating between what is valuable to society as a whole and what is less so; otherwise, the notion of a coherent ‘public’ falls apart as each individual seeks out whatever seems most personally appealing at the moment” [33].
In other words, a newsless Facebook would be a socially useless Facebook.
About the author
Jean-Hugues Roy teaches data and computational journalism at Université du Québec à Montréal (UQAM). A journalist for 23 years, he now studies information flows online and media business models using computational methods.
E-mail: roy [dot] jean-hugues [at] uqam [dot] ca
Notes
1. Facebook, 2020b, p. 6; Google, 2020, p. 23.
2. Newman, et al., 2021, pp. 67, 77, 107 and 119.
3. In fact, when searching for content in Canada, for example, CrowdTangle will return content published by pages whose administrators are mainly from Canada. This explains why certain pages whose content is related to a given country are classified in another country. British singer Dua Lipa’s official page, for example, has 208 administrators in 31 countries! Because the majority of them (77) are in Canada, the page is considered a Canadian page.
4. Huang, 2008, p. 50.
5. Ibid.
6. Li, et al., 2016, p. 1,592.
7. Kilgarriff, 2001, p. 258.
8. Oakes and Farrow, 2007, p. 89.
9. Poudat and Landragin, 2017, p. 169.
10. Poudat and Landragin, 2017, p. 11.
11. Boumedyen Billami, et al., 2020, p. 156.
12. Nechushtai, 2018, p. 1,052.
13. Ibid.
14. Rashidian, et al., 2020, p. 8.
15. Marthoz, 2018, p. 95.
16. Bastard, et al., 2017, p. 70.
17. Brubaker and Haigh, 2017, p. 2.
18. Brubaker and Haigh, 2017, p. 8.
19. Ibid.
20. Brubaker and Haigh, 2017, p. 8.
21. Kgatle, 2018, p. 4.
22. Four types of topics produced by the models were ignored because they were irrelevant or unsignificant: those joining Facebook-specific (“timeline photo”) or link-specific (“http www”; “blaguesquebec com”) vocabulary; those made up of stopwords (“qu ils”; “2020 06”); those including calls to action by Facebook pages, media or not (“cliquez ici” [click here]; “envoyez nous” [send us]; “chance gagner” [enter to win]); or those made up of words not in French (mostly English, sometimes Dutch or Arabic).
23. Rashkin, et al., 2017, p. 2,933.
24. Humprecht, et al., 2020, p. 14.
25. Ibid.
26. Humprecht, et al., 2020, p. 14.
27. Deuze, 2005, p. 447.
28. Zelizer, 2005, p. 72.
29. Painter, 2019, p. 7.
30. Ibid.
31. Lewis, 2006, p. 698.
32. Singer, 2006, p. 24.
33. Ibid.
References
R. Al-Rfou, 2016. “Polyglot,” (16.7.4) [Python], at https://github.com/aboSamoor/polyglot, accessed 26 June 2022.
I. Bastard, D. Cardon, R. Charbey, J.-P. Cointet and C. Prieur, 2017. “Facebook, pour quoi faire? Configurations d’activités et structures relationnelles,” Sociologie, volume 8, 57–82.
doi: https://doi.org/10.3917/socio.081.0057, accessed 26 June 2022.Belga, 2020. “Facebook interdit les contenus ngationnistes,” Lavenir.net (12 October), at https://www.lavenir.net/cnt/dmf20201012_01519459/facebook-interdit-les-contenus-negationnistes, accessed 26 June 2022.
E. Bell, 2021. “Off-label. How tech platforms decide what counts as journalism,” Columbia Journalism Review, at https://existential.cjr.org/who/tech-platforms-labels/, accessed 26 June 2022.
E.J. Bell, T. Owen, P.D. Brown, C. Hauka and N. Rashidian, 2017. “The platform press: How Silicon Valley reengineered journalism,” Tow Center for Digital Journalism, Columbia University.
doi: https://doi.org/10.7916/D8R216ZZ, accessed 26 June 2022.Y. Bergeron, 2020. “Le pasteur Mukendi devra subir un autre procès,” Radio-Canada (29 July), at https://ici.radio-canada.ca/nouvelle/1723056/pasteur-paul-mukendi-nouveau-proces, accessed 26 June 2022.
Y. Bestgen, 2014. “Inadequacy of the chi-squared test to examine vocabulary differences between corpora,” Literary and Linguistic Computing, volume 29, number 2, pp. 164–170.
doi: https://doi.org/10.1093/llc/fqt020, accessed 26 June 2022.C. Blanc, 2016. “R´seaux traditionalistes catholiques et ‘r´information’ sur le web: Mobilisations contre le ‘Mariage pour tous’ et ‘pro-vie’,” tic & soci´t´, volume 9, numbers 1–2.
doi: https://doi.org/10.4000/ticetsociete.1919, accessed 26 June 2022.M. Boumedyen Billami, C. Bortolaso and M. Derras, 2020. “Extraction de thèmes d’un corpus de demandes de support pour un logiciel de relation citoyen,” Actes de la 6e Conférence Conjointe Journées d’Études sur la Parole, pp. 155–163, and at https://aclanthology.org/2020.jeptalnrecital-taln.13/, accessed 26 June 2022.
P.J. Brubaker and M.M. Haigh, 2017. “The religious Facebook experience: Uses and gratifications of faith-based content,” Social Media + Society (25 April).
doi: https://doi.org/10.1177/2056305117703723, accessed 26 June 2022.R.-Y. Carignan, 2021. “Le journalisme et la représentation des rapports socio-numériques,” Maîtrise en communication, Université du Québec à Montréal, at https://archipel.uqam.ca/14717/, accessed 26 June 2022.
C. Cepernich, 2016. “Emotion in politics,” In: G. Mazzoleni (editor). International encyclopedia of political communication. Malden, Mass.: Wiley.
doi: https://doi.org/10.1002/9781118541555.wbiepc238, accessed 26 June 2022.Conspiracy Watch, 2020. “La vraie démocratie” (17 August), at https://www.conspiracywatch.info/la-vraie-democratie, accessed 26 June 2022.
M.M. Danilak, 2020. “langdetect,” (1.0.8) [Python; OS Independent], at https://github.com/Mimino666/langdetect, accessed 26 June 2022.
M. Deuze, 2005. “What is journalism? Professional identity and ideology of journalists reconsidered,” Journalism, volume 6, number 4, pp. 442–464.
doi: https://doi.org/10.1177/1464884905056815, accessed 26 June 2022.J. Devlin, M.-W. Chang, K. Lee and K. Toutanova, 2019. “BERT: Pre-training of deep bidirectional transformers for language understanding,” Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pp. 4,176–4,286, and at https://aclanthology.org/N19-1423/, accessed 26 June 2022.
Edelman, 2021. “Country report. Trust in Canada. Edelman Trust Barometer 2021,” at https://www.edelman.ca/sites/g/files/aatuss376/files/trust-barometer/2021%20Canadian%20Edelman%20Trust%20Barometer_0.pdf, accessed 26 June 2022.
Facebook, 2021. “An update to how we address movements and organizations tied to violence” (9 November), at https://about.fb.com/news/2020/08/addressing-movements-and-organizations-tied-to-violence/, accessed 26 June 2022.
Facebook, 2020a. “July 2020 coordinated inauthentic behavior report” (6 August), at https://about.fb.com/news/2020/08/july-2020-cib-report/, accessed 26 June 2022.
Facebook, 2020b. “Facebook response to the Australian Treasury Laws Amendment (News Media and Digital Platforms Mandatory Bargaining Code) Bill 2020” (28 August), at https://www.accc.gov.au/system/files/Facebook_0.pdf, accessed 26 June 2022.
P. Gautret, J.-C. Lagier, P. Parola, V.T. Hoang, L. Meddeb, M. Mailhe, B. Doudier, J. Courjon, V. Giordanengo, V.E. Vieira, H. Tissot Dupont, S. Honoré, P. Colson, E. Chabrière, B. La Scola, J.-M. Rolain, P. Brouqui and D. Raoult, 2020. “Hydroxychloroquine and azithromycin as a treatment of COVID-19: Results of an open-label non-randomized clinical trial,” International Journal of Antimicrobial Agents, volume 56, number 1, 105949.
doi: https://doi.org/10.1016/j.ijantimicag.2020.105949, accessed 26 June 2022.N. Gleicher and D. Agranovitch, 2020. “Removing coordinated inauthentic behavior from France and Russia” (15 December), at https://about.fb.com/news/2020/12/removing-coordinated-inauthentic-behavior-france-russia/, accessed 26 June 2022.
Google, 2020. “Draft News Media and Digital Platforms Mandatory Bargaining Code, Submissions in response” (28 August), at https://www.accc.gov.au/system/files/Google_0.pdf, accessed 26 June 2022.
M. Grootendorst, 2021. “BERTopic: Leveraging BERT and c-TF-IDF to create easily interpretable topics,” Zenodo (10 January).
doi: https://doi.org/10.5281/zenodo.4430182, accessed 26 June 2022.A. Gupta, 2021a. “Reducing political content in News Feed” (10 February), at https://about.fb.com/news/2021/02/reducing-political-content-in-news-feed/, accessed 26 June 2022.
A. Gupta, 2021b. “Incorporating more feedback into News Feed ranking” (22 April), at https://about.fb.com/news/2021/04/incorporating-more-feedback-into-news-feed-ranking/, accessed 26 June 2022.
M. Honnibal and I. Montani, 2017. “spaCy 2,” at https://spacy.io/, accessed 26 June 2022.
A. Huang, 2008. “Similarity measures for text document clustering,” Proceedings of the New Zealand Computer Science Research Student Conference.
E. Humprecht, F. Esser and P. Van Aelst, 2020. “Resilience to online disinformation: A framework for cross-national comparative research,” International Journal of Press/Politics, volume 25, number 3, pp. 493–516.
doi: https://doi.org/10.1177/1940161219900126, accessed 26 June 2022.M.S. Kgatle, 2018. “Social media and religion: Missiological perspective on the link between Facebook and the emergence of prophetic churches in southern Africa,” Verbum et Ecclesia, volume 39, number 1, a1848.
doi: https://doi.org/10.4102/ve.v39i1.1848, accessed 26 June 2022.A. Kilgarriff, 2001. “Comparing corpora,” International Journal of Corpus Linguistics, volume 6, number 1, pp. 97–133.
doi: https://doi.org/10.1075/ijcl.6.1.05kil, accessed 26 June 2022.B. Kovach and T. Rosenstiel, 2014. The elements of journalism: What newspeople should know and the public should expect. Revised and updated third edition. New York: Three Rivers Press.
H. Le, L. Vial, J. Frej, V. Segonne, M. Coavoux, B. Lecouteux, A. Allauzen, B. Crabbé, L. Besacier and D. Schwab, 2020). “FlauBERT: Des modèles de langue contextualisés pré-entranés pour le français,” Actes de la 6e Conférence Conjointe Journées d’Études sur la Parole (JEP, 31e Édition), Traitement Automatique des Langues Naturelles (TALN, 27e Édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e Édition). Volume 2: Traitement Automatique des Langues Naturelles, pp. 268–278, and at https://aclanthology.org/2020.jeptalnrecital-taln.26/, accessed 26 June 2022.
T. Leaver, 2021. “Going dark: How Google and Facebook fought the Australian News Media and Digital Platforms Mandatory Bargaining Code,” M/C Journal, volume 24, number 2 (26 April).
doi: https://doi.org/10.5204/mcj.2774, accessed 26 June 2022.C.W. Lewis, 2006. “In pursuit of the public interest,” Public Administration Review, volume 66, number 5, pp. 694–701.
doi: https://doi.org/10.1111/j.1540-6210.2006.00634.x, accessed 26 June 2022.B. Li, Z. Zhao, T. Liu, P. Wang and X. Du, 2016. “Weighted neural bag-of-n-grams model: New baselines for text classification,” Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pp. 1,591–1,600, and at https://aclanthology.org/C16-1150/, accessed 26 June 2022.
C.-Y. Lin, 1995. “Knowledge-based automatic topic identification,” ACL ’95: Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, pp. 308–310.
doi: https://doi.org/10.3115/981658.981705, accessed 26 June 2022.C. Loiseau, 2020. “Les complotistes iront ailleurs,” Le Journal de Montréal (7 October), at https://www.journaldemontreal.com/2020/10/07/facebook-supprime-la-page-radio-quebec-de-sa-plateforme, accessed 26 June 2022.
M. Lui, 2016. “langid.py,” (1.1.6) [Python], at https://github.com/saffsd/langid.py, accessed 26 June 2022.
J.-P. Marthoz, 2018. Journalisme international. Louvain-la-Neuve, Belgium: De Boeck Supérieur.
L. Martin, B. Muller, P.J.Q. Suárez, Y. Dupont, L. Romary, B. Sagot and D. Seddah, 2020. “Les modèles de langue contextuels CAMEMBERT pour le français: Impact de la taille et de l’hétérogénéité des donnes d’entrainement,” Actes de la 6e Conférence Conjointe Journées d’Études sur la Parole (JEP, 31e Édition), Traitement Automatique Des Langues Naturelles (TALN, 27e Édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e Édition). Volume 2: Traitement Automatique des Langues Naturelles, pp. 54–65, and at https://aclanthology.org/2020.jeptalnrecital-taln.5/, accessed 26 June 2022.
T. Mattelart, 2020. “Comprendre la stratégie de Facebook à l’égard des médias d’information,” Sur le journalisme, About journalism, Sobre jornalismo, volume 9, number 1, pp. 24–43.
doi: https://doi.org/10.25200/SLJ.v9.n1.2020.416, accessed 26 June 2022.A. Nadkarni and S.G. Hofmann, 2012. “Why do people use Facebook?” Personality and Individual Differences, volume 52, number 3, pp. 243–249.
doi: https://doi.org/10.1016/j.paid.2011.11.007, accessed 26 June 2022.E. Nechushtai, 2018. “Could digital platforms capture the media through infrastructure?” Journalism, volume 19, number 8, pp. 1,043–1,058.
doi: https://doi.org/10.1177/1464884917725163, accessed 26 June 2022.N. Newman with R. Fletcher, A. Schulz, S. Andi, C.T. Robertson and R.K. Nielsen, 2021. “Digital news report 2021,” Tenth edition. Reuters Institute for the Study of Journalism, University of Oxford, at https://reutersinstitute.politics.ox.ac.uk/digital-news-report/2021, accessed 26 June 2022.
N. Newman with R. Fletcher, A. Schulz, S. And and R.K. Nielsen, 2020. “Digital news report 2020,” Reuters Institute for the Study of Journalism, University of Oxford, at https://www.digitalnewsreport.org, accessed 26 June 2022.
N. Newman with R. Fletcher, A. Kalogeropoulos and R.K. Nielsen, 2019. “Digital news report 2019,” Reuters Institute for the Study of Journalism, University of Oxford, at https://www.digitalnewsreport.org, accessed 26 June 2022.
M.P. Oakes and M. Farrow, 2007. “Use of the chi-squared test to examine vocabulary differences in English language corpora representing seven different countries,” Literary and Linguistic Computing, volume 22, number 1, pp. 85–99.
doi: https://doi.org/10.1093/llc/fql044, accessed 26 June 2022.C. Painter, 2019. “Public service role of journalism,” In: F. Hanusch and T.P. Vos (editors). International encyclopedia of journalism studies. Hoboken, N.J.: Wiley-Blackwell.
doi: https://doi.org/10.1002/9781118841570.iejs0093, accessed 26 June 2022.E. Pariser, 2011. The filter bubble: What the Internet is hiding from you. New York: Penguin Press.
C. Poudat and F. Landragin, 2017. Explorer des données textuelles: Méthodes — pratiques — outils. Louvain-la-Neuve, Belgium: De Boeck Supérieur.
D. Ramage, E. Rosen, J. Chuang, C.D. Manning and D.A. McFarland, 2009. “Topic modeling for the social sciences,” Workshop on Applications for Topic Models, NIPS, at http://vis.stanford.edu/papers/topic-modeling-social-sciences, accessed 26 June 2022.
N. Rashidian, G. Tsiveriotis, P. Brown, E.J. Bell and A. Hartstone, 2020. “Platforms and publishers: The end of an era,” Tow Center for Digital Journalism, Columbia University.
doi: https://doi.org/10.7916/d8-sc1s-2j58, accessed 26 June 2022.H. Rashkin, E. Choi, J.Y. Jang, S. Volkova and Y. Choi, 2017. “Truth of varying shades: Analyzing language in fake news and political fact-checking,” Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 2,931–2,937.
doi: https://doi.org/10.18653/v1/D17-1317, accessed 26 June 2022.C. Reynolds, 2021. “Facebook Canada head warns news posts could be blocked as last resort,” Globe and Mail (29 March), at https://www.theglobeandmail.com/canada/article-facebook-canada-head-warns-news-posts-could-be-blocked-as-last-resort/, accessed 26 June 2022.
K. Roose and M. Isaac, 2021. “Facebook dials down the politics for users,” New York Times (10 February), at https://www.nytimes.com/2021/02/10/technology/facebook-reduces-politics-feeds.html, accessed 26 June 2022.
J.-H. Roy, 2021a. “Facebook vs. Australia — Canadian media could be the next target for ban,” The Conversation (22 February), at http://theconversation.com/facebook-vs-australia-canadian-media-could-be-the-next-target-for-ban-155728, accessed 26 June 2022.
J.-H. Roy, 2021b. “Instagram: La une de l’ère mobile,” Les Cahiers du journalisme, Nouvelle sèrie, number 6, pp. R69–R97, and at https://cahiersdujournalisme.org/V2N6/CaJ-2.6-R069.html, accessed 26 June 2022.
M. Silva, 2021. “Update on the News Media Bargaining Code in Australia,” About Google (6 January), at https://about.google/google-in-australia/jan-6-letter/, accessed 26 June 2022.
J.B. Singer, 2006. “The socially responsible existentialist: A normative emphasis for journalists in a new media environment,” Journalism Studies, volume 7, number 1, pp. 2–18.
doi: https://doi.org/10.1080/14616700500450277, accessed 26 June 2022.J. Tiedemann, 2012. “Parallel data, tools and interfaces in OPUS,” Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12), pp. 2,214–2,218, and at https://aclanthology.org/L12-1246/, accessed 26 June 2022.
J.-P. Willaime, 2017. “Polarisation autour du religieux en francophonie? Les nouveaux défis de la laïcité,” Francophonies d’Amérique, numbers 44–45, pp. 137–162.
doi: https://doi.org/10.7202/1055908ar, accessed 26 June 2022.B. Zelizer, 2005. “Definitions of journalism,” In: G. Overholser and K.H. Jamieson (editors). The institutions of American democracy: The press. New York: Oxford University Press, pp. 66–80.
M. Zuckerberg, 2018. “Continuing our focus for 2018 to make sure the time we all spend on Facebook is time well spent...” Facebook (19 January), at https://www.facebook.com/zuck/posts/10104445245963251, accessed 26 June 2022.
M. Zuckerberg, 2017. “Building global community,” Facebook (16 February), at https://www.facebook.com/notes/3707971095882612/, accessed 26 June 2022.
Technical appendix
The full code and most data (in accordance with CrowdTangle’s “Terms of Service”) are available in a repository on the author’s github account: https://github.com/jhroy/facebook-franco.
Editorial history
Received 16 July 2021; accepted 10 April 2022.
This paper is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.Kittens and Jesus: What would remain in a newsless Facebook?
by Jean-Hugues Roy.
First Monday, Volume 27, Number 7 - 4 July 2022
https://firstmonday.org/ojs/index.php/fm/article/download/11815/10683
doi: https://dx.doi.org/10.5210/fm.v27i7.11815