Kittens and Jesus: What would remain in a newsless Facebook? by Jean-Hugues Roy

This paper examines what would remain on Facebook if news content was removed, like the company temporarily did in Australia, in February 2021. Using a corpus of 3.3 million Facebook posts published in French in 2020 in four countries (Belgium, Canada, France and Switzerland), it compares media content to non-media content by submitting the text of the posts to three computational analyses: basic n-gram comparison, χ2 residuals and topic modeling. Two distinct spheres are defined within Facebook content: a “public interest” sphere, made up of media pages, and a “public’s interest” sphere, made up of non-media pages. Religious content and “inspirational” “feel good memes” were found to be most characteristic of a newsless Facebook.


Introduction: The Australian standoff

It was probably one of the biggest gambits in Internet history. On 17 February 2021, Facebook removed all Australian news content from its platform. Facebook users all over the world could no longer access articles from Australian media organizations, nor could they share URLs from those publications on their personal profiles, or on groups or pages they participate in. Even The Conversation, an international Web site used by academics to publish research findings, was barred from the platform because it is headquartered in Melbourne. The ban also worked the other way around as Facebook’s Australian users could no longer see or share any news, wherever it originated from in the world.

Facebook was reacting to Canberra’s News Media and Digital Platforms Bargaining Code. The legislation forces Google and Facebook to negociate deals with Australian media companies to compensate for news content in search results or users’ posts. While it aims to help finance Australian journalism, the Code had been denounced by both Web giants as “unworkable” [1].

In January, 2021, Google threatened, “if the Code were to become law in its current form”, to “have no real choice but to stop making Google Search available in Australia” (Silva, 2021). But it refrained from a total blackout after negociating changes in the Code with Australian Treasurer Josh Frydenberg (Leaver, 2021).

Facebook, on the other hand, surprised everyone by nuking the news in Australia. The ban, denounced by Prime Minister Scott Morrison and others, was however short lived. On 22 February, the Palo Alto company announced it too had convinced Australian lawmakers to amend their Code before enacting it. Basically, both platforms agreed to sign deals with Australian media companies before the Code became law.

In the end, everybody claimed they have won. Government and news businesses were happy Google and Facebook diverted some of their revenue to support journalism. Both Web behemoths were happy to escape the more constraining aspects of the Code.

While the conflict had been resolved in Australia, other countries contemplating similar legislation wondered if they would face the same sort of retaliation from platforms. Facebook, in particular, has said it would not hesitate to remove news content in Canada, even if it was as a last resort (Reynolds, 2021; Roy, 2021a).

Which leads one to wonder what would Facebook would look like if news content was permanently removed? What would remain on Facebook users’... “News Feed” if it was devoid of news? This is the research question that this article tries to answer.



Methods: Using four French-language countries as an example

As a journalism prof working in Montreal, the most natural “terrain” to conduct this research project is French-language Canada. But to establish whether my findings are restricted or not to my home area, I needed to examine what a news ban would look like in more than one country. France, Belgium and Switzerland, nations with sizeable French-language mediascapes, were selected. In all four countries, between 38 and 55 percent of the population mentions social media as a source for news. Everywhere, Facebook is the top social media used [2].


I used CrowdTangle to download a sample of Facebook posts from all four countries. CrowdTangle is a Facebook-owned tool to discover public content on social media platforms. It is made available to academics through a partnership between Facebook and Social Science One.

One of the ways to access CrowdTangle data is by using a Web-based dashboard. It enables country-specific [3] searches, but for content published on Facebook pages only. For any given search, CrowdTangle can return, as of early 2021, as much as 300,000 posts.

I thus used a CrowdTangle dashboard to download 300,000 posts made on public pages which generated the most interactions in all four countries covered by this research, for each and every month of the year 2020. After removing duplicates, this gave me an initial sample of 13.4 million posts. I then filtered it three times (see Table 1).

First, I only kept posts in French using langId (Lui, 2016), langDetect (Danilak, 2020) and polyglot (Al-Rfou, 2016), three Python language-detection libraries. When two or three agreed on a given language, the post was classified in that language, otherwise it was classified as “unknown”. This reduced my sample to 5.3 million posts, which is not surprising given that French is not spoken by the majority in Canada or Switzerland, and given that even in France many pages publish content in English to appeal to an international audience.

Second, because I wanted to analyse the text of those posts (see Findings), I only kept those which had 100 characters or more. This includes memes, as CrowdTangle was able to extract text from images using OCR. After running this filter, my sample was further decreased to 4.2 million posts.

Third, I wanted to reduce the number of pages in my sample because I had to manually classify media vs. non-media pages. This work was tedious, so I only kept pages which had 100 posts or more in my initial sample. This left me with a final sample of approximately 3.3 million posts.


Table 1: Sample filtering.
CountryPosts in initial sampleFrench-language postsPosts with 100 characters of text or morePosts in pages with 100 posts or more (final sample)
All four13,444,9405,276,9354,175,9253,271,441



The next step was to classify media and non-media pages. I certainly could not trust Facebook’s definition. “Consider [...] the absurdly wide scope of the ‘Media/News’ category on Facebook”, writes Columbia’s Emily Bell (2021). She quotes Gordon Crovitz, former Wall Street Journal publisher and cofounder of NewsGuard, a system that rates the credibility of news sources. According to him, platforms have demonstrated their “fundamental failure to understand the core concepts of journalism”. So I needed to craft my own criteria. I used the following. Media pages were:

I excluded, and thus classified in the non-media group, pages for:


Table 2: Media and non-media subcorpora, by country.
CountryPostsPagesPostsPercentage of total postsPagesPercentage of total pages
All four2,333,3606,046938,08128.7%5898.9%


The result of this classification process is presented in Table 2. The percentage of media pages in Canada was greater perhaps because I was more familiar with the Canadian news ecosystem. In other countries, when I was unable to establish the news character of a page, I left it out, probably excluding smaller or local media. However, main news organizations were included. The full list of pages, both media and non-media, can be found in the Technical appendix.

It produced eight subcorpora, a subset of media and non-media Facebook posts for each country. On average, posts from media pages represent 28.7 percent of the total 3.3 million posts in my final sample.

To answer the research question at the heart of this work, I needed to compare, within each country, media and non-media corpora. To do so, I performed three different textual analyses.

Analysis 1: n-grams

Using spaCy (Honnibal and Montani, 2017, version 3.0.6), a natural language processing Python library, and its “fr_core_news_md” model for French, I first ran a preprocessing of the textual elements of the posts (stopword removal, lemmatization, lowercasing, etc.). It was followed by basic n-gram extractions for single lemmas, bigrams and trigrams to see which were most frequent in each subcorpus. The total number of lemmas in all preprocessed subcorpora was more than 106 million.

This first analysis was based on the “bag of words” approach which has been used for decades in text mining. It assumes that “words appear independently [in a document] and their order is immaterial” [4]. The weight of each word is its frequency, “which means terms that appear more frequently are more important and descriptive for the document” [5]. The same applies to bigrams and trigrams.

The approach has however been perfected by different weighing techniques to achieve better performance, according to the task at hand [6]. A corpus made of Facebook posts gives us a novel way of weighing n-grams: the total number of interactions generated by the posts they appear in.

This method also better reflects how Facebook users reacted to a given content. Measuring only frequencies could lead to make certain terms misleadingly salient. For example, an election campaign team or a marketing firm could publish hundreds of posts on Facebook, making occurrences of a given politician or a given product much more common. In the context of Facebook, a “space that celebrates the primacy of the emotional, the impulsive, over argumentation” (Cepernich, 2016), a market of emotions where the engagement of users is what gives meaning to content (and value to Facebook), it appears methodologically justified to use interactions rather than simple frequencies.

Analysis 2: χ2 residuals

There are other ways of comparing two corpora that go beyond simply counting n-grams. For the last 20 years, computational linguists have been tackling the issue. In this quest, Adam Kilgarriff’s work stands out as he has considered many statistical methods. He concluded that “χ2 [chi-squared] is [...] a suitable measure for comparing corpora, and is shown to be the best measure of those tested” [7]. The χ2 test tells us whether the observed frequency of a given n-gram in one corpus is greater than its expected frequency.

Oakes and Farrow pushed χ2 further by calculating the standardized residual [8], using this formula:


Dividing the difference between observed and expected frequencies by the square root of the expected frequency, this formula produces a positive or a negative value for each n-gram, by corpus. The greater the positive value, the more characteristic this n-gram is to that corpus. A great negative value means the opposite and a value close to zero means the word is equally present in both corpora.

Table 3 gives an example of how to calculate standardized residuals for three words taken from the media and non-media Facebook posts in Canada : travail (work), pandémie (pandemic) and recette (recipe).


Table 3: Standardized residuals calculations for three words from the Canadian subcorpus.
 Observed ExpectedResiduals
sum of all words3,904,93110,455,52514,360,456


Table 3 illustrates how the expected frequency was computed: the word pandémie, for example, appears 24,363 times in both subcorpora. Given that 27.3 percent of all words are in the media subcorpus (3.9 million/14.4 million), the expected frequency of pandémie in this subcorpus is 27.3 percent of 24,363, or 6,624.85.

Table 3 also shows that the standardized residuals indicated that the word pandémie was more distinctive of the media corpus, that the word recette rather defined the non-media corpus and that the word travail, with values very close to zero in both subcorpora, was characteristic of neither.

Standardized residuals are a “statistical tool for exploratory research that allows for the identification of words that deserve deeper analysis, and not [...] an instrument of confirmatory analysis”, warned Bestgen (2014). Poudat and Landragin also pointed out that the χ2 test is based on the normal distribution. Insofar as most words are infrequent in most corpora, the χ2 test, and its residuals, are thus an “unreasonable” way to analyse text “unless you have an important volume of data” [9]. Given that a corpus containing more than one million words can be considered a “large corpus” [10], that the smallest of my subcorpora contains 3.6 million terms (trigrams from Canadian media pages), I will consider reasonable to use χ2 residuals to compare media and non-media corpora as part of the second textual analysis in the Findings section, keeping in mind that they are only “indicator of the potential interest of each of the numerous vocabulary differences between the corpora” (Bestgen, 2014).

I also chose to weigh the χ2 residuals by the number of interactions. Instead of calculating them using the frequency of each term (lemma, bigram or trigram), I used the sum of the interactions of the posts in which they were found. As discussed earlier, this gives a measure that was more adapted to the corpus used (Facebook posts).

Analysis 3: Topic modeling

Finally, in order to further explore the differences between both supcorpora, topic modeling was the third textual analysis (Lin, 1995; Ramage, et al., 2009). This technique clusters words which often appear together in documents, therefore revealing that they are probably somewhat semantically related. The models used also score each word to weigh a given topic relative to others. In a way, topic modeling lets the corpus speak by itself, even though the model doesn’t label topics. It is our job, as researchers, to name them.

I used the BERTopic algorithm (Grootendorst, 2021), based on the recently released Bidirectional Encoder Representations from Transformers (BERT) model (Devlin, et al., 2019). BERTopic provides two sets of embedding models: one for the English language and the other for multilingual documents. None are specifically adapted to French-language documents.

It was however possible to ask BERTopic to use other models for word embeddings. I used three: the aforementioned “fr_core_news_md” model by spaCy, which included a French regional newspaper (L’Est républicain) in its training corpus; the “flaubert_large_cased” model from the FlauBERT project (Le, et al., 2020), which includes some Belgian newspapers in its training corpus (Tiedemann, 2012); and the “camembert-base” model from the CamemBERT project (Martin, et al., 2020), trained on the French subcorpus of the OSCAR corpus, which did not seem to include any news content.

I ran BERTopic four times on each of all eight subcorpora with different parameters. The first run was using spaCy’s model and asking BERTopic to provide 20 topics with 20 bigrams each. The remaining three runs (once with each model) asked BERTopic to provide 12 topics with eight single words or bigrams. The use of three different models (spaCy, FlauBERT and CamemBERT) led me to believe the resulting analysis would be all the more robust.

This analysis was performed on the “raw” text of Facebook posts without lemmatization or removal of stopwords. Also, I did not weigh by interactions the terms used by the algorithm. However, instead of running the algorithm on single words, which was a limitation in most topic modeling approaches [11], I used it with bigrams on my first run with the spaCy model. In the following three runs, I let BERTopic use unigrams or bigrams. Joining names (such as “emmanuel macron”) or entities (like “états unis” or “real madrid”) often provided richer topics, facilitating their interpretation.

Topic modeling is extremely memory intensive for computers. With subcorpora as large as 37 million bigrams, in the case of non-media pages in France, I had to divide each by month, otherwise my code crashed. This strategy, though it meant each run took several hours, enriched the analysis, in the end, because it enabled me to explore one month at a time and tell the story of how topics in both media and non-media “spheres” within Facebook evolved over the year 2020.




The media loves Facebook

The first surprise is the sheer amount of content Francophone news media published on Facebook. Table 4 shows the top 50 pages that published the most posts in 2020 in the four countries covered by this study; 42 of them were media pages!


Table 4: Top-50 French-language pages by number of posts in Belgium, Canada, France and Switzerland (2020); media pages shown with grey background.
PageFacebook IDCountryNumber of postsType
RTL InfoRTLInfoBelgium20,318media
Le SoirlesoirbeBelgium17,656media
Le ParisienleparisienFrance16,290media
La Librelalibre.beBelgium14,673media
Le NouvellistenouvellisteSwitzerland14,466media
La Parole VivantelaparolevivanteCanada13,512religion
RTBF InfortbfinfoBelgium13,338media
Le VifLeviflexpressBelgium12,834media
Le Journal de MontréaljdemontrealCanada11,660media
TVA NouvellesTVAnouvellesCanada11,041media
La CôteLaCoteJournalSwitzerland9,998media
La ProvencelaprovenceFrance9,726media
Le FigarolefigaroFrance9,582media
Foot MercatofootmercatoFrance9,464football
20 Minutes20minutesFrance9,160media
Radio-Canada Informationradiocanada.infoCanada8,740media
Epoch Times ParisEpochTimesParisFrance8,537misinformation
Metro BelgiquemetrobelgiqueBelgium8,256media
La PresseLaPresseFBCanada8,252media
RTBF SportRTBFSportBelgium8,167media
Tribune de Genèvetdg.chSwitzerland8,145media
Ouest FranceouestfranceFrance8,046media
Le Mondelemonde.frFrance8,011media
beIN SPORTS FrancebeINSPORTSFranceFrance7,789media
FRANCE 24FRANCE24France7,689media
RT FranceRTFranceFrance7,417misinformation
Le Journal de QuébecJdeQuebecCanada7,338media
TF1 Le JTTF1leJTFrance6,345media
France Bleureseau.francebleuFrance6,139media
Lions de l’Atlaslionsdelatlas.netBelgium6,099football
Horoscope du jourhoroscopedujour.chSwitzerland5,803astrology
20 minutes online20minutesonlineSwitzerland5,792media


This might be explained by the fact media organizations have complete teams dedicated to repurposing content for Facebook and other platforms (Bell, et al., 2017). But so do record companies, viral content producers, government departments and other organizations keen on maintaining a regular presence on Facebook. It seems fair to say that media are the single most active type of organization in the French-speaking areas of Facebook.

Nechushtai [12] has argued journalism organizations have been “infrastructurally captured” by Facebook. Not only do they depend on the Palo Alto company for reaching their readers, but they are driven to adapt “news production more broadly [...], to comply with the logics, norms, or business strategies of external platforms” [13]. In the United States, more recent studies report media organizations have been distancing themselves more from digital platforms. Tech giants’ increasing power, the greater scrutiny on their actions and the growing calls for regulation have translated into caution, even distrust, towards platforms. “Publishers are attempting to regain control over the future of their business” [14]. This movement was however not at all apparent in my subcorpora of French-language media posts. The capture of Francophone journalism by Facebook seems to be still strong and, dare I say, borders on rapture.

First glance at top non-media pages

Let’s take a look at what type of pages would remain if media pages were gone. For each country, I sorted non-media pages by number of posts, by number of interactions and by average interactions by post. All twelve tables would be too long to reproduce here (Tables 5 to 8, with the top-20 pages by average interactions, by country, are presented below). But what stands out of this exercise is that comedy and humor pages, artists and fan pages (for musicians most of the time) and what I’d call “feel good meme pages” are the most common.

That is not surprising, given that this content is the bread and butter of viral pages. The fact they appeared on top of the tables, sorted by interactions, was expected. More thorough textual analyses are needed to dig deeper into these corpora. Before doing that, though, I want to discuss three elements worthy of notice in top non-media pages.


Table 5: Top-20 non-media French-language pages in Belgium (with more than one post per day), by average interactions per post.
PageFacebook IDNumber of postsInteractionsInteractions/postType
Merveille du mondemerveilledumondejaninevero1,56521,344,11913,638.4Feel good memes
Le Grand Cactus — RTBFLeGrandCactus3714,232,85811,409.3TV show
The Voice Belgique — RTBFthevoicebelgique3704,179,00711,294.6TV show
Brigade des nurseslesinfirmieres5793,630,4836,270.3Comedy
Jérôme de Warzéejeromedewarzee4062,481,4806,112.0Artist
Guillaume CorpardGuillaumeCorpard.officiel5142,556,5204,973.8Artist
Gregory Lemarchalgregorylemarchal1125192,461,5914,742.9Fan page
Pairi DaizaJardinDesMondes1,0324,637,3334,493.5Animals
TarmacTarmacRTBF8363,624,7644,335.8TV show
Bisoutendressebisoutendresse2,0458,666,7614,238.0Feel good memes
Samuel MovieActeurSamuelMovie1,1284,527,4594,013.7Artist
Pape FrançoisPapeFrancoisVatican7112,808,5293,950.1Religion
David AntoineDavidAntoine9063,249,9513,587.1Artist
Woman and material & Aimee Virgile MakougoumArtisteaucameroun4751,356,7052,856.2Artist
PTBptbbelgique9042,138,8592,366.0Political party
Fuck LoveIHateYouAndOnlyU391887,6572,270.2Comedy
Djanii AlfaDjanii.X447963,8072,156.2Artist
Cristiano Ronaldo — FranceRonaldo7France9871,942,3121,967.9Fan page



Table 6: Top-20 non-media French-language pages in Canada (with more than one post per day), by average interactions per post.
PageFacebook IDNumber of postsInteractionsInteractions/postType
Justin TrudeauJustinPJTrudeau4806,973,55414,528.2Political figure
François LegaultFrancoisLegaultPremierMinistre6475,366,8668,295.0Political figure
African Heroesafricanheroesmagazine1,1717,013,7025,989.5Special interest
Indochine officielIndochineofficiel4021,597,3263,973.4Artist
FFL — Fédération Française de la LoseFFLose9163,083,8413,366.6Comedy
Éric Duhaimeeduhaime3851,263,3693,281.5Political figure
La parfaite maman cinglantelaparfaitemamancinglante1,1533,604,6823,126.4Special interest
La solution est en vouslasolutionestenvous9702,935,0083,025.8Feel good memes
Ministère Paul Mukendiministerepaulmukendi4181,223,6332,927.4Religion
Occupation Doubleoccupationdouble6211,088,2191,752.4TV show
Gabriel Nadeau-DuboisGNadeauDubois421720,2531,710.8Political figure
Queen Fumiqueenfumiofficiel419712,7551,701.1Artist
EMCI TVemcitv1,3162,207,1631,677.2Religion
Québec Niaiseriesquebecniaiseries1598951,9481,591.9Comedy
La Parole Vivantelaparolevivante13,51221,379,3561,582.2Religion
Ricardo Cuisinericardocuisine494752,5721,523.4Food
LES ETHNIES DE LA COTE D’IVOIRE ET D’AFRIQUEethniesdeCI1,1011,643,4121,492.7Special interest
District 31ICIDistrict318681,252,8961,443.4TV show
Le RevoirJournalLeRevoir9001,298,6171,442.9Parody



Table 7: Top-20 non-media French-language pages in France (with more than one post per day), by average interactions per post.
PageFacebook IDNumber of postsInteractionsInteractions/postType
Le Meilleur du FootballLemeilleurduFootball923,691146,312,37139,640.3Football
Mathieu Rivrin • Photographe de BretagneMathieu.Rivrin.photographies3687,120,39019,348.9Artist
Les marseillais W9LesMarseillaisW93686,434,88517,486.1TV show
Nostalgies 60’–70’–80’Nostalgies60708069212,095,02317,478.4Nostalgia
30 Millions d’Amis (Officiel)30millionsdamis66310,400,10315,686.4Animals
Marine Le PenMarineLePen6048,944,60314,808.9Political figure
Né pour brûler la gommeNepourbrulerlagomme5167,230,47014,012.5Special interest
ImineoimineoTV5056,939,09713,740.8Special interest
One Voiceonevoiceanimal79910,309,38312,902.9Animals
Madame ConnasseMadameconnasse09876543212,87936,341,19012,622.9Comedy
La vraie démocratielavraiedemocratie1,07113,300,63412,418.9Reinformation
Corsica ile MagiqueCorsicaileMagique6698,296,47312,401.3Tourism
L214 Ethique et Animauxl214.animaux3834,663,06412,175.1Animals
Gnration 80’smageneration8084210,143,08712,046.4Nostalgia
M6M687010,106,36611,616.5TV channel
Nostalgie Du FootballNostalgieDuFootball5246,074,62611,592.8Football
Le Pépèrelepepere7528,554,82711,376.1Comedy
JEAN-MARIE BIGARD PAGE OFFICIELLElevraijeanmariebigard1,19713,259,81811,077.5Artist



Table 8: Top-20 non-media French-language pages in Switzerland (with more than one post per day), by average interactions per post.
PageFacebook IDNumber of postsInteractionsInteractions/postType
Les temps sont durs pour les rêveursbureaudesrevesbrises4102,682,1936,541.9Comedy
Trust My ScienceTrustMyScience2,5795,641,2552,187.4Special interest
La BibleBibles2,4592,995,3891,218.1Religion
Darius Rochebindariusrochebin371320,604864.2Journalist
Forum Économique Mondialweffrancais2,6431,171,794443.4International organization
Anthony DesonpereADeSonPere560228,334407.7Political figure
Hôpitaux Universitaires de Genèvehopitaux.universitaires.geneve456174,366382.4Institution
Te rappelles-turappelletoi5,1941,934,293372.4Nostalgia
Les souvenirs de Dakarlessouvenirsdusenegal520189,883365.2Special interest
Ambassade de Suisse en FranceAmbassadeSuisseParis408147,891362.5Institution
Ville de Neuchâtelneuchatelville724259,582358.5Institution
FCBarcelone FrenchFCBarceloneFre611216,280354.0Football
Ville de Genève — Officielvillegeneve.ch397122,243307.9Institution
Animal mon Egalanimal.mon.egal947260,594275.2Animals
Objectif FitnessObjectifFitnessOfficiel473124,453263.1Feel good memes
Servette FCservettefootballclub973185,946191.1Football
228 & AfrikaExclu228exclu1,243214,458172.5Special interest


First, few misinformation or “reinformation” pages were found. The only exceptions were “Epoch Times Paris”, “RT France” and “La vraie démocratie” (in France), as well as “L’Anti-Média” (in Qubec). Between August and November 2020, Facebook said it has removed “about 3,200 Pages [...] for violating our policy against militarized social movements [and] about 3,000 Pages [...] for violating our policy against QAnon” (Facebook, 2021). It said it has also removed pages for otherwise “coordinated inauthentic behavior”. Many such pages had been publishing content in French in the four countries covered by this study (Belga, 2020; Facebook, 2020a; Gleicher and Agranovitch, 2020; Loiseau, 2020).

Second, apart from clickbait and artists/celebrities pages, religious content appeared to be what generated the most engagement among Francophone Facebook users. For example, “La Parole vivante”, publishing Biblical quotes and memes, was the page generating the most interactions in French Canada, media or non-media categories alike. Throughout 2020, Facebook users had shared, reacted or commented on its content more than 21 million times, or two interactions every three seconds. A page on Pope Francis, created in Belgium, attracted almost 4,000 interactions per post through the year. The fact that Pastor Paul Mukendi had been convicted of sexual assault in 2019, condemned to eight years behind bars in 2020 and was awaiting a second trial, still for sexual assault (Bergeron, 2020), did nor deter his followers, as his page was the ninth most engaging non-media page in French Canada. We’ll see later that religious content is not only common in, but characteristic of, the remainder of the non-media corpora.

Third, political figures are completely absent from the top non-media pages. The only exception is Canada where the official pages of Justin Trudeau and François Legault, respectively Prime ministers of Canada and of Québec, occupied the first two spots in terms of interactions per post. This might be because both politicians have held almost daily publicly transmitted press conferences during the COVID-19 pandemic, something heads of state or government in the other three countries have seldom done.

No news is feel good news on Facebook

After extracting n-grams for all eight subcorpora, I sorted them by the total number of interactions generated by the posts that they were found in. I kept only the top-1,000 in each and then put them all back together in three lists, each made of 8,000 lemmas, bigrams or trigrams which have been associated with content garnering the most attention in French-language Facebook pages in 2020.

What is most relevant, here, are not the lists themselves. They tell us, for example, that “cristiano ronaldo” is, after “timeline photo” and “photo from”, the top bigram in the French non-media corpus, appearing in 7,370 posts having elicited close to 54 million interactions in 2020.

More interesting was to inquire what n-grams present in non-media were absent from media. Bear in mind that we had limited ourselves to the top-1,000, so “absent” only meant absent from that particular list, not from Facebook. But here is precisely why this exercise is relevant. We had kept only the most significant expressions according to Facebook’s own logic, where shares, likes and comments are used to elicit users’ emotions and keep them coming back. We can thus say that we were beginning to draw light on what would be characteristic of a newsless Facebook. Tables 9 and 10 provide examples.


Table 9: Top five lemmas from non-media corpora, absent from media corpora, and present in all four countries.



Table 10: Top five bigrams from the non-media corpora, absent from the media corpora, and present in all four countries.
bon journée10,13322,199,695
bel journée8,51820,319,814
bon dimanche4,7539,531,666
chaîne youtube5,4929,504,084
jésus christ8,2799,493,523


In the case of 3-grams, the top of the list was occupied by “bon week end” (have a good weekend). This, and the expressions “bon/bel (lemmatized “belle”) journée” (have a nice day), “bon dimanche” (wishing someone a nice day, but on a Sunday), “aime” (love at the first or third person, singular), “heureux” (happy), “bonheur” (happiness) and “3” (emoji code for a heart, “<3”, stripped from the less-than punctuation character in the preprocessing phase), reflect how much Facebook was built on feel good memes of people wishing their loved ones a good day with teddy bears, sparkling hearts and rainbows on pink backgrounds. Figure 1 gives a typical example.


Feel good meme post by the Une pomme par jour (An apple a day) page posted 8 December 2020
Figure 1: Feel good meme post by the “Une pomme par jour” (An apple a day) page posted on 8 December 2020. The message reads, “Life is short. Spend it with those who make you happy. Have a nice day.”


Facebook calls these “inspirational posts”. In the Discussion, we’ll see why the Palo Alto company has publicly favored this type of post over news content since 2018.

Two other keywords that stood out in the top non-news content were “dieu” (god) and “jésus christ”. The prevalence of religious content was one of the main surprises of this study, given the secular tradition of France, which also permeates French-language societies in Canada, Belgium and, to a lesser extent, Switzerland (Willaime, 2017). We will discuss it further below, as the other textual analyses performed on our corpora will continue to reveal the contents of a newsless Francophone Facebook.

God is everywhere

Twelve computations of weighed χ2 residuals were executed, one for each of the n-gram types extracted (lemmas, bigrams and trigrams) in every country. The top-50 results for each were then joined in two tables, one for those most characteristic of media pages, the other for those most characteristic of non-media pages. If a result appeared in two countries or more, it was included in Figures 2 and 3.


Most characteristic terms in posts from French-language media pages appearing in two countries
Figure 2: Most characteristic terms in posts from French-language media pages appearing in two countries (Belgium, Canada, France and Switzerland) or more, by sum of χ2 residual score weighed by interactions (2020). The number right of the bar is the number of countries this term was found in.



Most characteristic terms in posts from French-language non-media pages appearing in two countries
Figure 3: Most characteristic terms in posts from French-language non-media pages appearing in two countries (Belgium, Canada, France and Switzerland) or more, by sum of χ2 residual score weighed by interactions (2020). The number right of the bar is the number of countries this term was found in.


First, it might look surprising that the events generating the most attention in French-language media, apart from the COVID-19 pandemic, occurred in the United States. Terms related to the George Floyd murder, and ensuing demonstrations, and to the presidential election were among the most characteristic of media pages in the Francophone areas of Facebook. But that is predictable, given that “global news flows are dominated by Anglo-Saxon media” [15]. The dominance of news from the United States had also been observed in a content analysis of Instagram posts by Francophone news organizations from 2011 to 2020 (Roy, 2021b).

Unsurprisingly, “coronavirus” and “covid-19” were by far the top two terms most characteristic of media pages. This was line with the formidable amount of coverage devoted to the pandemic in 2020 by journalists worldwide. “Coronavirus” was the number one lemma characteristic of media pages in France, Belgium and Switzerland, while “covid-19” was number one in Canada.

This result also meant COVID-19 had not been characteristic of non-media pages. That was a little more unexpected, given the conventional wisdom according to which misinformation about the pandemic had been rampant on Facebook. Figure 3 shows no term related to any conspiracy theory having circulated in 2020 (vaccines, chloroquine, 5G technology, China, Bill Gates, etc.). Some related expressions were found in individual countries’ top-50 terms. The bigram “coronavirus confinement” was the 49th most characteristic of non-media pages in Belgium. In Canada, “presse covid-19” and “anti media” were found in the 38th and 39th positions. But these were the only traces of probable misinformation in content having generated the most interactions in French-speaking Facebook.

What was much more typical of non-media pages was content of little public interest (we will define this notion in the Discussion, below). After “photo”, the highest scoring term appearing in all four countries was “être” (to be). The dominance of that verb, the most common in the French language, was difficult to interpret. If it was as common in media pages, it would not show up in either Figures 2 or 3. Many reasons may explain why it was more characteristic of non-media pages.

One hypothesis: this indicates that news content was more impersonal and related less often to well-being than what could be found in non-media Facebook pages. It may also stem from the fact that Facebook and other social networks answer a need to self-represent (Nadkarni and Hofmann, 2012). In a study with more than 15,000 participants, Bastard, et al. (2017) developed a taxonomy of Facebook users. Only 17 percent were what the authors call “non active users” who preferred to lurk instead of participating in conversations with others or publishing content themselves. “For these users, Facebook isn’t a tool for self-expression or self-representation” [16]. Most people used Facebook to project what they wished to be. Thus, a second hypothesis: non-media content caters to that need.

The words “recette” (recipe) and “concours” (contest) appearing rather high in Figure 3, along with the bigrams “bon chance” (good luck, with “bonne” being lemmatized) and “huile olive” (olive oil), gave another indication of the type of content characterizing non-media pages on Facebook. Recipes and contests were surely of value to the social network’s users, but of little public interest value.

The word “temps”, which can either mean time or weather, had the highest χ2 residual score even though it appeared in the top-50 list in only two countries (France and Belgium). In which sense was it used in non-media pages? Weather information is a staple of news content. Some weather-related terms should therefore appear in the n-gram lists specific to media pages. Yet there were none. This meant that weather-related terms were found in proportionate amounts in non-media pages and that they were not specific of non-news Facebook content. So it was when it means “time” that “temps” is characteristic of non-media pages.

Besides, “prendre temps” (take [your, the] time) was the 34th in the list of non-media bigrams in Canada. This expression was another fixture of feel good memes, along with “joyeux anniversaire” (happy birthday), “prendre soin” (take care) and others mentioned earlier in this section.

Further terms of little value outside of Facebook that seemed to characterize non-media pages are “partager” (to share), “youtube”, “instagram”, and other calls to action. Also, terms that described a given post and that were untranslated, like “photo from” and “timeline photo”, appeared in my sample only because they were automatically generated when an administrator created a post on a Facebook page.

More significantly, “God” is a lemma that appeared in all four countries’ list of terms most specific of non-news content. It had been in posts generating close to 123 million interactions in 2020. Only “coronavirus” generated more in media pages during the year that the COVID-19 pandemic began! “Seigneur” (Lord) was also there, having made the top-50 list in two countries. Not appearing on in Figure 3, “allah” was the 26th most specific lemma in non-media content in Belgium.

The word “dieu” is part of many common French expressions. “Mon dieu!” is as prevalent as “Oh my god” in English. But closer inspection revealed that it was indeed in faith-based pages that the word was used most often and where posts collected the most interactions. Many of those pages have over a quarter million subscribers: Pastor Yvan Castanou’s or “Un miracle chaque jour” (A miracle a day) in France, “Darifton & compagnie”, a Muslim community in Belgium, “La Bible”, a Swiss page carpet bombing the social network with biblical memes, or “EMCI-TV”, a Christian evangelical television network based in Waterloo, Québec, broadcasting in Europe and Africa.

The Internet is a “mission field” [17] for which Facebook is well tailored. “Religious traditions [...] encourage the use of Facebook for proselytizing” [18]. Facebook’s algorithm may thwart these efforts by enclosing its users in filter bubbles (Pariser, 2011), making it difficult to preach to those not already converted. Yet, Brubaker and Haigh found in a survey that they conducted in 2017 that “Facebook use for religious purposes is primarily motivated by the need to minister to others” [19]. More than simply sharing information about themselves, devout Facebook users are motivated to share information about their religion: “They reach out to and uplift others by engaging in faith- based conversations and uploading faith-based messages” [20].

Figure 4 provides one example. It was posted by TopChrétien, a page based in France with more than a million subscribers. Posted in April, this video was a half-hour sermon by Singapore preacher Joseph Prince dubbed in French. It had 60,000 views at the time of data collection.


Repost of an English-language evangelical sermon by a French-language Christian page
Figure 4: Repost of an English-language evangelical sermon by a French-language Christian page.


Churches can also use Facebook to expand. Kgatle (2018) documented the use of the social network in the emergence of prophetic churches in southern Africa and found it played a “major role” [21]. Apart from sharing messages, it provides a platform to organize and advertise online events and services, as well as a way for the faithful to attend those services virtually live or later on Watch.

We will see in the next subsection that non-media corpora were more varied, but that religious content was still very much present.

Topic modeling: The complete diet of a newsless Facebook

Topic modeling provided a finer, more granular analysis of content posted in the four countries studied during the year. It generated approximately 72,000 terms, in more than 5,000 topics


Selection of terms appearing in the clearest topics identified per country every month during 2020, in both media and non-media subcorpora
Figure 5: Selection of terms appearing in the clearest topics identified per country every month during 2020, in both media and non-media subcorpora; lenght of bar shows frequency, or number of posts included in a given topic.


Figure 5 synthesizes the main interpretable [22] topics by month and country in both media and non-media subcorpora. It paints a broad picture where content found in media pages generally relates to current events, mainly the COVID-19 outbreak, while what is published on non-media pages seems to tout Facebook users with easy to produce and enticing content (recipes, astrology, self-help, among others).

However, the difference between media and non-media was not as clear cut as what was revealed by the χ2 residuals analysis. This may be the main takeaway from the topic modeling phase of this study. The models produced topics common to both supcorpora, most notably sports, which seemed almost as commonplace in media pages as they are in non-media pages.

Topic modeling also demonstrated that the pandemic was not confined to media pages. It was discussed in non-media pages throughout the year, most particularly in March. That being said, it seemed that the non-media pages dealing with the pandemic were mostly government agencies, as terms were related to rules, like “masque obligatoire” (mandatory mask) or “rassemblements” (gatherings), along with the name of an agency, such as “conseil fédéral” (Federal Council, the executive branch of government in Switzerland), or invitations to call a number to be informed on sanitary measures. Sometimes, COVID topics in non-media pages were more personal in tone, inviting Facebook users to “prendre soin” and “soin autres” (to take care of others), providing advice, such as “nettoyer légumes” (wash vegetables), or telling them how to “vivre normalement” (live normally) despite the confinement.

Occasionally, a few other current events topics appeared in the non-media subcorpora. Apart from sports, as mentioned earlier, extreme weather events appeared most frequently. Storms Ciara and Dennis, in northern Europe, along with the fires in Australia — probably because of all the cute koala pictures (“koala” being a term also present in those topics) — topped the list. Three other topics that generated much attention worldwide were included in a few topics in non-media subcorpora: the U.S. presidential election; the Black Lives Matter demonstrations following the murder of George Floyd; and the beheading of teacher Samuel Paty, in a northern suburb of Paris, for having shown cartoons of Muhammad in class.

Were the posts associated with those topics denouncing BLM demonstrations or supporting them? Were they condemning the assassination of Samuel Paty, or condoning it? The terms picked up by the models did not enable us to tell what were the sentiments associated with those posts. Only in Donald Trump’s case was it clearer. The terms “voler élection” (steal election), “doit vivre” (must live) and “monde parallèle” (parallel world) indicate the posts seemed to ridicule the former president’s allegations of voter fraud.

Celebrity news was another topic that proved a standard of both media and non-media pages, even though it couldn’t be classified as “current events”. French rock singer Johnny Hallyday was present in many topics (he died in 2017!). So were reality TV stars from shows such as “Occupation Double” in Canada or “Koh Lanta” in Europe. According to one Swiss non-media topic in August, the “petit maillot” (small bathing suit) of a participant of the latter program makes “craquer internautes” (break the Internet).

Most topics, though, were relatively unique to non-media pages. Recipes have been found in topics in all countries almost every month. Typical terms are “recette facile” (easy recipe) with ingredients, like “beurre” (butter) or “pommes terre” (potatoes), to make “repas noël” (Christmas meal) or “délicieux gâteau” (delicious cake). Some self-help topics, mostly to “perdre poids” (lose weight), have appeared once in a while, but less often.

In Switzerland, astrology was surprisingly dominant in non-media pages. The non-media page posted the most in French in Switzerland was “Horoscope du jour” (Today’s horoscope). Terms such as “compatibles amour” (compatible zodiac signs for love) or “prédisent astres” (what are the stars predicting), for example, have defined one to five topics (out of 12) almost every month. In January alone, more than 1,950 Swiss non-media posts mentioned “horoscope”. Of course, not a single post predicted the COVID-19 pandemic.

But recipes and horoscopes sometimes made up topics in media pages, albeit rarely. Only two types of content were unique to the non-media francophone Facebook, thus confirming the findings of the first two textual analyses: faith-based and, to a lesser extent, “feel good” content.

To be more precise, it was Christian and Islamic content that appeared regularly in topics, mostly in Canadian and Belgian subcorpora. The expression “jésus christ”, for example, appeared in no less than 71 topics throughout the year in the non-media subcorpus, not once in the media subcorpus. Terms like “nom jésus” (name of Jesus), “seigneur jésus” (lord Jesus) or “prière” (prayer) also appeared in dozens of topics unique to non-media subcorpora. Those topics were sometimes the most salient of the month in some Canadian subcorpora. In some topics, the expression “lavictoiredelamour” was present. “La victoire de l’amour” is the name of a dominical television program on TVA, Québec’s most watched private TV network. The popularity of the network itself does not explain why religious content had been highlighted so frequently by topic modeling. One more probable hypothesis could be that content posted by this program had been shared and reposted by many other pages on Facebook and therefore appeared more often in my final sample.

In Belgium, Muslim content was regularly featured in top topics throughout the year, with terms adapting to events in the Islamic calendar, like “tarawih” and “iftar” during Ramadan, for example. In some topics, terms appear in Arabic, such as “عليه وسلم” (peace be upon Him).

In the “feel good” category, terms such as “gros bisous” (hugs and kisses), ”beaux rêves” (sweet dreams), along with expressions highlighted in χ2 residuals such as “bonne journée” (have a good day), turned up in many topics of all non-media subcorpora.

One last element that stood out was that only two, maybe three topics could be attributed to disinformation, which was very little. In June, in the Canadian non-media subcorpus, one topic included terms such as “donald trump”, [French president] “emmanuel macron” along with “oui avez”, “avez bien” and “bien lu” (yes, you’ve read correctly). The use of the second person, engaging the reader by talking directly to him or her, is a telltale sign of misinformation [23]. After swearing, it was the linguistic feature most often associated with fake news. It appeared almost seven times as much in fake news articles as in trusted sources. In November, in the non-media Switzerland subcorpus, another topic contained terms related to vaccines along with “expérience incroyable” (incredible experience or experiment). Rashkin and her colleagues also found that the use of superlatives could be a sign of misinformation, although not as clear as the use of the second person.

In March and November, the terms “chloroquine” and “raoult” appeared in French and Swiss non-media subcorpora. Didier Raoult, a Marseille-based doctor, had become a household name in much of the French-speaking world after publishing an article in the International Journal of Antimicrobial Agents. It claimed that “hydroxychloroquine treatment is significantly associated with viral load reduction/disappearance in COVID-19 patients and its effect is reinforced by azithromycin” (Gautret, et al., 2020). It stirred a debate in the French press, amplified by the subsequent endorsement of the treatment by then U.S. president Donald Trump. But it was difficult to attribute these topics to misinformation as Raoult’s name also appeared in three topics from the media subcorpus during the year. On closer analysis, it appears Mr. Raoult had been mentioned in almost 4,600 posts from the French subcorpus, 63 percent of which were posted in non-media pages. Among those pages, many were known for misinformation, such as “RT France”, “Epoch Times Paris”, “Gilets Jaunes Infos” or “L’Eveilleur Quantique”. One of the posts generating the most attention was a video (Figure 6) posted by “La vraie démocratie”, a page considered “one of the most influential francophone misinformation pages on Facebook” (Conspiracy Watch, 2020). The video has more than 178,000 total interactions and almost 3.5 million views as of July 2021. But not all pages mentioning Mr. Raoult could be labeled misinformation. Besides, the page having quoted him most often in 2020 (161 times) is the Marseille daily La Provence. His name also appeared in 34 posts by Le Monde. Raoult may certainly be synonymous with debate and controversy, but not necessarily with disinformation.


Post by reinformation page La vraie democratie on 28 October 2020
Figure 6: Post by reinformation page “La vraie démocratie” on 28 October 2020. It is a video excerpt of an interview of Dr. Didier Raoult aired on all-news channel LCI on 27 October. The all-caps text above the still frame reads: “You’ve all gone bonkers!” In the message text above, Raoult is quoted as saying: “In the end, will we all be locked up for the rest of our lives because there are viruses outside?”


This study should not be interpreted as demonstrating there was no disinformation on Facebook. It shows disinformation has not been a salient topic in Francophone areas, even when one subtracts media content from the social network. This does not mean that there is no fake news in French on Facebook, it only means it is not one of its main ingredients.

A great deal of the literature on fake news has focused on the United States. Perhaps it has skewed perceptions. Humprecht, et al. found that it was “the most vulnerable country regarding the spread of online disinformation” [24]. In their cross-national study of “resilience to online disinformation”, they clustered 18 countries in three groups. The U.S. was in a group all by itself, “characterized by high levels of populist communication, polarization, and low levels of trust in the news media” [25]. France was not included in this study, but Belgium, Canada and Switzerland, countries “with consensus political systems, strong welfare states, and pronounced democratic corporatism” [26], had been classified in the cluster most resilient to disinformation. The findings of this study support the conclusions of Humprecht and her colleagues.



Discussion and conclusion

All three textual analyses converged. The differences that they’ve helped pinpoint between media and non-media content on Facebook boiled down to a classical opposition between “public interest” and “the public’s interest”.

The concept of “public interest” could be synonymous with “common good”, “public service” or “general welfare”. It is something many professions vow to defend: public servants, politicians, lawyers, physicians, journalists, to name just a few.

In journalism, public service is a core ideal-typical value and a “powerful component of journalism’s ideology” [27]. Many journalists see themselves as working for the public first and foremost, the media organization employing them being second in the order of their loyalties. That their “primary commitment is to the public” is a “deeply felt tradition” (Kovach and Rosenstiel, 2014).

But it isn’t only for the public’s sake that journalists strive to serve it. The public service ideal goes hand in hand with the democratic ideal. “Journalists think of journalism as a service in the public interest, one that is shaped with an eye toward the needs of healthy citizenship” [28]. It can be argued that Western journalism sees “the role of the press as serving the public interest by providing the informational needs necessary for [a] self-governing republic” [29]. In doing so, “Journalists earn public trust and protection by loyally performing their public service duty to citizens” [30].

To fulfill this duty, normative as it may be, media organizations have historically found ways to reach the public that they aim to serve. They’ve done so by selling newspapers, by broadcasting radio and television news programs, by creating Web sites. In the last decade, social networks have become an increasingly important channel to reach the public. It is a channel media organizations don’t control. It is a channel where Facebook products (the original Facebook platform, Instagram, as well as messaging applications Messenger and WhatsApp) are by far the most used by citizens throughout the world to access news, according to the latest versions of the Reuters Institute Digital News Report (Newman, et al., 2021, 2020, 2019).

Facebook’s role goes beyond that of a neutral, passive platform. Beginning in 2013, it has actively courted news organizations, inviting them not only to share content on Facebook, but to create content specifically designed for it (Mattelart, 2020). Without ever explicitly recognizing it, Facebook has taken the public service duty to inform citizens upon itself too.

Facebook seemed to embrace that role in a 2017 manifesto following the 2016 U.S. presidential election. The social network’s founder pledged that his company’s “next focus will be developing the social infrastructure for community” (Zuckerberg, 2017). Among the five goals he described for the future of Facebook was that it should be “informing us”.

Although Zuckerberg wrote that “a strong news industry is [...] critical to building an informed community”, many decisions taken afterwards ran counter to the 2017 manifesto. Barely a year later, in a bid “to encourage meaningful social interactions with family and friends”, the CEO announced that his users will “see less public content, including news [...]. After this change, we expect news to make up roughly 4% of News Feed — down from roughly 5% today” (Zuckerberg, 2018).

In February 2021, while the storm was the fiercest in Australia, Facebook announced that it wished to reduce political content on its’ users News Feed. “We’ll temporarily reduce the distribution of political content in News Feed for a small percentage of people in Canada, Brazil and Indonesia this week, and the US in the coming weeks” (Gupta, 2021a). Content from news media could be labeled political, according to a New York Times story. “Under the new test, a machine-learning model will predict the likelihood that a post — whether it’s posted by a major news organization, a political pundit, or your friend or relative — is political” (Roose and Isaac, 2021).

Two months later, the Palo Alto company began surveying its users “to understand which posts they find inspirational [...] People have told us they want to see more inspiring and uplifting content in News Feed because it motivates them [...]. For example, a post featuring a quote about community can inspire someone to spend more time volunteering” (Gupta, 2021b).

Less news, more “inspirational” content. Facebook’s idea of an informed community is one where individual expression around personal interests and experiences are merely aggregated by algorithms. It is not the public sphere of deliberation where professionals gather and report news in the public interest an informed citizenry would be in right to expect (Carignan, 2021).

The kind of content that Facebook pushes to its users is a self-serving version of the public interest. Carol W. Lewis would call it “a fabricated version of the common good” [31]. Removing public interest content from Facebook helps us understand what kind of content the Palo Alto giant favors: sports, celebrities, “inspirational” “feel good memes”; fast food satisfying in the short term, but dangerous in the long run for the health of a democratic society, or what Mr. Zuckerberg calls a “civically-engaged community”.

“It is not true that news has value for Facebook”, Kevin Chan, head of public policy for Facebook, Inc., in Canada, told me (Roy, 2021a). I beg to differ. News content is extremely valuable for the company. It confers what Jane B. Singer calls “societal value” by conveying “information people can trust” [32]. That trust, she writes, “is best established and nurtured by those with an existential commitment to social responsibility”, i.e., journalists. Trust in news media is low in many countries, but trust in social media is much lower (Edelman, 2021).

Would Facebook users come back if they could no longer find trustworthy information on the platform? Maybe not, because as Singer points out: “the public needs some means of differentiating between what is valuable to society as a whole and what is less so; otherwise, the notion of a coherent ‘public’ falls apart as each individual seeks out whatever seems most personally appealing at the moment” [33].

In other words, a newsless Facebook would be a socially useless Facebook. End of article


About the author

Jean-Hugues Roy teaches data and computational journalism at Université du Québec à Montréal (UQAM). A journalist for 23 years, he now studies information flows online and media business models using computational methods.
E-mail: roy [dot] jean-hugues [at] uqam [dot] ca



