First Monday

Hey, Google, is this what the Holocaust looked like? Auditing algorithmic curation of visual historical content on Web search engines by Mykola Makhortykh, Aleksandra Urman, and Roberto Ulloa



Abstract
By filtering and ranking information, search engines shape how individuals perceive both present and past events. However, these information curation mechanisms are prone to malperformance that can misinform their users. In this article, we examine how search malperformance can influence the representation of the traumatic past by investigating the image search outputs of six search engines in relation to the Holocaust in English and Russian. Our findings indicate that besides two common themes — commemoration and liberation of camps — there is substantial variation in the visual representation of the Holocaust between search engines and languages. We also observed several instances of search malperformance, including content propagating antisemitism and Holocaust denial, misattributed images, and disproportionate visibility of specific Holocaust aspects that might result in distorted perception by the public.

Contents

Introduction
Case study: Visual representation of the Holocaust by human and algorithmic curators
Related work: Algorithmic auditing of search engine malperformance
Methodology
Findings: Visual representation of the Holocaust via algorithmic curation
Findings: Malperformance of algorithmic curation of Holocaust images
Discussion and conclusion

 


 

Introduction

Web search engines, such as Google or Bing, can shape social reality through algorithmic content curation. By filtering and ranking information, search engines influence how their users perceive phenomena ranging from gender (Kay, et al., 2015) and race (Noble, 2018) to artificial intelligence (Cave and Dihal, 2020) and elections (Unkel and Haim, 2021). Similarly, search engines affect how the public is informed about the past by prioritizing certain forms of historical content and their interpretations (Hitchcock, 2013).

Despite being trusted by users (Edelman, 2020) and presented by their owner companies as “unbiased and scientific” (Sweeney, 2013), search engines are prone to malperformance. One form of such malperformance is bias, which is a systematic skewness of search outputs (Friedman and Nissenbaum, 1996). Bias often reflects societal prejudices and can amplify discrimination against vulnerable race and gender groups (Noble, 2018).

Another form of malperformance is non-systematic errors that result in irrelevant or low-quality outputs for specific queries and can mislead users. Examples of such outputs range from the appearance of hyperpartisan media promoting unreliable information for socially important queries, such as “coronavirus” (Makhortykh, et al., 2020), to the inclusion of “pornified” content for innocuous queries describing race and gender groups (Noble, 2018).

Search malperformance affects the representation of general concepts (e.g., race; Noble, 2018) and recent events (e.g., COVID-19; Makhortykh, et al., 2020). However, misrepresentation of historical phenomena can also be dangerous. Distortion of historical facts can facilitate stigmatization by blaming certain groups for past injustices, as in the case of Rohingya ethnic cleansing (Ware and Laoutides, 2019). It can also mislead the public by appealing to the strong emotions associated with the past, a tactic that is actively used by populist actors around the world (Lammers and Baldwin, 2020; Polletta and Callahan, 2017).

A few studies (Sullivan, 2016; Zavadski and Toelpf, 2019) scrutinize historical content curation on search engines and their possible malperformance. However, all of these studies deal with textual and not visual representations of historical phenomena. While both text and image searches are forms of algorithmic curation, there are differences between them that prompt our interest in the way image search algorithms treat historical content.

Unlike text search, which functions as a library catalog, image search highlights specific visual aspects of a phenomenon similar to a museum exhibition (Hansen-Glucklich, 2014). Consequently, image searches are subjected to additional ethical considerations, particularly when dealing with traumatic historical events (e.g., mass atrocities). While visualizing victims’ suffering and perpetrators’ cruelty, the outputs need to maintain a balance between satisfying user information needs and the obligation to keep the dignity of victims from being assaulted and to avoid justifying crimes (Lechtholz-Zey, 2012).

This task is complicated by the effectiveness of visual content for expressing but also instrumentalizing historical trauma. Images have strong affective potential (Makhortykh and Aguilar, 2020) and can facilitate the comprehension of complex phenomena that are hard to express verbally (Bleiker, 2018). These qualities reinforce images’ potential to create narratives “that can reinforce or challenge the accepted past” [1], but can lead both to the recognition of past suffering and its silencing.

The importance of images for communicating the past and, additionally, their algorithmic curation in Web searches prompts the main question of this article: how image search engines represent the Holocaust and whether this process is prone to malperformance.

The choice of case study is attributed to the Holocaust’s unique status as a global memory event (Levy and Sznaider, 2006). Because it is an epitomic case of mass atrocities, the Holocaust often serves as a point of reference for other crimes against humanity (Stevick and Gross, 2014). The Holocaust also features prominently in the public debate on topics varying from the rise of populism to memory ethics (Subotić, 2020), which additionally stresses the importance of its representation by search engines.

To answer this question, we audit image search results for the “Holocaust” query in six search engines. Using research on the human curation of Holocaust images and Holocaust representation online, we identify possible forms of search malperformance and check for them in search outputs. To identify whether search malperformance varies depending on the language in which the search is conducted, we compare the results obtained for Russian and English language queries.

 

++++++++++

Case study: Visual representation of the Holocaust by human and algorithmic curators

Human curation and visual representation of the Holocaust

The Holocaust is one of the most well-documented cases of mass atrocities, which, to a large extent, is attributed to visual content produced during and after the Second World War (Kushner, 2006). Images, such as photos from concentration camps served as crucial evidence during the postwar tribunals investigating Nazi crimes (Bathrick, 2008). Today, these images help to communicate the traumatic experience of the Holocaust to new generations and preserve memories of its victims (Hirsch, 2001).

Despite its importance, the use of visual content for remembering the Holocaust is also controversial. A large number of Holocaust images prompt the need for their selection by human curators (Kushner, 2006). However, such selection is a complicated task in which both educational (e.g., what image is more suitable for informing the audience) and ethical (e.g., does the suffering of Auschwitz or Majdanek victims deserve more spotlight) perspectives need to be considered (Holtschneider, 2007).

These ethical complexities prompt many scholars and survivors to argue against the idea of visually representing the Holocaust (Crane, 2008). The core of this argument is the claim that unprecedented violence during the Holocaust cannot be expressed adequately; hence, silence is a “more accurate or truthful or morally responsive” (Lang, 2000) way of treating it.

Practically, it resulted in several calls to avoid representing the Holocaust by visual means, or at least limit access to images associated with it (Crane, 2008). Neither of these options, however, looks feasible today, considering the wide distribution of Holocaust-related content across online platforms (Gibson and Jones, 2012; Makhortykh, 2019, 2017; Menyhért, 2017).

The infeasibility of not visualizing the Holocaust indicates the need to design ways of doing it ethically, which is no trivial task. Even though regional discrepancies in Holocaust representation (Young, 1993) became less pronounced following its reimagination as a global event (Levy and Sznaider, 2006), multiple context-agnostic considerations for its human curation persist.

These considerations relate to different aspects of Holocaust representation. Many Holocaust visuals are hard to digest, as visitors are confronted with graphic images of dead bodies (Holtschneider, 2007). Instead of evoking compassion, it can shock individuals and make them object to Holocaust representation (Carden-Coyne, 2011). To counter this, museums use prewar images of victims to build an empathic connection between them and visitors (Holtschneider, 2007), though the focus on such images can also obscure more horrendous aspects of the Holocaust.

The comprehensiveness of representation is another consideration that must be accounted for. While victims are usually situated at the center of representation, bystanders and perpetrators also played a role in the Holocaust. The exclusion of the latter two groups can decontextualize the event’s representation, but their inclusion can undermine the centrality of victims’ suffering (Carden-Coyne, 2011).

Similarly difficult is the process of deciding whose suffering to prioritize. The Holocaust took place at more than 44,000 incarceration sites (U.S. Holocaust Memorial Museum [USHMM], n.d.), but a few of them, in particular Auschwitz, are more well known due to being featured in popular culture products (Mintz, 2001) and serving as tourist destinations (Biran, et al., 2011). However, it creates a dilemma for human curators: to focus on recognizable images, which might be more appealing for the public, or to highlight less-known killing sites.

These considerations are far from the only challenges of human curation of Holocaust images. The other challenges vary from mislabeled images (Zelizer, 1999), the use of which can undermine the authenticity of Holocaust representation, to the dubious origins of many visuals produced by perpetrators against the will of their victims (Hirsch, 2001), thus prompting ethical concerns about their use.

These ethical concerns are amplified by the digitization of Holocaust images and the use of online platforms for their dissemination (Makhortykh, 2019). The opinions on the effects of such change vary from offering new venues for dealing with Holocaust trauma (Gibson and Jones, 2012) to facilitating Holocaust denial and trivialization (Gray, 2014).

Importantly, the platformization of Holocaust representation also diminishes the role of heritage institutions as gatekeepers of historical content. Instead of human curators who determine how the Holocaust is represented by museum exhibitions, the selection and ranking of Holocaust images online are performed by algorithms. Algorithmic curation of Holocaust-related content also raises its own concerns, as we discuss below.

What can go wrong with the algorithmic curation of Holocaust information?

Due to the lack of research on the algorithmic curation of Holocaust images, we synthesized our own list of indicators of systematic and non-systematic search malperformance in relation to this. To do so, we combined insights from research on the challenges of human curation of Holocaust images and on the representation of the Holocaust online. The resulting list consists of five indicators of search malperformance in the context of the Holocaust: 1) misattribution; 2) overrepresentation; 3) trivialization; 4) revisionism; and 5) antisemitism.

Misattribution occurs when visual content is attributed to a historical phenomenon to which it is actually unrelated. In the case of the Holocaust, misattribution is a common concern of human curation due to the large amount of visual material that often lacks reliable attribution (Zelizer, 1999). The use of misattributed content can mislead the public and cause incorrect interpretations of the event to which the image is mistakenly connected.

Misattribution is even more common for online representation of the Holocaust because Web users often lack the professional training required to correctly identify the origins of a particular image. It also facilitates the use of misattributed images to instrumentalize Holocaust memory (e.g., to stigmatize political opponents; Makhortykh, 2019). Depending on the sources from which search engines retrieve images, misattribution can be more or less present, but we expect at least some misattributed images to appear in the search outputs.

Overrepresentation is related to the disproportionate visibility of certain aspects of a historical phenomenon, which can lead to its skewed perception. Similar to misattribution, it often occurs in the case of human curation of Holocaust-related content (Ebbrecht, 2010; Hansen-Glucklich, 2014). Some examples include the frequent focus on Holocaust perpetrators that leads to structuring the event’s representation around their “chronology and ideology” [2], as well as images showing the liberation of camps (Ebbrecht, 2010).

The systematic prevalence of certain aspects of the Holocaust in human curation can lead to their overrepresentation in search outputs. While no study has investigated this memory spillover effect for image searches, Zavadski and Toepfl (2019) found that history-related text search results tend to reproduce dominant narratives. Hence, there is a possibility that search outputs can systematically prioritize some aspects of the Holocaust while downgrading others.

Trivialization concerns the use of the Holocaust for amusement and public distraction purposes (Doneson, 1996). It often involves simplification of the Holocaust’s complex nature to make it more accessible or to downgrade its gruesome aspects and provide a more entertaining experience. Examples of such uses vary from Holocaust tourism (Cole, 1999) to Holocaust-themed exploitation movies (Kerner, 2011).

A common form of online trivialization is the use of Holocaust references for producing entertaining content, such as Internet memes (Makhortykh, 2015; Sanchez, 2020). While such content can be viewed as less offensive than revisionist and antisemitic claims, it diminishes the importance of the Holocaust by normalizing it and humanizing perpetrators (Rosenfeld, 2014). Considering the effort that search engines put into countering such “junk” (Bradshaw, 2019) content, we do not expect trivializing images to appear in response to history-related queries, but it must be verified if this is the case.

Revisionism (also known as Holocaust denialism) rejects evidence of the Holocaust and challenges established views on the event. Revisionists often make claims that “carry a degree of absurdity” [3] in their interpretation of the past and attack the dignity of the Holocaust victims. Such claims vary from downgrading its scale to complete rejection of the fact that the Holocaust happened (Lang, 2010).

The rise of digital media facilitates the distribution of revisionist views due to the ease of promoting revisionist content online and the focus of anti-revisionist legislation on analog media (Whine, 2008). Search engines, particularly Google, prioritized revisionist Web sites for some Holocaust queries, but after these cases were exposed, they were demoted (Sullivan, 2016). Hence, while we expect that revisionist content should not be prioritized for a general “Holocaust” query, we cannot fully exclude the possibility of it appearing there.

Antisemitism in the form of different expressions of hostility toward Jews is another indicator of search malperformance. Because the Holocaust is an important component of Jewish identity, it is also targeted by antisemitic content. Unlike revisionist content that focuses on denouncing the notion of the Holocaust, antisemitic content attacks Jews in the context of the Holocaust by justifying it and calling for its continuation.

Despite attempts to fight antisemitism online, digital media remains “a safe harbor” (Ozalp, et al., 2020) for antisemitic campaigns. While search engines try to avoid prioritizing offensive content, they sometimes promote antisemitic content (Noble, 2018). Such cases are usually explained by data voids, namely, limited, or non-existing data for specific search terms (Nguyen, 2020). Following this logic, antisemitic content should not appear for broad queries, such as “Holocaust,” but whether that is actually the case has to be checked.

 

++++++++++

Related work: Algorithmic auditing of search engine malperformance

Algorithmic auditing is a research methodology that scrutinizes “the functionality and impact of decision-making algorithms” (Mittelstadt, 2016). Functionality auditing examines how algorithms arrive at certain decisions, whereas impact auditing investigates their outputs. Of the two approaches, impact auditing is commonly used to detect malperformance in Web search results (Trevisan, et al., 2018).

There are three approaches used for impact auditing in the context of Web search. The first relies on querying search engines either manually (Noble, 2018; Sullivan, 2016) or via respective APIs (Kay, et al., 2015; Otterbacher, et al., 2017). This approach is effective for detecting search malperformance and does not require complex technical implementation, but its results might be affected by search personalization and randomization.

Noble (2018) used the querying approach to detect the systematic overrepresentation of derogatory content in relation to non-white and female groups. It was also used for studying curation of historical content in a text search, which identified high visibility of denialist content for Holocaust queries (Sullivan, 2016) and similarities between Google and Yandex in reliance on authoritative sources for history-related queries (Zavadski and Toepfl, 2019).

The second auditing approach uses search results collected from crowd-workers. Participants can be recruited using crowd-working platforms, such as MTurk, but scaling it to many workers can be costly. This approach is also not suitable for certain auditing problems (e.g., studying the effect of specific variables on search outputs; Hannak, et al., 2013) because results can be affected by search personalization, which is difficult to control for.

The crowd-worker approach is useful for auditing Web search bias and ideological segregation because it allows for the comparison of outputs from workers with different ideologies. It was used to disprove the polarizing effect of Google’s Web search following the 2016 U.S. elections (Robertson, et al., 2018). It also revealed discrepancies in the representation of German parties in Google searches around the 2017 German federal elections (Puschmann, 2019).

The third approach employs virtual agents, that is, software emulating user behavior to collect search outputs. It can involve modeling agent personas (Feuz, et al., 2011) or Web search accounts (Hannak, et al., 2013) or focus on non-personalized outputs for text (Makhortykh, et al., 2020) or video search (Urman, et al., 2021a). Agent-based approaches can be difficult to implement technically, but they allow controlling for both search personalization and randomization by auditing in a controlled environment.

Agent-based approaches are used for a broad range of auditing tasks. Feuz, et al. (2011) simulated the browsing behavior of different information-seeking personas to show that the effects of search personalization for text outputs increase over time. Another study found evidence of limited personalization of political search outputs on Google and the prevalence of mainstream information sources (e.g., news media and political parties) (Unkel and Haim, 2021).

Despite the growing use of auditing for detecting Web search malperformance, there are still several aspects of it which remain understudied. Only a few studies (Kay, et al., 2015; Otterbacher, et al., 2017) have conducted audits of image search outputs. The results of these studies indicate systematic search malperformance that leads to the reiteration of societal stereotypes (e.g., in relation to gender and race). It stresses the need to extend research on image search and its malperformance to other areas, such as historical content.

Another aspect of search malperformance that is currently underinvestigated is its comparative dimension. Most auditing studies focus on a single search engine, namely Google, which is the most popular search engine globally (Statcounter, 2020a). However, other search engines are still used by millions of users and play an important role in local markets, such as Baidu for China and Yandex for Russia (Statcounter, 2020b).

Some comparative studies (Jiang, 2014; Makhortykh, et al., 2020; Urman, et al., 2021b) found major discrepancies in how different search engines represented and misrepresented specific phenomena. This indicates the need for more comparative analyses that can determine whether some search engines are more prone to malperformance and detect how different curation models affect the representation of specific subjects, including historical information.

 

++++++++++

Methodology

Data collection

To collect data, we used software simulating user browsing behavior (e.g., scrolling pages and entering queries) and recording its outputs (for more information on method see Ulloa, et al., 2021). This virtual agent-based auditing approach allows controlling for personalization (Hannak, et al., 2013) and randomization (Makhortykh, et al., 2020; Urman, et al., 2021b) factors that can influence search outputs.

Unlike human agents, virtual agents can easily be synchronized to isolate the effects of the time at which the search is conducted. They can also be deployed in a controlled environment, such as a network of virtual machines in the same IP range using the same operating system and browsing software, to limit the effects of personalization that might lead to skewed outputs.

In addition to controlling for personalization, agent-based auditing allows addressing the randomization of a Web search that is caused by search engines testing different ways of ranking results (Battelle, 2011). Search randomization can lead to different outputs for identical queries entered under the same conditions, thus making the observations non-robust. One way of addressing this is to deploy multiple virtual agents that simultaneously enter the same search query to determine randomization-caused variation in the sets of outputs that can then be merged into a single, more complete set.

For the current study, we built a network of 100 CentOS virtual machines in the Frankfurt region of the Amazon Elastic Compute Cloud. On each machine, we deployed two virtual agents: one in the Chrome browser and one in Mozilla Firefox. Each agent consisted of two browser extensions: a tracker and a bot.

The tracker collected the HTML and metadata of all pages visited in the browser and sent them to a storage server. The bot emulated a sequence of browsing actions that consisted of (1) visiting an image search engine page; (2) entering the “Holocaust” query; (3) scrolling down the search result page to load at least 50 images; and, (4) cleaning data accessible by the browser (browsing history and cache) and the search engine’s JavaScript (local storage, session storage, and cookies) to prevent earlier searches affecting the subsequent ones.

The length of the simulated browsing session was kept under three minutes for all search engines. The next browsing session always started seven minutes after the beginning of the previous session to guarantee that the agents would always be synchronized. Before starting the experiment, the browsers were cleaned to prevent the search history from affecting the search outputs (Ulloa, et al., 2021).

The study was conducted on 27 February 2020. We distributed 200 agents among the six most popular search engines by market share: Google, Bing, Yahoo, Baidu, Yandex, and DuckDuckGo (Statcounter, 2020a). For all engines, the “.com” version of the engine was used.

The agents were equally distributed between the engines, but because of technical issues (e.g., bot detection mechanisms), some agents did not manage to complete their routine. The overall number of agents per engine that completed the full simulation routine and returned the search results was as follows: Baidu (31), Bing (31), DuckDuckGo (34), Google (33), Yahoo (31), and Yandex (15).

Data analysis

For our analysis, we extracted URLs of image search results for each agent and detected the 50 most frequent search results per search engine. These images were manually examined by one of the authors, who is a trained historian with experience of working with Holocaust-related archival and digital content. The purpose of this examination was to detect the location and time at which the image was produced, so it would be possible to identify whether it was related to the Holocaust (and not another historical event), and facilitate the identification of which aspect of the Holocaust the image showed.

During the examination, the historian retrieved all the images from their respective URLs and viewed them one by one in the browser. Many of the reviewed Holocaust-related images were rather well known, and thus were easy to connect to a specific location or event (e.g., the Warsaw Ghetto boy photo; Zelizer, 2015). However, in order to minimize the probability of an error, each image was searched for in the online collections of the U.S. Holocaust Memorial Museum (USHMM) to verify image attribution.

The search was facilitated by the historian’s existing knowledge of Holocaust materials that allowed them to narrow down the process of searching to specific locations/episodes and then verify the initial attribution. In these cases, when no direct match was found (or the image was obviously not related to the Holocaust, as in the case of the exploitation movie posters), the reverse search in Google and Yandex was used to locate the source of the image.

As part of this initial examination, URLs that were no longer accessible (e.g., because of a change in address) were dropped. A few images for which it was not possible to identify reliably whether they were related to the Holocaust, were ignored. The result was the set of images described in Table 1 and used for the rest of the analysis.

 

Table 1: The number of outputs per engine by language.
 BaiduBingDuckDuckGoGoogleYahooYandex
English454350414449
Russian443643424749

 

Following the examination, the resulting images were examined by two coders to detect possible instances of search malperformance. Each image was classified according to the following categories: 1) Revisionism: does the image deny that the Holocaust has happened or downplay its scale? (1-Yes; 0-No); 2) Trivialization: does the image make fun of the Holocaust or refer to it in an ironic or humorous manner? (1-Yes; 0-No); 3) Antisemitism: does the image attack Jews or make other antisemitic claims? (1-Yes; 0-No); 4) Misattribution: is the image unrelated to the Holocaust as a historical event or its postwar representation? (1-Yes; 0-No); and, 5) Holocaust theme: What aspect of the Holocaust is illustrated by the image? (multiple options; see Figures 34).

To measure intercoder reliability, 30 percent of the sample were coded by both coders. Based on this, we calculated the Kripperndorf’s alpha values for each of the categories above, which showed a high level of reliability: 0.85 (revisionism), 0.85 (trivialization), 0.93 (antisemitism), 0.89 (misattribution), and 0.84 (Holocaust theme). Following the reliability assessment, the identified disagreements were resolved using consensus coding.

 

++++++++++

Findings: Visual representation of the Holocaust via algorithmic curation

Figures 1 and 2 show that algorithmic curation of Holocaust content prioritizes images related to two Holocaust themes. These two themes are represented by images depicting the liberation of the camps by the Allies at the end of the war (liberation) and the postwar commemoration of the Holocaust (memory). The latter theme is represented primarily by the images of Holocaust museums and memorials, in particular the Auschwitz-Birkenau Museum (Oświęcim) and Memorial to the Murdered Jews of Europe (Berlin).

 

Proportion of outputs related to Holocaust themes (per engine; English query)
 
Figure 1: The proportion of outputs related to Holocaust themes (per engine; English query).

 

 

Proportion of outputs related to Holocaust themes (per engine; Russian query)
 
Figure 2: The proportion of outputs related to Holocaust themes (per engine; Russian query).

 

The visibility of several other themes varied depending on the query language. For English, search engines prioritized images showing Jews being arrested (arrest) and deported to concentration and death camps (deportation). For Russian, images of murdered camp inmates (post-murder) were more visible, particularly when compared with the outputs for the English queries, where their presence was marginal.

The language-based difference was particularly pronounced in the case of the Chinese search engine Baidu, where for the Russian query, there were no outputs related to the Holocaust. Instead, there were only misattributed images that could be explained by Baidu’s search algorithms being prone to malperformance in the case of Russophone requests.

In addition to the query language, the choice of search engine also influenced Holocaust representation. For instance, images of Jews being tortured (torture) appeared only on smaller Western engines (Bing, DuckDuckGo, and Yahoo). The same engines, together with Google and Baidu, showcased images of actual murder (murder), whereas such content did not appear on Yandex.

Several themes appeared only on one or two engines, thus stressing cross-engine differences in historical content curation. Images of prewar Jewish life (prewar), Holocaust survivors seeking revenge (retribution), life during Nazi occupation outside ghettos (occupation), and Holocaust perpetrators (perpetrators) appeared only on Google. Similarly, only Baidu and Yandex showed images of Jewish children being rescued before the war (rescue).

In terms of geography, the majority of images came from Holocaust sites located in contemporary territories of Germany and Poland (Figures 3 and 4). Their prevalence across all engines and languages can be explained by major Holocaust camps-turned-museums being located there (Auschwitz, Treblinka, and Belzec for Poland and Dachau, Bergen-Belsen, and Buchenwald for Germany). The only exception was Baidu, where the second largest number of outputs was related to commemorative sites in the U.S. (e.g., U.S. Holocaust Memorial Museum).

 

Proportion of outputs related to Holocaust sites by country (per engine; English query)
 
Figure 3: The proportion of outputs related to Holocaust sites by country (per engine; English query).

 

 

Proportion of outputs related to Holocaust sites by country (per engine; Russian query)
 
Figure 4: The proportion of outputs related to Holocaust sites by country (per engine; Russian query).

 

The visibility of sites in other countries varied depending on the query language. For the one in English, there were more outputs related to Austria (Mauthausen), Hungary (Budapest), and Israel (Yad Vashem). There were relatively few outputs related to post-Soviet countries, such as the Ukraine, Belarus, or Russia, despite major acts of extermination happening there (in particular, in the Ukraine, where more than 1.5 million Jews were murdered as part of the so-called “Holocaust by bullets,” namely the acts of extermination carried outside specialized areas; Desbois, 2008).

A reverse situation was observed for outputs in response to the Russian query, where images coming from post-Soviet countries gained prominence, and outputs associated with Central European countries (Austria and Hungary) were less visible. This discrepancy can be explained by memory spillover, particularly Russophone memory institutions focusing on Nazi crimes committed in Eastern Europe and providing more visual content for them.

In terms of specific Holocaust camps, we found that, independent of language, images related to Auschwitz were prioritized (Figures 5 and 6). A few other camps that frequently appeared in outputs were Bergen-Belsen (Germany), Buchenwald (Germany), Ebensee (Austria), and Dachau (Germany). The exact proportion of images related to the respective camps varied by engine (e.g., DuckDuckGo and Yahoo prioritized content from Bergen-Belsen, Google and Yandex from Buchenwald).

 

Proportion of outputs related to Holocaust camps per camp (per engine; English query)
 
Figure 5: The proportion of outputs related to Holocaust camps per camp (per engine; English query).

 

 

Proportion of outputs related to Holocaust camps per camp (per engine; Russian query)
 
Figure 6: The proportion of outputs related to Holocaust camps per camp (per engine; Russian query).

 

The visibility of camps other than Auschwitz was also affected by the language of the query. Images from Ravensbrück, Sachsenhausen, and Nordhausen (all from Germany) appeared only for Russian queries. However, with the exception of Auschwitz, images of concentration camps from Western Europe, where inmates were incarcerated, prevailed over images of extermination camps from Eastern Europe, where inmates were murdered.

Such a distribution can be explained by content availability (e.g., Western European camps being liberated by Allies, who produced more visual content for Western Web sites) and fewer graphic images coming from concentration camps, particularly considering that extermination camp inmates were usually murdered before their liberation. However, it also resulted in the omittance of major centers of extermination in the east, which either appeared in search outputs only a few times (Majdanek, Treblinka) or never appeared at all (Sobibor, Chemno).

 

++++++++++

Findings: Malperformance of algorithmic curation of Holocaust images

Revisionism, trivialization, and antisemitism

Our analysis shows that some search outputs promoted revisionism, trivialization, and antisemitism. These outputs were distributed unequally between the search engines, and their presence was higher for two Western engines: DuckDuckGo and Yahoo. In both cases, these forms of malperformance were more present in response to the Russian query. This indicates that algorithmic curation was more prone to errors in non-English content.

Figure 7 shows that antisemitic content appeared in nine percent and eight percent of the search outputs for DuckDuckGo and Yahoo, respectively. It usually consisted of images attacking Jews in the context of the Holocaust (e.g., by claiming that it was caused by the need for Christians to protect themselves against Jews) or reproducing antisemitic tropes (e.g., by showing caricaturist images of Jewish puppeteers).

 

Proportion of outputs with antisemitic content (per engine)
 
Figure 7: The proportion of outputs with antisemitic content (per engine).

 

The presence of such images aligns with recent observations of the presence of antisemitic content in image search outputs (Nguyen, 2020). However, unlike other cases, where it was explained by data voids, namely, the absence of proper results for rare queries, the “Holocaust” query is not a niche one. This observation questions the data void argument and suggests that the retrieval of antisemitic content can be caused by search algorithms and not the absence of relevant content.

A few outputs (Figure 8) also promoted Holocaust trivialization, usually in the form of Internet memes, such as the one showing a Russian-Jewish TV anchor, Igor Kvasha, with a caption that said “Burn me” (a reference to the TV program called “Wait for me” hosted by Kvasha). Another common example of trivialization dealt with images of people behaving improperly at Holocaust sites (e.g., sitting at the Memorial to the Murdered Jews of Europe in Berlin).

 

Proportion of outputs with trivialization content (per engine)
 
Figure 8: The proportion of outputs with trivialization content (per engine).

 

While generally less obtrusive than antisemitic images, trivialization content can still be treated as an attack against the dignity of Holocaust victims. It can also promote inappropriate behavior at Holocaust sites, and in some cases, like the one with the Kvasha meme, gets close to antisemitism. The appearance of such content in the outputs for the “Holocaust” query can be attributed to the malperformance of filtering mechanisms.

Compared with antisemitism and trivialization, revisionism was represented by only a few outputs (Figure 9). Its low visibility can be attributed to both the difficulty of expressing revisionist arguments via visual means only and the solid performance of filtering mechanisms that prevent such outputs from appearing in response to Holocaust-related queries.

 

Proportion of outputs with revisionist content (per engine)
 
Figure 9: The proportion of outputs with revisionist content (per engine).

 

Revisionist content varied among the search engines. For Google, it was represented by an image referring to the Roger Hallam scandal, during which the Extinction Rebellion founder claimed that the Holocaust was “almost a normal event” (Connolly and Taylor, 2019). While not propagating revisionism per se, its appearance in Holocaust-related search outputs can amplify the visibility of revisionist arguments.

In contrast, Russian queries for Yahoo and DuckDuckGo contained images explicitly promoting revisionist views. Usually, these are memes claiming that Jews benefit from the Holocaust and persecute people who try to tell the “truth” about the event. An example of such a meme is an image of Soviet war veterans being humiliated and forced to enter the Holocaust museum. Similar to antisemitic content, the prioritization of such images for the “Holocaust” query can hardly be attributed to data voids and can be explained by filtering malperformance.

Misattribution

Figure 10 shows that many search outputs were unrelated to the Holocaust as a historical event. The degree of misattribution varied across search engines and was higher for the images retrieved for Russian queries. The only exception was Yandex, which can be explained by its algorithms being more likely to be trained on Russophone data. However, considering that Yandex outputs contained the lowest number of misattributed images, the observed difference could also be related to the better design of the image retrieval algorithm.

 

Proportion of outputs prone to misattribution (per engine)
 
Figure 10: The proportion of outputs prone to misattribution (per engine).

 

The importance of query language for this type of malperformance is reflected in the different forms it took between English and Russian queries. For English, misattributed content (except Baidu) had little difference from authentic Holocaust images and was often difficult to detect for a non-trained historian. One example is the image of starving kids behind the wire, which looks similar to images from liberated Nazi camps but is actually a photo made in a Finnish prisoner of war camp for the Russian population in Karelia.

A few other misattributed examples for the English query included black-and-white photos showing suppression of the Mau Mau rebellion in 1950 and the expulsion of Rohingya from Myanmar in 2017. While these outputs also deal with mass atrocities, their retrieval for the “Holocaust” query can result in them being erroneously perceived as either authentic representations of the Holocaust or events similar to the Holocaust in nature or scale, which can be misleading.

In contrast, misattributed images retrieved for the Russian query, particularly on DuckDuckGo, were easy to differentiate from authentic Holocaust content. Most of these outputs had little to do with historical images and showed content that was remotely related to the Second World War (e.g., images of toy soldiers) or not related to history (e.g., images of dead pigs or caricaturist images of Jews).

The Chinese search engine Baidu is a special case in terms of misattribution, with 36 percent of its outputs for the English query and 100 percent for the Russian query being irrelevant to the Holocaust. In the case of the English query, search outputs were mostly made of entertainment content, including posters of exploitation movies and death metal groups with the word “Holocaust” in their names.

For the Russian query, all the search outputs constituted images dealing with the tourist industry (e.g., photos of guest houses and hotel rooms), but were not related to the Holocaust. The lack of relevant outputs is most likely due to Baidu’s poor performance for Russian-language queries, which made it impossible to retrieve any Holocaust-related content in Russian via this engine.

Overrepresentation

Compared with other forms of search malperformance, overrepresentation is harder to define. Unlike cases with a clear baseline against which the distribution of specific features in outputs can be compared (e.g., in the case of gender distribution for occupations in the U.S.; Kay, et al., 2015), for historical content, including the Holocaust, there are no clear guidelines determining the desired proportion of certain features in the outputs.

At the same time, the relative prevalence of outputs related to specific Holocaust themes and sites can be treated as an indicator of overrepresentation by itself. Such unequal retrievability (Traub, et al., 2016) of information associated with certain aspects of the Holocaust creates a skewed perception of the phenomenon, where some aspects are highlighted and others downgraded.

In the case of Holocaust themes, there was a profound imbalance between one or two top themes, which constituted 50–60 percent of all retrieved content per engine, and the other themes. This imbalance leads to a skewed representation of the Holocaust, where the focus is on its final stage (i.e., the liberation of camps) and the aftermath. It results to a situation where the user is exposed to a rather simplified narrative of Jews being deported, liberated, and then commemorated, while omitting other aspects of the Holocaust.

While some of these other aspects (e.g., the consequences of mass murder for the Russian query on Bing) occasionally appeared, a number of Holocaust themes were consistently underrepresented. These themes included not only torture and murder, but also life in ghettos and Jewish resistance. Similarly downplayed were matters of prewar life, which are important for contextualizing the Holocaust (Holtschneider, 2007), as well as the postwar life of survivors.

It is hard to determine the ideal proportion of images showing different aspects of the Holocaust. However, there was a clear imbalance in the representation of the phenomenon when, for instance, on DuckDuckGo (Figure 1), 40 percent of outputs showed liberation of camps and only four percent showed ghettos. Such inequalities can be viewed as unfair, considering that both aspects are important for understanding and remembering the Holocaust.

Unequal retrievability was also observed for content from specific countries and sites. Images from Poland and Germany constituted more than 70 percent of Western engine outputs. Such distribution reflects the fact that many camps were located in what is now the territory of these two countries, but it also omits ghettos in the post-Soviet states and extermination sites used for the “Holocaust by bullets.” Similarly, transit camps used to move Jews from Western Europe (e.g., Westerbork) remained underrepresented.

Overrepresentation was even higher when comparing images related to individual Holocaust camps. The largest proportion of images (in some cases, up to 80 percent) was associated with a single camp: Auschwitz. Other camps, particularly the ones in Eastern Europe, remained mostly ignored. Some of them (e.g., Majdanek and Treblinka) appeared occasionally, whereas others (e.g., Sobibor and Chelmno) were absent in search outputs.

The focus on just a few Holocaust sites resulted in major episodes of the Holocaust being omitted by the algorithmic curation. It has created a situation where the visibility of victims’ suffering and, to a certain degree, its recognition by the public is very unequal. By giving priority to images of Auschwitz, search engines highlight the tragedy of more than a million of its victims, but obscure deaths of those who perished during the “Holocaust by bullets” (1.5 million victims) or, for instance, in Sobibor (more than 150 thousand victims).

The rationale behind this overrepresentation is also concerning. Together with the high visibility of Holocaust museums and memorials (i.e., memory themes), the disproportionate retrievability of content from Auschwitz, which is a major tourist destination, could indicate that search engines prioritized images based on the commercial value of the respective sites. This commodification-based curation logic can be unethical, especially in the case of content dealing with crimes against humanity and prompts the need for further investigation of overrepresentation malperformance in relation to other historical events.

 

++++++++++

Discussion and conclusion

Our analysis indicated that the visual representation of the Holocaust varied depending on the search engine and the language of the query. Western search engines prioritized images showing deportation of Jews and liberation of the camps in English but shifted toward more graphic content in Russian. In contrast, non-Western engines focused on images of Holocaust memorials and, in the case of Baidu, content that was unrelated to the Holocaust.

The differences in algorithmic curation are not surprising per se, considering that search engines rely on different algorithms and databases that might result in different ontologies, namely, hierarchically structured sets of items used to characterize a specific phenomenon (Ramkumar and Poorna, 2014). However, fundamentally different interpretations of the Holocaust in different search engines call into question the status of the Holocaust as a global memory event (Levy and Sznaider, 2006). These differences also raise concerns about the ability of search engines to inform their users about historical events in a comprehensive manner, particularly considering that the logic between their prioritization of specific aspects of the past is unclear.

These concerns are amplified by the instances of search malperformance that we observed. The most noticeable were search outputs promoting revisionism, trivialization, and antisemitism. While the number of such outputs was low, and they were mostly confined to Russian query results, their very presence is concerning. Not only does it misinform the public and attack the dignity of Holocaust victims, it also confirms these views by promoting them via Web services that are trusted by users (Pan, et al., 2007).

The occurrence of these forms of inappropriate content also calls into question the “data void” (Nguyen, 2020) argument used to justify erroneous search outputs. Unlike niche queries, for which the retrieval of irrelevant or offensive content can potentially be explained by the lack of appropriate outputs, this is not the case with the “Holocaust” query in either English or Russian. This observation suggests that, in the case of both the Holocaust and other cases, malperformance can be attributed to the algorithm itself and not to the lack of data.

This suggestion finds support in earlier research criticizing the image search mechanisms used by search engines for relying not on semantic analysis of the image itself, but on the presence or absence of specific text terms in its vicinity (Cui, et al., 2008; Etzioni, et al., 2007). Such implementation can be viewed as counterintuitive to the universal nature of visual content (Etzioni, et al., 2007) and also limits the number of potential outputs in response to queries in languages for which less Web content is available, particularly as it is often difficult to translate ontologies into different languages (Embley, et al., 2011). The latter factor could also explain the higher degree of malperformance for the Russian query, which we observed in several search engines.

Instances of malperformance associated with misattribution and overrepresentation were less obvious, but they had profound consequences for Holocaust representation. A number of misattributed images mixed evidence of the Holocaust with other episodes of the Second World War, as well as postwar atrocities. This undermines the authenticity of Holocaust representation and can distort the way the public perceives the event.

The overrepresentation of a few Holocaust themes and sites can have a similarly distortive effect by simplifying the complex nature of the event and reiterating its stereotypical representation. Such systematic malperformance, which can be viewed as a form of bias similar to the skewed representation of gender and race (Noble, 2018) by search engines, raises ethical concerns, particularly as it seems to be related to the commodification of Holocaust memory.

These observations raise the question of what can be done to counter these instances of malperformance. Some of them (in particular, revisionism, trivialization, and antisemitism) can be addressed by more robust filtering mechanisms, both to filter out “junk” (Bradshaw, 2019) sources and inappropriate images such as memes. While memes can be legitimate search outputs for many queries, they are not the most fitting option for information requests dealing with mass atrocities, particularly when users do not explicitly search for atrocities-related entertainment content.

The implementation of such filtering mechanisms is not a trivial task, considering the constantly changing nature of inappropriate content. This problem is not unique for Holocaust-related content and, similar to the different forms of hate speech and misinformation, requires constant monitoring and updating of content curation systems. Because of its complexity, the implementation of the task might require the deployment of additional oversight-centered algorithmic systems (e.g., artificial intelligence [AI] guardians; Etzioni and Etzioni, 2017) to ensure that the performance of curation systems stays within a certain set of parameters, such as the omission of denialist content in the top search outputs.

Similar to the way search engines shifted toward prioritizing authoritative health information sources during COVID-19 (Makhortykh, et al., 2020), alternative media promoting Holocaust denialism (in particular, in Russian) can be demoted in favor of established heritage institutions, such as Yad Vashem or the U.S. Holocaust Memorial Museum. While it might increase the systematic overrepresentation of certain aspects of the event, which can be present in the content produced by these institutions, it will still allow countering of the most obtrusive forms of malperformance in relation to the Holocaust.

In addition to prioritizing more authoritative sources of information, recent developments in the field of content-based image retrieval can facilitate the adoption of new forms of image search, relying on semantic principles instead of visual similarity or keyword matches in the text accompanying images (Barz and Denzler, 2019; Zhou, et al., 2017). Together with earlier work on multilingual ontology construction (Embley, et al., 2011; Etzioni, et al., 2007), these developments can improve the quality of the algorithmic curation of visual information in response to both English and non-English queries.

Besides filtering out non-authoritative sources, the increasing availability of authentic historical content (e.g., by digitizing museum collections), and advanced mechanisms of visual information retrieval can also address misattribution-related malperformance. However, it would not necessarily address overrepresentation, particularly considering the unequal distribution of authentic content itself, as well as the varying degrees of its popularization.

One potential solution could be to adapt to historical content the diversity metrics that are already used by search engines (Zheng, et al., 2017). To diversify its outputs, Google does not include more than one output from the same domain in its top search (Schwartz, 2019). Similar logic can be used, for instance, to show no more than one image from a specific Holocaust site or related to a certain Holocaust theme in the top outputs.

Implementing such a solution is a challenging technical task because it requires reliable metadata that can be used for diversification. It also raises ethical questions related to the normative role of algorithmic curation of information, such as whether a more equal representation of suffering is more desirable than the current focus on a few sites and themes embedded in popular culture and the tourist industry.

The latter point also emphasizes the importance of considering the ethical aspects of algorithmic curation mechanisms that deal with historical information. Similar to other areas, where the deployment of algorithm-driven systems is subjected to increased scrutiny and calls for regulation (Etzioni and Etzioni, 2016; Etzioni, 2018; Helberger, et al., 2020), the possibility of search malperformance leading to unfair representation of the traumatic past has to be recognized and addressed by intensifying the dialogue between heritage institutions and industry. Such dialogue is essential for improving the ways algorithms curate historical content and can potentially encourage more memory-sensitive design of curation and oversight mechanisms.

It is also important to mention the limitations of this study. First, it relies on data collected during a single experiment, whereas a more longitudinal approach is required to validate the consistency of the findings. Second, because of the post hoc extraction of search results, some of them became unavailable. Third, in a few cases, it was not possible to reliably identify whether the image was related to the Holocaust, which decreased the sample used for analysis.

Finally, for this study, we relied on a single search term — that is, “Holocaust” — in two languages, whereas there are a number of synonyms (e.g., “Shoa”) and related terms (e.g., “ghetto” or “einsatzgruppen”) or personalities (e.g., “Anne Frank” or “Adolf Eichmann”) that are important in the context of algorithmic curation of Holocaust-related content. In future research, we aim to expand the selection of search queries to integrate these terms and compare whether search outputs returned in response to them are different from those retrieved to general queries such as “Holocaust.”

Despite these limitations, our study highlights the urgent need to expand the research on algorithmic curation to historical information. The fact that the visual representation of the Holocaust, the memory of which is protected by legislative mechanisms, is subject to malperformance is concerning. It stresses the importance of further research on the representation of the past by search engines, as well as the more active involvement of memory scholars and curators in the ongoing debate about algorithmic fairness and diversity. End of article

 

About the authors

Dr. Mykola Makhortykh is a postdoctoral researcher at the Institute of Communication and Media Studies at the University of Bern. His research interests include news recommender systems, search engines, and digital memory studies.
Direct comments to: makhortykhn [at] yahoo [dot] com

Dr. Aleksandra Urman is a postdoctoral researcher at the Institute of Communication and Media Studies of the University of Bern and the Social Computing Group, University of Zurich. Aleksandra’s research interests include online political communication, algorithmic biases, and computational research methods.
E-mail: urman [at] ifi [dot] uzh [dot] ch

Dr. Roberto Ulloa is a postdoctoral researcher at the Computational Social Science Department of GESIS Leibniz Institute for the Social Sciences. His research interests include the role of institutions in polarization and homogenization of public opinion.
E-mail: Roberto [dot] Ulloa [at] gesis [dot] org

 

Notes

1. Carden-Coyne, 2011, p. 169.

2. Holtschneider, 2007, p. 89.

3. Behrens, et al., 2017, p. 1.

 

References

B. Barz and J. Denzler, 2019. “Hierarchy-based image embeddings for semantic image retrieval,” Proceedings of 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), pp. 638–647.
doi: https://doi.org/10.1109/WACV.2019.00073, accessed 16 September 2021.

D. Bathrick, 2008. “Seeing against the grain: Re-visualizing the Holocaust,” In: D. Bathrick, B. Prager, and M. Richardson (editors). Visualizing the Holocaust: Documents, aesthetics, memory. Rochester, N.Y.: Camden House, pp. 1–18.

J. Battelle, 2011. The search: How Google and its rivals rewrote the rules of business and transformed our culture. London: Hachette UK.

P. Behrens, N. Terry, and O. Jensen, 2017. “Introduction,” In: P. Behrens, N. Terry, and O. Jensen (editors). Holocaust and genocide denial: A contextual perspective. New York: Routledge, pp. 1–7.
doi: https://doi.org/10.4324/9781315562377, accessed 16 September 2021.

A. Biran, Y. Poria, and G. Oren, 2011. “Sought experiences at (dark) heritage sites,” Annals of Tourism Research, volume 38, number 3, pp. 820–841.
doi: https://doi.org/10.1016/j.annals.2010.12.001, accessed 16 September 2021.

R. Bleiker, 2018. “Mapping visual global politics,” In: R. Bleiker (editor). Visual global politics. New York: Routledge, pp. 1–30.
doi: https://doi.org/10.4324/9781315856506, accessed 16 September 2021.

S. Bradshaw, 2019. “Disinformation optimised: Gaming search engine algorithms to amplify junk news,” Internet Policy Review, volume 8, number 4.
doi: https://doi.org/10.14763/2019.4.1442, accessed 16 September 2021.

A. Carden-Coyne, 2011. “The ethics of representation in Holocaust museums,” In: J.-M. Dreyfus and D. Langton (editors). Writing the Holocaust. London: Bloomsbury Academic, pp. 167–184.

S. Cave and K. Dihal, 2020. “The whiteness of AI,” Philosophy & Technology, volume 33, number 4, pp. 685–703.
doi: https://doi.org/10.1007/s13347-020-00415-6, accessed 16 September 2021.

T. Cole, 1999. Selling the Holocaust: From Auschwitz to Schindler; How history is bought, packaged and sold. London: Routledge.
doi: https://doi.org/10.4324/9781315088129, accessed 16 September 2021.

K. Connolly and M. Taylor, 2019. “Extinction Rebellion founder’s Holocaust remarks spark fury,” Guardian (20 November), at https://www.theguardian.com/environment/2019/nov/20/extinction-rebellion-founders-holocaust-remarks-spark-fury, accessed 21 January 2021.

S.A. Crane, 2008. “Choosing not to look: Representation, repatriation, and Holocaust atrocity photography,” History & Theory, volume 47, number 3, pp. 309–330.
doi: https://doi.org/10.1111/j.1468-2303.2008.00457.x, accessed 16 September 2021.

J. Cui, F. Wen, and X. Tang, 2008. “Real time Google and live image search re-ranking,” MM ’08: Proceedings of the 16th ACM International Conference on Multimedia, pp. 729–732.
doi: https://doi.org/10.1145/1459359.1459471, accessed 16 September 2021.

F. Desbois, 2008. The Holocaust by bullets: A priest’s journey to uncover the truth behind the murder of 1.5 million Jews. New York: St. Martin's Press.

J.E. Doneson, 1996. “Holocaust revisited: A catalyst for memory or trivialization?” Annals of the American Academy of Political and Social Science, volume 548, number 1, pp. 70–77.
doi: https://doi.org/10.1177/0002716296548001005, accessed 16 September 2021.

T. Ebbrecht, 2010. “Migrating images: Iconic images of the Holocaust and the representation of war in popular film,” Shofar, volume 28, number 4, pp. 86–103.
doi: https://doi.org/10.1353/sho.2010.0023, accessed 16 September 2021.

Edelman, 2020. “Edelman Trust Barometer 2020,” at https://www.edelman.com/trust/2020-trust-barometer, accessed 16 September 2021.

D.W. Embley, S.W. Liddle, D.W. Lonsdale, and Y. Tijerino, 2011. “Multilingual ontologies for cross-language information extraction and semantic search,” In: M. Jeusfeld, L. Delcambre, and T.W. Ling (editors). In: Conceptual Modeling — ER 2011. Lecture Notes in Computer Science, volume 6998. Berlin: Springer, pp. 147–160.
doi: https://doi.org/10.1007/978-3-642-24606-7_12, accessed 16 September 2021.

A. Etzioni and O. Etzioni, 2017. “Should artificial intelligence be regulated?” Issues in Science and Technology, volume 33, number 4, pp. 32–36, and at https://issues.org/perspective-artificial-intelligence-regulated/, accessed 16 September 2021.

A. Etzioni and O. Etzioni, 2016. “AI assisted ethics,” Ethics and Information Technology, volume 18, number 2, pp. 149–156.
doi: https://doi.org/10.1007/s10676-016-9400-6, accessed 16 September 2021.

O. Etzioni, 2018. “Point: Should AI technology be regulated? Yes, and here’s how,” Communications of the ACM, volume 61, number 12, pp. 30–32.
doi: https://doi.org/10.1145/3197382, accessed 16 September 2021.

O. Etzioni, K. Reiter, S. Soderland, and M. Sammer, 2007. “Lexical translation with application to image search on the Web,” at http://turing.cs.washington.edu/papers/EtzioniMTSummit07.pdf, accessed 9 June 2021.

M. Feuz, M. Fuller, and F. Stalder, 2011. “Personal Web searching in the age of semantic capitalism: Diagnosing the mechanisms of personalisation,” First Monday, volume 16, number 2, at https://firstmonday.org/article/view/3344/2766, accessed 30 September 2021.
doi: https://doi.org/10.5210/fm.v16i2.3344, accessed 30 September 2021.

B. Friedman and H. Nissenbaum, 1996. “Bias in computer systems,” ACM Transactions on Information Systems, volume 14, number 3, pp. 330–347.
doi: https://doi.org/10.1145/230538.230561, accessed 16 September 2021.

M. Gray, 2014. Contemporary debates in Holocaust education. London: Palgrave Macmillan.
doi: https://doi.org/10.1057/9781137388575, accessed 16 September 2021.

P. Gibson and S. Jones, 2012. “Remediation and remembrance: ‘Dancing Auschwitz’ collective memory and new media,” ESSACHESS — Journal for Communication Studies, volume 5, number 2, at http://www.essachess.com/index.php/jcs/article/view/171, accessed 16 September 2021.

A. Hannak, P. Sapiezynski, A. Molavi Kakhki, B. Krishnamurthy, D. Lazer, A. Mislove, and C. Wilson, 2013. “Measuring personalization of Web search,” WWW ’13: Proceedings of the 22nd International Conference on World Wide Web, pp. 527–538.
doi: https://doi.org/10.1145/2488388.2488435, accessed 30 September 2021.

J. Hansen-Glucklich, 2014. Holocaust memory reframed: Museums and the challenges of representation. New Brunswick, N.J.: Rutgers University Press.

N. Helberger, M. Van Drunen, S. Eskens, M. Bastian, and J. Moeller, 2020. “A freedom of expression perspective on AI in the media — with a special focus on editorial decision making on social media platforms and in the news media,” European Journal of Law and Technology, volume 11, number 3, at https://ejlt.org/index.php/ejlt/article/view/752, accessed 16 September 2021.

M. Hirsch, 2001. “Surviving images: Holocaust photographs and the work of postmemory,” Yale Journal of Criticism, volume 14, number 1, pp. 5–37.
doi: https://doi.org/10.1353/yale.2001.0008, accessed 16 September 2021.

T. Hitchcock, 2013. “Confronting the digital: Or how academic history writing lost the plot,” Cultural and Social History, volume 10, number 1, pp. 9–23.
doi: https://doi.org/10.2752/147800413X13515292098070, accessed 16 September 2021.

K.H. Holtschneider, 2007. “Victims, perpetrators,bystanders? Witnessing, remembering and the ethics of representation in museums of the Holocaust,” Holocaust Studies, volume 13, number 1, pp. 82–102.
doi: https://doi.org/10.1080/17504902.2007.11087196, accessed 16 September 2021.

M. Jiang, 2014. “The business and politics of search engines: A comparative study of Baidu and Google’s search results of Internet events in China,” New Media & Society, volume 16, number 2, pp. 212–233.
doi: https://doi.org/10.1177/1461444813481196, accessed 16 September 2021.

M. Kay, C. Matuszek, and S.A. Munson, 2015. “Unequal representation and gender stereotypes in image search results for occupations,” CHI ’15: Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems, pp. 3,819–3,828.
doi: https://doi.org/10.1145/2702123.2702520, accessed 16 September 2021.

A. Kerner, 2011. Film and the Holocaust: New perspectives on dramas, documentaries, and experimental films. London: Bloomsbury.

T. Kushner, 2006. “Holocaust testimony, ethics, and the problem of representation,” Poetics Today, volume 27, number 2, pp. 275–295.
doi: https://doi.org/10.1215/03335372-2005-004, accessed 16 September 2021.

J. Lammers and M. Baldwin, 2020. “Make America gracious again: Collective nostalgia can increase and decrease support for right-wing populist rhetoric,” European Journal of Social Psychology, volume 50, number 5, pp. 943–954.
doi: https://doi.org/10.1002/ejsp.2673, accessed 16 September 2021.

B. Lang, 2010. “Six questions on (or about) Holocaust denial,” History & Theory, volume 49, number 2, pp. 157–168.
doi: https://doi.org/10.1111/j.1468-2303.2010.00537.x, accessed 16 September 2021.

B. Lang, 2000. Holocaust representation: Art within the limits of history and ethics. Baltimore, Md.: Johns Hopkins University Press.

J. Lechtholz-Zey, 2012. “The laws banning Holocaust denial,” Genocide Prevention Now, number 9, at https://www.ihgjlm.com/wp-content/uploads/2016/01/Laws-Banning-Holocaust_Denial.pdf, accessed 16 September 2021.

D. Levy and N. Sznaider, 2006. The Holocaust and memory in the global age. Translated by A. Oksiloff. Philadelphia, Pa.: Temple University Press.

M. Makhortykh, 2019. “Nurturing the pain: Audiovisual tributes to the Holocaust on YouTube,” Holocaust Studies, volume 25, number 4, pp. 441–466.
doi: https://doi.org/10.1080/17504902.2018.1468667, accessed 16 September 2021.

M. Makhortykh, 2017. “Framing the Holocaust online: Memory of the Babi Yar massacres on Wikipedia,” Studies in Russian, Eurasian and Central European New Media, number 18, pp. 67–94, and at https://www.digitalicons.org/issue18/framing-the-holocaust-online-memory-of-the-babi-yar-massacres/, accessed 16 September 2021.

M. Makhortykh, 2015. “Everything for the Lulz: Historical memes and World War II memory on Lurkomor’e,” Studies in Russian, Eurasian and Central European New Media, number 13, pp. 63–90, and at https://www.digitalicons.org/issue13/historical-memes-and-world-war-2-memory-on-lurkomore/, accessed 16 September 2021.

M. Makhortykh and J.M. González Aguilar, 2020. “Memory, politics and emotions: Internet memes and protests in Venezuela and Ukraine,” Continuum, volume 34, number 3, pp. 342–362.
doi: https://doi.org/10.1080/10304312.2020.1764782, accessed 16 September 2021.

M. Makhortykh, A. Urman, and U. Roberto, 2020. “How search engines disseminate information about COVID-19 and why they should do better,” Harvard Kennedy School Misinformation Review (11 May).
doi: https://doi.org/10.37016/mr-2020-017, accessed 16 September 2021.

A. Menyhért, 2017. “Digital trauma processing in social media groups: Transgenerational Holocaust trauma on Facebook,” Hungarian Historical Review, volume 6, number 2, pp. 355–376.

A. Mintz, 2001. Popular culture and the shaping of Holocaust memory in America. Seattle: University of Washington Press.

B. Mittelstadt, 2016. “Auditing for transparency in content personalization systems,” International Journal of Communication, volume 10, pp. 4,991–5,002, and at https://ijoc.org/index.php/ijoc/article/view/6267, accessed 16 September 2021.

G. Nguyen, 2020. “Antisemitic memes in image results highlight vulnerabilities in search” (30 September), at https://searchengineland.com/antisemitic-memes-in-image-results-highlight-vulnerabilities-in-search-341267, accessed 21 January 2021.

S.U. Noble, 2018. Algorithms of oppression: How search engines reinforce racism. New York: NYU Press.

J. Otterbacher, J. Bates, and P. Clough, 2017. “Competent men and warm women: Gender stereotypes and backlash in image search results,” CHI ’17: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 6,620–6,631.
doi: https://doi.org/10.1145/3025453.3025727, accessed 16 September 2021.

S. Ozalp, M.L. Williams, P. Burnap, H. Liu, and M. Mostafa, 2020. “Antisemitism on Twitter: Collective efficacy and the role of community organisations in challenging online hate speech,” Social Media + Society (18 June).
doi: https://doi.org/10.1177/2056305120916850, accessed 16 September 2021.

B. Pan, H. Hembrooke, T. Joachims, L. Lorigo, G. Gay, and L. Granka, 2007. “In Google we trust: Users’ decisions on rank, position, and relevance,” Journal of Computer-Mediated Communication, volume 12, number 3, pp. 801–823.
doi: https://doi.org/10.1111/j.1083-6101.2007.00351.x, accessed 16 September 2021.

F. Polletta and J. Callahan, 2017. “Deep stories, nostalgia narratives, and fake news: Storytelling in the Trump era,” American Journal of Cultural Sociology, volume 5, pp. 392–408.
doi: https://doi.org/10.1057/s41290-017-0037-7, accessed 16 September 2021.

C. Puschmann, 2019. “Beyond the bubble: Assessing the diversity of political search results,” Digital Journalism, volume 7, number 6, pp. 824–843.
doi: https://doi.org/10.1080/21670811.2018.1539626, accessed 16 September 2021.

A. Ramkumar and B. Poorna, 2014. “Ontology based semantic search: An introduction and a survey of current approaches,” Proceedings of 2014 International Conference on Intelligent Computing Applications, pp. 372–376.
doi: https://doi.org/10.1109/ICICA.2014.82, accessed 16 September 2021.

R.E. Robertson, S. Jiang, K. Joseph, L. Friedland, D. Lazer, and C. Wilson, 2018. “Auditing partisan audience bias within Google search,” Proceedings of the ACM on Human-Computer Interaction, volume 2, article number 148, pp. 1–22.
doi: https://doi.org/10.1145/3274417, accessed 16 September 2021.

G.D. Rosenfeld, 2014. Hi Hitler! How the Nazi past is being normalized in contemporary culture. Cambridge: Cambridge University Press.
doi: https://doi.org/10.1017/CBO9781139696449, accessed 16 September 2021.

B.C. Sanchez, 2020. “Internet memes and desensitization,” Pathways: A Journal of Humanistic and Social Inquiry, volume 1, number 2, article number 5, at https://repository.upenn.edu/pathways_journal/vol1/iss2/5/, accessed 16 September 2021.

B. Schwartz, 2019. “Google search update aims to show more diverse results from different domain names” (6 June), at https://searchengineland.com/google-search-update-aims-to-show-more-diverse-results-from-different-domain-names-317934, accessed 21 January 2021.

Statcounter, 2020a. “Search engine market share worldwide,” at https://gs.statcounter.com/search-engine-market-share, accessed 21 January 2021.

Statcounter, 2020b. “Search engine market share: Russian Federation,” at https://gs.statcounter.com/search-engine-market-share/all/russian-federation, accessed 21 January 2021.

D. Stevick and Z. Gross, 2014. “Research in Holocaust education: Emerging themes and directions,” In: K. Fracapane and M. Haß (editors). Holocaust education in a global context. Paris: UNESCO, pp. 59–77, and at https://unesdoc.unesco.org/ark:/48223/pf0000225973, accessed 16 September 2021.

J. Subotić, 2020. “The appropriation of Holocaust memory in post-communist Eastern Europe,” at https://www.modernlanguagesopen.org/articles/10.3828/mlo.v0i0.315/print/, accessed 21 January 2021.

D. Sullivan, 2016. “Google’s top results for ‘Did the Holocaust happen’ now expunged of denial sites” (24 December), at https://searchengineland.com/google-holocaust-denial-site-gone-266353, accessed 21 January 2021.

M. Sweeney, 2013. “Not just a pretty (inter)face: A critical analysis of Microsoft’s ‘Ms. Dewey’,” Ph.D. dissertation, University of Illinois at Urbana-Champaign, at http://hdl.handle.net/2142/46617, 16 September 2021.

M.C. Traub, T. Samar, J. van Ossenbruggen, J. He, A. de Vries, and L. Hardman, 2016. “Querylog-based assessment of retrievability bias in a large newspaper corpus,” JCDL ’16: Proceedings of the 16th ACM/IEEE-CS on Joint Conference on Digital Libraries, pp. 7–16.
doi: https://doi.org/10.1145/2910896.2910907, accessed 16 September 2021.

F. Trevisan, A. Hoskins, S. Oates, and D. Mahlouly, 2018. “The Google voter: Search engines and elections in the new media ecology,” Information, Communication & Society, volume 21, number 1, pp. 111–128.
doi: https://doi.org/10.1080/1369118X.2016.1261171, accessed 16 September 2021.

R. Ulloa, M. Makhortykh, and A. Urman, 2021. “Algorithm auditing at a large-scale: Insights from search engine audits,” arXiv:2106.05831 (10 June), at https://arxiv.org/abs/2106.05831, accessed 16 September 2021.

J. Unkel and M. Haim, 2021. “Googling politics: Parties, sources, and issue ownerships on Google in the 2017 German federal election campaign,” Social Science Computer Review, volume 39, number 5, pp. 844–861.
doi: https://doi.org/10.1177/0894439319881634, accessed 16 September 2021.

A. Urman, M. Makhortykh, and R. Ulloa, 2021a. “Auditing source diversity bias in video search results using virtual agents,” WWW ’21: Companion Proceedings of the Web Conference, pp. 232–236.
doi: https://doi.org/10.1145/3442442.3452306, accessed 16 September 2021.

A. Urman, M. Makhortykh, and R. Ulloa, 2021b. “The matter of chance: Auditing web search results related to the 2020 US presidential primary elections across six search engines,” Social Science Computer Review (28 April).
doi: https://doi.org/10.1177/08944393211006863, accessed 16 September 2021.

U.S. Holocaust Memorial Museum (USHMM), n.d. “Nazi camps,” at https://encyclopedia.ushmm.org/content/en/article/nazi-camps, accessed 21 January 2021.

A. Ware and C. Laoutides, 2019. “Myanmar’s ‘Rohingya’ conflict: Misconceptions and complexity,” Asian Affairs, volume 50, number 1, pp. 60–79.
doi: https://doi.org/10.1080/03068374.2019.1567102, accessed 16 September 2021.

M. Whine, 2008. “Expanding Holocaust denial and legislation against it,” Jewish Political Studies Review, volume 20, numbers 1–2, pp. 57–77.

J.E. Young, 1993. The texture of memory: Holocaust memorials and meaning. New Haven, Conn.: Yale University Press.

A. Zavadski and F. Toepfl, 2019. “Querying the Internet as a mnemonic practice: How search engines mediate four types of past events in Russia,” Media, Culture & Society, volume 41, number 1, pp. 21–37.
doi: https://doi.org/10.1177/0163443718764565, accessed 16 September 2021.

B. Zelizer, 2015. “Child in Warsaw ghetto, 1943,” In: J.E. Hill and V.R. Schwartz (editors), Getting the picture: The visual culture of the news. London: Routledge, pp. 66–68.
doi: https://doi.org/10.4324/9781003103547, accessed 16 September 2021.

B. Zelizer, 1999. “From the image of record to the image of memory: Holocaust photography,” In: B. Brennen and H. Hardt (editors). Picturing the past: Media, history, and photography. Urbana: University of Illinois Press, pp. 98–121.

K. Zheng, H. Wang, Z. Qi, J. Li, and H. Gao, 2017. “A survey of query result diversification,” Knowledge and Information Systems, volume 51, number 1, pp. 1–36.
doi: https://doi.org/10.1007/s10115-016-0990-4, accessed 16 September 2021.

W. Zhou, H. Li, and Q. Tian, 2017. “Recent advance in content-based image retrieval: A literature survey,” at arXiv:1706.06064v2 (2 September), at https://arxiv.org/abs/1706.06064, accessed 21 January 2021.

 


Editorial history

Received 25 January 2021; revised 14 June 2021; accepted 14 September 2021.


Creative Commons License
This paper is licensed under a Creative Commons Attribution 4.0 International License.

Hey, Google, is this what the Holocaust looked like? Auditing algorithmic curation of visual historical content on Web search engines
by Mykola Makhortykh, Aleksandra Urman, and Roberto Ulloa.
First Monday, Volume 26, Number 10 - 4 October 2021
https://firstmonday.org/ojs/index.php/fm/article/download/11562/10494
doi: https://dx.doi.org/10.5210/fm.v26i10.11562