First Monday

Have your cake and feed it forward too: YouTube, oral cravings and the persistent question of media addiction by Steffen Kruger

Focusing on the case of recommendations on the video streaming platform YouTube, this article revisits questions of media addiction and addictive media that continue to trouble research in the field. Based on a close reading of Google/YouTube’s engineering papers, this paper argues that the platform’s recommender system — the machine learning system responsible for the personalisation and customisation of what videos users are offered — has been designed to function as a feeding tube and a precarious holding environment, thus corroborating widespread critiques about this system’s addictive — oral — strategies of user retention. Subsequently, this article discusses how the platform’s more recent promise of “responsible recommendations” has so far been articulated in the engineering papers. Specifically, this promise has taken the form of a fetishist structure that endorses responsibility but not at the expense of the time users spend watching. This structure is best captured in the proverbial Have your cake and eat it too.


Part I: Oral cravings — Theories of orality and addiction
Part II: Reading YouTube’s engineering literature




“YouTube has created a restaurant that serves us increasingly sugary, fatty foods, loading up our plates as soon as we are finished with the last meal.” This critique, put forth by media scholar Zeynep Tufekci in a 2018 opinion piece in the New York Times, was one of several leveraged against the video platform in recent years. Published in a period of political upheaval in the U.S., these critiques took on a political dimension from the first. “YouTube the great radicalizer,” Tufekci (2018) titled her piece. “How YouTube drives people to the Internet’s darkest corners,” Jack Nicas (2018) wrote in the Wall Street Journal. And Paul Lewis (2018) informed Guardian readers about “how YouTube’s algorithm distorts truth.”

These articles and others along similar lines effectively triggered academic research and guided it toward YouTube’s recommender system — a system, based on machine learning provided by Google Brain and its open-access Tensor Flow project, which makes personalised recommendations to each user about what video to watch next (Zhao, et al., 2019). There is now a vast palette of studies sounding out the platform’s potential for radicalising its users (e.g., Ledwich, et al., 2022; Hosseinmardi, et al., 2022; Kaiser and Rauchfleisch, 2019; Schmitt, et al., 2018; Lewis, 2018) — many of them in the footsteps of ex-Google engineer, Guillaume Chastlot’s and his computer-aided experiments into the logics of recommendations on the platform (Lewis and McCormic, 2018). Already, surveys of this literature are being compiled to maintain an overview of the swiftly expanding field (Yesilada and Lewandowski, 2022; Snow, 2021).

Whereas I have no doubt about the importance of these advances, in the present article, I will suspend the predominant political concerns of the existing studies to unpack a related — and similarly relevant — dimension underlying these ‘platformed’ politics. Specifically, in following Tufekci’s culinary metaphor, I seek to address persistent questions of media addiction and addictive media. I do so from a psychoanalytically informed perspective, revisiting the time-worn notion of “the oral” to turn it into a normative theoretical tool with which to do a close reading of the engineering literature on the YouTube recommender system [1].

YouTube remains the second most used digital media platform globally, with 2,6 billion unique users, being the most visited internet site after Google Search [2]. This makes it a paramount cultural influence on people around the world who, regardless of the content they consume there, all interact with and in the forms that the platform affords them. Such forms, I argue, hold a powerful socialising sway. In this respect, the engineering literature, and particularly that on YouTube’s recommender system, is of central interest for researchers inquiring into the ways social media platforms unfold their subjectifying and socialising powers, in that it is here, in the statements on the conceptualisation, design features and programming of those interactions, that the specific modes of ordering and structuring the social and relational find articulations [3]. While Taina Bucher, in her study on Facebook (2021), rightly points to the caution that needs to be administered when consulting patents and engineering papers for inquiries into the digital programming of sociality, these documents hold great explanatory power for how people are shaped by the design of digital applications. When Matamoros-Fernández and colleagues (2021) emphasise the importance of recognizing “platform algorithms as complex socio‐technical systems” [4], this describes well the approach taken in this article.

Along these lines, then, I show firstly how YouTube is indeed designed as a feeding tube that promotes addictive behaviour, with its recommender system intended to establish holding patterns that aim for watching cycles of maximum length. However, while this finding seems almost unremarkable in light of the deluge of critiques against the platform, the theory of the oral offers an anthropological basis from which to unfold this critique in a more concrete and constructive way. Hence, from the perspective of the oral, what becomes perceivable as lacking in the programming and design of the system is a strategy of ‘weaning’ that might help users to step out of the dependency that the service affords them.

Now, equating YouTube with a caregiver and thus putting users in the position of children rightly provokes unease and merits resistance. However, not only does psychoanalysis rightly insist on the human propensity toward immature states of fearing and desiring as an anthropological truth; rather, YouTube’s strategy of identifying user engagement patterns amounts to a form of technological interpellation (Rouvroy and Stiegler, 2014) that already puts users in an infantile position. Furthermore, whereas the demand of an exit strategy of weaning might seem naïve in light of a commercial platform whose profit model is based on maximising user retention, more recently, YouTube itself has announced changes that point in this direction. Responding to both political and parental concerns, it has promised to make “responsible recommendations” (Goodrow, 2021) its new priority and turn its preoccupation with users’ “watchtime” into a concern for “valued watchtime” (Goodrow, 2021). However, also in this respect, a close reading of the engineering papers indicates that the platform’s promises have so far translated into markedly contradictory design arrangements displaying a fetishist logic of denial (e.g., Benvenuto, 2016). This logic presumes that both the quality of the users’ experience and their overall watch time can be increased over a “potentially infinite time horizon,” as one of the papers put it [5].

Pacifying the active user

When recent critiques of YouTube suggest that the platform’s automated recommendations “drive” people to the Internet’s dark corners (Nicas, 2018), the evoked images receive much of their immediate legibility from their family relations with other well-established tropes. These tropes imagine media users as being directly ‘latched on’ — or ‘glued to’ — their screens [6]. Tufekci’s (2018) metaphor of the fast-food restaurant, “loading up our plates as soon as we are finished with the last meal,” is but one of the more poignant of these images, bringing to the fore the oral dimension involved in such fantasies of media attachment. More drastic depictions can be found in filmic variations of media horror, such as in Poltergeist (Hooper, 1982) or Videodrome (Cronenberg, 1983), where people are literally sucked into their devices. In all cases, an oral dimension is central in that media are held to literally flow into and fill up people who surrender all agency to them.

The accusation of YouTube driving processes of radicalisation through its automated recommendations seems to have been significantly successful in instigating responses, both public and academic, because it has at least partly tapped into this reservoir of media-dystopian fantasies. In this respect, the focus on online radicalisation in the research output has tended to cover over to a degree the underlying — media-archetypical — images of people being unthinkingly nursed on numbingly oral (sugary, fatty) media diets. These imaginaries, in turn, are closely connected with the theme of (media) addiction.

Ever since Sigmund Freud (1905) in his “Three essays on the theory of sexuality” identified orality as not only a psychosexual phase in infant development, but a zone that would maintain its erogenous potential throughout human life, has the area of the mouth, including lips, tongue, teeth, oral cavity and pharynx, with its functions of taking in, kissing, licking, tasting, biting, chewing, swallowing etc., been linked to notions of craving and dependency (Sabshin, 1995). Indeed, Freud, who, despite his wish to elevate psychoanalysis to a hard science, made a point of choosing terms that were close to people’s everyday experience, created the label of “the oral” with at least one eye on how the notion had already been in colloquial use. Thus, along parallel lines, ever since media are under suspicion of bearing an addictive, dependency-creating potential, oral notions of media use have been circulating to characterise worrying forms of media consumption [7]. In this article, I want to follow these hunches and unpack the psychoanalytic category of the oral as a tool with which to assess the relationships and forms of attachments that YouTube as a specific case of digital media affords its users.

From within media studies, however, phenomena that point to the plausibility of media addictions have proven extremely difficult to bring into view without the ensuing discussions swiftly gravitating toward moral (media) panics (Marwick, 2008). For the case at hand, what is clearly present in the circulating images of YouTube ‘injecting extremism’ are traces of old beliefs in overwhelmingly strong media effects — stimulus-response, or hypodermic needle models — to which people tend to become vulnerable in the context of such panics, despite these models having been proven wrong repeatedly (Orben, 2020). Along similar lines, Mariek Vanden Abeele and Victoria Mohr (2021), in a discourse analysis of media addictions, rightly refer to a long line of studies that identify “fear mongering about pathological media use” as a “common ingredient” in media panics, with the addiction label tending to lead to an “overpathologization of everyday life” [8].

Countering such panics, research into YouTube indicates that its potential to radicalise is limited by users’ prior inclinations and political orientations (Faddoul, et al.,, 2020), with searches and channel subscriptions often being more decisive for the political content people watch (e.g., Papadamou, et al., 2022; Chen, Nyhan, et al., 2023; Kaushal, et al., 2016). In this light, what seems to become reproduced in the unfolding dynamic between current research on YouTube and the more popular critiques in news media is a ritual in which academic research takes over the function of reigning in and delimiting the socialising powers that news media frequently attribute to new media. And whereas there is little doubt that the findings of YouTube’s limited political efficacy are plausible and solid, the ritual between news media and academia, in which this research has been taking part, has frequently shown tendencies of researchers overstating the rationality and non-impressionability of media users. Even media studies traditions with a strong humanities orientation, like that of the Birmingham School, have tended to over-emphasise such rationality when countering deep-seated, folkloristic beliefs in strong media effects. Indeed, ever since the resounding critiques of the Frankfurt School (Horkheimer and Adorno, 1994) by Stuart Hall (1980) and John Fiske (1989), any attempt at taking seriously the idea of a pacified and passive media user has been nearly impossible to undertake [9]. Against this grain, and in line with other recent attempts at theorising regressive modes of “audiencing” (e.g., Stanfill, 2020), this article aims to show how the idea of user pacification is being facilitated and realised by YouTube.

The notion of user pacification seems diametrically opposed to the established ideology of user participation and participatory online culture in which everybody is invited to take part in the production of cultural goods and for which the “You” in “YouTube” has served as a token. However, as Bernhard Rieder and colleagues (2020), in a quantified mapping of YouTube, indicate, this participatory culture might be far less established on the platform than expected. Rather, YouTube increasingly approximates the centrist logic of traditional television, with some few — mainly Western and American — channels accumulating viewer and follower numbers that far exceed standard Pareto effects. Furthermore, research by Zoe Glatt (2022), Sarah Banet-Weiser (Glatt and Banet-Weiser, 2021) and Sophie Bishop (2017) has shown how high the stakes have become for content creators on YouTube to become visible — watched, followed and monetised — so that there is an increasingly clear separation perceivable between production and consumption. While the present study has its focus firmly on the consumption side, it seems plausible to assume that, for the wide majority of YouTube users, this is so, too. In this respect, it is little surprising that YouTube has become television’s heir as the central representative of cultural fears of media addiction. This article seeks to determine, on the programming and design side, how and in how far these fears might be justified.

In line with recent inquiries into addictive media (Bhargava and Velasquez, 2021), I thus hold that there is a growing need for media studies to reassess its knowledge of and stance toward dependency and addiction. Bringing together critical understandings of (discourses of) media addiction (Vanden Abeele and Mohr, 2021; Marwick 2008) with arguments about the commercial cultivation of addictive behaviour (Courtwright, 2019; Alter, 2017), I hold that, despite the dangers of pathologizing media use, YouTube must be seen as an example of such cultivation, with its recommender system affording the compensation of widespread relational deficiencies. Hence, while my punning play on YouTube as a ‘feeding tube’ admittedly risks moving my argument into ‘moral panicky’ terrain again, the colloquialism is still justified in that it underlines my claim that the programming of the platform’s recommender system is indeed geared toward this feeding function. And whereas it would be naïve to take this to mean that all users automatically fall prey to this function — being human in the social world is far too complex for such a uniform response pattern — it is my claim that the system’s overall orientation toward feeding tacitly invites people with relational vulnerabilities to precariously prolonged spells of self-abandonment. Furthermore, in YouTube’s recent response to the critique levelled against it, the engineering feats that can be aligned with its promise of “responsible recommendations” show the platform’s continued struggle between private and public interests and between creating “value” for its users and for itself (cf., Burgess and Green, 2018).

In what follows, I take readers through a brief introduction to theories of the oral, synthesising these into a theoretical tool with which to assess contemporary digital media technologies and their specific strategies of user habituation (cf., Chun, 2016). From there, I cast spotlights on related theories of addictions — from psychoanalytic, via neuroscientific, to media-related ones — so as to set the discussion of media addiction onto a solid, relational foundation. It is upon this foundation that my critical hermeneutic reading of YouTube’s engineering papers is based, with the three main characteristics of orality — feeding, holding and stimulating — serving as reading guides. In this respect, the aspect of stimulating becomes decisive for the assessment of the recommender system design. Clinical research into infant development has repeatedly corroborated the central importance of modes of stimulation that point beyond the “holding environment” (Winnicott, 1971a; 1971b; 2016) that the caregiver provides for the child. Such stimulation continues to be important for subject formation throughout the human life-course. It is this importance, I hold, which re-emerges in YouTube’s recent struggles to operationalise “responsibility” in recommending videos. And while, in this respect, infant research has long-since established the paramount importance that lies in the failure of creating a perfect holding environment, digital recommender systems have yet to integrate this insight into their programming and design.



Part I: Oral cravings — Theories of orality and addiction

In his seminal “Three essays on the theory of sexuality”, Freud (1905) defined “the oral” as a distinct phase in the pregenital, psychosexual development of human beings. It is through a process of “anaclisis,” i.e., of leaning onto life-maintaining, instinctual-somatic forms of functioning that oral practices and the oral zone of the mouth starts taking on its psychosexual function (Laplanche, 1999). Already infants at their mother’s breasts (or their caregiver’s bottle) develop a sense of pleasure from the interactions tied to the intake of nourishment. This pleasure goes well beyond the sheer nourishing function and is non-conscious and sexual in a non-mature way. Psychosexuality in this respect means a form of ‘proto-sexuality’ that is formative of the whole of psychic and social development of each human as a relational being.

In passing, Freud (1905) offers the mundane habit of “thumb-sucking” amongst small children to make this point. When he writes that “the sexual activity, detached from the nutritive activity, has substituted for the extraneous object one situated in the subject’s own body” [10], what one can find in these lines is a complex psychosocial and relational dynamic. In this dynamic, the yearning after another person — the primary caregiver, whose practice of care has laid the blueprint for all further attachments in the child — becomes compensated for and enacted by the child itself in an autoerotic mode: the childs thumb compensates for the absence of mother’s breast. Children sucking on their thumbs sense that this activity is neither fully nourishing nor fulfilling, and yet, they are doing it anyways to calm and abate these longings and cravings. Psychosexual developmental phases do not ever vanish completely, Freud states, but, depending upon character development, remain salient in people’s lives.

Soon after Freud’s foundational statements, the oral became integrated into wider psychoanalytic theorising of character formation. Particularly Karl Abraham (1916) refined the concept by subdividing it into more nuanced phases in which also aggressive impulses (biting, chewing) become reflected. The oral soon became relevant in early theorisations of pathological dependencies, as well. However, since the concept was used here to corroborate a presumed inherent hedonism in people with substance addictions, these early applications of the concept proved little helpful because they tended to withhold empathy from patients suffering from dependency issues (Sabshin, 1995; Khantzian, 2003). But also in following theoretical developments, the oral has been decreasing in importance. Already in 1994, the clinician Stephen Johnson observed that:

In recent psychoanalytic theorising, we emphasise more the interpersonal needs for relationship and mother-infant attunement. All the developmental research is consistent in emphasising these themes, making the ‘oral’ label a bit out-of-date as far as the regional specificity it implies, but it is still an appropriate label if it is understood metaphorically. [11]

In tune with this assessment, the aim of this subchapter is to expand the definition of the oral into scenes of interpersonal relationships to prepare the ground for the concept’s application in human-machine interactions. Hence, with “interpersonal needs,” Johnson gestured broadly to the post-Freudian paradigms of relational and intersubjective psychoanalysis prominently represented by Donald Winnicott (1971a) and Jessica Benjamin (e.g., 1988), while “mother-infant attunement” refers to Daniel Stern’s (e.g., 1985) advances in early human development, with all these approaches continuing to be vastly influential in the scientific understanding of human subject formation. When Winnicott (1971b), in a well-known response to Jacques Lacan’s theory of the “mirror stage” (2006) holds that it is during the nursing situation, and in the primary caregiver’s eyes, that the small child finds its first mirror, with the child being held by both the caregiver’s loving gaze and, concretely, in their arms, this expands the ‘scene’ of the oral in ways that were only implicitly present in Freud’s writings. Specifically, the oral becomes expanded toward the interpersonal qualities of feeding/being fed and holding/being held. Hence, beyond the concrete intake of nourishment, what has proven necessary for healthy mental growth — and what Winnicott sees as implied in the oral phase — is a capacity to hold the child and for the child to be held. This capacity equates to mental nourishment — literally an inspiration to live that the caregiver needs to instil in the child by holding it in concretely physical and paradoxically no less concrete metaphorical ways.

Digital holding environments and the question of stimulation

These considerations seem to have strayed far from concerns of digital media. However, consulting research on digital gambling, for example, a field in which addiction is an ever-present spectre, the importance of an expanded notion of the oral comes readily to the fore. Natasha Dow-Schüll’s (2012; 2005) ethnography of machine gambling in Las Vegas is a strong case in point. Dow-Schüll describes in detail how gambling machines are being designed in ways that aim for the creation of feeding and holding patterns for grown-ups, with the aim of instilling a self-abandoned drowse that is uncannily reminiscent of the (post-) feeding state of neonates. Furthermore, she shows how the gambling industry seeks to become increasingly indistinguishable from the (video) gaming industry. What is thus being instilled in gamblers (and increasingly in gamers as well) is the desire to be attached to a dependable object — intensely and for ever longer stretches. It is particularly with respect to the phenomenon of self-abandonment, so typical of relational constellations with a strong oral character, that a third aspect of the oral shows its importance, specifically, that of stimulating children to want to venture out of the dyadic holding environment (e.g., Benjamin 1988).

This function of stimulation in the infant-caregiver setting has been theorised in various forms, with Heinz Kohut’s (2009) term “optimal frustration” and Donald Winnicott’s (2016) claim of a necessary process of “disillusionment” as two central statements. What both point to is the importance for the caregiver to add to the holding environment an opening, frequently in the form of imperfections and small failures, so that the child is made to orient itself elsewhere. This, I argue, is particularly relevant for concerns about media dependencies. As mentioned above, notions of value and responsibility, should they not remain empty phrases, need to address the question of how to point users/ audiences beyond the flow (Williams) of watchtime and the holding patterns established by exacting calculations of personal preferences.

Hence, it is the kind of stimulation that points human beings beyond their ‘comfort zones’ and makes them want to venture out into the world that offers the decisive dimension in an updated, relational conception of the oral. This updated notion thus falls into the partly overlapping functions of feeding, holding and stimulating. Such a conception is not only in line with central paradigms in infant research (Hollway, 2015) and attachment theory (Bowlby, 2005) but also a main pillar in critical traditions of social philosophy (Honneth and Whitebook, 2019). In the negotiations between feeding/being fed, holding/being held and stimulating/being stimulated, I argue, can be found a productively relational field for understanding addiction and dependency. I now turn to this field.

The theory of addiction in light of the theory of orality

Whereas a narrow, literal understanding of the oral has proven unproductive for theory-making in the field of addiction research, already a cursory overview of this field shows how the expanded notion of the oral presented here ‘fits the bill.’ What emerges from this literature is a compensatory logic of addiction where substance dependencies, but also eating disorders and other compulsive phenomena are seen as solutions — albeit belated and precarious ones — to relational conflicts and deficiencies. Simply put, people attempt to create compensatory situations in which they temporarily escape from inner tensions (Savitt, 1963) and arrange to be fed and held in ways symptomatic of the lack of nourishment and care experienced at earlier points in their lives.

Particularly Edward Khantzian (2003) captured this trajectory of addiction in the label of “self-regulation disorder.” However, that his emphasis on the “self” implies a ‘relation-regulation disorder’ becomes clear when he states that: “Any theory or explanation of addiction that does not address what it is in the workings of the mind (i.e., the inner psychological terrain) and a person to predispose and cause them to repeatedly relapse to addictive drugs is incomplete” [12]. Furthermore, when Khantzian (2003), in coining the term “drug-of-choice” [13] observes that “there is a significant degree of psychopharmacologic specificity in the appeal of addictive drugs” [14], this refers to the peculiar yet specific ways in which relational yearnings are displaced onto objects, rituals and habits, where they find their precarious (re)solutions.

The view of addiction-as-solution has more recently received a fresh impetus by the object-relational psychoanalyst Alistair Sweet (2013) who, in drawing on Wilfred Bion (1962), suggests the term “anti-container” to capture the evacuating and encasing logic enacted in addictions. As opposed to what Bion described as a containing function, i.e., a function in which painful feelings are transferred to another person or object to become symbolised and worked through there, an anti-container merely stores, absorbs and numbs these feelings. YouTube, I argue, has fashioned itself as such an anti-container; its recommender system has been exclusively oriented toward the aspects of feeding and holding, storing and absorbing. In my reading of the engineering literature, I have found no plausible provisions being made for stimulating people in ways that would make it attractive for them to exit the service’s feeding and holding cycles.

Displaceable cravings

The psychiatrist Hedy Kober (2014) has corroborated the addiction-as-solution model from a neuroscientific perspective while also pointing to the pathways that have transferred psychoanalytic orientations into a more cognitive-behaviouristic paradigm. Neuroscientific research has also corroborated the general displaceability of affective charges from one object to another. For example, when Helen Fisher (2016) finds in functional magnetic resonance imaging experiments that people who miss an ex-lover show significant activity in exactly those brain centres that are also firing in cocaine addictions (when on withdrawal), this is a strong indication for how compulsive pining and craving can shift from object to object. The pain caused by missing a person, it shows, can indeed be calmed by taking drugs, although this amounts to a “hijack[ing [of] the emotional brain” [15].

This paradigm of the displaceability of craving builds a bridge back to the digital platform economy. In line with Fisher’s (2016) findings, studies indicate that also media can become such stand-ins for people’s relational longings (Scala, et al., 2017). And even though it is important to keep in mind that such displacements are always personal and subjective, cultural critics have a strong point when they hold that there has been a long-term trend toward the commercial cultivation of addictive and habit-forming behaviours (Courtwright, 2019). From the design and marketing of alcoholic beverages and cigarettes to gambling and the food industry, there is heavy competition between products and services that seek to insert themselves not just into people’s lives but also their relational vulnerabilities. Digital media are only a further step in this cultivation process (cf., Chun, 2016).

From ‘the Tube’ to YouTube

As concerns relational vulnerabilities, a historical perspective suggests that YouTube inherits its main compensatory function from television. As McIlwraith and colleagues (1991) show in their survey of television addiction studies of the 1980s, people mostly attributed such a media addiction to distant others along the lines of a “third person effect” [16]. And yet, the 10–12 percent of study participants who in McIlwraith et al.’s overview admitted to addiction-like TV-habits themselves described significantly oral and anti-containing routines. These participants claimed to use television along the lines indicated by the established addiction research paradigm, specifically, as an escape from overstimulation, negative emotions and life circumstances as well as a precarious way of being fed and held. Hence, along the lines of Dow-Schüll’s (2012) observations with respect to gambling, McIlwraith and colleagues (1991) write that, “Not only does television relax people, it does so quickly. Within moments of sitting or lying down and pushing the power button, most viewers will feel more relaxed than they did before” [17]. Furthermore, in line with other addictions, heavy viewers reported decreasing degrees of relaxation and a need to prolong watching time to reach a desired point of relaxation again. “Viewing begets viewing,” the authors state, “One must keep watching in order to feel relaxed” [18].



Part II: Reading YouTube’s engineering literature

McIllwraith, et al.’s (1991) observation that it is never the specific television programmes that drive television addiction, but rather the overall form of the medium itself [19] takes on renewed importance in the formal analysis of YouTube. While already television programmers devised fine-grained audience analysis models with which they tried to increase people’s time spent watching their channels, the increase in the quantity and quality of data available together with automated modes of data analysis have drastically expanded the possibilities of digital platforms to create personalised holding patterns. In what follows, I unpack the distinctly oral form of anti-containment that YouTube as a medium has taken.

The media anthropologist Nick Seaver (2019) locates the origins of recommender systems in the mid-1990s, where they were developed “as tools to help users manage increasingly large catalogs of information” [20]. The main metric that dominated the system development in the 1990s was the “Root Mean Square Error (RMSE),” which allowed predictions about how users would rate an item by continuously comparing actual ratings with the system’s prior predictions [21]. While this trial-and-error approach of reinforcement learning (Williams, 1992) has remained a central tool in AI for recommender systems, the importance of ratings has waned. Instead, tracking user activities on and across various platforms showed that there is a viable difference between what people rate highly (if they submit ratings at all), and what people actually spend time engaging with. Hence, what the analysis of “dwell time,” i.e., the time users spend engaging with items as a form of “implicit feedback,” has made possible is to determine what people actually do on a platform (Yi, et al., 2014). In line with the definition of addiction in this article, it is mainly features related to “dwell time” that make it possible for YouTube to offer diets to individual users that are customised to be soothing and comforting — sugary and fatty, as Tufekci (2018) put it — and thus compensatory and precarious. Yet, it is not first and foremost at the level of the individual user that engagement data becomes relevant for automated recommendations. Rather, at a first level of analysis, what the availability of this data makes possible are processes of “collaborative filtering” (Koren, et al., 2009), i.e., the establishment of similarities across larger groups of users, which facilitate the creation of neighbourhoods of likeminded people on a platform with reasonably similar tastes (cf., Chun, 2016).

“Deep neural networks for YouTube recommendations”

In the engineering literature the main point of reference for all further engineering work on YouTube’s recommender system as well as for the critique levelled against YouTube has been Covington, et al.’s 2016 article “Deep neural networks for YouTube recommendations” which announced “[d]ramatic performance improvements brought by deep learning” at an “industrial” scale [22]. For the present article, the paper is additionally useful in that it outlines the overall structure and main building blocks of YouTube’s AI recommender system.

Based on “the classic two-stage information retrieval model” [23], which is the design standard for large-scale recommender systems [24], YouTube’s system is divided into the tasks of (a) candidate generation and (b) the ranking of these candidate videos into a feed-forward hierarchical list of recommendations personalised for each user. Both task performances are based on neural-networked artificial intelligence. Candidate generation consists of the system going through vast landscapes of YouTube’s ever-growing video catalogue, narrowing it down to a selection of hundreds of videos that it then feeds into the ranking network. This ranking task, in turn, serves to create a hierarchy of the generated items which is slimmed down to mere dozens of videos that become ordered according to a regressional probability of video watches [25]. On the side of candidate generation, write the authors, the network “only provides broad personalisation via collaborative filtering,” with the similarity of users being “expressed in terms of coarse features such as IDs of video watches, search query tokens and demographics” [25]. By contrast, on the ranking side, “presenting a few ‘best’ recommendations in a list requires a fine-level representation to distinguish relative importance among candidates with high recall” [27]. This is done, the authors write, “by assigning a score to each video according to a desired objective function using a rich set of features describing the video and user” [28]. Harking back to Tufekci’s (2018) restaurant metaphor, candidate generation can thus be imagined as a ‘canteen’ where what is prepared will be halfway palatable for a large amount of people. At the level of candidate ranking, in turn, each user, at least in theory, gets their own 12-course (and beyond) meal, catered specifically to their personal tastes.

But what are the “objective function[s]” and “rich set[s] of features” describing videos and users that Covington, et al. (2016) refer to? The “desired objective function” seems easy enough to guess, specifically, that the user indeed clicks on and watches the videos offered to them by the recommender. However, also this function is quickly complicated even by general considerations as, for example, how to treat videos containing nudity or violence, against which YouTube’s engineers were quick to develop filtering mechanisms (Goodrow, 2021; YouTube, 2019a, 2019b). The “rich set of features,” in turn, is still harder to come by, but there are hints strewn across various texts connecting to Covington, et al. (Ma, et al., 2020; Zhao, et al., 2019; Beutel, et al., 2018; Chen, et al., 2023). Hence, already the “coarse” collaborative features of the ‘canteen phase’ entail significant refinements. Spatial information, for example, as to where users are in the world holds cues about what to offer. Likewise, information on whether users arrive at their YouTube starting page or click on a video embedded elsewhere, and temporal cues such as how long it has been since a user has searched for, clicked on, or indeed watched a video last can offer relevant hints about what might be watched next. This information, paired with knowledge about collaborative watch histories, the kind of device with which a user is accessing YouTube, their logged-in state, plus simple demographic information like age and gender already make for a reasonably well-rounded impression.

In terms of temporality, a substantial gain in accuracy in recommendations is achieved through the system’s time sensitivity as concerns new content that continuously becomes available on the platform. While “users prefer fresh content, though not at the expense of relevance [...,] Machine learning systems often exhibit an implicit bias towards the past because they are trained to predict future behavior from historical examples,” wrote Covington, et al. [29]. Countering this bias, an “example age” feature in the candidate generator improves user clicks and video watches by forecasting “the future peak day of viral videos” (Jiang, et al., 2014).

Turning to the ranking procedure, Covington, et al. (2016) boast that “we have access to many more features describing the video and the user’s relationship to the video” [30]. However, it is on this point of individual user relationships that the article becomes markedly vague. This is in line with Google’s overall approach. Time and again, Google/Alphabet has asserted that such features are extremely hard to describe to laypeople. Yet, already Steven Levy (2011), in his study on Google, remarks that secrecy about the programming of the PageRank algorithm has been at odds with Google’s academic orientation from the start. More recently, Rieder (2020) has similarly observed that Google’s patent applications are significantly more detailed than Larry Page and Sergej Brin’s scientific publications on the algorithm [31].

Beyond the protection of corporate secrets, however, determining how neural networks de facto work in the construction and analysis of features has indeed proven difficult. Accordingly, in a survey of AI-driven recommender systems, Shuai Zhang and colleagues (2019) emphasise the importance of transparent recommendations with deep learning “to make explainable predictions to users, allowing them to understand the factors behind the network’s recommendations” [32]. At the same time, however, when the authors bemoan that existing works “do not utilize [the] various forms of side information in a comprehensive manner to take the full advantages of the available data” [33], they counter their call for transparency with one for a radical expansion of data mining. This expansion refers not merely to “contextual information” and its pairing with “implicit feedback” from the users’ histories on the platform. Rather, they suggest to further investigate “users’ footprints (e.g., tweets or Facebook posts) from social media and the physical world (e.g., Internet of things),” arguing that “the deep learning method is a desirable and powerful tool for integrating these additional pieces of information” [34]. In this respect, it does not seem overly farfetched for the case of YouTube to expect data from other Google services, for example from Gmail and Drive (cf., Qin, et al., 2020), to be fed into YouTube recommendations, so that people’s state of employment, the field of their occupation as well as interests and hobbies etc., might also weigh in on personal recommendations. In any case, prior studies with a focus on the features guiding recommendations have shown that “historical features”, i.e., features that pertain to “long-time accumulated user behavior” [35], are by far the most important ones for predictive uses. Studying the predictability of users clicking on ads on Facebook, Xinran He and colleagues (2014) found that “The top-10 features ordered by importance are all historical features” [36].

Sequencing user actions: Building oral holding patterns

Expanding on this focus on user histories, Covington, et al. (2016) see the main challenge in feature engineering to lie “in representing a temporal sequence of user actions and how these actions relate to the video impression being scored” [37]. This statement might seem relatively unremarkable, and yet, what is formulated here points directly to the construction of those addiction-like holding patterns at the centre of this article. Specifically, what is suggested as the target outcome of an automated recommender system is not so much a series of snapshots of user actions and video impressions. Such snapshots would merely be in line with the common knowledge that, in dynamic environments, it is worth “retraining [the system] on a daily basis” [38]. Rather, the key to the engineering task as Covington, et al. (2016) define it is to identify those forms of interaction that are neither the properties of the users alone, nor those of the videos to be ranked, but of both users and the system together (cf., Zhang, et al., 2019). It is the routinised interplays between users and video items over time.

Clarifying the relationship of people to their environments, the psychosocial theorist Alfred Lorenzer (2022) defines specific “forms of interactions” as the relay stations between a socialised subject and this subject’s object world, explaining that: “The scenic unity of the sensorimotor experiential patterns can be illustrated through a banal analogy: Just as the sound of the mouse entails the cat’s turn of the head, so the stimulus always finds its reaction” [39]. It is this conception of mutual entailment — of routine patterns of interaction formed by the interplay of relevant actors — that is the overall aim in designing YouTube’s recommender system. Hence, when Covington, et al. (2016) observe that “continuous features describing past user actions on related items are particularly powerful because they generalize well across disparate items,” this indicates the platform’s interest in creating such relational entailments through what the authors call “responsive recommendation” [40].

Aiming to identify the interlocking forms of interaction between user and platform that can no longer be clearly located on the user’s or the videos’ side, what the recommender system aims at is a form of rhythmic attunement that already Dow-Schüll (2012) found in the design of gambling machines. On this basis, what Covington, et al. (2016) intend for their system to achieve is a level of responsiveness that makes it possible to “model expected watchtime” [41] for each of its users. This is a remarkable aim, because what is factored into it is the awareness that not all forms of interaction lead to the exploitation of the same potential for maximising results. Rather, each relationship — although, at the candidate generation level it is embedded in homophilic neighbourhoods — is being negotiated individually to arrive at that rhythmic pattern of interaction between user and platform which can be maximally exploited. The system, in other words, by aiming to identify the exact sequential pattern of movement and interaction for each user in relation to the platform, is engineered to find the optimal point of “respons-ability” with each user so that they can be kept in feeding and holding patterns for a maximum length of time.

It is at this point that YouTube can justifiably be equated with a feeding tube. For, if addiction is rooted in subjective forms of relational cravings that are compensated through turning to available and dependable objects (be they legitimate or not), then YouTube clearly fashions itself as such an object. Creating fine-grained models of interaction patterns between users and platform, it seeks to be optimally available and dependable. This does by no means imply that everybody who uses the service will turn into an addict; rather, it means that, while everybody is afforded to maximally depend on YouTube, those who are vulnerable to precarious attachments are afforded to indulge this vulnerability. In these cases, YouTube has the potential to function as an anti-container as it seeks to intimately envelope people to feed and hold them, without, however, making provisions for these people maturing beyond its service.

In the years after the publication of Covington, et al.’s (2016) article, the idea of “respons-ability” through sequencing user interactions has been pursued further and is now becoming a major paradigm. Shuai Zhang, et al. (2019), for example, emphasise the advantage of a “sequence-aware recommendation model,” emphasising that modelling “item-item relationships within a user’s context history [... is] crucial to understand fine-grained relationships between individual item pairs” [42]. Yongfeng Zhang, et al. (2017) introduce a design feature that makes it possible “to obtain the joint representations for users and items” [43]. Ultimately, just how central considerations of interaction sequences are becoming can be seen from Zhen Qin and colleagues (2020). Here, the authors, in presenting their development of a novel “Multitask Mixture of Sequential Experts” model (MoSE), claim that: “Modelling user sequential behaviors as explicit sequential representations can empower the multi-task model to incorporate temporal dependencies, thus predicting future user behavior more accurately” [44]. Once individual use patterns have been identified and typified in their dynamic procedurality over time, the theory goes, it will be possible to tell from a certain unfolding of actions what a user will want to be afforded to do next. In this way, users can be kept ‘running to stand still’ — they can be held and fed while engaged in their habitual interaction patterns.

Respons-a-bility vs responsibility: YouTube’s struggle with its parental role

As much as engineering efforts have flown into the idea of capturing users’ characteristic ways of interacting with the platform, such “respons-ability,” it becomes clear, is decisively different from any notion of responsibility. With responsibility entailing a “moral obligation to behave correctly towards or in respect of a person or thing” (Oxford English Dictionary, 2022), one can argue that the “responsability” of the recommender literature is failing this moral obligation exactly because of its incessant focus on the oral dimensions of feeding and holding. This fantasy of optimal responsability might still be some way from functioning convincingly [46]. Yet, the aim and orientation of this vision, specifically, to create holding patterns for individual users and abate them for maximal amounts of time, is clear.

It was against the mounting critique of the platform that YouTube started to act, at least rhetorically, with Goodrow’s (2021) blogpost announcing a new overall direction by making “responsible recommendations” and “valued watchtime” the platform’s “top priority.” Returning to Kohut’s (2009) concept of “optimal frustration” and Winnicott’s (1963) notion of “disillusionment” [47], what both concepts point to is the formative importance of moments of weaning at the core of the oral complex of holding, feeding and stimulating. Indeed, what they claim is that it is moments of frustration and disillusionment that make the unfolding of valuable stimulation possible. For the case of YouTube, one can say that that which has the potential to turn watchtime into valued watchtime might indeed be the very failures in the acts of holding and feeding, predicting and recommending.

Along these lines, there are aspects in the engineering literature that suggest that less enveloping forms of stimulation and an interest in approximating human forms of imperfection might indeed become preoccupations and aims in YouTube’s engineering task. As far as I can see, this preoccupation is approximated under the labels of “fairness” (Beutel, et al., 2019), “selection bias” (Zhao, et al., 2019), “attention” (Tay, et al., 2018; Zhang, et al., 2019), “off-policy correction” (Ma, et al., 2020; Yi, et al., 2019), and “exploration” (Chen, et al., 2023; Schnabel, et al., 2018). What is perceivable in these current attempts, though, is a fetishist logic that points toward the development of these more responsible forms of stimulation while at the same time proving unable to move away from the task of user retention.

Have your cake ... — Magical thinking in responsible recommendations

The productivity, even necessity, of a ‘failure to fit’ already shows in a common problem of machine learning that Covington, et al. (2016) broach: “Somewhat counter-intuitively,” the authors write about the set of materials with which the system is trained, “great care must be taken to withhold information from the classifier in order to prevent the model from exploiting the structure of the site and overfitting the surrogate problem” [48]. Overfitting, i.e., the phenomenon of a self-learning system becoming so familiar with its training materials that it is no longer open for the analysis of new ones, shows how failure and learning go together — not merely for people, but for the recommender system, too. In the formula of philosopher Odo Marquard (1994), experience is the opposite of expectation; similarly, a deep-learning system in which everything is exactly as expected tends to lose its capacity for new ‘experience.’

With respect to Goodrow’s (2021) promise of “valued watch time,” the problem of overfitting already suggests that both people and machines need to be adequately stimulated to grow. In this respect, when Zhe Zhao and colleagues (2019) refer to the familiar problem that “a user might have clicked and watched a video simply because it was being ranked high, not because it was the one that the user liked the most” [49], this problematisation of “a feedback loop effect” [50] or “ranking selection bias” ([51] points to the ambiguity that opens up between engineering tasks and notions of responsibility. After all, the “artificial idiocy” (Bostrom, 2003) of feeding users the same recommendations over and over again will neither improve the quality of these users’ experience nor their watch time metrics. To alleviate these problems, Zhao, et al. (2019) propose to substitute a more traditional ranking model (a Rectified Linear Unit [ReLU] model) with one better at taking into account multiple objectives (a Mixture of Experts [MoE] model; [52]). Testing the system with the new MoE model, write the authors, significantly alleviated the “misalignment between user implicit feedback and true user utility” [53].

However, when the authors then move to unpack how they define “true user utility” and the “significant improvements of our proposed system” [54], Goodrow’s (2021) promises of “responsibility” and “value” fly out of the window. “Our ranking system learns from two types of user feedback: 1) engagement behaviors, such as clicks and watches; 2) satisfaction behaviors, such as likes and dismissals,” the authors write [55] so as to then announce that the introduction of their new model “significantly improves both engagement and satisfaction metrics” [56].

Now, along the lines of the theory of the oral outlined in this article, it is just as implausible to increase both the length of people’s “engagement” and the mental quality of their “satisfaction” as it is to proverbially eat one”s cake and have it, too. This is not to say that something can only be good if it is scarce, finite and unscalable (cf., Seaver, 2021) as well as consumed in small quantities; yet, announcing improvements in “user utility” while at the same time celebrating the improvement of “performance on all objectives” [57] cannot be convincingly kept in line with notions of responsibility. Simply put, it is just not credible to keep people in increasingly longer feeding and holding patterns and still expect them to come out the qualitatively better stimulated for it.

This quasi-magical mode of ‘having one’s cake and feeding it forward, too’ I argue, is a constant feature of the engineering literature and casts YouTube’s newfound cloak of responsibility into severe doubt. In this respect, Minmin Chen and colleagues’ (2021) study of “Top-k off-policy correction for a Reinforce recommender system” offers another revealing example. Specifically, it shows how YouTube tries to negotiate between their public responsibilities and commercial interests. [58] Chen, et al. (2021) ask in how far users can and should be enabled to explore different video types through a sophisticated weighing and correcting of how recommendations are generated [59]. The article thus goes right to the heart of how a responsible parental role might be engineered into a recommender system. Unfortunately, though, it inevitably returns to the narrowly oral horizon of “user satisfaction metrics, e.g., indicated by clicks or watch time” [60].

What the article seeks to show is “the value of exploration” [61], with exploration here meaning “actions rarely taken by the existing system” [62]. Aiming to develop a policy that weighs the costs and rewards of such actions, Chen, et al. (2021) present an algorithmic structure [63] that balances recommendations so as to ““get the benefit of exploratory data without negatively impacting user experience” [64] Already Schnabel and colleagues (2018) had tested user responses to recommendations that were in breach with these users’ viewing habits and found that, up to a relatively high degree, feeding unexpected videos into users’ recommendations had no documentable effect on how helpful people reported to find these recommendations. Yet, while Schnabel, et al. (2018) ultimately turn their interest to hearing from the users how satisfied they were with their viewing experience, Chen, et al. (2021) again follow the quasi-magical logic that equates user satisfaction with the quantity of time spent watching. Comparing YouTube’s prior system with a version “more likely to observe the outcomes of some rarer state,” they observe ”a statistically significant increase in ViewTime by 0.07% in the test population” [65]. This improvement is by no means large, the authors admit; yet the category of “exploration,” which accounts for how much people are stimulated to move out of their viewing habits, is thus made vitally dependent again on the quantity of their engagement.

This finding is particularly striking against the significant number of independent studies that indicate an improvement in the diversity of videos YouTube has been offering users in its recommendations in recent years (e.g., Matamoros-Fernández, et al., 2021; Möller, et al., 2018). As much as such diversity is to be welcomed, what the engineering papers indicate is that it is only offered if and up to the degree that it does not hurt — and ideally helps — the overall amount of time people spend on the platform. Again, it is the oral craving of the platform itself that comes in the way of the value and responsibility it promises its users.

As a final example, Jiaqi Ma and colleagues (2020) build on the works by Chen, et al. (2021) and Zhao, et al. (2019) to refine the “bias correction” task in recommendations further. While earlier approaches had merely applied such correction to the level of candidate generation, their advance rests on “explicitly tak[ing] into account the ranking model when training the candidate generation model” [66]. This, the authors claim, “helps improve the performance of the whole system” [67]. From the point of the fetishist thinking traced in this article, what is remarkable is how the logic of ‘more is better’ becomes normalised and rendered invisible here. “There is still a lot of ongoing effort to improve both the efficiency and the recommendation quality” Ma and colleagues [68] write about their engineering approach. Yet, whereas they firmly position their efforts on the side of recommendation quality, when they write that their approach “indeed improves the performance of the whole two-stage recommender system” [69], this takes for granted again that efficiency and quality mean the same thing, go hand in hand and can, or indeed must, be improved together. Whereas, in prior publications, notions of “quality”, “user satisfaction,” and “user utility” would still become explicitly defined, the mere formulaic reference to the “optimization of the performance” in Ma, et al.’s (2020) contribution goes a long way to entrenching this paradoxical conflation of satisfaction and retention in the ideology of engineering for corporate digital platforms. In this way, also the oral logic of addiction becomes perpetuated further in that a relational craving is encased in the cycles of feeding and holding that the platform seeks to optimize for each user in a common-sense way.




As the cultural studies scholar Hartmut Böhme (2014) has shown, fetishist structures of thinking have become a mainstay in (commercial) Western cultures. In this respect, the simple equation of ‘more is better’ is still the most widespread of these structures in practically all industry branches, be they cultural or otherwise. Therefore, to find this fetishist structure at the heart of the design of YouTube’s recommender system is unremarkable but striking at the same time. While it is ‘old news,’ what is interesting nevertheless is how it has been programmed into the very form that YouTube takes as a service. It is in this respect that this article has not aimed at sounding the alarm — users have heard arguments about the potentially harming tendencies of ‘big tech’ many times. Rather, the aim has been to point out the concepts and ideas with which this fetishist structure is being integrated into the system and with which forms of interaction it is entrenching processes of feeding and holding. The recommender system’s telos of making the functions of feeding and holding permanently available and dependable and its invitation to users to depend on them is invariably prioritised to the detriment of the function of stimulating people so as to gain a more sovereign position toward their dependencies. Such a sovereignty, in turn, would not come in forms of total freedom and autonomy from relational dependencies. Rather, it needs to be found in fleeting experiences of satiation and satisfaction and the temporary states of contentment, replenishment and recreation that come with them and that instil people with a desire for engaging with the world again. Experiencing a degree of mastery in positively acting upon this desire is a centrally important step into maturity.

Now, I am painfully aware that these formulations, in amounting to a call for (self-)regulation and moderation, have an embarrassingly ‘bourgeois’ ring to them. In directing them at my readers as individuals, they also seem to approximate a neoliberal logic of personal responsibility. In this way, they uneasily connect to those dimensions of the psychoanalytic tradition which has at times had the image of being culturally conservative (e.g., Preciado, 2021). Especially the U.S. brand of psychoanalytic ego-psychology was long under suspicion of treating people so as to make them adapt to precarious societal conditions (e.g., Krüger, 2011). And yet, on a psychic plane, continuing to try and strike balances and find liveable compromises between dependencies and liberties is an inescapable human task that deserves appreciation.

On the side of YouTube as a digital platform and a “socialisation agency” (Prokop, 1976), taking seriously its own public relations rhetoric of “responsible recommendations” (Goodrow, 2021) would entail thoroughgoing efforts at turning the anti-containing logic of its recommender system, which precariously abates relational conflicts, into a containing one in the full sense of the meaning that Bion (1962) attributed to this term. It would mean to cautiously and strategically fail in supplying an optimal holding environment in ways that would help people to experience, digest and understand the very relational conflicts that make them willing to surrender to abandonment. When YouTube, during the COVID-19 pandemic and in response to the deluge of conspiratorial videos posted to its site, started to prioritise news features from authoritative sources in people’s feeds, this offers a simple first exemplary step for similar processes of weaning and ‘optimally frustrating’ users on the platform.

However, also below such ideals, corporate digital platforms, in line with other cultural-industrial branches, have reached a point at which unfettered growth is no longer sustainable and cannot be upheld as an unblemished aim. Instead, after the era of growth, an era of maturity needs to be rung in (cf., Srinivasan and Ghosh, 2023). Such maturity, in turn, must go together with the clear resolution on the part of national and international governing bodies that ‘big tech’ needs to be reined in and regulated. And it must go together with the insight on the part of ‘big tech’ itself that their services have indeed taken on the significance of public utilities (Basu, et al., 2021). Growing into this role, platforms such as YouTube need to find ways of decoupling ideas of value, quality and user satisfaction from the ubiquitous drive toward increasing dwell time. Recently, Seaver (2021) has outlined what he calls a “decorrelative ethics,” an ethics that, by decorrelating notions of “care” and “scale,” might allow new forms of care to emerge which are ethical and non-human and can thus become available to vast parts of global populations. Such an ethics, however, only seems achievable if the industry manages to decorrelate two other central pillars of their belief systems first, specifically, those of the quantity of demand and the quality of experience. While decision-makers at Google and YouTube, in unguarded moments, will have encountered and entertained such thoughts, it remains anybody’s guess whether it will ever be possible for private corporations to genuinely act upon them without pressure from regulators — in a political economy based on ‘stilling’ people by keeping them entertained all the while extracting value from them. End of article


About the author

Steffen Krüger is senior lecturer and head of the Screen Cultures program at the Department of Media and Communication at University of Oslo (Norway). His research interests are at the intersections between media and cultural studies, as well as psychoanalysis and critical theory. Together with Jacob Johanssen he recently published Media and psychoanalysis — A critical introduction (London, Karnac, 2022).
E-mail: steffen [dot] kruger [at] media [dot] uio [dot] no



1. While the core of the analysis is focused on a relatively small number of key engineering papers (n<10), their adequate understanding is warranted by their contextualisation through further, extensive readings in the field (n>50) as well as by consulting tutorials to, and introductory readings on, specific terms and concepts.

2., accessed 12 July 2023.

3. Camille Roth and Jérémie Poiroux (2022), in an interview study, have rightly warned that statements on ethics by programmers and engineers need to be read with caution, since such statements are usually lofty ideals that are not reflected in actual programming (paragraphs 49–51). However, in my reading of the engineering papers, I do not focus on these statements, but on the kind of ethics that arises from the social logic of what the papers say about the programming itself.

4. Matamoros-Fernández, et al., 2021, p. 235.

5. Ma, et al., 2020, p. 465.

6. Vanden Abeele and Mohr, 2021, p. 1,544.

7. For example, Gilroy-Ware (2017) centrally refers to social media as a fridge that has new things in it every time one opens the door.

8. Vanden Abeele and Mohr, 2021, p. 1,537.

9. However, see Peters (2003) for a “redemptive reading.”

10. Freud, 1905, p. 198.

11. Johnson, 1994, p. 28.

12. Khantzian, 2003, p. 8.

13. Khantzian, 2003, p. 9.

14. Khantzian, 2003, p. 10.

15. Khantzian, 2003, p. 8.

16. McIlwraith, et al., 1991, p. 111.

17. McIlwraith, et al., 1991, p. 117.

18. Ibid.

19. McIllwraith, et al., 1991, p. 104.

20. Seaver, 2019, p. 428.

21. Ibid.

22. Covington, et al.’s, 2016, p. 191.

23. Ibid.

24. Ma, et al., 2020, p. 464.

25. Covington, et al., 2016, p. 192.

26. Ibid.

27. Covington, et al., 2016, p. 192.

28. Ibid.

29. Covington, et al., 2016, p. 193.

30. Covington, et al., 2016, p. 195.

31. Rieder, 2020, p. 267.

32. Zhang, et al., 2019, p. 27.

33. Zhang, et al., 2019, p. 26.

34. Ibid.

35. He, et al., 2014, p. 8.

36. He, et al., 2014, p. 7.

37. Covington, et al., 2016, p. 196; emphasis added.

38. He, et al., 2014, p. 4.

39. Lorenzer, 2022, p. 59.

40. Covington, et al., 2016, p. 197.

41. Ibid.

42. Zhang, et al., 2019, p. 1.

43. Zhang, et al., 2017, p. 1,449.

44. Qin, et al., 2020, p. 3,083.

45. for details, see Stanford Encyclopedia of Philosophy, 2019. “Moral responsibility” (16 October), at, accessed 26 July 2023.

46. Rieder, et al. (2020) have deomonstrated that YouTube’s recommendations have a strong mainstreaming effect.

47. Abram, 2007, p. 242.

48. Covington, et al., 2016, p. 193.

49. Zhao, et al., 2019, p. 43.

50. Ibid.

51. Zhao, et al., 2019, p. 44.

52. Zhao, et al., 2019, p. 47.

53. Zhao, et al., 2019, p. 45.

54. Zhao, et al., 2019, p. 43.

55. Zhao, et al., 2019, p. 46.

56. Zhao, et al., 2019, p. 46; emphasis added.

57. Zhao, et al., 2019, p. 44.

58. And whereas these two need not principally be at odds, in the case of recommendations they have proven thus.

59. Chen, et al., 2021, p. 2.

60. Ibid.

61. Chen, et al., 2021, p. 2.

62. Chen, et al., 2021, p. 6.

63. Based on a corrected Boltzmann exploration strategy (Cesa-Bianchi, et al., 2017).

64. Chen, et al., 2021, p. 6.

65. Chen, et al., 2021, p. 7.

66. Ma, et al., 2020, p. 463.

67. Ibid.

68. Ma, et al., 2020, p. 464.

69. Ma, et al., 2020, p. 472.



Karl Abraham, 1916. “Untersuchungen über die früheste prägenitale Entwicklungsstufe der Libido,” Internationale Zeitschrift für Psychoanalyse, volume 4, number 2, pp. 71–97.

Jan Abram, 2007. The language of Winnicott: A dictionary of Winnicott’s use of words. Second edition. London: Karnac Books.

Adam Alter, 2017. Irresistible: The rise of addictive technology and the business of keeping us hooked. New York: Penguin Press.

Kaushik Basu, Aviv Caspi, and Robert Hockett, 2021. “Markets and regulation in the age of big tech,” Capitalism & Society, volume 15, number 1, pp. 1–20.

Jessica Benjamin, 1988. The bonds of love: Psychoanalysis, feminism, and the problem of domination. New York: Pantheon Books.

Sergio Benvenuto, 2016. What are perversions? Sexuality, ethics, psychoanalysis. London: Routledge.
doi:, accessed 26 July 2023.

Alex Beutel, Paul Covington, Sagar Jain, Can Xu, Jia Li, Vince Gatto, and Ed H. Chi, 2019. “Latent cross: Making use of context in recurrent recommender systems,” WSDM ’18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 46–54.
doi:, accessed 26 July 2023.

Vikram R. Bhargava and Manuel Velasquez, 2021. “Ethics of the attention economy: The problem of social media addiction,” Business Ethics Quarterly, volume 31, number 3 (July 2021), pp. 321–>doi:, accessed 26 July 2023.

Wilfred R. Bion, 1962. Learning from experience. New York: Basic Books.

Sophie Bishop, 2018. “Anxiety, panic and selfoptimization: Inequalities and the YouTube algorithm,” Convergence, volume 24, number 1, pp. 69–84.
doi:, accessed 26 July 2023.

Nick Bostrom, 2003. “Ethical issues in advanced artificial intelligence,” In: Iva Smit and George E. Lasker (editors). Cognitive, emotive and ethical aspects of decision making in humans and in artificial intelligence. Windsor, Ontario: International Institute of Advanced Studies in Systems Research and Cybernetics, pp. 12–17.

John Bowlby, 2005. The making and breaking of affectional bonds. London: Routledge.
doi:, accessed 26 July 2023.

Taina Bucher, 2021. Facebook. Cambridge: Polity.

Jean Burgess and Joshua Green, 2018. YouTube: Online video and participatory culture. Second edition. Cambridge: Polity.

Nicolò Cesa-Bianchi, Claudio Gentile, Gábor Lugosi, and Gergely Neu, 2017. “Boltzmann exploration done right,” NIPS’17: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 6,287–6,296.

Annie Y. Chen, Brendan Nyhan, Jason Reifler, Ronald E. Robertson, and Christo Wilson, 2023. “Subscriptions and external links help drive resentful users to alternative and extremist YouTube videos,” arXiv:2204.10921v2 (2 April).
doi:, accessed 26 July 2023.

Minmin Chen, Alex Beutel, Paul Covington, Sagar Jain, Francois Belletti, and Ed Chi, 2021. “Top-k off-policy correction for a REINFORCE recommender system,” arXiv:1812.02353v3 (15 December).
doi:, accessed 26 July 2023.

Wendy Hui Kyong Chun, 2016. Updating to remain the same: Habitual new media. Cambridge Mass.: MIT Press.
doi:, accessed 26 July 2023.

David T. Courtwright, 2019. The age of addiction: How bad habits became big business. Cambridge Mass.: Belknap Press of Harvard University Press.

Paul Covington, Jay Adams, and Emre Sargin, 2016. “Deep neural networks for YouTube recommendations,” RecSys ’16: Proceedings of the 10th ACM Conference on Recommender Systems, pp. 191–198.
doi:, accessed 26 July 2023.

David Cronenberg (director), 1983. Videodrome. Filmplan International.

Natasha Dow-Schüll, 2012. Addiction by design: Machine gambling in Las Vegas. Princeton, N.J.: Princeton University Press.

Natasha Dow-Schüll, 2005. “Digital gambling: The coincidence of desire and design,” Annals of the American Academy of Political and Social Science, volume 597, number 1, pp. 65–81.
doi:, accessed 26 July 2023.

Marc Faddoul, Guillaume Chaslot, and Hany Farid, 2020. “A longitudinal analysis of YouTube’s promotion of conspiracy videos,” arXiv:2003.03318v1 (6 March).
doi:, accessed 26 July 2023.

Helen E. Fisher, 2016. Anatomy of love: A natural history of mating, marriage, and why we stray. Completely revised and updated. New York: Norton.

John Fiske, 1989. Reading the popular. London: Routledge.
doi:, accessed 26 July 2023.

Sigmund Freud, 1905. “Three essays on the theory of sexuality,” at, accessed 26 July 2023.

Marcus Gilroy-Ware, 2017. Filling the void: Emotion, capitalism and social media. London: Repeater Books.

Zoe Glatt, 2022. “‘We’re all told not to put our eggs in one basket’: Uncertainty, precarity and cross-platform labor in the online video influencer industry,” International Journal of Communication, volume 16, at, accessed 26 July 2023.

Zoe Glatt and Sarah Banet-Weiser, 2021. “Productive ambivalence, economies of visibility and the political potential of feminist YouTubers,” In: Stuart Cunningham and David Craig (editors). Creator culture: An introduction to global social media entertainment. New York: New York University Press, pp. 39–56.
doi:, accessed 26 July 2023.

Cristos Goodrow, 2021. “On YouTube’s recommendation system,” YouTube official blog (15 September), at, accessed 3 January 2023.

Stuart Hall, 1980. “Introduction to media studies at the centre,” In: Stuart Hall, Dorothy Dobson, Andrew Lowe, and Paul Willis (editors). Culture, media, language: Working papers in cultural studies, 1972–79. London: Routledge, pp. 104–109.
doi:, accessed 26 July 2023.

Xinran He, Junfeng Pan, Ou Jin, Tianbing Xu, Bo Liu, Tao Xu, Yanxin Shi, Antoine Atallah, Ralf Herbrich, Stuart Bowers, and Joaquin Quiñonero Candela, 2014. “Practical lessons from predicting clicks on ads at Facebook,” ADKDD’14: Proceedings of the Eighth International Workshop on Data Mining for Online Advertising, pp. 1–9.
doi:, accessed 26 July 2023.

Wendy Hollway, 2015. Knowing mothers: Researching maternal identity change. London: Palgrave Macmillan.
doi:, accessed 26 July 2023.

Axel Honneth and Joel Whitebook, 2019. “Fusion or omnipotence: A dialogue,” In: Amy Allen and Brian O’Connor (editors). Transitional subjects: Critical theory and object relations. New York: Columbia University Press, pp. 23–45.

Tobe Hooper (director), 1982. Poltergeist. Metro-Goldwyn-Mayer (MGM).

Max Horkheimer and Theodor W. Adorno, 1994. Dialectic of enlightenment. Translated by John Cumming. New York: Continuum.

Homa Hosseinmardi, Amir Ghasemian, Aaron Clauset, Markus Mobius, David M. Rothschild, and Duncan J. Watts, 2022. “Examining the consumption of radical content on YouTube,” arXiv:2011.12843v2 (14 February).
doi:, accessed 26 July 2023.

Lu Jiang, Yajie Miao, Yi Yang, Zhenzhong Lan, and Alexander G. Hauptmann, 2014. “Viral video style: A closer look at viral videos on YouTube,” ICMR ’14: Proceedings of International Conference on Multimedia Retrieval, pp. 193–200.
doi:, accessed 26 July 2023.

Stephen M. Johnson, 1994. Character styles. New York: Norton.

Jonas Kaiser and Adrian Rauchfleisch, 2019. “The implications of venturing down the rabbit hole,” Internet Policy Review, volume 8, number 2 (27 June), at, accessed 26 July 2023.

Rishabh Kaushal, Srishty Saha, Payal Bajaj, and Ponnurangam Kumaraguru, 2016. “KidsTube: Detection, characterization and analysis of child unsafe content & promoters on YouTube,” arXiv:1608.05966v1 (21 August).
doi:, accessed 26 July 2023.

Edward Khantzian, 2003. “Understanding addictive vulnerability: An evolving psychodynamic perspective,” Neuro-Psychoanalysis, volume 5, number 1, pp. 5–21.
doi:, accessed 26 July 2023.

Hedy Kober, 2014. “Emotion regulation in substance use disorders,” In: James J. Gross (editor). Handbook of emotion regulation. Second edition. New York: Guilford Press, pp. 428–446.

Heinz Kohut, 2009. The analysis of the self: A systematic approach to the psychoanalytic treatment of narcissistic personality disorders. Chicago: University of Chicago Press.

Yehuda Koren, Robert Bell, and Chris Volinsky, 2009. “Matrix factorisation techniques for recommender systems,” Computer, volume 42, number 8, pp. 30–37.
doi:, accessed 26 July 2023.

Steffen Krüger, 2011. Das Unbehagen in der Karikatur. Kunst, Propaganda und persuasive Kommunikation im Theoriewerk Ernst Kris’. Paderborn: Verlag Wilhelm Fink.

Jacques Lacan, 2006. “The mirror stage as formative of the I function,” In: Jacques Lacan. Ecrits: The first complete edition in English. Translated by Bruce Fink. New York: Norton, pp. 75–82.

Jean Laplanche, 1999. Essays on otherness. Edited by John Fletcher. London: Routledge.
doi:, accessed 26 July 2023.

Mark Ledwich, Anna Zaitsev, and Anton Laukemper, 2022. “Radical bubbles on YouTube? Revisiting algorithmic extremism with personalised recommendations,” First Monday, volume 27, number 12.
doi:, accessed 26 July 2023.

Steven Levy, 2011. In the plex: How Google thinks, works, and shapes our lives. New York: Simon & Schuster.

Paul Lewis, 2018. “‘Fiction is outperforming reality’: How YouTube’s algorithm distorts truth,” Guardian (2 February), at, accessed 26 July 2023.

Paul Lewis and Erwin McCormic, 2018. “How an ex-YouTube insider investigated its secret algorithm,” Guardian (2 February), at, accessed 26 July 2023.

Alfred Lorenzer, 2022. “In-depth hermeneutical cultural analysis,” In: Katharina Rothe, Steffen Krüger and Daniel Rosengart (editors), Cultural analysis now! Alfred Lorenzer and the in-depth hermeneutics of culture and society. New York: Unconscious in Translation, pp. 21–122.

Jiaqi Ma, Zhe Zhao, Xinyang Yi, Ji Yang, Minmin Chen, Jiaxi Tang, Lichan Hong, and Ed H. Chi, 2020. ”Off-policy Learning in two-stage recommender systems,” WWW ’20: Proceedings of the Web Conference 2020, pp. 463–473.
doi:, accessed 26 July 2023.

Odo Marquard, 1994. Skepsis und Zustimmung: philosophische Studien. Stuttgart: Reclam.

Alice E. Marwick, 2008. “To catch a predator? The MySpace moral panic,” First Monday, volume 13, number 6.
doi:, accessed 26 July 2023.

Ariadna Matamoros‐Fernández, Joanne E. Gray, Louisa Bartolo, Jean Burgess, and Nicolas Suzor, 2021. “What’s ‘up next’? Investigating algorithmic recommendations on YouTube across issues and over time,” Media and Communication, volume 9, number 4, pp. 234–249.
doi:, accessed 26 July 2023.

Robert McIlwraith, Robin Smith Jacobvitz, Robert Kubey, and Alison Alexander, 1991. “Television addiction: Theory and data behind the ubiquitous metaphor,” American Behavioural Scientist, volume 35, number 2, p. 104–121.
doi:, accessed 26 July 2023.

Judith Möller, Damian Trilling, Natali Helberger, and Bram van Es, 2018. ”Do not blame it on the algorithm: An empirical assessment of multiple recommender systems and their impact on content diversity,” Information, Communication & Society, volume 21, number 7, pp. 959–977.
doi:, accessed 26 July 2023.

Jack Nicas, 2018. “How YouTube drives people to the Internet’s darkest corners,” Wall Street Journal (7 February), at, accessed 26 July 2023.

Amy Orben, 2020. “The Sisyphean cycle of technology panics,” Perspectives on Psychological Science, volume 15, number 5, pp. 1,143–1,157.
doi:, accessed 26 July 2023.

Oxford English Dictionary, 2022. “Responsibility,” at, accessed 26 July 2023.

Kostantinos Papadamou, Savvas Zannettou, Jeremy Blackburn, Emiliano De Cristofaro, Gianluca Stringhini, and Michael Sirivianos, 2022. “‘It is just a flu’: Assessing the effect of watch history on YouTube’s pseudoscientific video recommendations,” 16th International Conference on Web and Social Media, at, accessed 26 July 2023.

John Durham Peters, 2003. “The subtlety of Horkheimer and Adorno: Reading ‘The culture industry’,” In: Elihu Katz, John Durham Peters, Tamar Liebes, and Avril Orloff (editors). Canonic texts in media research: Are there any? Should there be? How about these?. Cambridge: Polity, pp. 58–73.

Paul B. Preciado, 2021. Can the monster speak? Report to an academy of psychoanalysts. Translated by Frank Wynne. London: Fitzcarraldo Editions.

Ulrike Prokop, 1976. Weiblicher Lebenszusammenhang: von der Beschränktheit der Strategien und der Unangemessenheit der Wünsche. Frankfurt am Main: Suhrkamp.

Zhen Qin, Yicheng Cheng, Zhe Zhao, Zhe Chen, Donald Metzler, and Jingzheng Qin, 2020. “Multitask mixture of sequential experts for user activity streams,” KDD ’20: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 3,083–3,091.
doi:, accessed 26 July 2023.

Bernhard Rieder, 2020. Engines of order: A mechanology of algorithmic techniques. Amsterdam: Amsterdam University Press.
doi:, accessed 26 July 2023.

Bernhard Rieder, Òscar Coromina, and Ariadna Matamoros-Fernández, 2020. “Mapping YouTube: A quantitative exploration of a platformed media system,” First Monday, volume 25, number 8.
doi:, accessed 26 July 2023.

Camille Roth and Jérémie Poiroux, 2022. “L’écriture guidée du code. Le cas des recommenda de recommendation (Guiding code development. The case of recommender systems),” RESET — Recherches en sciences sociales sur Internet (Social science research on the Internet), volume 11.
doi:, accessed 26 July 2023.

Antoinette Rouvroy and Bernard Stiegler, 2016. “The digital regime of truth: From algorithmic governmentality to a new rule of law,” La Deleuziana — Online Journal of Philosophy, number 3 (Translated by Anaïs Nony and Benoît Dillet), at, accessed 26 July 2023.

Edith Sabshin, 1995. “Psychoanalytic studies of addictive behavior: A review,” In: Scott Dowling (editor). Psychology and treatment of addictive behavior. Madison, Conn.: International Universities Press, pp. 3–15.

Robert A. Savitt, 1963. “Psychoanalytic studies on addiction: Ego structure in narcotic addiction,” Psychoanalytic Quarterly, volume 32, number 1, pp. 43–57.

Loredana Scala, Maria Rosaria Anna Muscatello, Antonio Bruno, and Rocco Antonio Zoccali, 2017. ”Neurobiological and psychopathological mechanisms underlying addictionlike behaviors: An overview and thematic synthesis,” Mediterranean Journal of Clinical Psychology, volume 5, number 2.
doi:, accessed 26 July 2023.

Josephine B. Schmitt, Diana Rieger, Olivia Rutkowski, and Julian Ernst, 2018. “Counter-messages as prevention or promotion of extremism?! The potential role of YouTube recommendation algorithms,” Journal of Communication, volume 68, number 4, pp. 780–808.
doi:, accessed 26 July 2023.

Tobias Schnabel, Paul N. Bennett, Susan T. Dumais, and Thorsten Joachims, 2018. “Short-term satisfaction and long-term coverage: Understanding how users tolerate algorithmic exploration,” WSDM ’18: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, pp. 513–521.
doi:, accessed 26 July 2023.

Nick Seaver, 2021. “Care and scale: Decorrelative ethics in algorithmic recommendation,” Cultural Anthropology, volume 36, number 3, at, accessed 26 July 2023.

Nick Seaver, 2019. “Captivating algorithms: Recommender systems as traps,” Journal of Material Culture, volume 24, number 4, pp. 421–436.
doi:, accessed 26 July 2023.

Sarah Snow, 2021. “The struggle over YouTube’s recommendation algorithm,” paper presented to the faculty of the School of Engineering and Applied Science, University of Virginia (22 November), at, accessed 3 January 2023.

Ramesh Srinivasan and Dipayan Ghosh, 2023. “A new social contract for technology,” Policy & Internet, volume 15, number 1, pp. 117–132.
doi:, accessed 26 July 2023.

Mel Stanfill, 2020. “Introduction: The reactionary in the fan and the fan in the reactionary,” Television & New Media, volume 21, number 2, pp. 123–134.
doi:, accessed 26 July 2023.

Daniel Stern, 1985. The interpersonal world of the infant: A view from psychoanalysis and developmental psychology. New York: Basic Books.

Alistair Sweet, 2013. “Thoughts without a thinker, mimetic fusing and the anti-container considered as primitive defensive mechanisms in the addictions,” Psychoanalytic Psychotherapy, volume 27, number 2, pp. 140–153.
doi:, accessed 26 July 2023.

Yi Tay, Anh Tuan Luu, and Siu Cheung Hui, 2018. “Multi-pointer co-attention networks for recommendation,” KDD ’18: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 2,309–2,318.
doi:, accessed 26 July 2023.

Zeynep Tufekci, 2018. “YouTube, the great radicalizer,” New York Times (10 March), at, accessed 26 July 2023.

Mariek MP Vanden Abeele and Victoria Mohr, 2021. “Media addictions as Apparatgeist: What discourse on TV and smartphone addiction reveals about society,” Convergence, volume 27, number 6, pp. 1,536–51,557.
doi:, accessed 26 July 2023.

Ronald J. Williams, 1992. “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” Machine Learning, volume 8, pp. 229–256.
doi:, accessed 26 July 2023.

Donald W. Winnicott, 2016. “Communicating and not communicating leading to a study of certain opposites,” In: Lesley Caldwell and Taylor Robinson (editors). The collected works of D. W. Winnicott. Volume 6, 1960–1963. Oxford: Oxford University Press, pp. 433–446.
doi:, accessed 26 July 2023.

Donald W. Winnicott, 1971a. Playing and reality. London: Tavistock Publications.

Donald W. Winnicott, 1971b. “Mirror-role of mother and family in child development,” In: Donald W. Winnicott. Playing and reality. London: Tavistock Publications, pp. 130–139.

Muhsin Yesilada and Stephan Lewandowski, 2022. “Systematic review: YouTube recommendations and problematic content,” Internet Policy Review, volume 11, number 1.
doi:, accessed 26 July 2023.

Xing Yi, Liangjie Hong, Erheng Zhong, Nathan Nan Liu, and Suju Rajan, 2014. “Beyond clicks: Dwell time for personalization,” RecSys ’14: Proceedings of the Eighth ACM Conference on Recommender Systems, pp. 113–120.
doi:, accessed 26 July 2023.

Xinyang Yi, Ji Yang, Lichan Hong, Derek Zhiyuan Cheng, Lukasz Heldt, Aditee Kumthekar, Zhe Zhao, Li Wei, and Ed Chi, 2019. “Sampling-bias-corrected neural modeling for large corpus item recommendations,” RecSys ’19: Proceedings of the 13th ACM Conference on Recommender Systems, pp. 269–277.
doi:, accessed 26 July 2023.

YouTube Team, 2019a. “The four Rs of responsibility, Part 1: Removing harmful content,” YouTube Blog (3 September), at, accessed 26 July 2023.

YouTube Team, 2019b. “The four Rs of responsibility, Part 2: Raising authoritative voices and reducing borderline content and harmful misinformation” YouTube Blog (3 December), at, accessed 26 July 2023.

Shuai Zhang, Lina Yao, Aixin Sun, and Yi Tay, 2019. “Deep learning based recommender system: A survey and new perspectives,” ACM Computing Surveys, volume 52, number 1, article number 5, pp. 1–38.
doi:, accessed 26 July 2023.

Yongfeng Zhang, Qingyao Ai, Xu Chen, and W. Bruce Croft, 2017. “Joint representation learning for top-N recommendation with heterogeneous information sources,” CIKM ’17: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 1,449–1,458.
doi:, accessed 26 July 2023.

Zhe Zhao, Lichan Hong, Li Wei, Jilin Chen, Aniruddh Nath, Shawn Andrews, Aditee Kumthekar, Maheswaran Sathiamoorthy, Xinyang Yi, and Ed Chi, 2019. “Recommending what video to watch next: A multitask ranking system,” RecSys ’19: Proceedings of the 13th ACM Conference on Recommender Systems, pp. 43–51.
doi:, accessed 26 July 2023.


Editorial history

Received 20 February 2023; revised 18 July 2023; accepted 20 July 2023.

Creative Commons License
This paper is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Have your cake and feed it forward too: YouTube, oral cravings and the persistent question of media addiction
by Steffen Krüger.
First Monday, Volume 28, Number 11 - 6 November 2023