First Monday

Education runs quickly violence runs slowly: An analysis of closed captioning speed and reading level in children’s television franchises by Edward Schneider



Abstract
Closed captioning is a great example of how efforts in technological equity can benefit all, as closed captioning has become a common tool in childhood literacy and new language learning for adults. In comparison to traditional print reading forms closed captioning is in its infancy. Popular methodologies for assessing reading level of text were invented before closed captioning existed. This study is an investigates an aspect of closed captioning that has historically received less attention, the content and reading level of the closed captions.

While considerable research attention has been given to the role of speed in relation to reading comprehension of closed captioning, less attention has been given to specific analysis of closed captioning content. This study analyzed closed captioning text extracted from 337 episodes of popular children’s television shows through the lens of content and reading level analysis using both traditional literacy tools and tools for analysis of closed captions. The goal was to provide insight for language teachers, parents of deaf children, and other relevant practitioners to make more informed and equitable decisions about children’s programming. The 21 shows sampled displayed clear patterns based on genres and age. The data also provided insight into methodological assessment differences between closed captioning and traditional print forms.

Contents

Introduction
Literature review
Research questions
Methodology
Results
Discussion
Conclusion

 


 

Introduction

Video closed captioning is an established part of academic literature in new language acquisition. A quick search of published research studies involving new language acquisition and video closed captions reveals hundreds of studies focused on the efficacy of closed captions in learning. The only research study that produced specific quantitative information on the closed captioning across a range of shows was a large study by Jensema, McCann, and Ramsey in 1996. They analyzed 183 individual shows, music videos, and movies. This study is similar in scope to Jensema, McCann, and Ramsey, with three key differences. This study is specifically focused on children’s television, it uses season-length samples, and uses tools that did not exist in 1996. Closed captioning became a part of literacy development for deaf children in the 1970s, by the 1980s reading specialists and others noticed its potential in literacy education for hearing children (Koskinen, et al., 1986). In the decades since many other studies in the United States as well as countries such as Finland (Vanderplank, 2016) and India (Randev, 2014) have come to similar conclusions, if a child is watching television, a parent should turn the closed captioning on. However little help is available in prescriptive selection of shows based on reading level or closed captioning speed.

This research analyzed the closed captions from season-length samples of 21 children’s television shows across a range of genres, some from public television and some from commercial television. Closed captioning is of interest to many people for both personal and professional reasons, and there are hundreds of studies showing that closed captioning is an effective educational tool for many different audiences. However a new language teacher looking for a show with slower closed captions for a beginner class would have no way of knowing anything definitive about the speed or reading level of the closed captioning in a potential show without watching it first and making a personal estimate.

Streaming video presents parents, teachers, and practitioners with more options for children’s television programming than ever before. Teachers of deaf children and teachers in new language acquisition may want to know more about a program’s closed captions when guiding media selection. Research has long supported the idea that the rapid pace of television closed captions can interfere with a deaf child’s comprehension of what they were watching and many countries have regulations or guidelines concerning speed. Shroyer and Birch (1980) established that speed of closed captions could make text on a child’s reading level unreadable if it was too fast. In that same study they measured reading speed of children of what they called primary and intermediate ages. They found that average primary age children (ages 8–11) can read at 116.3 words per minute and average intermediate age (17–20) children could read at 135.2 words per minute. This study provides initial measurements and analysis to help interested parties understand how fast closed captions run across a range of in a children’s television programs of different genres. The data from this research establishes that closed captioning speed does not always match the advertised age level in a given children’s program in predictable ways.

 

++++++++++

Literature review

Closed captioning services began to take shape in limited markets during the early 1970s. By the late 1970s studies on closed captioning were beginning to appear in the fields of deaf communications (Sillman, 1978; Propp, 1978) and educational technology (Braverman and Cronin, 1978). The first national use of closed captioning in America in 1980 required a separate box to accompany a television set. At that time the available television offerings were limited. Included amongst the very first closed captioned programming was the children’s show Zoom (Austin, 1980). The wider adoption of closed captioning across American television was spurred on by the Television Decoder Circuitry Act of 1990, stating that in the United States all television sets with screens 13 inches or larger must have built-in closed caption functionality (Owens, 1990).

Carl Jensema led some of the earliest studies that influenced the creation and form of early closed captioning (Jensema, et al., 1975). That research that inspired this effort provided quantitative information on speed and content of closed captions. Jensema, et al. (1996) asked “What is the caption presentation speed of programs currently being shown on television?” [1] They posed this question without citing research providing clues. A citation analysis of related research did not find any similar studies created since the original work. The technology of Jensema’s study was videotape based, noting that captions were ‘rolling’ on screen. Today captions digitally appear and disappear, as needed. Jensema, et al. (1996) examined 183 different shows, movies, and music videos for a range of audiences. Included in the sample were 20 children’s shows, but only one episode of each. This study is more focused than Jensema, et al. (1996), as it studied season-length samples of 331 episodes of 21 different children’s shows.

Huffman (1986) was one of the first studies involving closed captions and English as a second language. In the decades since research on closed captions has evolved in different directions. There are hundreds if not thousands of published studies that deal with how closed captions affect viewer comprehension of videos using a new language and learner attitudes towards closed captions. The range of studies on the accuracy of closed captions and how well they match what is on screen are rooted in early debates about how much of a program should be included in closed captions (Fresno, 2021). Studies often found in computer science focus on methodologies for algorithmically assessing reading difficulty (Pantula and Kuppusamy, 2020) but none of these could be found being applied elsewhere.

A 2012 guide from the Broadcasting Authority of Ireland stated on its first page that “Subtitles for children should have regard to the reading age of the intended audience.” [2] Note that it referenced “reading age” not “reading speed,” indicative that reading level should be considered. Researchers and practitioners have long considered reading level as an important factor for success in areas such as public health (Davis, et al., 1994) and the design of testing (Phillips, 1994). However television shows only indicate that they are closed captioned; no other contextual information is provided. A parent of a deaf child would have to individually assess if the reading level of a program is appropriate for their child by watching it. While methodologies for measuring closed captioning reading level have been proposed, there is no current way of finding programs by reading level.

A common topic in literacy research in closed captioning is the speed of captions. Research connecting speed of captions to comprehension can be traced back to Shroyer and Birch’s (1980) study establishing that the speed of captions can affect comprehension for a majority of deaf children [3]. Investigations into closed captioning speed and comprehension are especially apt in the United States, as American closed captioning puts the complete text on screen, in comparison to some other countries that put shortened text to aid readers. The Federal Communications Commission (U.S. FCC, 2014) issued new standards for closed captioning that mandated the content of closed captioning mirror audio content. This move increased the amount of closed captioning content that was shown on screen, resulting in closed captions that appeared more rapidly (Fresno, 2018). Fresno’s study of animated cartoon channels showed that children’s programming often moved at a rate that could make it difficult for children to read, even children who are in the target demographic for a program.

Several studies since Jensema, et al.’s (1996) study presented detailed information on closed captioning of specific children’s television programming. Fresno (2018) contained analysis of entire blocks of shows from children’s cable channels. Her study focused on the intersection of speed and comprehension; it found differences between commercial and public television channels that inspired an analysis on that factor in this research’s program selection. Fresno found that commercial channels had significant captioning that exceeded the 17 characters per second threshold that literacy specialists note as potential trouble for readers [4]. This study highlights how different researchers approach the topic of measuring speed. Fresno’s research considers a space as a character in character per second analysis.

De Linde and Kay (2016) provided some information on individual children’s shows. Their research is primarily interested in international and translation issues. Most of the programming investigated is British. The quantitative analysis involved the accuracy of captions and how captions are handled in situations that could potentially confuse a deaf child. For example, closed captioning that stems from an unseen narrator did not match what was on screen as much as closed captioning that stemmed from a character speaking in front of the camera. Further citation analysis did not locate further research on closed captioning in children’s shows.

Careful consideration of reading level is important in media selection. There are different proposed methodologies for measuring reading level, but no widely adopted standards for analysis of closed captions. Even the basic unit of speed varies, with studies split between investigating characters per second (Jensema, et al., 1996) and others like investigating words per minute (De Linde and Kay, 2016). This research applied four commonly used measures from traditional literacy assessment. Reading closed captions is a unique reading environment that most likely could use its own measurement methodology that synthesizes the speed of text presentation and the reading level of that text. The wider intent of this research is partially to measure the reading level of specific shows, while also considering the aptitude of traditional reading level tools for the task of measuring closed captions. That is why other basic quantitative measurements of the text content of closed captions, outside of reading level, were also considered. The number of words per episode was calculated, as well as the number of unique words. Characters per second was tabulated in one select case.

In summary, closed captioning is commonly utilized and examined in education, but little information about individual shows is readily available.

 

++++++++++

Research questions

R1: Is there a statistical relationship between captioning speed and show genre or content?

R2: Do any of the shows in the sample exceed the thresholds considered too fast for closed captioning by relevant researchers or organizations?

R3: Are there relationships between measured reading level of closed captions and closed captioning speed?

R4: Are there patterns of reading level and genre or age?

R5: Is there a relationship between speed at which words are displayed in closed captions and complexity of text?

R6: Is there a relationship between production year and speed of closed captions?

 

++++++++++

Methodology

Five reading level assessment methods commonly used in research were selected for this study — the Coleman-Liau Index (Coleman and Liau, 1975), Flesch-Kincaid Grade Level Test (Kincaid, et al., 1975), Automated Readability Index (ARI) (Senter and Smith, 1967), Gunning-Fog Index (Gunning, 1952), and Simple Measure of Gobbledygook (SMOG) (McLaughlin, 1969). The Coleman-Liau Index and the ARI measure readability based on characters per word and words per sentence, in comparison to the Flesch-Kincaid model that uses words per sentence and syllables per word. The SMOG is designed specifically to look for workplace jargon and other complicated text through a word length analysis that focuses on longer words. Gunning-Fog takes both complex words and sentence length into consideration. These five were chosen because they were found to have been used recently in readability research across fields. They return results that can be directly translated into school grade level to test if reading levels were appropriate for children and inter-method reliability.

There were four guiding principles in selecting our sample of children’s shows. The first principle was an attempt to represent current and available streaming children’s television. Thus selections were limited to individual shows that were openly available on popular streaming services at the time of this writing. The second principle guiding show selection was a preference for widely recognizable and internationally famous franchises. This was done so that our audience was more likely to be familiar with those television shows selected for analysis and discussion. Third, the sample selected a balance of shows with male and female protagonists. Lastly, the sample included shows with a balance of public and private funding.

Once individual programs were selected, a single season of shows were subjected to analysis, except in four special cases. The syndicated children’s television shows She-Ra Princess of Power (1985), Mightly Morphin’ Power Rangers (1993), Transformers (1985), and He-Man and the Masters of the Universe (1983) feature first seasons with dozens of episodes, in some cases over 100 episodes in the initial season. In these cases, 16 episodes for each series were included for analysis. In other cases the first season was selected, except the case of PBS’s Arthur where only the tenth season was available on Amazon Prime and therefore selected.

In total the sample included full seasons of 17 shows and 16 episode samples of four other shows, for a total of 21 shows for analysis, altogether 337 episodes.

From oldest to newest, the shows selected were: He-Man and the Masters of the Universe (1983), Transformers (1985), She-Ra: Princess of Power (1985), Mightly Morphin’ Power Rangers (1993), The Magic School Bus (1994), Arthur (1996), Dora the Explorer (2000), Peppa Pig (2004), Scooby Doo! Mystery Incorporated (2010), My Little Pony: Friendship is Magic (2010), Wild Kratts (2010), Octonauts (2012), Barbie Life in the Dreamhouse (2012), Transformers: Rescue Bots (2012), Daniel Tiger’s Neighborhood (2012), The Magic School Bus Rides Again (2017), She-Ra and the Princesses of Power (2018), DC’s Super Hero Girls (2019), Transformers: War for Cybertron (2020), Masters of the Universe: Revelation (2021) , and a relatively new show entitled He-Man and the Masters of the Universe (2021). The two Masters of the Universe shows released on Netflix in 2021 are not related in story or animation style. Masters of the Universe: Revelation is more serious and complex, while the 2021 He-Man and the Masters of the Universe is aimed at a younger audience.

When a television show is streamed, closed captioning information is sent as a text file formatted in XML for a video device to use to time placement. Using the developer tools sidebar in a Web browser, closed captioning text files were downloaded as XML (eXtensible Markup Language) files with .XML file extensions. The previously mentioned Black Box tool only reads a format known as SubRip Subtitle files, with the file extension SRT. The XML files were converted to SRT for analysis with Black Box which only made tiny changes to supporting metadata in the file header. The XML files were also cleaned of XML symbols and saved as text files for analysis with traditional text tools. The cleaned plain text versions of captioning was then passed through each of the five literacy tools as well as a quantitative text analysis tool, ANTConc (Anthony, 2010). Immediately after quantitative analysis, closed captioning files were deleted to comply with relevant copyright regulations.

As Fresno (2018) noted: “There is no universal way to calculate the speed of subtitles.” This analysis was inspired by Jensema, et al. (1996) where words were amalgamated in a database and spaces were not counted. This effort mirrored those methods of Jensema, et al. with comparable results. Another other aspect of this study involves testing how apt traditional text tools are in a closed captioning context. The data from Black Box counts every character (including spaces) in an individual closed caption against the time closed captioning provides to those characters. None of the previously mentioned reading level assessment tools counted spaces as characters. Black Box does not provide words per minute data and many international speed benchmarks are given in a words per minute format, limiting Black Box’s utility in that context. Characters per second (albeit without spaces) can be calculated from Jensema, et al.’s method; words per minute cannot be calculated from Black Box data. Black Box data will be reported contextually, as appropriate. General readers would not normally be familiar with text counts that considered spaces. Thus this research uses two methods to measure the speed of closed captions but will generally opt for Jensema, et al.’s methodology as it is more methodologically consistent with reading level tools and international assessment methods.

Just as there is no universal way to measure speed of closed captions, there is no universal way to report speed. Researchers in related studies generally use one of three different measurements: words per minute, characters per minute, and characters per second. This study opts for words per minute for potential reader familiarity and clarity of reporting in most assessments. In general words per minute is a far more common measurement than characters per second, and has been used as a measure of expertise in Morse code. Words per minute is commonly used in typing assessment and reader familiarity is the rationale behind Jensema, et al.’s (1996) use of words per minute. The results from characters per second are single digit numbers requiring readers to parse decimals to see differences between shows. Characters per second was only be used sparingly in comparison to some problematic benchmarks established in prior literature. Characters per minute returns numbers in the hundreds, requiring readers to contemplate how long 450 characters would be. The results from words per minute are expressed in numbers more akin to academic writing samples, as a paragraph is approximately 100 words.

All but one of the shows was designed for a half-hour time slot, with some room for sponsorship and station identification. These shows differed in length from 27-minute-long PBS shows to 22-minute action shows that accompanied toy lines. The eleven-minute episodes of Super Hero Girls were the only outlier. Using methods that mirror Jensema, et al. (1996), the baseline reading speed was determined by dividing the average number of words in an episode by the average number of minutes in an episode. Characters per second was tabulated in limited instances also using Jensema’s method where spaces were not counted for comparison to Black Box’s methodology.

 

++++++++++

Results

R1: Is there a statistical relationship between captioning speed and show genre or content?

When the sampled 21 shows were arranged by the average number of words per minute there was a clearly observable pattern at both the top and the bottom of the list. Four of the top five shows that required reading the fastest were publicly funded educational shows. Both Jensema, et al. (1996) and Fresno (2018) similarly found educational shows to have the fastest captions. Jensema found the PBS Show SciGirls to have a majority of its closed captions running at a high rate, and also found Bill Nye the Science Guy to be the fastest kids show in his sample. Across the sample the shows presented 95.08 words in an average minute. Ten of 21 shows in the sample had average word-per-minute speeds that were higher. The shows in the faster half were either educational or had a female protagonist; all were produced after the year 2000.

The fastest show was PBS’s Arthur, with 3,459.3 average words per episode, running a 128.1 words per minute in an episode. All three of the shows with over 3,000 words per episode were publicly funded PBS shows, given their longer run time. PBS shows do not allot time for commercials in broadcast length, and therefore these shows have a 27-minute (Arthur and Wild Kratts) or 25-minute runtime (Daniel Tiger’s Neighborhood) in comparison to the 23-minute, 2-second average runtime in the sample. However these same publicly funded shows also were three of the four fastest in terms of words per minute. The commercially funded My Little Pony: Friendship is Magic was the second fastest show with a speed more commonly found in publicly funded PBS shows. My Little Pony was representative of a cluster of shows with female protagonists that followed the PBS shows in terms of speed. Magic School Bus Rides Again, Barbie Life in the Dreamhouse, and DC’s Super Hero Girls were among the next set of shows. Of the 11 fastest shows only Transformers Rescue Bots was the sole commercially funded show with a male protagonist.

R2: Do any of the shows in the sample exceed the thresholds considered too fast for closed captioning by relevant researchers or organizations?

Differences in measurement methodologies are clearly important in determining if shows are too fast. The fastest of the fast shows, PBS’s Arthur, averaged 13.8 characters per second using Jensema, et al.’s (1996) method not counting spaces. However Black Box, which counts spaces, found it to be running at 18.689 characters per second (cps). Generally 12 cps is the edge of problematic speed below the concern thresholds of most relevant guidelines. The slowest show, Transformers (1985), had closed captions running at 6.95 characters per second according to Jensema, et al.’s method and Black Box found it to be even slower (3.67 cps). Black Box found the slower speed, even counting spaces, as it considered each caption in its allocated time. Black Box could identify when two segments of captioning might be on screen at the same time for an extended period. Jensema, et al.’s method divides total number of characters in words by total time.

Some international speed guidelines frame 120 words per minute as the beginning of trouble for children (Canadian Radio-Television and Communications Commission, 2016; Mikul, 2014). The previously mentioned Irish broadcast standards provided the slowest benchmark, 70 to 90 words per minute, for shows that serve “pre-lingually deaf children” [5]. Only two shows were found to breach the 120-words-per-minute guideline. An average episode of PBS’s Arthur ran at 128.1 words per minute; an average episode of My Little Pony: Friendship is Magic ran at 122.8 words per minute. These were also above the 116.3 words per minute benchmark that Shroyer and Birch (1980) had previously determined as average 8-11-year-old reading speed. As detailed in Figure 1, the rest of the shows were under 120 words per minute, with 10 of 21 shows having less than 90 words per minute, within the benchmark for pre-lingually deaf children. Problematically three shows that seem to be intended for children that age were all faster than 90 words per minute. Dora the Explorer was found to run at 95.9 wpm, Peppa Pig returned 104.1 wpm, and most problematically Daniel Tiger’s Neighborhood was among the fastest shows at 118.95.

 

Patterns in closed captioning speed
 
Figure 1: Patterns in closed captioning speed.

 

Although previous research demonstrated that educational shows had a potential to be fast, on the surface it seems counterintuitive that the shows in this sample that had the most frenetic content tended to be the slowest in terms of closed captions. The three shows that were found to have the slowest close captioning speeds were the original Transformers and two new versions of violent shows featuring a male protagonist. Masters of the Universe: Revelation and Transformers: War for Cybertron were serious in tone and story and were the second and third slowest shows. Both of these shows had large amounts of contextual and descriptive information in their closed captions.

The other clear pattern was a link between violent content and a lower number of words per minute. In terms of this research a show was considered violent if characters regularly carried signature weapons. Of the 10 fastest shows only two of them, DC’s Super Hero Girls and She Ra Princesses of Power, had characters with signature weapons. One iteration of Transformers was in the top half of faster shows, but that program, Transformers: Rescue Bots, was aimed at young children. That show featured robots mainly turning into rescue vehicles to help humans during natural disasters without weapons, in terms of this research it was classified as non-violent. See Table 1 listing the shows from slowest to fastest.

 

Table 1: Sampled shows with additional context.
TitleYearWPMViolent?Male heroFemale heroMixed team
Transformers198567.4YX  
Masters of the Universe: Revelation202169.7YX  
Transformers: War for Cybertron202071.6YX  
Magic School Bus199472.2  X 
She-Ra: Princess of Power198575.8  X 
Octonauts201280.3   X
Mighty Morphin’ Power Rangers199381.3Y  X
He-Man and the Masters of the Universe202184.3YX  
He-Man and the Masters of the Universe198388.1YX  
Scooby Doo! Mystery Incorporated201088.8   X
Dora The Explorer200096.0  X 
She-Ra Princesses of Power201899.6Y X 
Peppa Pig2004104.2  X 
Transformers Rescue Bots2012104.2 X  
DC’s Super Hero Girls2019106.0Y X 
Barbie Life in the Dreamhouse2012108.5  X 
Magic School Bus Rides Again!2017113.6  X 
Wild Kratts2010115.0 X  
Daniel Tiger’s Neighborhood2012118.9 X  
My Little Pony: Friendship is Magic2012122.8  X 
Arthur1996128.1 X  

 

This sample of 21 shows indicated that shows that require the fastest reading were publicly funded and educational, followed by shows with female protagonists. Those that display words at the slowest pace tended to be older and most were based on violent themes. An amalgamation of international closed captioning guidelines that use words per minute indicates that anything below 70 wpm is probably safe and anything above 120 wpm is potentially problematic. The speeds found in the sample ranged from 67.4 wpm in the 1984 version of Transformers to 128.1 in Arthur. As a whole the shows were found to mostly be in a safe zone, although some shows fringed on problematic.

R3: Are there relationships between measured reading level of closed captions and closed captioning speed?

The closed captioning speed had essentially no correlation with reading level of text as measured by the selected tools of this research. Correlation between speed of closed captions and the Gunning-Fog Reading Level was 0.00, similarly the Coleman Lau Index and had negligible (p=-.11) correlation with speed. Correlation between speed and Automated Readability Index was negligible (p=-.05), similar with the Flesh-Kincaid Index (p=-.09). The SMOG test had a negative relationship with speed, but that is potentially anecdotal to this sample. The SMOG methodology examines complex words, and the Transformers franchise had the most due to complex names, while they also tended to be at the bottom in terms of speed.

R4: Are there patterns of reading level and genre or age?

Although reading level generally did not correlate with captioning speed, the range of scores returned by four of the five assessments seemed appropriate. All five reading level assessments used in this research returned a score that was based on school grade level or could be converted to grade level. The measured reading level scores were appropriate for the sample for all but one of the measurement methods. The Coleman-Liau Index scores were consistently higher than the other three. Coleman-Liau rated nine out of 21 shows as being on the eighth grade level, and measured Peppa Pig as the second most difficult to read. All four other tools returned scores that would be considered age appropriate for the sample, generally between first and fifth grade reading levels. The only shows that exceeded the fifth grade level on any assessment were two shows from the Transformers franchise.

A lack of correlation between reading level and closed captioning speed was found, even though the shows with the slowest closed captioning were consistently ranked as the most difficult to read. This result suggests a negative correlation, but it was found to be anecdotal due to action shows with long character names. On all four reading level measures either the original 1985 Transformers or the 2020 Transformers: War for Cybertron were ranked as the most difficult to read. On all four reading level assessments the three He-Man and the Masters of the Universe shows and the two more violent Transformers shows were commonly around the top. However there was evidence that the form of closed captions affects how these tools assessed reading level.

Peppa Pig was found to be the second most difficult to read by the Coleman-Liau index and tied for third in the Automated Readability Index, which seemed counterintuitive as Peppa Pig was aimed at pre-K children. Peppa Pig provided insight into the intersection of reading level assessment tools and closed captioning. The way that the Automated Readability Index tabulated syllables is based on counting vowels. An initial search determined that Peppa Pig had the highest frequency of character names. The names “Peppa” and “George” appeared 1,242 times in closed captioning for the first 11 episodes, the highest frequency in the sample. In addition the names “Daddy,” “Mummy,” and “narrator” were also multi-vowel words, appearing collectively 1,251 times in the first season. Of all the words in the first season of Peppa Pig, nine percent of all words were multi-vowel names. Another three percent of the words were multi-syllable and represented sound, like “giggling” and “laughter”. The other potential factor could be formatting. Many of the closed captions in the sample presented approximately six words on screen at a time; Peppa Pig displayed full sentences with punctuation. Coleman-Liau measures how many sentences per 100 words as part of its formula. In this situation, closed captions that place full sentences on at once will read as more difficult as closed captioning without punctuation reads each line as a sentence. In Peppa Pig it was not uncommon to see 10 or 11 words on screen at one time, with sentences consistently punctuated. This formatting was seen in a minority of other shows. Many other shows only punctuated in cases of questions or exclamations, with a sentence end determined by the end of the closed captioning line.

R5: Is there a relationship between speed at which words are displayed in closed captions and complexity of text?

This research evaluated the complexity of closed captioning text using two different measures. The first measure of complexity involved tabulating the percentage of complex words in closed captions. The second measure counted the number of different words used in an episode as an assessment of the size of vocabulary.

In most literacy measurements a complex word is defined as a polysyllabic word, meaning three or more syllables. The average percentage of complex words in an episode was found to negatively correlate with speed (-.37). Some of the slowest shows were found to have the highest percentages of complex words and some of the fastest shows were found to have lowest percentages of complex words. The 2020 iteration of the Transformers franchise had the highest percentage of complex words (12.53 percent) and the third slowest words per minute (71.58 wpm). At the other extreme, Daniel Tiger’s Neighborhood had the lowest percentage of complex words in the sample (3.88 percent), but was the third from fastest show in terms of words per minute (118.95 wpm).

The biggest commonality among shows with the highest percentage of complex words was commercial funding with pre-existing character names. The five shows with the highest percentages of complex words were all shows where the characters had names before the shows went into production, either because they are based on toy lines (Transformers, He-Man and The Masters of the Universe, and Barbie) or older intellectual property (DC’s Super Hero Girls).

For example in Transformers: War For Cybertron the villain’s name is “Megatron” and it is the tenth most common word in the closed captions. The name appeared on average 20.6 times in a 24-minute episode. Other frequently appearing characters had complex names such as “Optimus” and “Starscream”. Megatron appeared in the closed captioning more frequently than words like “is” or “this” (fifteenth and sixteenth most common). The word “Megatron” appeared so frequently at least theoretically due to 2014 closed captioning rule changes, requiring more details on sounds and indicators of characters on screen. In some closed captions the name appeared on screen in brackets as part of non-dialogue, such as “Megatron grunts”, on average 6.5 times per episode, and occurred on average as part of dialogue an additional 14.1 times per episode. DC’s Super Hero Girls was found to have complex language due to characters such as “Supergirl” and “Bumblebee” and locations like “Metropolis”. These longer names and the rules on the functionality of closed captions increased complex word count.

The other measure of complexity counted the number of different words used in an average episode, assessing the total size of vocabulary that made up captions. There were significant differences between shows. An average episode of Dora the Explorer was made up of less than 400 unique words (363.98) while an average episode of Arthu> was made up of over 850 different words (862.9). The total number of unique words did not statistically correlate with the percentage of complex words; the correlation was virtually zero (-.0032). The use of a large vocabulary did not necessarily translate to a higher frequency of complex words. There was a positive (.38) correlation between speed of closed captions and the number of unique words; inversely the relation between speed and complex words was negative (-.37). Thus the number of unique words generally increased as speed increased, but the number of complex words generally decreased as words per minute increased.

R6: Is there a relationship between production year and speed of closed captions?

2014 rule changes affected the speed and complexity of closed captions by intent, as they increased the amount of contextual material required. No part of the regulation was about removing content from closed captions. However in terms of this sample this increase did not seem clear at exactly 2014. For example, five of the top six fastest shows were all made before the 2014 changes. As mentioned in the results of R1, two of the three slowest shows were a 2021 version of Masters of the Universe and a 2020 version of Transformers. However the changes seem to have a relationship with an increase in closed captioning speed. Every single show in the sample made before the year 2000 was in the bottom half of closed captioning speed. Newer shows in the sample were generally faster than older shows, with a +.31 correlation between the age of a show in years and the words per minute rate of closed captions.

The previous question determined that the high frequency of character names combined with the complexity of some of the names increased reading level. The 2014 changes increase the number of times character names should appear because names were used repeatedly next to gerunds, a word that stems from a verb but functions as a noun. Gerunds were used in the closed captioning text to represent the cause of sounds, location of sounds, or tone of dialogue. For example, phrases like “Zatanna screaming” and “Megatron panting” appeared numerous times in a given episode.

One of the most notable language differences after the 2014 rule changes was how commonly the word “grunts” appears in violent shows captioned under new guidelines. Grunts is specifically mentioned here because no other similar word was found to be near its frequency, although the word “gasps” was less common but statistically similar in shows with female protagonists. The words “laughing” and “screaming” appeared somewhat frequently but were usually around the hundredth most common word in the captioning. In the more serious Masters of the Universe Revelation (2021) “grunts” was the eighth most common word appearing 19.4 times per episode, about every minute and 16 seconds. In order, the eight most common words in this series’ closed captioning are “the,” “you,” “I,” “to,” “of,” “and,” “a,” and “grunts.” Grunts was the fourteenth most common word in Transformers: War for Cybertron; the word did not appear in the season-long samples of the original 1980s versions of either show. Grunts was the twenty-second most common word in the 2021, youth-oriented He-Man cartoon and the thirty-second most common word in DC’s Superhero Girls. “Gasps” was somewhat similar in that it did not appear in captions for older shows to being the twenty-fourth most common word in DC’s Superhero Girls, appearing about every one minute and forty seconds. Gasps also appears as the sixty-eighth most common word in Barbie: Life in the Dreamhouse and in the newer version of Magic School Bus “gasps” is in the top 100. While “gasps” did not reach the frequency of “grunts”, it was statistically similar in that the word did not appear in older shows and was frequently used in newer shows.

Did the 2014 rule changes specifically cause an increase in speed? That question cannot be answered by this research for two reasons. First, many shows created before 2014 were among the fastest shows in the sample. Second, legal interactions between deaf organizations and streaming services have resulted in reports that closed captions in some shows were revised for accuracy. It is beyond the scope of this research to know when closed captions for certain shows in the sample were generated or last edited. However the concept that newer shows in the sample were generally faster than older ones was supported as noted in the mild positive correlation between age and speed. The original captioning of 1980s and 1990s included formatting in capital letters with little contextual information. This was done in part to balance legibility and overcrowding on low-resolution television sets of the era. All older shows were in the slower half of the sample, but not necessarily slower than new shows.

Does the frequency of character names in captioning correspond with intended age level?

Three factors contributed to the analysis of character name frequency in closed captioning. The length and frequency of character names was found to affect reading level of closed captions in some shows. Secondly in the analysis of the impact of the 2014 rule changes, it was noted that names appeared more frequently as part of contextual information. The question of commercial influence on name frequency also contributed to this analysis, as market forces theoretically would want character names to appear more commonly to build product recognition in children.

The clearest pattern was the high frequency of character names in shows in the preschool demographic. Peppa Pig, Daniel Tiger’s Neighborhood, and Dora the Explorer were the only shows in the sample explicitly aimed at children two to four years of age. In order, the four shows with the highest frequency of name repetition were Peppa Pig, Barbie: Life in the Dreamhouse, Dora the Explorer, and Daniel Tiger’s Neighborhood.

Character names in preschool shows were so frequently displayed that in order of frequency the most common names in those shows were present more than all but a handful of words. In Dora the Explorer the name “Dora” was the eighth most common word, in Peppa Pig “George” was sixth and “Peppa” ninth, and in Daniel Tiger’s Neighborhood “Daniel” was the tenth most common word.

In closed captioning of preschool shows the most frequent names appeared more than 30 times an episode, and appeared on screen more than once per minute. In the case of Peppa Pig “George” appeared 641 times in closed captions from 11 episodes of the first season, averaging 58.1 appearances in a 25-minute episode. “Peppa” averaged 54.0 appearances per episode. One of the five most common names in Peppa Pig appeared on average every 6.7 seconds. Closed captioning of Daniel Tiger’Peppa”s Neighborhood displayed “Daniel” the least frequently in this top group of four, on average every 48.2 seconds. None of the other shows in the sample came close to the character name frequency of these four shows.

As mentioned, Barbie: Life in the Dreamhouse was the only show in the group of four shows with the highest character name frequency that was not aimed at children ages 2–4. Format and setting were the apparent root causes, given the fast pace of the the show with a reality show-like format, asking characters to discuss the main character in a self-parody of the brand. “Barbie” was the eighth most common word, averaging 39.5 appearances in closed captions per episode. The name appeared every 36.4 seconds, on average.

Thus, preschool oriented shows in the sample included names far more than other shows, with Barbie being the only non-preschool show within this range. No other patterns or anomalies in how frequently names appeared were detected. While the length of names in some shows affected overall reading level, those same names appeared more often than average across the sample.

Are different television shows that stem from the same intellectual property similar to each other in terms of closed captioning content and complexity?

In the original sample of shows there were four different examples of long running, children’s franchises being used in different series. Three versions of the Transformers were in the sample. They represented the latest adult/teen oriented version entitled Transformers: War for Cybertron, the eponymous 1985 animated series, and Transformers: Rescue Bots, aimed at younger children. There were also five shows in the sample from the He-man franchise. Two were titled He-Man and the Masters of the Universe, the original 1983 series, and the very latest version released on Netflix in November of 2021. Masters of the Universe: Revelation was aimed at older children and adult fans, released in July 2021 on Netflix. In 1985 Mattel debuted a girl-oriented version of He-Man named She-Ra, the original 1986 series, and the 2018 Netflix revival were included. Two versions of Magic School Bus were included.

Magic School Bus appeared in 1994 and was revived by Netflix in 2018. The newer version of the show displayed words in closed captions 64 percent faster than the original series. The original show averaged 72.2 wpm and the newer version averaged 113.7 wpm. Among shows in the sample, this was the largest gap between versions of shows based on the same intellectual property. The original series was rated as much easier to read across reading level assessments. The cause for the higher reading level was the contextual information from new regulations. The words “laughing,” “whirring,” “cheering,” “screaming,” “speaking,” and “grunting” made a combined 198 appearances in the new version of Magic School Bus, averaging over 15 times per episode. These words were all eight or nine letters long and all were added based on the 2014 regulations. These words were only rarely found in the original series and not with the same conjugation. In the original series the word “laugh” appeared twice, “cheer” four times, and the word “scream” five times.

The two versions of Magic School Bus were statistically very different and provided evidence that a parent or practitioner can’t count on name alone to predict closed captioning speed or reading level. The two versions were both 25 minutes long; on average there were more than 1,000 additional words in the closed captioning of the 2017 version (2,841.46 words per episode) when compared to the 1994 version (1,804.92 words per episode). The new version required children to read significantly faster. The original 1994 version was also rated the easiest to read in the entire sample by three out of the four reading level assessment tools. The 2017 version was usually in the shows with higher reading levels, averaging eighth from the most difficult across measures. However only two versions of Magic School Bus were included. In this case the newer version had more words and was more difficult to read. This was not always the case; in some cases old and new versions of the same intellectual property were found to be very statistically similar.

Three shows in the franchise subgroup were at least somewhat intended for older children and adult fans. Masters of the Universe: Revelation was advertised as written by filmmaker Kevin Smith. Virtually every feature film that Kevin Smith directed was rated R and intended for adults. The 2018 She Ra: Princesses of Power won an Emmy Award and featured acclaimed independent comic book writer and artist Noel Stevenson as showrunner. Transformers: War for Cybertron (2020) was the third example of franchise shows aimed at older audiences with its bleak tone and complex storyline. In order from the slowest first, shows with the fewest total words and the slowest pace overall were Transformers (1985), Masters of the Universe: Revelation (2021), Transformers: War for Cybertron (2020), Magic School Bus (1994), and the original 1985 She Ra: Princess of Power. Among these slowest shows were a cluster of the oldest shows in the sample and different versions of toy-based intellectual property.

There was evidence that the 2014 rule changes did not always cause speed to increase significantly. Of the three He-Man shows the original 1983 show had the fastest closed captions (still only about average speed across the entire sample), while the more complex 2021 version of He-Man was the slowest (second slowest in the sample). The 2021 He-Man and the Masters of the Universe was a different entry in the franchise with characters and story modified to appeal to younger audiences. Statistically it was very similar to the original 1983 He-Man and the Masters of the Universe. The original averaged 88.07 words per minute in the sample, the newer version averaged 84.37 words per minute. Although slower than the average show, both were just slightly less than the mean words per minute in the wider sample of 95.7 wpm. This was unexpected as the newer closed captioning regulation would cause an expectation for more words, and the Kevin Smith version of the franchise had the slowest words per minute speed in the franchise at 69.77 wpm.

In terms of speed the She-Ra franchise was largely opposite from the related He-Man franchise. In She-Ra the newer 2018 show had a faster pace (99.63 wpm) over the original 1985 version (75.8 wpm). The shows were very similar in how many times the main character’s name appeared in closed captions, but a notable spelling error was discovered. In the 2018 version the misspelled name “Shira” appears 93 times in the sample, 7.15 times per episode, similar to the 7.0 average appearance of “She-Ra” in the original series. This was the most common spelling error found in the sample.

Transformers was intended for the youngest audience, Transformers: Rescue Bots, had both the fastest pace (104.2 wpm) and the largest vocabulary of the three Transformers shows, but the lowest measured overall reading level of the three. The closed captions in Transformers: Rescue Bots appeared 64 percent faster than the original series (67.44 wpm). The significant difference for reading level was that the most common name in Transformers: Rescue Bots was a human boy named “Cody” and the featured robot characters were more likely to have simpler names like “Blades” and “Chase.” In other Transformers shows “Megatron” was the most common name, where robots named “Starscream” and “Bumblebee” were featured.

 

++++++++++

Discussion

Considering the diversity of measurements found in the literature review and the unpredictable nature of speeds in this research’s sample, this clearly supports the notion of specialized and standardized reading measures for closed captioning. A standard measure that synthesizes both speed and reading level of captions would be valuable to many. The grade level scores having zero correlation with closed captioning speed could be interpreted as a potentially troubling indicator that the speed of closed captions had nothing to do with the intended age level of the audience, especially combined with the knowledge that none of the shows targeted towards pre-kindergarten were among the slowest shows. It also indicates there is more than one way to consider general pacing of a television show. Conventional wisdom would lean towards the idea that a violent cartoon would be more fast paced than an educational kids show on PBS. However in terms of closed captioning this is now the third research study that indicates the opposite is true. The complexity of the text and the speed of the text should be treated as independent factors.

Choosing to watch these videos on streaming services affected results by excluding commercial advertisements. The fastest of the sampled shows was just above what more conservative guidelines note could be potentially too fast in terms of words per minute, and excessively over most guidelines if measured in characters per second with Black Box. Part of the problem is that the guidelines themselves generally do not detail measurement methods which would be important to know in order to adhere to guidelines. As seen in the case with PBS’s Arthur, counting spaces makes a significant difference in character per second count. Researchers commonly give overviews of captioning measurement systems as part of literature reviews. A separate study just about detailing the different guidelines and how they were assessed would be highly helpful for other in measuring according to those assessments.

The other results will hopefully provide clarity to interested parties and is for relevant parties to interpret. This study is not intended to make value judgements. For example the incredible frequency of names in some of the preschool shows might bolster learning through repetition. Teachers of second languages might feel more comfortable showing action shows to students knowing the words will appear at a slower pace. The clearest casual takeaway from this research is that violent shows are likely to have fewer words than educational shows. This could help children struggling to read or it could be interpreted as harmful since children should be exposed to more words to promote language development. Primarily this research is intended as a reference to relevant professionals and first step towards broader public analysis of closed captioning content.

 

++++++++++

Conclusion

Certain results of this research were revealing. It was determined that among shows in the sample, educational shows had far more words than the violent shows and ran at a faster pace. Action shows with female protagonists generally had faster closed captioning than action shows with male protagonists. Shows made for Pre-K aged children repeated character names in closed captioning with much higher frequency than other shows. Shows based on the same intellectual property could have very statistically different words in their captioning. Shows aimed at the youngest audiences did not turn out to be especially slow. The version of the Transformers aimed at the youngest audiences had the fastest closed captions of the three measured and three other preschool shows were in the top half in terms of speed.

The closed captioning trend towards better contextual information has resulted in complex words being inserted into closed captioning of children’s shows, therefore increasing speed and reading level. Some of the contextual material was repeated with high frequency, especially the word “grunts” in action shows. While this may seem comical, it encourages the idea that the creative writers behind these shows consider closed captioning in their writing. It would potentially make the material more appropriate in terms of reading level and less repetitive.

Reading level assessment results were generally inconclusive. In some contexts the scores returned by four of the five tools seemed reliable, most grade level scores returned were at least plausible, with the exception of the Flesh-Kincade scores. However there was absolutely no conclusive relationship between reading level and speed of closed captions. Complex names from commercial shows were clearly an influence in reading level assessment numbers. Commercial or public funding seemed to be a factor, but in an unpredictable way. PBS shows were found that had twice as many words per episode as commercial shows designed for the same time slot. The general unpredictability in speed and the lack of correlation between speed and reading level was matched by a range of differing guidelines about what is too fast provided by a range of governmental and non-governmental sources. Closed captioning would clearly benefit from large scale work towards standardization of measurements, standardization of measurement methods, and reporting of those measurements. Any standardized measurement should consider speed of closed captions and reading level of closed captions as discreet variables. End of article

 

About the author

Edward Schneider is an associate professor in the Department of Communication Media & Design in the Park School of Communications at Ithaca College in Ithaca, N.Y.
E-mail: eschneider [at] ithaca [dot] edu

 

Notes

1. Jensema, et al., 1996, p. 285.

2. Broadcasting Authority of Ireland, 2012, p. [2].

3. Shroyer and Birch, 1980, p. 921.

4. Fresno, 2018, p. 414.

5. Broadcasting Authority of Ireland, 2012, p. 10.

 

References

L. Anthony, 2010. “AntConc,” version 3.2.1, at http://www.antlab.sci.waseda.ac.jp/, accessed 30 April 2024.

B.A. Austin, 1980. “The deaf audience for television,” Journal of Communication, volume 30, number 2, pp. 25–30.
doi: https://doi.org/10.1111/j.1460-2466.1980.tb01962.x, accessed 30 April 2024.

B.B. Braverman and B.J. Cronin, 1978. “Television and the deaf,” Journal of Educational Technology Systems, volume 7, number 1, pp. 7–17.
doi: https://doi.org/10.2190/GFDF-4VG2-XLRJ-QJK5, accessed 30 April 2024.

Broadcasting Authority of Ireland, 2012. “Subtitling guide,” at https://www.bai.ie/en/download/128530/, accessed 30 April 2024.

Canadian Radio-television and Communications Commission, 2016. “Broadcasting regulatory policy CRTC 2016-435” (2 November), at https://crtc.gc.ca/eng/archive/2016/2016-435.htm, accessed 30 April 2024.

M. Coleman and T. Liau, 1975. “A computer readability formula designed for machine scoring,” Journal of Applied Psychology, volume 60, number 2, pp. 283–284.
doi: https://doi.org/10.1037/h0076540, accessed 30 April 2024.

T.C. Davis, E.J. Mayeaux, D. Fredrickson, J.A> Bocchini, Jr., R.H. Jackson, and P.W. Murphy, 1994. “Reading ability of parents compared with reading level of pediatric patient education materials,” Pediatrics, volume 93, number 3, pp. 460–468.

Z. De Linde and N. Kay, 2016. The semiotics of subtitling. London: Routledge.
doi: https://doi.org/10.4324/9781315538686, accessed 30 April 2024.

N. Fresno, 2021. “Closed captioning quality in the information society: The case of the American newscasts reshown online,” Universal Access in the Information Society, volume 20, number 4, pp. 647–660.
doi: https://doi.org/10.1007/s10209-020-00738-3, accessed 30 April 2024.

N. Fresno, 2018. “Watching accessible cartoons: The speed of closed captions for young audiences in the United States,” Perspectives, volume 26, number 3, pp. 405–421.
doi: https://doi.org/10.1080/0907676X.2017.1377264, accessed 30 April 2024.

R. Gunning, 1952. The technique of clear writing. New York: McGraw-Hill.

D.T. Huffman, 1986. “Soap operas and captioning in the ESL class,” paper presented at the International Conference on Language Teaching and Learning of the Japanese Association of Language Teachers (Hamamatsu, Japan, 22–24 November), at https://eric.ed.gov/?id=ED281392, accessed 30 April 2024.

C. Jensema, R. McCann, and S. Ramsey, 1996. “Closed-captioned television presentation speed and vocabulary,” American Annals of the Deaf, volume 141, number 4, pp. 284–292.
doi: https://doi.org/10.1353/aad.2012.0377, accessed 30 April 2024.

C.J. Jensema, A.N. Schildroth, and S.W. O’Rourke, 1975. Score conversion tables and age-based percentile norms for standard achievement test, special edition for hearing impaired students. Washington, D.C.: Gallaudet College, Office of Demographic Studies.

J.P. Kincaid, R.P. Fishburne, Jr., R.L. Rogers, and B.S. Chissom, 1975. “Derivation of new readability formulas (automated readability index, fog count and Flesch reading ease formula) for Navy enlisted personnel,” at http://stars.library.ucf.edu/istlibrary/56, accessed 30 April 2024.

P.S. Koskinen, R.M. Wilson, L.B. Gambrell, and C.J. Jensema, 1986. “Using closed captioning to enhance reading skills of learning disabled students,” National Reading Conference Yearbook, volume 35, pp. 61–65.

G.H. McLaughlin, 1969. “SMOG grading — A new readability formula,” Journal of Reading, volume 12, number 8, pp. 639–646.

C. Mikul, 2014. “Subtitle quality: International approaches to standards and measurement,” Media Access Australia, at https://mediaaccess.org.au/sites/default/files/files/MAA_CaptionQuality-Whitepaper.pdf, accessed 30 April 2024.

M. Owens, 1990. “H.R.4267 — Television Decoder Circuitry Act of 1990,” at https://www.congress.gov/bill/101st-congress/house-bill/4267?s=1&r=29, accessed 30 April 2024.

M. Pantula and K.S. Kuppusamy, 2020. “A metric to assess the readability of video closed captions for the persons with low literacy skills,” Computer Journal, volume 63, number 7, pp. 1,063–1,075.
doi: https://doi.org/10.1093/comjnl/bxz074, accessed 30 April 2024.

S.E. Phillips, 1994. “High-stakes testing accommodations: Validity versus disabled rights,” Applied Measurement in Education, volume 7, number 2, pp. 93–120.
doi: https://doi.org/10.1207/s15324818ame0702_1, accessed 30 April 2024.

G. Propp, 1978. “An overview of progress in utilization of educational technology for educating the hearing impaired,” American Annals of the Deaf, volume 123, number 6, pp. 646–652.

D.J. Randev, 2014. “Reading all the way to functional literacy: Using same language subtitling in television programmes in India,” IOSR Journal Of Humanities And Social Science (IOSR-JHSS), volume 19, number 7, pp. 24–29, and at https://www.iosrjournals.org/iosr-jhss/papers/Vol19-issue7/Version-7/E019772429.pdf, accessed 30 April 2024.

R.J. Senter and E.A. Smith, 1967. “Automated readability index,” Aerospace Medical Research Laboratories, AMRL-TR-66-220, at https://apps.dtic.mil/sti/tr/pdf/AD0667273.pdf, accessed 30 April 2024.

E.H. Shroyer and J. Birch, 1980. “Captions and reading rates of hearing-impaired students,” American Annals of the Deaf, volume 125, number 7, pp. 916–922.
doi: https://doi.org/10.1353/aad.2012.1240, accessed 30 April 2024.

D. Sillman, 1978. “Line 21, closed captioning of television programs — A progress report,” American Annals of the Deaf, volume 123, number 6, pp. 726–729.

U.S. Federal Communications Commission, 2014. “FCC adopts closed captioning quality standards for TV programs,” at https://www.fcc.gov/fcc-adopts-closed-captioning-quality-standards-tv-programs, accessed 30 April 2024.

R. Vanderplank, 2016. “The value of closed captions and teletext subtitles for language learning,” In: R. Vanderplank. Captioned media in foreign language learning and teaching: Subtitles for the deaf and hard-of-hearing as tools for language learning. London: Palgrave Macmillan, pp. 43–73.
doi: https://doi.org/10.1057/978-1-137-50045-8_3, accessed 30 April 2024.

 


Editorial history

Received 230 November 2023; revised 19 February 2024; accepted 30 April 2024.


Creative Commons License
This paper is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Education runs quickly violence runs slowly: An analysis of closed captioning speed and reading level in children’s television franchises
by Edward Schneider.
First Monday, Volume 29, Number 5 - 6 May 2024
https://firstmonday.org/ojs/index.php/fm/article/download/13301/11614
doi: https://dx.doi.org/10.5210/fm.v29i5.13301