First Monday

Backgrounds and behaviors: Which students successfully identify online resources in the face of container collapse by Christopher Cyr, Tara Tobin Cataldo, Brittany Brannon, Amy G. Buhler, Ixchel M. Faniel, Lynn Silipigni Connaway, Joyce Kasman Valenza, Rachael Elrod, and Samuel R. Putnam

In a digital environment, students have difficulty determining whether an information resource comes from a book, magazine, journal, blog, or other container, and lose the contextual information that these containers provide. This study of students from primary through graduate school looks at their ability to identify the containers of information resources, and how this ability is affected by their demographic traits, the resource features they attended to, and their behaviors during a task-based simulation. The results indicate that correct container identification requires deep engagement with a resource. Those who attended to cues such as genre and source were better able to identify container, while those who paid attention to heuristics such as its visual appearance and URL were not. Demographic characteristics, including educational cohort and first-generation student status, also had an effect.


Literature review




Students have long used visual context and cues from print resources to help identify their origins and value. A physical newspaper, for example, looks and feels different from a physical book in both its style and format. In a Web browser, documents from a blog, book, journal, magazine, etc. appear similar. This phenomenon is known as container collapse (Valenza, 2016; Connaway, 2018).

The ability to determine a resource’s container plays a role in students’ ability to judge its quality. Container collapse makes it more difficult for today’s students to identify the resources they retrieve in online search engine result pages, and determine if they are credible and acceptable as citations. What differentiates students who are better at identifying containers from those who struggle? We examine this as part of the Researching Students’ Information Choices: Determining Identity and Judging Credibility in Digital Spaces (RSIC) study, a four-year U.S. Institute of Museum and Library Services (IMLS)-funded research project.

This project looked at 175 students’ point-of-selection behaviors (the moment that they determine a piece of information retrieved from a search meets their research need) as they were beginning work on a science research assignment (Buhler, et al., 2019). We look at the factors that impact each student’s ability to correctly identify the containers of digital resources, focusing on the impact of their educational background, the cues that they attended to when identifying containers, and their behavior during the research session tasks.

We find that correctly identifying resource containers requires deep engagement with the resource and cannot be done as effectively with heuristics. Educational background, both for students and their parents, also makes a difference. Based on our findings, we suggest approaches to helping our students develop their skills in navigating the online information world.



Literature review

In 2015 the Association of College & Research Libraries (ACRL) created the Framework for Information Literacy for Higher Education. This offers a rich, complex set of ideas that could guide educational reform within libraries. The frame Authority Is Constructed and Contextual states that learners should “recognize that authoritative content may be packaged formally or informally and may include sources of all media types” (Association of College & Research Libraries, 2016).

Formats, including containers, result from the work that was done to create information and transmit it to others. This includes how immediately (or not) the information was created, who created it and why, whether or not the information was vetted in any way, and what type of information to expect. Understanding this publication process gives clues about the reliability and credibility of the information, as well as where in the information lifecycle an information resource falls. All of these elements are important when determining whether and how to use the information.

Identifying containers when doing research online, however, is not an easy task. An e-book chapter and an e-journal article on the same publisher platform have few visual and textual cues to distinguish them from one another (Figures 1 and 2). While the different container types may look similar, they convey important context for the information in a resource. Information in a journal article, for example, has generally been peer reviewed, while information in a book has not.


A journal article on the SpringerLink platform
Figure 1: A journal article on the SpringerLink platform.



A book chapter on the SpringerLink platform
Figure 2: A book chapter on the SpringerLink platform.


Some assume that today’s students don’t care how their online information is packaged. A few researchers have declared that these younger generations are “format agnostic” (OCLC, 2004; Williams and Rowlands, 2007). Francke and Sundin (2009) argued against this after studying high school students, finding that students use genre to assess the credibility of resources, though as a “rather blunt tool”.

If students are not indifferent to the containers of digital information, they struggle to identify those containers on a computer screen. Buhler and Cataldo’s study (2016) found that students struggled to identify e-books and e-journals when shown screenshots of online resources. Leeder (2016) found similar results, where participants misidentified 60 percent of the resources.

Container misidentification has implications for students’ understanding of the publishing history of the information they are using, their evaluation of its credibility, and their ability to correctly reference resources in their schoolwork.

For example, when researching a social issue or event, news articles and social media posts may be useful to gather immediate and first-hand accounts of what happened, but may not provide a comprehensive view. Journal articles are published later, but often contain more context, analysis, and research about the issue. To see this in action, see Lisa Campbell’s information timeline of the Women’s March (

These different types of resources also reflect very different knowledge communities, which have varying standards for evidence and argumentation and varying processes for constructing and vetting knowledge. Depending on the task at hand, students will need to understand which resource and container types are appropriate and how they can be used as evidence in a school setting. Students’ ability to identify containers is an important component of their ability to be critical users of information.

The current literature does not directly address what factors impact students’ success at identifying containers. However, some insights can be found by looking at factors that impact students’ overall information literacy (IL) skills. Previous research indicates that the development of students’ information literacy skills is impacted by first-generation status, confidence in their own skills, and their use of heuristics when evaluating information.

First-generation college students

There is new literature on first-generation college students (or students with no parent/guardian with a bachelor’s degree) and how their introduction and success in higher education differs from continuing generation college students. The current literature takes a more positive tone about their success in higher education than past library and information science (LIS) studies (Arch and Gilman, 2019; Folk, 2018; Ilett, 2019).

In the past, first-generation students were often viewed as a problem to be fixed because educators felt they were coming in with a knowledge deficit. The focus is now on the students’ perceptions of libraries and library services, and the need for targeted information literacy training that builds on their strengths (Pickard and Logan, 2013; Arch and Gilman, 2019). But there is little on how they differ in their evaluation and use of online information from the continuing generation students.

First-generation students describe their search behavior at a general level, for example saying they searched “online” rather than specifying sources (Logan and Pickard, 2012). If they struggle to describe their research process, the nuances behind it, like containers and genres, may also elude them. There are pieces of the evaluation process that they may not have been able to pick up through their parents or in their K-12 education. Most of the LIS literature on first-generation students focuses on library services and spaces geared towards them, but little is known about their behavior when using information resources.

Student confidence in their research skills

One factor that may influence students’ success in finding and evaluating online information is confidence in their ability to do so. Research into this area has found that there is, at best, no relationship between students’ reported confidence and skill Jackson (2013) found no difference in self-rated confidence among doctoral students who scored better or worse on a test of research skills. Molteni and Chan (2015) similarly found no correlation between confidence and competence among undergraduate health sciences students.

In contrast, Gross and Latham found that community college students who score below-proficient on tests of information literacy skills tend to greatly overrate their own skills (Gross, 2005; Gross and Latham, 2012, 2009, 2007). They link this phenomenon to the Dunning-Kruger effect (Kruger and Dunning, 1999), which suggests that people with low skills in a certain domain are unable to recognize their own lack of skill or the expertise of others (Gross and Latham, 2012).

Familiarity with the Internet and a lack of IL training might complicate the relationship between confidence and skill. Watson (2001) found that high schoolers were confident using the Web for personal research because they had long-term experience with it, but were less confident when using it for school because they had less experience searching for academic content Similarly, Gross and Latham (2009) found that the students believed members of their generation were better at finding information than those of older generations due to greater familiarity with computers. Given the ubiquity of digital search, this poses a serious problem for students’ ability to accurately assess their own IL skills.

Paterson and Gamtso (2017) found that all but one student in a pre-course survey indicated confidence in their research abilities. However, in their post-course survey, students were able to articulate more specific skills as a basis for their confidence (Paterson and Gamsto, 2017). This suggests that students’ over-confidence can be addressed by IL instruction that helps them better understand the full range of research skills, enabling the students to better identify their own strengths and weaknesses.

Students use of heuristics in online searching

Heuristics (aka rules of thumb) may influence students’ success in finding and evaluating online information. These heuristics can differ at different points in the discovery and evaluation process. The literature suggests that students use superficial rules of thumb more frequently when selecting resources from result pages.

Two heuristics that previous research shows are triggered when navigating results pages are the prominence heuristic (exploring only the first few items of a results page) and similarity heuristic (the degree to which a search results snippet matches the seeker’s objective) (Sundar, 2008).

Spievak and Hayes-Bohanan (2016) found that students tend to select results towards the top of the page, suggesting the use of what the authors call the availability heuristic (results at the top of the page are easily available) and the consensus heuristic (“everyone” uses Wikipedia, therefore it is a good choice).

Haas and Unkel (2017) conducted a similar study, and noted the use of result rank as the dominant heuristic associated with a student’s selection of search results. They also note the influence of the reputation of the mediating source (i.e., the Web site that hosts the information) and the neutrality of the primary source (i.e., the originator of that information). However, those heuristics were secondary to the top-rank heuristic, especially when students were making their first selection of what to click on from the result page.

Students appear to use different heuristics when judging the credibility of a specific resource. Metzger and Flanigan (2013) used prior studies to develop a framework of six heuristics used when making credibility judgements about resources:

  1. Reputation – a person’s familiarity with or recognition of source or author;
  2. Endorsement – using recommendations from others (either known or unknown);
  3. Consistency – cross-checking information across resources;
  4. Self-confirmation – trusting information that confirms their beliefs;
  5. Expectancy violation – deeming an online resource not credible if it fails to meet expectations (e.g., poor visual appearance, interactivity, navigation, etc.); and,
  6. Persuasive intent – distrusting resources perceived as biased (e.g., commercial Web sites, information that includes advertising, etc.).

Flanagin and Metzger (2008) also describe evidence of information seekers using “credibility transfer”, or what Sundar (2008) calls the old-media heuristic, which can occur both between different media (e.g., print to online) and within a medium (e.g., a credible Web site can extend that credibility to a story hosted on the site).

These studies are a small part of the literature which identifies and defines a wide range of heuristics used by information seekers. Despite this interest in heuristics, there is relatively little research on how well these heuristics serve those who use them, and we couldn’t find any research describing how students use heuristics when determining information containers.




Data collection

The data for this paper were collected from 175 students in six educational cohorts: elementary school, middle school, high school, community college, undergraduate, and graduate. Each student completed an initial demographic survey, a facilitated task-based session using a simulated search engine environment, and interview questions.

Students conducted a search for information in a simulated Google search environment. This environment was designed to mimic the look and feel of an actual Google search, and students were informed that it was a simulation upon completion.

Students were presented with a research query prompt appropriate to their educational cohort and instructed to perform a Google search for the topic. In addition to seeing the Google result pages, students could click into each online resource and explore it. Students then completed five tasks within the simulated result pages that their search returned, which asked them to determine the resources’ helpfulness, citability, credibility, and container identity. A short video demonstration of the simulation session featuring all the tasks can be viewed at:

The simulation had a few key differences from a real-life search environment. First, students received a controlled set of results that was the same for each cohort, regardless of the search term that they used. Second, the resource links were not live, but instead linked to a saved version of the resource from when the simulation was put together. Third, the simulation result pages contained buttons for students to select their choice in each task that made up part of the simulation (rating the resource’s helpfulness, citability, credibility, and selecting its container identity).

A think-aloud interview protocol was used to record the way that students described their search behavior and decision-making. A facilitator asked students to describe their inner thoughts as they explored the simulation and completed each task.

This study focuses on the container identity task (Figure 3) which displayed a controlled subset of resources from the result pages and asked students to identify which of eight possible containers best described each resource. For this task, students within each cohort evaluated the same set of resources.


The container task with drag-and-drop container identification
Figure 3: The container task with drag-and-drop container identification.


Correct container identification

We were interested in the predictors of the percentage of containers correctly identified, with a focus on aspects of the student’s background, common cues that they mentioned during their think-aloud protocol, and their behaviors that were captured by the simulation software. Each studentcs choices were compared with the container labels that an advisory panel of experts gave each resource to determine if the student “correctly” labeled the resource. This panel included librarians and instructors who were STEM subject matter experts. We then added these together to measure the total number of resources that each student labeled correctly.

The number of resources that students had to label varied by cohort, with elementary schoolers labeling eight resources, middle schoolers labeling 15 resources, high schoolers labeling 20 resources, and higher education students labeling 21 resources. We did this because students in younger cohorts generally work with fewer resources when conducting research and we needed to ensure the cognitive load was appropriate for each cohort. To account for this difference in the number of resources, we calculated correct container labeling as a percentage of the total container labels attempted. For example, an elementary schooler who labeled six out of eight resources correctly would receive a score of 75 percent, while an undergraduate who correctly labeled 17 out of 21 resources would receive a score of 81 percent.

The percentage of containers that students correctly identified is summarized in Figure 4, which shows the number of students that fell into each 10-percentage-point range. On average, students identified 51 percent of containers correctly. Nobody correctly identified every container, with a middle schooler who identified 87 percent of containers correctly being the highest percentage. One elementary school student did not identify any containers correctly. The majority identified between 40 percent and 70 percent correctly.


Number of students within each range of percentage correct
Figure 4: Number of students within each range of percentage correct.


The average percentage correct, broken down by cohort, is shown in Figure 5. The results show an upward trend, with students in higher cohorts, on average, correctly identifying a higher percentage of containers. The only exception was community college. Community college students, on average, correctly identified around three percent fewer containers than high school students did.


Average percentage correct by cohort
Figure 5: Average percentage correct by cohort.



The demographics of interest in this study were educational cohort, whether the student had a parent with a bachelor’s degree, and level of confidence searching for information online. These are important indicators of students’ educational backgrounds and attitudes towards searching. We also had strong reason to believe they would affect container labeling ability.

We measured educational cohorts based on the student’s year of schooling, with students in fourth and fifth grade considered elementary school, sixth–eighth grade considered middle school, students in ninth–twelfth grade considered high school. The other cohorts followed their label, i.e., graduate students constituted the graduate cohort. Each cohort ranged from 26 to 30 students.

Students were asked “How confident do you feel in selecting online information for research projects?”, and responses were categorized onto a five-point scale. For example, those who said that they were “extremely” confident were given a score of five, while those who said they were “reasonably” confident were given a score of three.

On average, students rated 3.9 out of five. The skew of the distribution is telling. Only five of the 175 students were rated below three, and almost half were rated a four. This suggests that, overall, students were relatively confident in their ability to find information online.


Number of students at each confidence level
Figure 6: Number of students at each confidence level.


Finally, we were interested in the impact that being a first-generation college student, or potential first-generation student (for K-12 students), had on container identification. We measured this by asking each student if they have a parent or guardian with a bachelor’s degree. Fifty-eight of the 175 students (33 percent) were either first-generation college students, or potential first-generation college students (for K-12 students). The other 117 students (67 percent) had a parent with a bachelor’s degree.


During the simulation, facilitators used a think-aloud protocol by asking students to say out loud what they were thinking as they completed the tasks. We then coded transcripts from these sessions, using a codebook designed to capture common cues that students mentioned, judgements that they made, and descriptions of their normal search behavior. We used NVivo software to analyze the data alongside the quantitative data captured by the screening survey and simulation software. The complete codebook can be found at:

We looked at the cues — the elements or features of each resource that participants mentioned when evaluating it — that students attended to during the container task to see how they impacted their ability to identify containers. We were particularly interested in whether students mentioned the genre, source, aboutness, URL, visual appearance, and Google snippet of each resource.

Genre refers to one of several classes of similar documents, for example calling something an article, book chapter, or fact sheet. Source refers to the host or publisher of the information resource. Aboutness refers to the conceptual focus or meaning of the information contained within the resource. The URL refers to any mention of all or part of the URL. The visual appearance refers to any mention of how the resource looks or is formatted visually. The Google snippet refers to the overview on the Google results page.

For this study, we measured the cues that the student attended to during the container task dichotomously, meaning we looked at whether they did or did not mention the cue. For each cue, any student who did mention it received a score of one, while those who did not mention it received a score of zero.

The frequencies for each cue are summarized in Figure 7. The genre cue was most common, with 84 percent of students mentioning it. The source cue was a close second, with 68 percent of students mentioning it. The aboutness (“this is a good resource because it is about ...”) and URL cues also were common, with more than half of the students mentioning each of these during the container task. The visual appearance (39 percent) and Google snippet (10 percent) cues were less common.


Number of students who attended to each cue during container task
Figure 7: Number of students who attended to each cue during container task.



Behaviors refer to things that the students did during the simulation. The simulation software captured task choices, as well as when students clicked into resources. We calculated the amount of time that they spent on each task from the session recordings.

We looked at the time that the students spent on the total simulation. We chose this rather than the amount of time on the container task, because we think this offers a better indicator of the amount of time that they spent exploring the resources. Since the container task was the fifth task, students used their knowledge of the resources from previous tasks when making judgments for the container task. Students spent a mean of 68 minutes and median of 65 minutes on the entire simulation, with the distribution shown in Figure 8.


Number of students within each range of simulation duration
Figure 8: Number of students within each range of simulation duration.


For reference, the average amount of time that each cohort spent on the container task is provided in Figure 9. Students in higher cohorts, despite having to label more containers, on average spent less time on the container task. This could be a result of greater familiarity with some of the sources prior to the simulation or of having spent more time working with the resources in prior simulation tasks.


Average time on container task by cohort
Figure 9: Average time on container task by cohort.


We looked at the number of different container types (out of eight possible) that participants used during the container task. Since each container label should have been used at least once, this offers a rough indication of their knowledge of a variety of container types. The choices were Web site, news, book, journal, magazine, blog, preprint article, and conference proceeding. Figure 10 summarizes how many different container labels students used during the container task. Students used a mean 6.2 container labels and a median of six container labels.


Number of container types that students used during task
Figure 10: Number of container types that students used during task.


We looked at the number of resources that participants clicked on during the container task. This offered an indication that the students explored the resources in detail, rather than making judgements based on heuristics gathered from the result pages. Figure 11 shows the number of students that fell into the different source-click ranges. On average, students clicked on 10.42 sources. Since the number of sources differed for K-12 and higher education cohorts, it is not possible to make general conclusions from this.


Number of students within each range of resource clicks during container task
Figure 11: Number of students within each range of resource clicks during container task.


The average number of resource clicks in each cohort is shown in Figure 12. There is an overall upward trend, with students in higher cohorts tending to click on more resources. The relationship is not perfectly linear, though. High school students, on average, clicked on more resources than any other cohort.


Average number of resource clicks by cohort
Figure 12: Average number of resource clicks by cohort.


Statistical model

To test the impact of each of our demographic, cue, and behavior variables on the percent of containers correctly identified, we used ordinary least squares (OLS) regression. This model calculates the impact that a one-unit change in each variable has on the predicted percent of containers that each participant is expected to identify, as well as the statistical significance of the relationship.

OLS allows for controlled comparisons that show the relationship that each variable had when the other variables included in the model are held constant.

It is important to remember when we include multiple variables in a statistical model that they can sometimes become sensitive to the specific set of variables included. This becomes more likely as we include more variables in each model. Because of this, we ran four different statistical models.




The regression results are summarized in Table 1. For each variable, the coefficient is identified. This number represents the predicted impact of a one-unit increase in the variable on the percentage of containers correctly identified. In parentheses below each coefficient is the standard error, used to calculate the statistical significance. The level of significance is denoted with asterisks, and we’ve used boldface to indicate relationships that are significant at greater than a 90 percent level of confidence.

Model 1 shows just demographic variables, Model 2 shows just cue variables, Model 3 shows just behavior variables, and Model 4 shows all variables.


Table 1: Predictors of percent of containers correctly identified.
Note: *p<0.10 **p<0.05 ***p<0.01
 Model 1Model 2Model 3Model 4
Parent w/degree6.04***
Genre 14.51***
Source 2.27
Aboutness -4.96**
Visual appearance -1.46
URL -0.10
Google result snippet -13.33***
Total simulation duration  -0.10**
Different containers  3.87***
Resource clicks  1.00***



Models 1 and 4 show the results for demographics. All three of the demographic variables included (cohort, confidence searching, and parent with bachelor’s degree) are significant, positive predictors of a student’s ability to identify containers in both models.

When all other variables are held constant, students are expected to correctly identify approximately five percent more resources, representing an effect size of more than 25 percent for the full range of students from elementary to graduate school. The predicted percentage of containers that one is expected to correctly identify at each cohort level, based on Model 1, is shown in Figure 13. In this case, the percentages are calculated with confidence searching and parent with a bachelors degree held constant at their means.


Predicted percent correct by cohort
Figure 13: Predicted percent correct by cohort (Model 1).


Those who feel more confident finding information on the Internet also are better able to identify containers. Each additional point on a scale from one to five increases the percent of containers that one is predicted to correctly identify by just over three percent, representing a 14 percent increase for the full range of the variable. The predicted percent correct for each confidence level is shown in Figure 14, based on Model 1 with the other variables held constant at their means.


Predicted percent correct by confidence level
Figure 14: Predicted percent correct by confidence level (Model 1).


In addition, those who have a parent with a bachelor’s degree are expected to correctly identify around five percent–six percent more containers correctly, depending on the statistical model. Overall, these models show that individuals’ demographic background impacts their digital behavior, and there is no substitute (that we found) for additional years of education in impacting individuals digital literacy.


Of the cues examined, genre and source proved the most useful for identifying container type. Those who paid attention to genre identified over 14% more containers correctly when all other cue variables are held constant, according to Model 2. That drops to around 6% when controls for demographics and behaviors are added in Model 4, but remains statistically significant. Source was not significant in Model 2, but was significant in Model 4. In Model 4, those who paid attention to source identified about five percent more containers correctly.

Attending to aboutness appears to decrease ones ability to identify containers. This cue was significant in both Model 2 and Model 4. In both cases, those who attended to aboutness identified about five percent fewer containers correctly.

Visual appearance and URL did not significantly impact the percentage of containers that the student correctly identified, despite being two common cues that students used to identify containers. While Google snippet was less common, the impact was negative in Model 2 and not significant in Model 4.


The amount of time students spent on the entire simulation had a slightly negative impact on the percentage of containers that they correctly identified. Those who spent more time on the simulation did slightly worse in the container task. While the impact was statistically significant, the magnitude of the impact (around 0.1 percent fewer correct containers for each additional minute) was small enough that it is likely not substantively meaningful.

The number of different container labels that a student used during the container task was a statistically significant predictor of correct container identification in Model 3, but not Model 4. The relationship in Model 3, illustrated in Figure 15, shows the predicted percent of containers correctly identified across the range of the different container labels when the other variables are held at their means.

This relationship was likely driven by students in higher cohorts using a larger variety of container types, for example an elementary schooler would be less likely to be familiar with the term “preprint”, making the relationship appear to be statistically significant before controlling for cohort.


Predicted percent correct by number of container labels used
Figure 15: Predicted percent correct by number of container labels used (Model 3).


Finally, the number of resource clicks in the container task had a statistically significant relationship with the number of containers correctly identified in both Model 3 and Model 4. The substantive impact is shown in Figure 16, based on Model 3, with the other variables held constant at their means.

Over the entire range of the variable, this represents an increase 22 percent. By definition, this variable tracks with cohort because elementary schoolers only saw eight resources while graduate students saw 22. Having said that, the finding does not appear to be sensitive to controlling for cohort in Model 4.


Predicted percent correct by number of resource clicks
Figure 16: Predicted percent correct by number of resource clicks (Model 3).





We asked our participants if they thought it was important to know the container of online information resources. The answer was a resounding yes, with 86 percent (151 out of 175) of the students indicating it was important.

Like the students in the Francke and Sundin (2009) study, many higher education students see container as an aid in credibility assessments. One undergraduate participant stated, “Yes. That’ll help you decide how credible it is.” Others referred to containers’ importance in creating citations, as this undergraduate student did: “for the sake of citing, you have to know what you’re citing from”.

Interestingly, younger students were more likely to say that container is not important. The younger students who did not care about the container were more focused on their version of information quality.

This might indicate a shift among students who always have done research in a digital world. However, it also might indicate that students who have more fully developed research and information literacy skills better understand the value of containers. Further research, particularly longitudinal studies with these younger students, is necessary to determine whether this is reflective of shifting generational attitudes or sustained differences in educational levels.

Overall, our results indicate that students have some understanding of containers. They realize that, even though all of these resources were found on the Internet, they were not just Web sites. However, the fact that they only identified an average of 51 percent of containers correctly suggests that container collapse is a real and serious phenomenon. Nobody, even at the graduate stage, was able to identify every container correctly, and some students were incorrect on almost all of their container judgements.

Our results also suggest that students do not fully understand the entire range of containers that exist. Only 16 percent of students used all eight labels presented to them, but the majority used around six or seven different labels. In particular, many students were unfamiliar with preprint as a container, and often asked facilitators what it was during the container task.

All of our post-secondary students were STEM majors. Despite preprints being much more common in STEM disciplines than in the humanities or social sciences, students still did not recognize and apply this label consistently. This is especially important during times such as the COVID-19 pandemic, where much scientific research is being published in preprint form in the interest of getting it out quickly (Kubota, 2020). While this has allowed researchers to adapt to a quickly changing situation, it also can create problems if even students who have training in STEM disciplines are struggling to use preprints appropriately (Flynn, 2020).


In our study, students who were more confident generally identified more containers correctly. This contrasts with previous studies that find a discrepancy between how students rated themselves and how they actually performed (Gross, 2005; Gross and Latham, 2012, 2009, 2007). These previous studies have suggested that the Dunning-Kruger effect may partially explain why students with lower IL skills report higher levels of self-confidence (Gross and Latham, 2012). Our findings do not support that hypothesis. This could be due to differences in the skills being evaluated, in how data was collected, in how the researchers related to the participants, or something else entirely. However, it suggests that confidence in online searching ability is nuanced, and that its impact differs depending on which specific aspect of information literacy is being investigated. Further research is needed to better understand the relationship between students self-efficacy and their actual IL skills.

The findings from the student demographics highlight one of the challenges that first-generation college students face. Even when they feel confident searching for information and have a similar level of educational experience as their peers, they did not identify as many containers correctly. This underscores the importance of information literacy instruction that takes into account what we have learned from first-generation college students. Many libraries have already begun implementing services for this demographic, and our findings provide additional empirical support for these efforts. These efforts can include reimagining traditional techniques where physical items were used in the classroom to demonstrate the difference between journals and magazines or explicitly highlighting containers in instruction around the information life cycle.

Cues and behaviors

Our findings on cues expand the literature on heuristics to include an understanding of how students use common heuristics to identify containers and how well these heuristics work in doing so. We found that, in general, students don’t benefit from superficial cues.

The use of cues such as URL and Google snippet, which appear on the result pages, may indicate that students are using the similarity heuristic (the degree to which a result matches the seeker’s objective) even though this type of relevance judgment has little to do with container judgments. This heuristic has been observed when students are selecting results from result pages (Sundar, 2008), so our results may indicate that students who use these cues to determine container are relying on the same heuristics throughout the information evaluation process instead of being able to utilize different heuristics for different purposes.

On the other hand, some of the heuristics that students use to make credibility decisions may prove useful for identifying containers. We found that paying attention to genre and source — which is related to use of the reputation heuristic observed during students’ credibility assessments (Metzger and Flanagin, 2013) — helped students identify containers.

Both cues help students think about where the information comes from and how it got to them. This provides leverage for both credibility and container judgments, which require students to move beyond what a resource is about.

Since the conceptual focus of a resource does not necessarily indicate what the container is, students that focus on aboutness when asked to identify the container might get bogged down in the information inside rather than identifying features of the resource that might indicate the container. It also might mean that they incorrectly believe that certain types of information fall into certain types of containers, which is not the case.

Instead, students must make decisions based on features of a resource that may not be directly related to their information topic, focusing instead on features of the resource that speak to how it was produced. Investigating these features requires students to engage more deeply with the resource.

This is further supported by our findings regarding behaviors. Of the behaviors in this study, resource clicks appear to be the most meaningful. Clicks most directly deal with resource engagement, supporting our finding that cues which require deeper engagement with a resource help students identify containers more than superficial ones.




To our knowledge, container and genre are not regularly included in information literacy instruction programs. Our findings suggest that they should be. Librarians can easily add these concepts into their existing instruction. Librarians who are already conducting instruction using a discovery system or specialized database can include container and genre identification skills. Perry (2014) discusses adjusting instruction sessions to use a discovery tool and its limiters to draw students’ attention to the containers of information.

Another ideal setting for incorporating container identification is instruction in the use of citation management tools. Most citation management software programs attempt to identify the container for the user, and sometimes get it wrong. Students should not feel bad that container identification is challenging when specialist computer programs designed by experts get it wrong too.

Our findings suggest the need for a scaffolded approach to container identification that begins in elementary grades and continues through graduate school. Such an approach would not only help younger students improve their ability to identify containers, it would likely also help reduce the gap between first-generation and non-first-generation students when they enter college. More consistent instruction in how information is produced and disseminated, how to distinguish different digital information containers, and why these issues are important would provide all students with the knowledge and skills that they need when matriculating to a college or university.

These skills are also a valuable element of lifelong information literacy for those who are not interested in pursuing post-secondary education. Information containers are closely linked to issues of credibility and information quality, which are important for everyone navigating the increasingly cluttered digital information environment.

To support the scaffolding of information literacy, instructors should also teach these skills in the open systems, like Google, that students more commonly frequent for searching. While many of the proprietary systems that college and university libraries use have some form of built-in container identification, even if it’s not always accurate, open search engines do not. Social media systems, where many students get their everyday life information, are potentially even more decontextualized. Students will have to rely on their ability to use the cues available to them to make informed decisions about the resources they encounter. Librarians can help students explore online information resources and understand the cues to help them correctly identify and interpret the significance of resource containers.

The genre cue, in particular, can be useful in this regard. Genre, like container, is an element of form that results from the publication process, and thus is closely linked with it. We know that novice learners “are inclined to seek out characteristics of information products that indicate the underlying creation process” (Association of College & Research Libraries, 2016). Since students who focus on genre generally identify more containers correctly, librarians can teach students to use this cue to successfully identify containers.

Beyond the use of a single cue, our findings suggest that deeper engagement with information resources better enables students to identify containers. However, deep engagement has behavioral and attitudinal components that go beyond skill-based instruction. It requires students to not only understand how to critically evaluate resources and meta-content, but also to be motivated enough to consistently engage in this evaluation. This underscores the importance of information literacy educators teaching students to click into resources and employ lateral searching techniques to evaluate sources (Wineburg and McGrew, 2019). One way to make this realistic for students is to explicitly help them separate and sequence judgements about whether a source is helpful and if it is credible.

Despite this added complication, it is perhaps the most important thing for students to learn. Deeply engaging with and evaluating information resources sets students up for success not only in identifying information containers, but also in navigating an information environment that obscures such containers in the first place. End of article


About the authors

All of the authors on this paper are part of the Researching Students’ Information Choices project, a collaboration between the University of Florida, Rutgers, and OCLC. This four-year, grant-funded project looks at how STEM students identify and judge the credibility of online resources. Dissemination of this research is ongoing, with results published in outlets such as Issues in Science & Technology Librarianship, and IFLA and ACRL conference proceedings.

Direct comments to Christopher Cyr, OCLC Research, cyrc [at] oclc [dot] org


This project was made possible in part by the U.S. Institute of Museum and Library Services, grant number LG-81-15-0155.



Xan Arch and Isaac Gilman, 2019. “First principles: Designing services for first-generation students,” College & Research Libraries, volume 80, number 7, pp. 996–1,012.
doi:, accessed 16 June 2020.

Association of College & Research Libraries (ACRL), 2016. “Framework for information literacy for higher education” (11 January), at, accessed 15 May 2020.

Amy Buhler and Tara Cataldo, 2016. “Identifying e-resources: An exploratory study of university students,” Library Resources & Technical Services, volume 60, number 1, pp. 23–37.
doi:, accessed 16 June 2020.

Amy G. Buhler, Ixchel M. Faniel, Brittany Brannon, Christopher Cyr, Tara Tobin Cataldo, Lynn Silipigni Connaway, Joyce Kasman Valenza, Rachael Elrod, Randy A. Graff, Samuel R. Putnam, Erin M. Hood, and Kailey Langer, 2019. “Container collapse and the information remix: Students’ evaluations of scientific research recast in scholarly vs. popular sources,” In: Dawn M. Mueller (editor). Proceedings of the Association of College and Research Libraries, pp. 654–667, and at, accessed 3 June 2020.

Lynn Silipigni Connaway, 2018. “What is ‘container collapse’ and why should librarians and teachers care?” OCLC Next (20 June), at, accessed 14 April 2020.

Andrew J. Flanagin and Miriam J. Metzger, 2008. “Digital media and youth: Unparalleled opportunity and unprecedented responsibility,” In: Miriam J. Metzger and Andrew J. Flanagin (editors). Digital media, youth, and credibility. Cambridge, Mass.: MIT Press, pp. 5–27, and at, accessed 22 June 2016.

Katheen Hutchinson Flynn, 2020. “Citation analysis of mathematics and statistics dissertations and theses from the University at Albany,” Science & Technology Libraries, volume 39, number 2, pp. 142–154.
doi:, accessed 7 August 2020.

Amanda L. Folk, 2018. “Drawing on students’ funds of knowledge,” Journal of Information Literacy, volume 12, number 2, pp. 44–59.
doi:, accessed 16 June 2020.

Helena Francke and Olof Sundin, 2009. “Format agnostics or format believers? How students in high school use genre to assess credibility,” Proceedings of the American Society for Information Science and Technology, volume 46, number 1, pp. 1–7.
doi:, accessed 16 June 2020.

Melissa Gross, 2005. “The impact of low-level skills on information-seeking behavior: Implications of competency theory for research and practice,” Reference & User Services Quarterly, volume 45, number 2, pp. 155–162.

Melissa Gross and Don Latham, 2012. “What’s skill got to do with it? Information literacy skills and self-views of ability among first-year college students,” Journal of the American Society for Information Science & Technology, volume 63, number 3, pp. 574–583.
doi:, accessed 16 June 2020.

Melissa Gross and Don Latham, 2009. “Undergraduate perceptions of information literacy: Defining, attaining, and self-assessing skills,” College & Research Libraries, volume 70, number 4, pp. 336–350.
doi:, accessed 16 June 2020.

Melissa Gross and Don Latham, 2007. “Attaining information literacy: An investigation of the relationship between skill level, self-estimates of skill, and library anxiety,” Library & Information Science Research, volume 29, number 3, pp. 332–353.
doi:, accessed 16 June 2020.

Alexander Haas and Julian Unkel, 2017. “Ranking versus reputation: perception and effects of search result credibility,” Behaviour & Information Technology, volume 36, number 12, pp. 1,285–1,298.
doi:, accessed 16 June 2020.

Darren Ilett, 2019. “First-generation students’ information literacy in everyday contexts,” Journal of Information Literacy, volume 13, number 2, pp. 73–91.
doi:, accessed 16 June 2020.

Cathie Jackson, 2013. “Confidence as an indicator of research students’ abilities in information literacy: A mismatch,” Journal of Information Literacy, volume 7, number 2, pp. 149–152.
doi:, accessed 16 June 2020.

Justin Kruger and David Dunning, 1999. “Unskilled and unaware of it: How difficulties in recognizing one’s own incompetence lead to inflated self-assessments,” Journal of Personality and Social Psychology, volume 77, number 6, pp. 1,121–1,134.
doi:, accessed 16 June 2020.

Taylor Kubota, 2020. “Stanford researchers discuss the benefits — and perils — of science without peer review,” Stanford News (6 April), at, accessed 8 June 2020.

Chris Leeder, 2016. “Student misidentification of online genres,” Library & Information Science Research, volume 38, number 2, pp. 125–132.
doi:, accessed 16 June 2020.

Firouzeh Logan and Elisabeth Pickard, 2012. “First-generation college students: A sketch of their research process,” In: Lynda M. Duke and Andrew D. Asher (editors). College libraries and student culture: What we now know. Chicago: American Library Association, pp. 109–125.

Miriam J. Metzger and Andrew J. Flanagin, 2013. “Credibility and trust of information in online environments: The use of cognitive heuristics,” Journal of Pragmatics, volume 59, part B, pp. 210–220.
doi:, accessed 16 June 2020.

Valeria E. Molteni and Emily K. Chan, 2015. “Student confidence/overconfidence in the research process,” Journal of Academic Librarianship, volume 41, number 1, pp. 2–8.
doi:, accessed 16 June 2020.

OCLC, 2004. “2004 Information format trends: Content, not containers,” at, accessed 14 April 2020.

Susanne F. Paterson and Carolyn White Gamtso, 2017. “Information literacy instruction in an English capstone course: A study of student confidence, perception, and practice,” Journal of Academic Librarianship, volume 43, number 2, pp. 143–155.
doi:, accessed 16 June 2020.

Maureen A. Perry, 2014. “Revitalizing a lesson: Teaching with a discovery tool,” Online Searcher, volume 38, number 1, pp. 38–41.

Elizabeth Pickard and Firouzeh Logan, 2013. “The research process and the library: First-generation college seniors vs. freshmen,” College & Research Libraries, volume 74, number 4, pp. 399–415.
doi:, accessed 16 June 2020.

Elizabeth R. Spievak and Pamela Hayes-Bohanan, 2016. “Creating order: The role of heuristics in website selection,” Internet Reference Services Quarterly, volume 21, numbers 1–2, pp. 23–46.
doi:, accessed 16 June 2020.

S. Shyam Sundar, 2008. “The MAIN model: A heuristic approach to understanding technology effects on credibility,” In: Miriam J. Metzger and Andrew J. Flanagin (editors). Digital media, youth, and credibility. Cambridge, Mass.: MIT Press, pp. 73–100, and at, accessed 15 February 2021.

Joyce Valenza, 2016. “Truth, truthiness, triangulation: A news literacy toolkit for a ‘post-truth’ world,” Never Ending Search (26 November), at, accessed 2 June 2020.

Jinx Stapleton Watson, 2001. “Issues of confidence and competence: Students and the World Wide Web,” Teacher Librarian, volume 29, number 1, pp. 15–19.

Peter Williams and Ian Rowlands, 2007. “Information behaviour of the researcher of the future: The literature on young people and their information behaviour,” British Library/JISC (18 October), at, accessed 15 May 2020.

Sam Wineburg and Sarah McGrew, 2019. “Lateral reading and the nature of expertise: Reading less and learning more when evaluating digital information,” Teachers College Record, volume 121, number 11, pp. 1–40, and at, accessed 26 August 2020.


Editorial history

Received 2 July 2020; revised 9 September 2020; accepted 20 September 2020.

Creative Commons License
This paper is licensed under a Creative Commons Attribution 4.0 International License.

Backgrounds and behaviors: Which students successfully identify online resources in the face of container collapse
by Christopher Cyr, Tara Tobin Cataldo, Brittany Brannon, Amy G. Buhler, Ixchel M. Faniel, Lynn Silipigni Connaway, Joyce Kasman Valenza, Rachael Elrod, and Samuel R. Putnam.
First Monday, Volume 26, Number 3 - 1 March 2021