Disciplining Search/Searching Disciplines
First Monday

Disciplining Search/Searching Disciplines: Perspectives from Academic Communities on Metasearch Quality Indicators by Rohit Chopra and Aaron Krowne



Abstract
“Quality Metrics” is an IMLS–funded research project which aims to address longstanding deficits in the formal conceptual support for and development of scholarly digital libraries. Central to attaining these goals is collecting and analyzing feedback from stakeholders in the scholarly community about the efficacy and value of key aspects of search technologies; including search interfaces, modalities, and results displays. A team at Emory University conducted this foundational research by utilizing the qualitative methodology of focus groups. In addition to an initial set of exploratory focus groups, the team conducted a second round of focus group sessions with a protoype search system specially designed for scholarly digital libraries. This paper describes the concept, objectives, methodology, and findings of the focus groups component of the Quality Metrics Project.

Contents

Introduction
Background and Project Objectives
Research Design: Structure of Set of Focus Groups and Individual Focus Groups
Demographics of Participants and Sample Composition
Analysis of Data and Findings, Fall 2005 Focus Groups
Further Research Development and Focus Groups, Spring 2006

 


 

Introduction

How do academic communities — such as faculty members and graduate students in the humanities, social sciences, and sciences — understand and use search technologies, whether commercial or those deployed in digital library systems? Do members of these communities bring implicit assumptions about quality to their evaluation of records of library and research objects that are displayed by various search technologies? How would members of particular academic communities — for example, graduate students in the humanities — effectively utilize these quality indicators or ‘metrics’ to make judgments about research objects culled through a ‘metasearch’ retrieval exercise from different search paradigms and research sources? Do these notions of quality vary across disciplines?

The “Study of User Quality Metrics for Metasearch Retrieval Ranking” (hereafter Quality Metrics Project) seeks to answer precisely these questions, among others. A joint undertaking of Emory University and Virgina Tech, the Quality Metrics Project is funded by the U.S. Institute of Museum and Library Studies [1]. It is is one of several projects of the MetaScholar Initiative, an ongoing program for digital library research based at Emory University for scholarly communication [2]. The Emory team conducted research for the Quality Metrics Project utilizing the qualitative methodology of focus groups, while the Virginia Tech team employed quantitative methodologies for their component of the Project. Between September and December 2005, the Emory team conducted nine exploratory focus groups with faculty and graduate students, across many disciplines, that yielded rich data pertinent to the research questions. In parallel, the Emory team also developed a working prototype metasearch system, titled “QMSearch.” Between March and May 2006, the Emory team conducted three additional focus groups centered on the efficacy of the QMSearch prototype system. This paper describes the qualitative focus group component of the Quality Metrics Project undertaken at Emory from September 2005 to May 2006, with a presentation of the findings from analysis of data collected through both rounds of focus groups [3].

The body of this paper is divided into five sections. The next section briefly provides a background on the Quality Metrics Project including the wider project objectives, working hypothesis, framework, and methodology that orients the focus groups. The third section explains the project design including Institutional Review Board (IRB) procedures, structure of the set of focus groups, and design of individual focus groups conducted in Fall 2005. The fourth section details the demographics selected for participation, and composition of the sample of the first set of focus groups. The fifth section presents the approach adopted for data analysis collected through the first set of focus groups as well as findings from data analysis. The final section presents findings from the second set of focus groups, conducted in the Spring semester 2006, and concludes the report.

 

++++++++++

Background and Project Objectives

The central idea underlying the QM Project, literature survey, and preliminary research have been detailed in the project proposal to IMLS, written by Martin Halbert and Aaron Krowne, and in the interim reports to IMLS authored by Krowne, et al. [4] Here we recap background from these reports, with key pieces quoted verbatim.

The Quality Metrics research exercise can be situated in terms of a proposed redefinition of the notion of metasearch as compared to “classical” definitions of the term. As in the wider MetaScholar initiative, in the Quality Metrics project we have approached metasearch as a harvest–based methodology as opposed to a federated query system.

... we have approached metasearch as a harvest–based methodology as opposed to a federated query system.

Classically, metasearch originated within the library community as a distributed “federated” search, which worked by dispatching queries to many disparate hosts at run–time, and gradually gathering the results from the hosts as they arrived in independent fashion. This was implemented with asynchronous protocols, most prominently Z39.50, which separated query from response.

This asynchronous notion of metasearch had two key problems. The first, and perhaps most apparent to library developers, was that the protocols were not usually implemented consistently at the different hosts. Typically, as in Z39.50, there was very little common ground in search semantics, greatly diluting the search capability. Librarians had very little control over this situation due to the fragmentation of vendors and proprietary library systems.

Secondly, and more apparent to users, was the fact that the quality of service was low. This was a basic side effect of the asynchronous nature of the federated metasearch model. That is, as the queries were processed by the various hosts at different rates, the results would come back at unpredictable times. This meant the user would often have to wait a long time to see all of the results, and that there was no way to gradually reveal the “best results” first. It was quite possible that the best match could come in the last set of results, minutes (or worse) after the initial query was submitted. Worse, if a host was down entirely, the search might “never complete” fully.

The harvest–based framework remedies both of these problems. In harvesting, records (or sufficient versions of them) are copied to the central search system. Here they are either cached or directly indexed, into a single data structure which facilitates rapid search (the index). A query can then be served and a response delivered up without relying on external hosts at search time. Thus it is impossible for potentially unreliable remote hosts to slow down the search or keep it from returning all matching results [5].

We believe this model represents the “baseline” functionality in metasearch systems today, largely as a result of how Google and other Web search engines have raised expectations. Users now expect search results in seconds, or even fractions thereof, and want a comprehensive result set. This is now fairly easy for libraries and smaller digital library projects to attain, due to adequate and inexpensive computing capability at the entry or small–scale enterprise level. Also making this method possible has been the rise of the OAI–PMH protocol and free/open source OAI harvesting and provider tools, as well as the availability of very capable off–the–shelf F/OSS search engines such as Lucene (at http://lucene.apache.org/java/docs/ or Swish–e (a http://swish-e.org/).

The Quality Metrics Project’s basic objective is to gather foundational data concerning user quality metrics for metasearch retrieval; that is, the principles and criteria by which different academic user communities would organize results from different information domains within their particular frame of reference. For example, a professor of history would invoke certain criteria in ranking search results for the search query “1992 US presidential elections” from different search technologies, such as Yahoo, Google, JSTOR.org (at http://www.jstor.org/), and so on. Similarly, a graduate student in the sciences might invoke specific criteria in ranking search results for the term “global warming.” The Quality Metrics Project seeks to identify such implicit criteria by which different user groups judge and classify search technologies, search strategies, and result displays from different systems. The core proposition sought to be tested in the Project is encapsulated in the experimental hypothesis articulated in the Project proposal: “end–users of digital library metasearch services think in terms of implicit quality metrics that can be respected in retrieval ranking, improving results over the current text–query relevance and even hub–authority based methods. If effective metasearch systems are to be informed by empirical studies of actual users, this hypothesis must be tested” [6].

Users now expect search results in seconds, or even fractions thereof, and want a comprehensive result set.

The problem space of user quality metrics of metasearch systems encompasses questions of clarity, utility, and aesthetics, and can range from philosophical assumptions about the value of methodologies for academia, to preferences about the logic of classification by which search results are displayed. As we argued in the Project proposal, while there is a significant body of research concerning user preferences and searching behaviors in particular systems, such as library catalogs and Web search engines, virtually no real–world user studies exist for metasearch systems that bring jarringly different sorts of information together. Little is known about optimal interfaces or visualization methods. Moreover, virtually no digital library search engines provide the capability for explicitly tailoring the logic of retrieval results presentation in ways that reflect the desires of specific user communities relative to the varying quality and attributes of the underlying resources. This state of affairs may itself be understood as a reflection or symptom of the following factors.

Digital library search systems have not evolved to keep up with growing user expectations and the metadata–rich nature of digital library objects. These search systems, though a vast improvement from arcane, classical OPACs, behave clumsily in the face of heterogeneity and radically different values for key metadata attributes. The increasing prevalence of metasearch, a scenario whereby disparate and generally heterogeneous objects are searched together in one interface, has exacerbated the problem.

Perhaps one reason for this situation is that search engines have come out of the field of information retrieval (IR), which has recently been focused on solving the Web search problem. While digital libraries have benefited from these recent advances in information retrieval, the Web search and digital library search scenarios are quite different. Aside from scale differences and different amounts and kinds of item inter–linkage, the Web largely lacks metadata.

This is not a minor distinction. As an example of how this issue manifests itself, digital libraries have the opportunity to distinguish between item–level and collection–level records in determining how to present retrieval results. Search engines developed for Web–based materials do not address this issue of record type, which is also an issue of granularity. Some search engines adapted for digital libraries have attempted to differentiate results along such granularity boundaries, but their approaches have not been formally tested with users. A similar problem also manifests itself in treating results from various sub–collections (e.g., separate library content databases, metadata records harvested from other digital libraries) and items culled from the Web vs. “native” digital library records.

Further, these and other attributes often also bear on notions of quality, which deeply influences the organization of retrieval results. In the dominant Web search model utilizing “PageRank” or similar methods of scoring, hyperlink data is used to make inferences about quality which allows ranking to be vastly improved over purely content–based matching methods. In digital libraries, we could extend the gathering of such quality information to attributes pertaining to the “vettedness” of records, rating, view popularity, logged activation from search results lists, aggregation in path or lesson plan objects, and much more. Thus, there is both a great need and opportunity to intelligently make use of digital library metadata in retrieval results presentation. Accordingly, the goals of the QM Project are:

  1. to discover the best way to present digital library retrieval results by digging down to the user expectations level using focus groups,
  2. apply these findings to a working prototype system, and then
  3. evaluate the effectiveness of these systems relative to standard or typically available alternatives.

The new search system that we are developing to integrate and expose quality metrics is referred to as “QMSearch.” The development of the QMSearch system is a long–term objective of the Project that orients and guides all aspects of the Project towards some functional impact. The objective of using the focus groups methodology is to obtain information about how users employ search strategies, use different search interfaces, and understand displayed search results, so as to develop the QMSearch model. More specifically, the focus groups seek to test the hypothesis that a coherent model can be produced over a diverse base of scholarly users classified by meta–disciplinary orientation and research level. The population of focus group participants in our study covers four demographics: 1) graduate students in the humanities and social sciences; 2) faculty members in the humanities and social sciences; 3) graduate students in the sciences; and, 4) faculty members in the sciences. There is a rich body of literature on focus groups, which affirms the suitability of the methodology for the qualitative dimension of the Quality Metrics Project and the applicability of the method given the Project objectives. The project report in the Metascholar Web site documentation (see note 4) section includes a detailed literature survey and evaluation of relevant perspectives on focus groups.

 

++++++++++

Research Design: Structure of Set of Focus Groups and Individual Focus Groups

The focus groups were conceptualized and structured as a separate sub–project within the larger Quality Metrics Project [7]. The Emory investigators decided on a figure of nine focus groups for the period of October 2005 to December 2005, with a target participant number of three to six individuals for each focus group session. This first set of focus groups was conceptualized as an exploratory exercise to obtain information on user expectations on quality metrics. An accompanying assumption was that the first set of focus groups would determine the scope and objectives of any follow–up groups. Our objective, to the extent possible, was not to not mix participants by research level, keeping faculty focus groups separate from graduate student focus groups. We also decided to keep participants from the sciences separate from participants from the humanities and social sciences, for this first set of focus groups [8].

By October 2005, we obtained clearance for conducting the focus groups from Emory University’s Institutional Review Board (IRB). All the project investigators also completed the CITI Human Subjects Research Education Program, an IRB requirement of researchers working on projects with human subjects. We selected a “smart classroom,” Room 217, in Emory’s Center for Interactive Technology, in the Woodruff Library at Emory, as the location for the focus groups. All but two of the nine focus groups held in the exploratory phase were conducted in this room. This location offered access to a high–speed Internet connection and a computer connected to a large double screen. This meant that we could show participants two browser windows at any point of time, which was extremely helpful in enabling relevant comparative observations. For example, we could show the logical progression between two steps of a search or metasearch sequence. We could also compare two search systems/interfaces, for example, A9 and Google Scholar [9]. Two focus group sessions were conducted in another classroom, the ECIT Teaching Theater that is smaller but nonetheless adequate as a research setting and also a “smart classroom.” The second set of three focus groups were conducted respectively in another ECIT smart classroom, the ECIT Teaching Theater, and Room 217.

All focus group sessions were video–recorded through two cameras placed at different ends of the room. One camera typically recorded the principal investigator’s introduction, subsequent comments by the investigator, and the screen, while the other recorded the participants’ responses. Immediately after each focus group, the content of each tape was transferred to, and recorded on, DVD format, using iMovie technology available at the ECIT in Emory’s Woodruff Library. Cumulatively, we have obtained 12 hours of recorded data on DVD format — nine hours for the first round of focus groups and three hours for the second — with two DVD recordings for each hour (approximately) of discussion.

Each focus group session was scheduled for an hour and fifteen minutes, including lunch. Each focus group was structured as follows. We began by communicating the broad objectives and goals as well as participant rights and responsibilities, as per IRB guidelines. Next, we described the topic of the investigation — the problematic of quality metrics — in more detail, specifically with reference to a hypothetical search scenario. We presented participants with a set of mock–up images (examples displayed below) that reflected the outcome of a metasearch exercise on terms or keywords such as “globalization” or “civil rights movement.” These mock–up images are indicative of the kind of search a member of an academic community might want to undertake. The mock–up images also included quality and “fitness” metrics and indicators that, we assumed, would be useful for academic communities and that we hope to make available through our working search system. For instance, on the mock–ups, we showed academic articles from journals, scholarly books, and Web sites ranked by one to five stars, where the number of stars was supposed to indicate the academic merit of the arguments/content in the journal, book, or Web site respectively. “Fitness” indicators covered descriptive attributes of the item or collection that the item was sourced from, for instance, whether the article or book was downloadable as full text, word length of the article, and so on. “Fitness,” therefore, was understood as a form of quality contingent upon immediate and practical scholarly needs, that we conceptually distinguished from substantive evaluations of academic work that apply beyond any immediate contextual need.

We showed participants a sequence of three mock–up images. The first image reflected a basic or “default” outcome of the search exercise, with individual records listed by the principle of relevance or degree of match between the keyword and the record. The following images reflected subsequent logical steps in the search exercise. The second image, for instance, showed the results organized by both relevance and in vertical columns by “domain,” or the broad type of source the items were drawn from, such as books, Web sites/Web pages, or archives. The third image, in keeping with the same logic, showed the results organized by relevance and domain as well as horizontally divided by format (text, audio, video etc) or number of times cited (for example, whether item is cited more than 100 times, 50 to 100 times, or 1 to 50 times in all databases and resources searched). We referred to these three views of increasing levels of user options and complexity as the one–, two– and three–dimensional view respectively. Figures 1, 2, and 3 below are the various dimensional mock–up images of the metasearch outcome for the search phrase “civil rights movement.”

 

Mock-up of one-dimensional screen for search phrase civil rights movement.

Figure 1: Mock–up of one–dimensional screen for search phrase “civil rights movement”.

 

 

Mock-up of two-dimensional screen for search phrase civil rights movement.

Figure 2: Mock–up of two–dimensional screen for search phrase “civil rights movement”.

 

 

Mock-up of three-dimensional screen for search phrase civil rights movement.

Figure 3: Mock–up of three–dimensional screen for search phrase “civil rights movement”.

 

Participants were allowed to interrupt to ask questions or clarifications at any point of time during the focus groups. Participants were also informed that they could demonstrate any search interface, digital resource, or Web site from the World Wide Web to the group, by asking the investigators to display it on the screen. We followed a model of low to moderate intervention in structuring the conversation. As investigators, we did clarify questions and through the course of each session introduced new issues for further discussion based on participant responses. However, we did not seek, at any point in any session, to structure the conversation in very rigid or definite terms. Based on the composition of the groups, the particular issues of discussion or threads of conversation varied significantly, within the broad conceptual and topical parameters of the quality metrics problematic. This provided extremely rich, nuanced, and informative data, wide–ranging in its scope and range of issues covered.

Rohit Chopra was present at all focus groups for the entire duration of each session. For most focus groups, at least two of the Emory investigators were present for the session. For several focus groups, three to four of the investigators were present for the session. This enabled the Emory team of investigators to get multiple perspectives, from different vantage points, on many of the focus group sessions. The Emory team of investigators met informally within a short time of span after several individual focus group sessions to assess findings from those particular discussions. Additionally, the team met periodically through the semester for discussions on the focus groups.

 

++++++++++

Demographics of Participants and Sample Composition

Our objective in inviting participants to focus groups was to cover four demographics as follows representing both meta–disciplinary orientation and research level.

 

Faculty in SciencesGraduate Students in Sciences
Faculty in Humanities and Social SciencesGraduate Students in Humanities and Social Sciences

 

For the purposes of convenience, the report will use the term “humanities” from hereon to designate both the humanities and social sciences. For the first set of focus groups, we were successful in obtaining participants from three of these four categories, with the one missing demographic being graduate students from the sciences. As will emerge from the next section of analysis of findings, this omission did not compromise the quality of data obtained, which varies more by disciplinary orientation than by research level. In other words, there is no one set of quality metrics that graduate students across the humanities seemed to unanimously agree on, thus precluding the possibility that graduate students across the humanities and sciences would agree on a particular set of quality metrics. Rather, a graduate student in a particular discipline in the humanities, say, English, is more likely to agree with a faculty member in that discipline. However, we sought to rectify this imbalance by inviting several science graduate students and recent graduates to one of the focus groups conducted in Spring 2006. This section of the paper, as well as the next section, concern the first set of focus groups conducted in Fall 2005, while the details of the second set of focus groups conducted in Spring 2006 are presented in the final section.

Our response rate to invitations for participation in the first set of focus groups was high at nearly 81.5 percent. Out of 27 participants who confirmed participation, 22 showed up. As a result of the no–shows, two of the focus group sessions de facto turned into individual in–depth, semi–structured, open–ended interviews with the sole participant on the project topic. While the group dynamics that are the basis of the focus group structure were no doubt absent on these occasions, these sessions afforded an opportunity to get a single–user perspective on the quality metrics problematic. We believe the single–user perspective both complements and is contiguous with the plural group perspective obtained from the other focus group sessions. For the first focus group, we conducted an in–house focus group discussion with library staff, engaged with similar and overlapping questions on digital scholarship. One of the participants, in addition to working part–time in the library, is also a graduate student and, accordingly, we are treating this participant’s contributions in this session as data. The other participant is a full–time library staff member.

On the whole, the composition of the first set of focus groups has also been more skewed towards the humanities. Within the humanities, there is a skew towards American Studies but that is the result of a deliberate decision. The Woodruff Library’s Digital Programs and System Division which is implementing this project also has several projects pertaining to the area of American Studies, which have some research objectives in common with the Quality Metrics Project. Inviting faculty and graduate students with interests and expertise in American Studies was an ideal comparative frame for assessing overlapping concerns that traverse different digital scholarship projects.

As mentioned earlier, we also wanted to keep each participant category separate in terms of the composition of particular focus groups. This was largely followed, barring one focus group session, where participants comprised two faculty members and one graduate student. In terms of range of disciplines and research areas, we were able to obtain a fairly diverse group of scholars at both the graduate student and faculty level. Listed below in Table 1 is the composition of each focus group in terms of number of participants, research level, disciplinary orientation, and more detailed area of research specialization (where specified). Table 2 also presents some aggregated information on these characteristics of focus group participants.

 

Table 1: Composition of participants in each focus group, Fall 2005
Focus group numberDateNumber of participantsResearch levelMeta–disciplinary orientationDiscipline and area of specialization
117 October 200521. One full–time library staffNot applicableNot applicable
   2. Graduate studentHumanitiesInterdisciplinary Studies
2.25 October 200511. Graduate studentHumanitiesReligious Studies, Hinduism and Judaism
3.1 November 200531. Graduate studentHumanitiesInterdisciplinary Studies, American Studies
   2. Graduate studentHumanitiesInterdisciplinary Studies, American Studies
   3. Graduate studentHumanitiesInterdisciplinary Studies, American Studies
4.8 November 200511. Graduate studentHumanitiesTheology
5.15 November 200541. FacultyHumanitiesInterdisciplinary Studies, American Studies, Southern Studies
   2. FacultyHumanitiesInterdisciplinary Studies, American Studies, Southern Studies
   3. FacultyHumanitiesEnglish, Modernism
   4. FacultyHumanitiesInterdisciplinary Studies, Classics
622 November 200521. FacultyHumanitiesHindi, South Asian Studies
   2. FacultyHumanitiesPhilosophy
7.6 December 200531. FacultyScienceComputer Science
   2. FacultyScienceBiology
   3. FacultyScienceOphthalmology
8.13 December 200531. FacultyHumanitiesTheology
   2. FacultyHumanitiesTheology
   3. Graduate studentHumanitiesTheology
9.13 December 200531. FacultyHumanitiesHistory
   2. FacultyHumanitiesEnglish
   3. FacultyHumanitiesEnglish

 

 

Table 2: Aggregated details of composition of focus groups
A — Variation of focus groups by meta–disciplinary orientation
Humanities only groups 8
(including 2 single–participant groups and 1 group with library staff and graduate student)
Science only groups1
Total number of groups9
B — Variation of focus groups by research level
Faculty only groups4
Graduate student only groups4
(including 2 single–participant groups and 1 group with library staff and graduate student)
Mixed group1
(2 faculty and 1 graduate student)
Total number of groups9
C — Participant composition by meta–disciplinary orientation
Humanities faculty11
Humanities graduate students8
Total number of humanities participants19
Science faculty3
Science graduate students0
Total number of science participants3
D — Participant composition by meta–disciplinary orientation
Humanities faculty11
Science faculty3
Total number of faculty participants14
Humanities graduate students8
Science graduate students0
Total number of graduate student participants8
All participants22

 

 

++++++++++

Analysis of Data and Findings, Fall 2005 Focus Groups

This section presents the framework for the analysis of data collected through the focus groups as well as the findings that emerged from the analysis. A key practical objective of the Quality Metrics Project is the creation of an effective and flexible search and discovery system, sensitive to the requirements of diverse academic communities and deployable across a wide range of library, academic, and research contexts. This imperative has been used as a frame and guiding principle in assessment of the data. The goal is to identify key elements of a narrative offered by academic communities about their assumptions and expectations regarding quality in search retrieval and display systems, such that these elements can translate into practical features and characteristics of a working prototype system.

Additionally, the research imperatives underlying the project were articulated in the form of explicit research hypotheses, against which findings could be assessed and compared. These hypotheses may be expressed as follows.

  1. The digital library setting is different from the general Web setting because it has richer metadata, more focused purpose (e.g., by discipline or topic), and carries more information about a particular scholarly community.

  2. However, metasearch is still useful (and highly demanded) as a paradigm in this context.

  3. Attributes of digital library information, either explicit (encoded in metadata) or implicit (needing to be extracted by analysis/data mining) convey information about the quality or fitness of resources with respect to each inquiry.

  4. This quality information can be used to better organize search results to make the search–centric inquiry process more efficient and fulfilling for users.

  5. Notions of quality will be subjective, varying from digital library to digital library, sub–community to sub–community, individual to individual, and even inquiry to inquiry by the same individual.

  6. Despite this apparent requirement for sophistication and complexity, a metasearch system could be built that affords the utility described above by exposing and allowing manipulation (either by digital librarians or end users) of quality indicators and their bearing on results organization (i.e., customization of quality metrics in metasearch).

This progressive chain of hypotheses, informed by our general experience, intuition and past studies under the MetaScholar initiative, also guided our investigation for this initial round of focus groups. Two other important points regarding the focus group analysis need to be clarified here.

First, while one of the principles of the analysis is to look roughly or approximately at “consensus” among academic communities on the issues under discussion, it has been considered equally important to pay attention to the disagreement, diversity of opinion, and the multiplicity of opinions among focus group participants. The nature of the data in focus groups is such that “consensus” cannot quite be accurately measured in an empirical sense. For example, while five participants may appear to agree on a substantive point, one participant might clarify the rationale for his/her view with a detailed explanation while the other four might signal their agreement with merely a nod or a “yes.” The question that arises here is whether, in such a scenario, the perspective of each of the five participants should be given equal weight. Rather than seek to establish consensus in this sense on point–by–point, specific or detailed matters, we have found it more useful to identify “consensus” in terms of general concerns that appear to broadly be shared among several participants. In conjunction with this broad sharing of concern among participants, we may recognize the range of views expressed by participants as equally legitimate in their claims.

Secondly, it should be noted that the two central investigators, the authors of this paper, who have analyzed the focus group data in depth also bring distinct perspectives to the reading and analysis of the data, reflecting their different academic, research, and professional backgrounds. This internal diversity of perspective, we believe, enhances the quality of the analysis as well as the richness of the findings.

The findings are divided into two broad categories, each represented in a subsection. The first category present a thematic explication of key findings, which may be seen as the fundamental constituent elements of the narrative offered by academic communities about search. The second category presents the ramifications of the findings for system design and implementation.

A. Thematic explication of key findings

  1. Affirmation of Need for Metadata, Quality Indicators, and Metasearch as a Viable Method and Strategy

    The focus groups confirmed the validity and viability of the basic idea and most fundamental assumptions of the Project. Participants agreed with and emphasized the need for good metadata. There appears to be agreement on the necessity for metadata across different kinds of objects (corresponding to different kinds of records), whether books, journal articles, archives, general Web sites, and other cultural representations and items such as pamphlets, posters, and films. There also appears to be agreement on the need for metadata for records of objects across different formats, whether text, static images, audio, or video. Similarly, participants affirmed the need for contextual information and metadata on records on non–English content. Most scholars are actually using resources from all sorts of “realms,” including the Web, the library, professional organizations, and the mainstream press. Yet, scholarly metsearch portals rarely draw upon all of these realms.

    There is widespread use of commercial/general metasearch systems like Google, Google Scholar, Amazon A9, Amazon’s book search, and so forth. These are often used in preliminary and exploratory ways, such as to explore topics in which the scholar is not an expert, to look up general “encyclopedic” information, or to retrieve full text copies of items already located in academic search engines. In other words, the point to note is that the use of these sources does not necessarily indicate satisfaction or agreement with the principles by which the search systems organize, rank and display results.

    Participants also confirmed that there is an important need for sorting results of search retrieval by quality. It should be noted, as shortly described below, that participants’ understanding of quality was contingent on a complex constellation of factors and, consequently, varied significantly. For our purposes here, however, the following three points are relevant:

    a) The general project hypothesis about users bringing expectations of quality to the search exercise was confirmed;

    b) This general confirmation broadly held across graduate students and faculty members, as well as across the sciences, humanities, and social sciences; and,

    c) Participants confirmed an association between quality and metadata. In other words, participants’ responses implied that quality (or fitness) of records could be communicated through effective use of information available as metadata.

    Participants also expressed dissatisfaction with current library as well as commercial search systems, interfaces, and databases, in meeting academic search needs. Particular objections depended upon the immediate task and nature of inquiry. Objections covered issues such as lack of comprehensiveness and/or efficacy, lack of transparency, non–discrimination between quality of search/results and records, unclear logic of system such that participants are unsure of what to expect in terms of content and form, and so on. As one participant, a professor in philosophy observed, in describing Philosopher’s Index: “its kind of a scandal among philosophers that Philosopher’s Index does not find things that are out there.” A graduate student in religion and South Asian studies noted, “What I have found from library databases is that I have to put in a lot of time to get what I want because I don’t necessarily know what these databases are going to throw up. For example, there’s lots of stuff about South Asia that might or might not be classed under religion or anthropology.”

    Participants generally endorsed the metasearch strategy as a viable method and strategy to achieve these objectives. However, participants did not necessarily view metasearch as the only, natural or inevitable solution to the problem of search retrieval and display. In other words, it is possible that participants can envision alternate solutions that are specific to their disciplinary orientation, area of specialization, and searching needs. However, no such alternative that would work across disciplines and research levels, and could thus substitute for metasearch as a systemic principle for structuring search and retrieval, emerged from the focus group discussions. Metasearch, accordingly, is seen as a useful, if not the most salient paradigm for search and retrieval, but existing Web/metasearch solutions fall short for comprehensive scholarly inquiry. This manifested as lack of comprehensiveness, poor/opaque ranking, and fragmentation, among other complaints.

  2. Contingent, Varied, and Competing Definitions of Academic Quality

    The discussion on quality metrics and indicators as principles by which to retrieve and display search results was directly related to the broader question of academic quality. As such, participants expressed and shared their views on the nature, characteristics, and complexities of academic quality and the problems associated with evaluating academic quality. While this discussion on academic quality often verged on the philosophical and abstract, it, nonetheless, has clear practical implications in that it points to the limits as well as license of the search system we are developing. It also points to the crucial fact that “quality” as defined for academic search purposes may not necessarily concur with “quality” as defined for more general search purposes.

    One participant, a professor of ophthalmology, in fact, questioned the very idea of using quality as an evaluative principle or conceptual frame for a project such as this one, suggesting that it might be a misleading term. As his views reflect, quality is very much a shifting and unstable signifier. In his words, “I would say ‘quality’ is the wrong word. It is more of a reliability issue. Is this an academic source? For some things, even that is not relevant. But is this a source that puts an effort into this [i.e., into what is presented] or is this something that someone is writing off the top of their head. That’s a huge difference. If quality is utility, that’s worthless to me. Even if quality is a true measure of the worth of a work, even that is not useful to me. Sometimes bad work has nuggets in it or useful information in it. Like a method that someone did nothing useful with could be useful to me. So I don’t see quality as something that I really want to spend a lot of effort searching for. I want to search for things that are relevant and I want it to come from sites that I’m not going to laugh at once I go to [them].” The generalizations offered here below, accordingly, are subject to the diversity of participant views on the topic.

    On the one hand, there appears to be a partial consensus across communities on the markers of academic quality. Academic quality is associated with the credibility and legitimacy of academic institutions, organizations, journals, presses, and the like. What may be termed the “academic brand equity factor” provides some indication of the value of an item retrieved through search. In this regard, we asked some participants in several groups if they would consider Web sites rated by professional associations, such as the American Historical Association, as credible academic sources of the order of journals or books. These participants did not necessarily confirm that they would cite such ‘credentialized’ Web sites as evidence (akin to citing an academic book or journal article). However, they did specify that they would trust such Web sites more than academically unaffiliated Web sites. One participant, a professor in computer science, mentioned that Web sites are useful as preliminary research resources that can, in turn, point to articles and books that could be cited in academic papers. The same professor also mentioned that Web sites (and the World Wide Web) are useful for locating information that may not be found in citable form in journals or books because such information belongs to the domain of long–established fact. ‘Credentialized’ Web sites could be a useful resource in this regard. Participants also mentioned that ‘credentialized’ Web sites could be valuable teaching tools, and that they would recommend such sources to their students.

    The partial consensus about markers of academic quality is, somewhat paradoxically, complemented by diverse understandings of academic quality as well as of “fitness” of records. The diverse understandings reflect highly specific information requirements of different participants, based on a combination of meta–disciplinary orientation, discipline, area of topical specialization, as well as the immediate task at hand. Quality as defined for the purpose of a preliminary exploratory search on a research topic might thus be constructed quite differently from “quality” for a search for, say, academic journal articles that address that topic in detail. Understandings of quality also reflected participants’ views about the strengths and limitations of processes of academic legitimacy. The range of varied — including competing — understandings of quality are expressed well in the following statements offered by two professors. A professor of American Studies pointed out that organizing the search retrieval and results display into categories such “peer–reviewed journals” and “books,” — as we had done on our mock–ups based on a search for the phrase “civil rights movement” — runs a certain risk. The professor observed, for example, that “a lot of interesting work will not show up in peer–reviewed journals,” and that “this may reinforce the very conservative base of peer review.” “Non–indexed works, films” as well as objects such as “pamphlets, leaflets, relevant for the civil rights movement” may not fit in the standard academic categories, although the latter might be found in archives. Also, structuring the search around standard categories of academic sources and well–known sources (whose “quality” is indicated, for instance, by number of citations) might implicitly devalue other kinds of knowledge and information. As the professor described it, “I’m always looking for the new directions in which things are going. It seems that one can run into problem of circularity here — you wind up referring to the same limited or defined body of information that every one is using. That’s not so much what I’m interested in as the new stuff, which is not cited.”

    On the other hand, a professor of history described the lack of peer credibility as a major problem in using Web sites as a research resource. The professor stated that “there’s really no counterpart to the peer review process in publishing” for Web sites. If something has been published, “at least you know it has been through some kind of vetting procedure.”

    One other important finding about quality is that the understanding of quality (as well as fitness of records) depends more on meta–disciplinary and disciplinary orientation (sciences, humanities) than on research level (faculty or graduate students). In other words, inasmuch as there might be consensus among communities about what counts as “quality” of a record, that consensus is likely to be formed on the basis of the disciplinary and meta–disciplinary affiliation of members of the communities and not on the basis of communities defined by research level. It is not clear that there is an understanding of quality that applies to graduate students across disciplines that is distinct from any understanding of quality that applies to faculty members at large across disciplines.

    The implications of the fact of diverse understandings of quality among academic communities for a search system warrants some comment. The diversity of opinion, including competing views or notions of quality, confirms the necessity for an initiative such as the QM Project and an effective metasearch system. It points to the need for a system that can provide/enable or effect transparency, user empowerment, user choice, and flexibility in search retrieval and display. As we will now describe, the focus group data revealed that participants defined quality in terms of these characteristics or attributes, in the context of the search exercise.

  3. Quality as Transparency

    A point repeatedly stressed by participants was that the search system should be self–explanatory and transparent with regard to all aspects of its functionality and interface. Participant feedback suggested that a system that clearly explains its limits and license, clarifies its reach and scope in terms of resources assessed, and communicates the logic of retrieval and display provides users with invaluable information to contextualize search output results and make judgments about the quality of those results. An emphasis on transparency can also help distinguish the QM system from existing search systems, such as Google or Google Scholar, which for proprietary, commercial or other reasons do not offer users a full explanation or transparency about various aspects of the search process. The emphasis on transparency is perhaps the most emphasized point we heard, and is a major failing of existing offerings across the board.

    The concerns about transparency were expressed directly as well as often implicitly or obliquely. In addition to the concerns that were categorically presented by focus group participants, a close reading and interpretation helped identify several implicit concerns about transparency in the search process. These concerns may broadly be categorized and expressed in the form of the following questions and hypothetical examples that illustrate the corresponding questions. Please note that these hypothetical examples are not necessarily provided by participants themselves. Particular views and comments offered by participants are clearly indicated where invoked.

    a) What was the universe of information being mapped? For example, if one used X search engine to search for a journal article on the topic of “postmodernism,” it was important to know how many databases had been covered by the search engine, how many journals had cumulatively been searched, what other sources had been examined, and so on? In sum, how comprehensive was the search for journal articles on “postmodernism,” given the theoretical existence of a finite, if vast, number of journal articles on the topic of “postmodernism.” With respect to their use of search engine X, could a scholar justifiably make the claim that they had searched all articles on the topic of postmodernism available on the World Wide Web? As a participant, a professor of American Studies pointed out, a single record or item of evidence could completely change a research finding, conclusion, or profoundly alter an academic thesis. Moreover — and this was a serendipitous, unexpected finding — users wanted to know the connections between different components of the universe of information about a topic. Relational information is very much in demand; e.g. to reviews, to categories, to and from citations, to different editions, and so forth. This kind of network information allows navigation that seems to be closely interspersed with search in the wider inquiry process.

    b) What proportion of the universe of all information on the relevant topic was represented by the universe of information mapped on the Web? For example, if one was searching journals in political science for the phrase “theory of the state,” and the search system conducted this search in the content on 25 journals in the field of political science, it was important to know that there are also, say, 50 journals in the field of political science that are not being searched by the system because they are not online. Such information is invaluable because it bears directly on claims that scholars would make, for example, about theoretical comprehensiveness or the exhaustiveness of a literature survey. As a graduate student in American Studies observed, “One thing I have to say is that with the rise of all these other search engines, there is just so much more information available. Things like Lexis–Nexis, which offer all these newspapers and articles online. For me the challenge is dealing with all the information and making sure I have all my bases covered. I’m interested in seeing if something I am working on, somebody else has published something recently on it.”

    c) On what basis were particular records floated to the top, when organizing the search by any particular criterion or combination thereof? Need for clarity about terms, logic of organization and retrieval, functionality of each feature. In this regard, the investigators asked participants their opinion on their views of the logic of both Google and Google Search in retrieving and displaying results. Participants indicated that Google and Google Scholar sometimes worked satisfactorily but by no means were these systems uniformly reliable or predictable with regard to retrieving/finding records on academic subjects and topics. As a professor of English commented, “I never quite know what I am looking at on Google Scholar.” Accordingly, a search system that foregrounds a commitment to being self–explanatory as a key attribute is likely to immediately resonate as credible with academic communities. In one focus group, the investigators explained how a text (for example, the Narrative of the Life of Frederick Douglas) that is relevant to — and shows up on a search for — the topic of “civil rights movement” may not even have a single instance of the term. The response of a participant, a professor of theology, is worth viewing in detail here since it illuminates several questions and issues of concern to members of academic communities. The professor stated that: “That is a good example. Suppose in the 1960s, the Narrative of the Life of Frederick Douglas was the touchstone text that everyone went back to for the civil rights movement, that everyone cited over and over again — then it would get a very high rating even it it does not say civil rights movement anywhere in the topic or even in the LoC [Library of Congress] subject categories.” The professor also stated: “I’m most concerned about [use of Web resources by] ... students because they really do use Google as their first line of research and a lot of the stuff they come up with is really bad. Because, the things that come up the highest [on Google] on the topic, they’re really out there.”

    For QMSearch, the implications of participant feedback about dissatisfaction with existing search systems are that each feature and its corresponding term should be explained clearly on the interface. For example, users should be able to find out what exactly is meant by “content–query similarity,” if retrieved records are being listed by that principle. Similarly, according to one schematic of organization of “domain,” records may be divided and displayed by the kind of source they are retrieved from, such as “peer–reviewed journals,” “archives,” “books,” and so on. In this situation, it would be very important for users to know on what basis the category of “domain” has been selected as a principle of retrieving and organizing results, and, concomitantly, what is the rationale for the constituent sub–categories. Also, it would be useful for users to know the rationale for the search intelligence in retrieving and displaying records for each sub–category. For example, would a search for a book on the topic of “Byzantine art” on QMSearch search for every book that had an ISBN number? In such a case, the attribute of possession of an ISBN number would mark the limit of the search and bound the universe of objects searched. As may be discerned, this point closely relates to a) and b) above. This in fact, was one of the unexpected findings. There is a very strong desire on the part of participants to see “under the hood” of the search engine, knowing how ranking is done and what it means, even aside from manipulability. Even if a user can select between “popularity” and “rating,” they want to know what goes into popularity and how rating is calculated (and where the ratings are derived from).

    d) Trace or record of information about objects that do not exist anymore. One interesting suggestion offered by a participant, a graduate student in American Studies, was that it would be helpful for a search system to find and list any information about objects that do not necessarily “exist” anymore. This somewhat paradoxical — seeming necessity may be explained as follows, in the student’s own words:

    I have a comment bordering on the philosophical ... .[I]s there any kind of space for records of things that don’t exist — things that aren’t available. For example, I’m doing research on films and they are all out of print. There are about 30 films that are all out of print. I have tried inter–library loan, eBay; these films have essentially been wiped off the face of the planet and you can find them. But should there not be some kind of record that they exist or are out there or once existed?

    The implication of this argument is that it is valuable to scholars to access metadata for objects, even if those objects are not available, not accessible, or even if they do not exist anymore. Correspondingly, it is worthwhile for search systems to provide access to this metadata for members of academic communities.

    While, all these objectives would be difficult to fully achieve in any search system, nonetheless, they can operate as important guiding principles for conceptual and technical development of the QM system. One way to utilize these findings is to see them as long–term objectives, which can, in turn, structure a program of ongoing development. Over a period of time, as the world of digital scholarship itself evolves, search systems can, through iterative improvement, come close to realizing these objectives in as near a state of completeness as possible.

  4. Quality as User Empowerment to Make Judgments about Quality

    Another key finding — logically consequential and related to the above point about transparency — was that participants wanted information to make their own judgments about the quality of records but did not want the search system to categorically make this judgment for them. User feedback was varied regarding the value of specific kinds of information or quality/fitness metrics well the variety of interfaces — this is in consonance with point 2 in this section that expectations of quality are highly contingent upon discipline, area of sub–disciplinary specialization, and immediate task at hand. A description of the kinds of information presented to focus group participants through the interfaces is in place here, before describing participant feedback on the same.

    The mock–up interfaces that were presented to participants offered hypothetical substantive evaluations of quality about records. For example, the records rated books and journal articles in terms of stars, from five stars to one star, with five stars representing the highest quality and one star the lowest quality. As explained to participants, the ratings were hypothetically conceptualized as computed through numerically weighting various factors such as academic peer comments, non–academic comments, number of times cited, and the like. Amazon’s system of user reviews and ratings was cited as a hypothetical model and reference point for the substantive academic rating system, for the purpose of initiating discussion.

    Users were also asked about the value of quality indicators of academic peer/community use and evaluation, such as the number of times a book/article was cited in other books/articles accessible to the search system (along with access to all specific instances of such citations) and all occurrences of search word or phrase on a document.

    The sets of mock–ups, for “postmodernism,” “globalization,” and “civil rights movement,” across their various iterations also included evidence of other quality and fitness indicators or metrics. These included the default organizing principle relevance (or query similarity), openness or the nature of access to the record (for example, downloadable in full–text or not, only abstract available, and so, on), format (text, static image, audio, video), domain or type of knowledge–object (book, journal article from peer–reviewed journal, archive, Web resource), cost (whether free, trial subscription, single–time downloaded, institutional subscription required).

    a) Rejection of Quantified Indicators of Substantive Quality. There was a categorical rejection of the value — and, of the very possibility — of substantive quality indicators presented in the form of the ratings system, in particular as these applied to books and journals. One philosophical objection was to the notion of the quantification of quality in such a reductive manner. Participants variously described it as akin to commercial ratings for hotels or movies, and, consequently, singularly inadequate for academic objects. In this context, it emerged that participants did not necessarily view the Amazon model of reviewer ratings as indicative of the quality of the work. Participants mentioned that they did find Amazon.com useful as a preliminary source to find some information about a work, that user reviews on Amazon helped provide a sense of what a work was about, and that they could get some sense of the value of a work from the substance of the review. However, participant feedback suggests that the user review system on Amazon cannot be taken as a definitive credible system of evaluating the substantive merits of a work, such that it can or should be replicated or emulated with regard to academic work.

    At the same time, participants were open to the general idea of ratings (if not in starred format) for the category of domain of general Web sites. By general Web sites, we refer to online sources of information such as Wikipedia, that are distinguished from ‘credentialized’ academic print books, archives, journals accessible through the World Wide Web. As noted in point 2 in this section, participants suggested that they would use, say a history or mathematics Web site, as an initial or preliminary research source or would recommend it to their students if such a Web site were approved, vetted or ranked by a credible organization in the field, such as, for instance, the American Historical Association. A professor of theology noted that ratings and rankings would be useful for his students since they used the Web, especially Google, as their first line of research. He pointed out that “if there was a way for them [students] to get quick, easy information that was rated, I think that would help them.” Another participant in the same focus group, also a professor of theology, was critical of the starred ratings but affirmed its pedagogical value for students. The professor also elaborated on the possible bases of such ratings, and bears quoting at length:

    I would not want the ratings but for students there may be value to it. I find the ratings — stars — too much like movies, TV, it would put me off. ... To have an academic search engine that has 5 stars, 4 stars, 3 stars. My initial reaction to it is quite negative. But I see the point that it could be quite pedagogically useful to help students. But I would be worried about what the ratings are based on — if it’s simply reliability or usefulness. Reliability, okay, but usefulness — nobody else can actually make that judgment for me because I might be coming from a particular angle where something that seems unrelated is actually exactly what I am looking for as a supplemental point. But if you were rating the reliability, that I could be more interested in, as a rating, but I still would not want stars.

    Another key reason for the skepticism about substantive quality indicators pertained to the nature of academic research and output. Participants pointed out that: there is no necessary consensus in relevant academic communities about the value of a work; often the value and significance of a work becomes apparent years or even decades later; the journal a work of research is published in is not necessarily an indicator of its value, since even top–quality journals often publish articles whose value is in question; often the same work may contain elements later proven to be incorrect and elements that are “nuggets” of insight and brilliance. As a professor of ophthalmology noted, “Sometimes bad work has nuggets in it or useful information in it. Like a method that someone did nothing useful with could be useful to me.”

    The comments of a professor of biology on the difference between scholarly work and more general kinds of knowledge with regard to quality are pertinent here. The professor pointed out that “quality” of Web sites is a relevant issue in the context of a non–academic search, but does not bear on academic searching. As the professor described it, “[I]f someone is interested in some medical condition and they search, it will pop up all these medical sites. Some are good and well–researched while others [are not]. There it becomes much more of an issue — whether it is quality or not. But if I’m searching for some medical thing I’m interested in in terms of research, I’m not going to be looking at those Web sites anyway. I’m going to be looking at the scholarly literature.”

    We can unequivocally conclude that users reject a system that makes substantive and evaluative determinations of the quality of academic objects.

    The related question of who would rate academic works — and, by implication, the impracticality of such an undertaking — were additional reasons for skepticism about substantive quality indicators. We can unequivocally conclude that users reject a system that makes substantive and evaluative determinations of the quality of academic objects. Inasmuch as a search system proffers information on the value of an academic object, that value appears to rest in “fitness” indicators, such as whether an item is downloadable in full text or not. In terms of implications for the search system structure, this may be viewed as an understanding of quality as discipline– and topic–sensitive intelligence and of quality as user choice and flexibility. Each of these findings is now briefly described with relevant examples.

    b) Quality as Discipline– and Topic–sensitive Intelligence. Participant comments showed that the value of indicators varied across disciplinary and sub–disciplinary contexts, in addition to contingency on the task at hand, as noted earlier. There is no overwhelming affirmation, or, for that matter, no overwhelming rejection of indicators such as number of times cited and information such as cost of access. Some examples of the kinds of metadata and quality/fitness indicators specific to disciplinary and sub–disciplinary domains may be noted here.

    A professor of American Studies noted that with regard to searching for a particular filmmaker, it would be helpful if the search system (or principle for organizing search results) could “distinguish between major works and things like newspaper articles which just mention a line or his name.” We can conceptualize this as the principle of sorting between trivial and meaningful occurrences.

    Two professors, one in computer science and the other in ophthalmology, both brought up the question of search terminology and sensitivity of the search intelligence to disciplinary and sub–disciplinary vocabularies. For example, as the professor in computer science noted, he had to conduct several searches to ensure that the search system was indeed searching for the right kind of information. The professor in ophthalmology cited the example of searching on Google for “Usher’s Syndrome,” a medical condition. As he pointed out and demonstrated, this search, in addition to obtaining relevant results, also brings up numerous records on the musician Usher. The issue of system sensitivity to search phrases was also brought up by a graduate student in American Studies, whose research pertains to music. The inability of search engines to understand the nuances of phrases that may possess other normative meanings as well is a source of concern.

    Scholars — both graduate students and professors — who worked with non–English sources pointed to the urgent need for search systems to be more inclusive of, and sensitive to, the specificity of materials in various languages. Issues of concern included sensitivity to terminology, spelling, diacritics, as well as availability of sources. This point was affirmed by scholars working in a range of languages, including Hebrew, Greek, Sanskrit, Hindi, and German.

    For historians and literary scholars, rich metadata on archives would be invaluable. This metadata would cover both collection–level and item–level information.

    For scholars in working in interdisciplinary areas such as American Studies and film studies, metadata and quality indicators on images and cultural objects such as pamphlets, posters, and film would be very helpful. A graduate student of American Studies expressed frustration with Google’s image search as especially inadequate, given its lack of information about image selection and lack of options for organizing results. The student stated: “I teach a visual culture class at Emory and I use Google Image search a lot. I’m very frustrated with it. It is a lot less accurate than the regular Google Search. The only way you can break it down is into large, medium, and small resolution. I don’t know how to make it better.”

    For a graduate student in theology, the designation of records in terms of their secular or theological provenance respectively was important. Credibility of Web sites in theological matters was also stressed. Two professors of theology drew attention to the vast amount of dubious and unreliable material on the Web, which is often, however, presented in a very professional style to reinforce credibility.

    c) Quality as User Choice, Flexibility and Convenience. As the information above itself suggests, quality is constructed by users in terms of the system’s ability to provide them with choice, flexibility, and convenience. A professor of English pointed out, “When you are searching, you want the technology to disappear — I don’t want to think about the format or layout, just the results. I guess that’s the objective of all searching.” In consonance with the professor’s views, the efficacy of a system depends, in no mean measure, upon the degree to which it is intuitive to use.

    The principle of choice in terms of range of indicators as well as design and layout needs to be respected in the development of QMSearch. This is affirmed by the fact that a wide variety of indicators are seen as useful but in different ways by different people. Similarly, different interfaces are liked and disliked by various individuals.

    One important finding that can be drawn from the data is that users want the ability to switch metrics or indicators midway during a search. For example, if users are searching based on dividing results by “format,” they may want to switch to organizing results by “source.” Or if they are retrieving results by content–query similarity, they may want to switch to retrieving results by popularity. Similarly users want the ability to divide a set of results by a range of principles (whether full–text or not, less than 3,000 words or more than 3,000 words, secular or theological in provenance, downloadable or not). The multiplicity of options represents the multiplicity of uses of the Web as well as the multiple ontologies that users perceive in the Web. Key among these is the fact that users use the Web (and any digital tools and sources therein) as both a finding aid and as a research resource. In other words, the Web is both the means to an end and the end itself. Users additionally switch between these modalities of usage. This plenitude of search and academic use offered by the Web in general needs to be incorporated as a principle of structure and design in the search system.

    Users wanted the assurance that any such action of switching metrics would not cause the previous search screen to be “permanently” lost. In other words, the search system should have the capability to preserve the “memory” of the user’s sequence of actions. This fact or property, additionally, needed to be communicated clearly to users through the system. In this regard, we demonstrated the A9 system to users and also inquired whether text or icons would be more helpful in communicating this information (or communicating information in general). There is no clear consensus among users in terms of acceptance or rejection of either the A9 terminology or icon versus text. This is an issue that will have to be addressed through additional user studies as well as in terms of philosophy of design of any interface that is devised when the system is ready.

    The search system should be dynamic and time–sensitive. Databases keep adding material and objects are added everyday to the worlds mapped by the search engine. The system should be timely and relevant.

    To conclude, the findings of this section may be distilled as follows. Metadata is seen as crucial for faceting the results display. This is a major “win” over the generic/Web metasearch scenario, in that scholars actually desire to make use of rich metadata. Metrics that involve some sort of value judgment in their constituent indicators are seen as useful, but must be balanced with alternatives and given in a fully transparent way. Many of the findings boil down to the insight that scholars do not want to be told what is good, but rather to be given transparent tools to apply their own nuanced and inquiry–specific notions of what “good” is. This capability goes far beyond the metasearch systems that are available today, and seems to counter the commercial sphere wisdom that the search system should hold the user’s hand and “do all the thinking.” A wide variety of indicators are seen as useful but in different ways by different people. Similarly, different interfaces are liked and disliked by various individuals. Users do desire the ability to manipulate how ranking/organization are done, seeing this in basically the same light as “advanced search” is currently.

  5. Other Findings about Quality Metrics and Indicators

    Some other important findings may be noted here. In the opinion of scholars, the value of “popularity”–based ranking was limited. To exemplify, one participant mentioned that he/she sometimes prefers Altavista to Google because Altavista is based purely upon text similarity.

    There was not much complaining about ads or spam. It is unknown whether this doesn’t bother scholars or whether they are simply so used to the phenomenon that they do not think to mention it (however, it should also be noted that we did not really bring up this issue).

    Scholars are very interested in the prospects of metasearch as applied to archival research. Many of them echoed a desire to integrate more archival information, even if it was not in the form of completely digitized records. As the views of one participant, a professor of English, suggest, it is in new areas of knowledge such as archives that digital initatives can be very helpful. Terminology for metadata fields, quality indicators, and display facets can be very troublesome.

    Users want feedback about availability of library items (e.g. full text or not, at a subscribing library or not, ability to print or not, etc.). Searchers want to seem simplest interfaces first, but still have the ability to move to more advanced manipulations progressively. Most scholars use both basic and advanced searches.

B. Ramifications of Findings for Design and Implementation

The above findings do seem to suggest that we are on the right track with regard to core design. That is, our model of quality indicators and metadata attributes being treated interchangeably; both being sourceable for ranking of presentation in the form of dimensions and bins, seems to be at least a major component of any successful solution. However, the findings also suggest the following areas where modeling, design and implementation need to be expanded:

  1. Relational information

    Relational information is inadequately integrated and interconnected in the current model. Links of “primary” records to collections, categories, ratings, and alternate versions can be established with the current framework, but only with a lot of a posteriori implementation legwork by the DL deployer. To make this process easier, infrastructural provisions would need to be made, likely including some modelling of link graphs in Lucene’s index layer, and some exposure of this information to query syntax.

  2. Transparency

    Similarly, there is no methodical way in our framework to date to have the inner workings of the ranking exposed. Of course, the DL deployer can go as far as adding interface–level prose to do this, but once again this involves “extra” work on their part. We could begin to “mechanize” the transparency process by provisions like “rendering” organization specifications based on some transformation template (much like how the HTML output of search results is presently created). Thus, for any org spec the DL deployer can create, we can in theory provide an automated facility to “explain” it to users. However, the details of how to do this so end users would actually understand the org specs would likely require further research with scholarly partrons.

  3. Manipulation

    Our initial design placed the emphasis for building organization specs on the DL deployer. The end users would then be set to have one or more of these specs as the defaults for their “search profile,” with the ability to switch between them. However, it became clear in the course of the user studies that scholars were more willing and desirous to have lower–level manipulation of results presentation. We now believe that something akin to “advanced search,” but which applies to all aspects of organization (not just filtering) is needed. The natural solution is therefore some sort of “advanced edit” for org specs, where the user can alter any aspect of the org spec or create an entirely new one from scratch. Along these lines, we are currently investigating the integration of a generic schema–driven “metadata editor” to the editing of organization specs. Even if such an editor proves useful, however, a new problem is created of transforming the output of searches based on these ad hoc organization specs, as the presentation–layer stylesheets are currently static and bound to particular org specs.

Some of these issues have been explored through further development and focus groups conducted in Spring 2006, as explicated in the next section.

 

++++++++++

Further Research Development and Focus Groups, Spring 2006

This section presents information about the second round of the Quality Metrics focus groups as well as research development undertaken in January through May 2006, corresponding to the academic Spring semester 2006. As described above, discussions in the nine focus groups conducted in Fall 2005 were conducted with reference to a hypothetical quality metrics–based search engine. These focus groups were “focused” around mock–ups of such a prospective system, represented in Figures 1 to 3 above.

Based on our initial assumptions and revised designs after these focus groups, the QM technology team, specifically Aaron Krowne and Urvashi Gadi, constructed a working prototype of a quality metrics search engine, called QMSearch. To provide a basic check on the fitness of this system to task, as well as validation of our theoretical framework and understanding of the previous focus groups, we felt it prudent to conduct further focus groups utilizing the working prototype system.

Such a plan allowed us to capture a different kind of feedback, including direct and indirect interaction with the type of system we ultimately desire to provide. While the mock–ups of the initial focus groups were meant to explain our assumptions and spur discussion, sessions using the working system allow more concrete feedback and direct observation of usage. This kind of observation almost always leads in user studies to more finely–tuned real–world systems, helping to avoid the situation of a “good idea” that undermines its own social impact through poor execution. One should never underestimate the ability of software engineers to fail to model their target user community’s needs and behaviors on an a priori basis.

To provide such a capstone analysis with our remaining project resources, we decided to hold three our four additional focus groups (ultimately settling on three due to constraints) in the March–May 2006 timeframe. Two of these sessions utilized a QMSearch instance built on our humanities collection (the American South digital library), and one was based on an instance utilizing computer science content (the CITIDEL collection, from Virginia Tech). The prototype systems can be viewed at the following URLs: for American South content, see http://metacluster.library.emory.edu/quality_metrics/ and for CITIDEL content http://metacluster.library.emory.edu/quality_metrics/citidel/. We drew upon faculty, graduate students, and library staff for the focus groups. Additionally, all the library staff were recent or earlier graduate students from different departments in Emory. The profile of the attendees of the focus groups are indicated in Table 3 below.

 

Table 3: Composition of participants in each focus group, Spring 2006
Focus group numberDateNumber of participantsResearch levelMeta–disciplinary orientationDiscipline and area of specialization
123 March 200631. Full–time library staff (ex–graduate student)HumanitiesEnglish, Digital Scholarship
   2. Graduate studentHumanitiesHistory
   3. FacultyHumanitiesEnglish
2.24 March 200621. Graduate studentHumanitiesInterdisciplinary Studies
   2. Full–time library staff (ex–graduate student)HumanitiesTheology, Digital Scholarship
3.2 May 200641. FacultySciencesComputer Science/Programming Languages
   2. FacultySciencesComputer Science/Computational Neuroscience
   3. Library staff (recent graduate student)SciencesComputer Science
   4. FacultySciencesComputer Science

 

The results were positive for our working prototype and our model and approach in general, though prospective challenges were repeatedly noted due to the advantage of having the initial challenges somewhat met. In general, as with the earlier focus groups, the fundamental idea and objective of the Project was strongly affirmed by participants. These findings may be itemized as follows:

  • Users in general seemed to consider the QMSearch system workable and useful, and many of them were even excited about it. Many users voiced their particular “favorite” display of our four demo views, e.g., the “M9” column–based display faceted on source repository, or the “Super–Google” column–based display giving alternative rankings of the same set of resources.

  • Users continued to be interested in having additional details of what the metrics and indicators meant, and how they were constituted. This goes as far as wanting to know the chronological coverage of auxiliary data streams (such as logs from which “views” indicators are derived). This once again emphasizes a strong point of distinction between the academic search niche and the general Web niche.

  • We continue to receive support for having many different types of indicators. Not only do indicators clearly vary from user category to user category, but also user to user, and perhaps surprisingly, from inquiry to inquiry for a given user. Many focus group participants explicitly recognized all of these potential “value profiles” and appreciated the provision of having a diverse pool of indicators to support them. There was almost no proposed or demonstrated indicator that someone didn’t like.

  • Users were excited about the ability to configure metrics, indicators, and elements of “facetization” of display. We now even more strongly believe QMSearch addresses a simmering frustration with the static nature of most search portals. Thus a general release of our system could find brisk uptake because of this unrealized need.

  • Filtering, including that based on credibility indicators, continues to be emphasized in our focus groups.

  • Users in both the humanities and sciences grasped the idea behind the prototype system fairly intuitively, although some specific features and options had to be clarified. The pragmatic value of the new conceptualization of metasearch underlying the project and embodied in the system — outlined earlier in the paper — was accordingly affirmed.

  • Users appeared to be able to correlate the different “value profiles” or demo views with specific kinds of research and academic exercises contingent upon discipline and subtopic of research. Some of the library staff, all of whom are also ex–graduate students, also correlated the significance of the profiles to specific research–/scholarship–oriented exercises in their library work. Flexibility thus appears to be a desirable quality metric that the system should exemplify.

Challenges and tempering considerations were as follows:

  • There was some doubt about whether such a system could be a “first stop” inquiry destination due to the universal scope of outlets like Google, Google Scholar, JSTOR, or CiteSeer.

  • “Advanced search” and full configurability of displays (including metrics) as desired by users implies a novel editing and display–generating infrastructure, which itself could be a fully–fledged topic of research in order to develop fruitfully.

  • Exposure to various features of the system will need to be carefully organized, so as not to present too steep a learning curve. A high desire for customizability must be balanced with the need not to confuse users, both in terms of available manipulations and in terms of information overload in results records.

  • Interface design and functionality must address how to “locate” or “position” the QMSearch system in relation to existing systems like Google, Google Scholar, JSTOR, etc. The system should be similar and familiar that users feel comfortable trying it out, yet its differences, specificities, and advantages should also be clearly marked so that users are encouraged to use it.

  • One implication of user feedback is whether discipline–specific views or “profiles” would be helpful to specific academic communities. This feature could be provided at the deployer level, to customize the system in terms of discipline–relevant profiles. In other words, along with customizability of indicators and specific metrics, customizability of profiles and ability to add new profiles or views could itself be part of the system offerings.

  • One additional issue, although not discussed in the focus groups, to consider concerns metrics and indicators for non–English academic content. This would necessitate building the ability into the system to display characters in a range of languages, for example, Hungarian, Arabic, Urdu, Bengali, Sanskrit or Hebrew, as well as obtain, “read,” and display metadata from these various sources. This could be a topic for separate research and investigation.

Other interesting findings:

  • A number of surprising contrarian/meta indicators have come out of these sessions. For example, physical/digital availability, the presence or absence of publication or author as a credibility metric, or inverse popularity as a means of revealing novel items.

  • QMSearch could add a lot of value to image search by not only bringing metadata into the image search market, but also by turning useful informal indicators like pixel density (DPI) and overall image size into formal indicators for use in ranking and filtering.

To conclude, the work we have done to date can be considered a success, both in terms of discovering novel things about user needs in metasearch in general and in specific the niche of scholarly search; in terms of building a useful working prototype that could be extended to production; and, in terms of pointing the direction for future research in order to refine these findings and build upon them.

We have also gathered a significant amount of rich empirical data through the focus groups, which can be revisited to explore more specific and detailed issues pertaining to quality metrics, metadata, or even, more broadly, academic views on search technology. Our experience also confirms the viability of the focus groups methodology for a project of this nature, and suggests that the methodology can be successfully utilized for further projects undertaken within the digital library field. End of article

 

About the authors

Rohit Chopra is Visiting Assistant Professor at the Graduate Institute of the Liberal Arts at Emory University in Atlanta. His PhD dissertation research project examines the relationship between technology and nationalism in India from 1750 until the present, with a focus on expressions of nationalism among Indian communities on the Internet. Related research interests include the impact of new technologies such as the Internet on South Asian identities and socialities, the history of technology in colonial and postcolonial India, and nationalism and identity politics in contemporary South Asia. He has published articles in the journals New Media and Society and Cultural Studies as well as book chapters in edited collections. Rohit is also Web consultant for, and manages, several digital scholarship projects at the Emory Law School, including the Islam and Human Rights Fellowship Program (http://www.law.emory.edu/IHR) and the Future of Shari’a Project (http://www.law.emory.edu/fs). In 2005–2006, he was MetaScholar Fellow in the Digital Programs and Systems Division of the Woodruff Library at Emory University, in which capacity he worked as lead investigator of the Focus Groups component of the Quality Metrics Project.

Prior to joining Emory, Rohit worked as an information architect in the Indian Internet industry. As Web writer in the Web solutions division at rediff.com, India’s first and premier online portal, he was responsible for conceptualizing, developing content, and coordinating production for corporate Web sites and in–house content channels. He has also worked as a copy editor in an academic publishing house, Sage India, and as a freelance journalist for various print and online publications including Sunday and rediff.com.

Aaron Krowne holds an MS in Computer Science and a BS in Mathematics from Virginia Tech, where he was a research assistant at the digital library research lab (DLRL) working under Edward A. Fox on the CITIDEL (http://www.citidel.org/) digital library project. He currently is Head of Digital Library Research at Emory University’s Woodruff Library, in Atlanta, where he leads the research on grant projects in the MetaScholar initiative (http://www.metascholar.org/), including the Quality Metrics Project.

In 2001 Krowne co–founded the PlanetMath collaborative digital library and Web encyclopedia with a fellow student at Virginia Tech and an informal, worldwide group of mathematics students and enthusiasts. The site has since become one of the major educational digital library sites on the Web and continues as a non–profit organization to sustain and advance the PlanetMath community, free mathematical knowledge, and commons–based knowledge development itself.

 

Notes

1. Martin Halbert and Aaron Krowne, 2004. “Study of User Quality Metrics for Metasearch Retrieval Ranking,” IMLS NLG project proposal, at http://www.metascholar.org/quality_metrics/pdfs/Q-M-application.pdf, accessed 1 September 2005.

2. Metascholar: An Emory University Digital Library Research Initiative, at http://metascholar.org/, accessed 1 October 2005.

3. The investigators on the Emory team are Rohit Chopra, Aaron Krowne, Urvashi Gadi, and Katherine Skinner. The Emory team is also referred to in this paper as Emory investigators. Katherine Skinner and Rohit Chopra also presented information on the MetaScholar Initiative and findings from the first round of focus groups of the Quality Metrics Project at the pre–conference workshop, “Assessing the Use of Digital Resources Assessing the Use of Digital Resources,” on 15 February 2006 at the 2006 IMLS WebWise Conference, Inspiring Discovery: Unlocking Collections, held in Los Angeles. The presentation may be downloaded from http://www.getty.edu/about/institutional_research/downloads/QM_Webwise_06_final.pdf.

4. Halbert and Krowne, op. cit. See also the interim reports in the Metascholar Web site documentation section at http://www.metascholar.org/quality_metrics/documentation.html, accessed 31 May 2006.

5. The harvest-based metasearch model is discussed in more detail in Edward A. Fox, et al., 2003. “Harvesting: Broadening the Field of Distributed Information Retrieval,” Lecture Notes in Computer Science, volume 2924, Distributed Multimedia Information Retrieval: SIGIR 2003 Workshop on Distributed Information Retrieval (Toronto, Canada, 1 August 2003), Revised Selected and Invited Papers, pp. 1–20.

6. Halbert and Krowne, 2004, p. 4.

7. Officially designated as Focus Group Component, Study of User Quality Metrics for Metasearch Retrieval Ranking.

8. The exact composition of the focus groups and extent to which these objectives were achieved are described in the section of this paper entitled Demographics of Participants and Sample Composition.

9. Google at http://www.google.com, accessed September–December 2005; A9 at http://www.a9.com, accessed September–December 2005.

 

References

Edward A. Fox, Marcos A. Gonçalves, Ming Luo, Yuxin Chen, Aaron Krowne, Baoping Zhang, Kate McDevitt, Manuel Pérez–Qui ñones, Ryan Richardson, and Lillian N. Cassel, 2003. “Harvesting: Broadening the Field of Distributed Information Retrieval,” Lecture Notes in Computer Science, volume 2924, Distributed Multimedia Information Retrieval: SIGIR 2003 Workshop on Distributed Information Retrieval (Toronto, Canada, 1 August 2003), Revised Selected and Invited Papers, pp. 1–20.

Martin Halbert and Aaron Krowne, 2004. “Study of User Quality Metrics for Metasearch Retrieval Ranking,” IMLS NLG project proposal, at http://www.metascholar.org/quality_metrics/pdfs/Q-M-application.pdf, accessed 1 September 2005.


Editorial history

Paper received 4 June 2006; accepted 20 July 2006.


Contents Index

Copyright ©2006, First Monday.

Copyright ©2006, Rohit Chopra and Aaron Krowne.

Disciplining Search/Searching Disciplines: Perspectives from Academic Communities on Metasearch Quality Indicators by Rohit Chopra and Aaron Krowne
First Monday, volume 11, number 8 (August 2006),
URL: http://firstmonday.org/issues/issue11_8/chopra/index.html





A Great Cities Initiative of the University of Illinois at Chicago University Library.

© First Monday, 1995-2017. ISSN 1396-0466.