Social question–answering services such as Yahoo! Answers (YA) are becoming highly prominent venues for online information seeking. While their immense popularity indicates their success, there is a need to measure their effectiveness and how satisfactory information they provide to the information seekers. To study these questions of effectiveness and user satisfaction, we collected a large amount of data from YA. For operationalizing the constructs of effectiveness and user satisfaction, we considered the amount of time lapsed between a question being asked and answered, and the asker choosing an answer to be satisfying, respectively. Using data mining, we show that the majority of the questions on YA get at least one answer within a few minutes, however, it takes longer to receive an answer that satisfies the asker. We also demonstrate that the sooner an answer appears for a question, the higher chances it has being selected as the best answer by the asker.
With emerging online information sources, information seeking behavior is changing. A variety of of sources have emerged, including social or community question–answering (social Q&A) sites such as Yahoo! Answers (http://answers.yahoo.com/), AnswerBag (http://www.answerbag.com/), and WikiAnswers (http://wiki.answers.com/). A common and defining characteristic of these sites is that anyone can pose their information need on almost any topic as a question, and receive answers from the community of users that belong to that particular site (Harper, et al., 2008). In recent years, these services have become increasingly popular. Yahoo! Answers is reported to have more than 200 million users worldwide , with 15 million users visiting daily .
While social Q&A sites may be relatively new, asking questions to experts via an online form — digital referencing — has been around for a while. This area is considered to be an online or virtual version of traditional reference services, where a patron can receive information from reference librarians or other experts (Mon, 2000). Several works have tried to compare such digital referencing or expert Q&A services with social Q&A in order to draw their pros and cons (e.g., Shah, et al., 2008).
Rather than comparing with other forms of online Q&A services, in this paper we focus on understanding the effectiveness of and satisfaction with a typical social Q&A service, since it is considered to be one of the core research agendas in this area (Shah, et al., 2009). We choose Yahoo! Answers (YA) due to its scope and accessibility, examining its effectiveness in terms of retrieving information and providing satisfactory answers. Hence, we will address the following research questions:
- How quickly is a specific question answered?
- How quickly does a question receive an answer that satisfies the interrogator?
- What is the relationship between a satisfactory answer and its position (or rank) in a list of answers?
The remainder of the paper describes how we collected a large amount of data from YA, mining it to answer these research questions. We conclude with a few interpretations and implications of our findings.
To answer our research questions, we performed data mining on a large set of data retrieved from YA. This section describes our methodology for data collection, as well as a basic description of the data.
Collecting the data
We used YA’s Application Programming Interface (API, at http://developer.yahoo.com/answers/) support for collecting questions and answers data. However, APIs have a daily limit so collecting a large amount of data in a short period of time is not possible. We, therefore, ran our data collection processes for more than two years (between Fall 2007 and Fall 2009) to create a corpus of a reasonable size. This resulted in a collection of over 3,000,000 questions, with over 16,000,000 answers.
On YA, an interrogator can choose an answer that satisfies his or her information needs from a set of answers. This selected answer is denoted as the “best answer”, and with that, the question is considered “resolved”. We only collected those questions that have been resolved. In other words, interrogators of these questions had picked the best answers from a set of answers they received; hence, answers were not possible for these questions. This approach was selected because we wanted to look at the selected answer for each question, since having an answer chosen as the best answer marks a given question resolved. This creates one of the limitations of our approach to identifying effectiveness and user satisfaction. Since we only studied questions that were resolved, we only looked at situations where at least one satisfactory answer was provided, thus, making the service appear very effective. It is beyond the scope of this work, due to its methodology, to consider situations where no satisfactory answer was posted.
Description of data
Table 1 provides a summary of the data, where the number of questions and their respective answers are presented within the 25 subject categories of YA (http://answers.yahoo.com/dir/index). We do not know the actual size of the YA dataset, which constantly grows. We do, however, believe that our data is a reasonable sampling of it.
Table 1: Summary of Yahoo! Answers data. Category Number of questions Number of answers Arts & Humanities 155,034 748,097 Beauty & Style 155,625 922,294 Business & Finance 148,808 489,885 Cars & Transportation 152,045 584,804 Computers & the Internet 157,524 530,176 Consumer Electronics 151,241 476,891 Dining Out 98,599 597,079 Education & Reference 134,265 448,545 Entertainment & Music 130,064 1,207,108 Environment 90,235 556,815 Family & Relationships 136,315 887,881 Food & Drink 135,576 803,592 Games & Recreation 134,406 395,745 Health 115,402 423,612 Home & Garden 123,430 454,242 Local Businesses 97,664 250,407 News & Events 105,701 765,140 Pets 130,701 739,300 Politics & Government 133,020 1,026,122 Pregnancy & Parenting 128,634 969,644 Science & Mathematics 132,275 366,106 Social Science 114,450 500,394 Society & Culture 133,626 1,056,102 Sports 123,900 702,559 Travel 130,049 503,078 Total 3,248,589 16,405,618
We note that the category “Entertainment & Music” had the highest average number of answers per question (more than nine), followed by “Society & Culture” and “Politics & Government” (seven to eight). Questions in these categories are often seeking opinions rather than a specific answer .
For each question, we collected its subject, content, category, username of the interrogator, number of answers, and the time when the question was posted. For each answer, we collected its content, username of the individual providing an answer, rating (if there was any), and the time when the answer was posted. Given that the rating is posted only for the interrogator selecting a given answer, having a rating for an answer also tells us which was chosen.
This section presents our analysis of the data based on our three research questions.
1. How quickly is a specific question answered?
First, we looked at how many answers a given question received. Figure 1 depicts a scatter plot of number of questions–answers. As we can see, the majority of questions received fewer than 10 answers. In fact, based on our data — about 16,000,000 answers for about 3,000,000 questions — we can derive that, on average, each question earned five to six answers.
Figure 1: Scatter plot of number of questions and answers. Each data point (represented with a cross) indicates how many questions receive how many answers.
Receiving five or six answers per question may seem encouraging, but a more important question is how soon one could receive at least one answer. This could, in a sense, indicate the effectiveness of YA. To answer this question, we computed the time lapse (in minutes) between the posting times of a question and its first answer. The results are summarized in Figure 2.
Figure 2: Time lapse between a given question and the first answer to be posted.
We can see that more than 30 percent (965,867 out of about 3,000,000) of the questions received their first answers in less than five minutes, and only about eight percent (259,120 out of about 3,000,000) of the questions took longer than one hour to secure their first answers. In other words, more than 90 percent of the questions received at least one answer within an hour. Given that millions of questions are posted on YA, these statistics are indicative of the effectiveness of this system. Note that we did not collect unresolved questions from YA, so there may be a number of questions for which no satisfactory answer was received. We have observed that many of these sorts of questions ask for opinions or advice rather than a solution or specific information; thus, by nature, they are irresolvable. Such observations are supported by Harper, et al. (2010), reporting that only about 34 percent of questions on YA are factual questions.
2. How quickly does a question receive an answer that satisfies the interrogator?
Interrogators select answers from a set of answers that best fit their information needs. Liu and Agichtein (2008) regard this as an indication of user satisfaction. When selecting an answer, an interrogator can also rate that answer on the scale of 1 to 5. Figure 3 illustrates the rank distribution for answers in our dataset. A large portion of the answers have rating=0 (no rating) as they were not selected. Among the selected answers, most received a rating of 3 or higher. This has been regarded as the indication of high quality answers by Shah and Pomerantz (2010) . In other words, interrogators were not only satisfied with a given answer; they also found these selected answers to be of high quality.
Figure 3: Distribution of answer ratings.
Let us consider these answers with respect to time. While receiving a quick response for a posted question may demonstrate the effectiveness of the system, it still remains to be seen if these answers satisfy the needs of a given interrogator. To analyze this, we looked at the time lapsed between the time a question was posted and an answer was given and selected. This is shown in Figure 4.
Figure 4: Time lapse between a given question and selected answer.
It is notable that about one–third of the best answers took more than an hour to appear. This is interesting to see in contrast with Figure 2. We noticed that while many answers are posted within a few minutes of a given question being posted (Figure 2), the answers that satisfy an interrogator takes longer to appear (Figure 4). The appearance of better answers after the first 5 minutes may be attributed to the fact that a delayed answer may use earlier posted information to generate a more refined or comprehensive answer. Thus, a delayed answer may have a higher likelihood of being selected as the best answer. A large number of best answers arriving after 60 minutes may likely related to cases when an interrogator does not select an answer in a reasonable time, and the community decides on an answer by voting.
3. What is the relationship between a satisfactory answer and its position (or rank) in a list of answers?
While answers for a given question may be posted by a variety of individuals, each answer potentially can be influenced by answers already posted. Let us examine which answers, in a list of answers, were selected to be the best.
Figure 5 illustrates rank distribution for the best answers. We can see that the answers that appear at rank 1 (first answer) for a given question had a greater chance of being selected as the best answer. We also see a tall peak for rank=6. On average a question on YA receives five to six answers; hence the last answer may be chosen as the best answer. It should also be noted that once an interrogator selects an answer to be the best, the question is considered to be resolved and closed. We collected only resolved questions and their answers, which also explains the peak for rank=6. We plan to explore this phenomenon in future research.
Figure 5: Rank distribution for best answers.
This finding that the higher an answer appears, the better chances it has for being selected as the best answer was also noted by Shah and Pomerantz (2010).
Let us now examine the results as displayed in Figures 2, 4, and 5. Figure 2 essentially tells us that the first answer to a question appears quickly, however Figure 4 indicates that the eventual selected answer may not appear immediately. Figure 5 indicates that the best answers tend to appear higher on a list of answers. This can be explained by noting that there are many questions for which answers do not appear immediately (that is, for more than an hour). When the answers eventually appear, they are selected as the best answers at early ranks. Figure 6 illustrates rank distributions for best answers that appear after more than 60 minutes of the initial postings of questions. Relative to Figure 5, we note that there is a gradual decrease in the number of best answers as we go down the ranks, without a spike at any particular rank. This may indicate a level of difficulty for these questions. On the other hand, there are many questions for which answers appear quickly (within 5–10 minutes), and at some point the interrogator selects one as the best (on average, rank=6) and closes the question (Figure 5).
Figure 6: Rank distribution for best answers appearing after 60 minutes.
Combining these factors, one could even explain or evaluate question difficulty, and possibly improve content quality using the following guidelines:
- If a question has not been answered for over an hour, it could be difficult or poorly posed. Interrogators may be contacted in these cases to revise their questions.
- Interrogators could be informed about the average response time for questions akin to a given posted question. YA provides on–the–fly suggestions for similar questions already posted. This feature could be modified to inform interrogators about response rates, encouraging patience and deliberation before selecting the best answer.
Social or community Q&A services, such as Yahoo! Answers, are significant since a large number of online information seekers are increasingly using these resources. Individuals providing answers on these sites are in a way fulfilling the role of traditional reference librarians or field experts. It is still remains to be seen how their answers compare in quality with those from traditional reference sources. However, there are ways to test the effectiveness of these services in terms of providing information seekers satisfactory, if not high quality, information.
In this paper we addressed questions about effectiveness and user satisfaction by mining data collected from YA. Our analysis indicated that YA provides a very effective platform for one to post a question and secure an answer quickly. However, a satisfactory answer may take longer depending on the difficulty of the question. While we measured user satisfaction by examining the interaction of interrogators with YA, other approaches have been utilized. For instance, Kim ,et al. (2007) and Kim and Oh (2009) examined comments by interrogators in order to understanding perceived relevance and satisfaction. Finally, the methodologies used in this study provide some evidence for evaluating question difficulty and content quality.
About the author
Dr. Chirag Shah is an assistant professor in School of Communication & Information (SC&I) at Rutgers University. He received his Ph.D. from the School of Information & Library Science (SILS) at University of North Carolina at Chapel Hill. His research interests include various aspects of interactive information retrieval/seeking, especially in the context of online social networks and collaborations, contextual information mining, and applications of social media services for exploring critical socio–political issues.
E–mail: chirags [at] rutgers [dot] edu
1. http://yanswersblog.com/index.php/archives/2009/12/14/yahoo-answers-hits-200-million-visitors-worldwide/, accessed 4 February 2011.
2. http://yanswersblog.com/index.php/archives/2009/10/05/did-you-know/, accessed 4 February 2011.
3. There are many questions in these categories asking for opinions, generating thousands of “answers”. We did not include these opinion-seeking questions in our dataset since were not “resolved”.
4. Dividing up five–rating responses to two categories (in this case, high quality and low quality) is often done in literature on usability. See, for instance, White and Kelly (2006), and Liu and Belkin (2010). The rationale behind this approach is that the five–point scale may be appropriate for individuals to respond, but too fine for a meaningful quantitative analysis.
F.M. Harper, J. Weinberg, J. Logie, and J.A. Konstan, 2010. “Question types in social Q&A sites,” First Monday, volume 15, number 7, at http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/view/2913/2571, accessed 4 February 2011.
F.M. Harper, D. Raban, S. Rafaeli, and J. Konstan, 2008. “Predictors of answer quality in online Q&A sites,” Proceedings of the 26th Annual SIGCHI Conference on Human Factors in Computing Systems, pp. 865–874.
S. Kim and S. Oh, 2009. “Users’ relevance criteria For evaluating answers in a social Q&A site,” Journal of the American Society for Information Science and Technology, volume 60, number 4, pp. 716–727.http://dx.doi.org/10.1002/asi.21026
S. Kim, J.S. Oh, and S. Oh, 2007. “Best–answer selection criteria in a social Q&A site from the user–oriented relevance perspective,” Proceedings of the American Society for Information Science and Technology, volume 44, number 1, pp. 1–15.http://dx.doi.org/10.1002/meet.1450440256
J. Liu and N.J. Belkin, 2010. “Personalizing information retrieval for multi–session tasks: The roles of task stage and task type,” SIGIR ’10: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 26–33.
Y. Liu, J. Bian, and E. Agichtein, 2008. “Predicting information seeker satisfaction in community question answering,” SIGIR ’08: Proceedings of the 31st International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 483–490.
L. Mon, 2000. “Digital reference service,” Government Information Quarterly, volume 17, number 3, pp. 309–318.http://dx.doi.org/10.1016/S0740-624X(00)00046-0
C. Shah and J. Pomerantz, 2010. “Evaluating and predicting answer quality in community QA,” SIGIR ’10: Proceedings of the 33rd International ACM SIGIR Conference on Research and Development in Information Retrieval.
C. Shah, S. Oh, and J.S. Oh, 2009. “Research agenda for social Q&A,” Library & Information Science Research, volume 31, number 4, pp. 205–209.http://dx.doi.org/10.1016/j.lisr.2009.07.006
C. Shah, J.S. Oh, and S. Oh, 2008. “Exploring characteristics and effects of user participation in online social Q&A sites,” First Monday, volume 13, number 9, at http://firstmonday.org/htbin/cgiwrap/bin/ojs/index.php/fm/article/viewArticle/2182/2028, accessed 4 February 2011.
R.W. White and D. Kelly, 2006. “A study on the effects of personalization and task information on implicit feedback performance,” CIKM ’06: Proceedings of the 15th ACM International Conference on Information and Knowledge Management.
Received 1 August 2010; revised 30 January 2011; accepted 2 February 2011.
“Measuring effectiveness and user satisfaction in Yahoo! Answers” by Chirag Shah is licensed under a Creative Commons Attribution–NonCommercial–NoDerivs 3.0 Unported License.
Effectiveness and user satisfaction in Yahoo! Answers
by Chirag Shah.
First Monday, Volume 16, Number 2 - 7 February 2011
A Great Cities Initiative of the University of Illinois at Chicago University Library.
© First Monday, 1995-2014.