People are seeking more meaningful and customized information than what is obtained by keywordsbased queries and document retrieval through a search engine. In this paper, we look at a set of such services, referred to as social Q&A sites. With sites such as Google Answers, and primarily Yahoo! Answers, we attempt to understand various characteristics of user participation and their possible effects on the design and success of the site. We discuss these social Q&A sites by comparing their designs based on user participation and point out the effects and defects of each. We show that active user participation is the core component of these sites. We further analyze rich data collected from more than 55,000 Yahoo! Answers user profiles to understand the nature of user participation, and the quality of this participation. With our analysis we discover that Yahoo! Answers model implicitly encourages users to make an active contribution. An important contribution of the work reported here is the framework with which various factors relating to user participation in social Q&A sites can be studied.
An increasing number of individuals are using various online sources for their information needs. Search engines often act as the gateways to this online information. While these have become some of the most essential tools for online information seeking, a number of other sources have emerged in the recent years where people can seek information. Among these sources, questionanswer sites are getting increasingly popular and the number of users of these sites is increasing rapidly. These questionanswer sites, also called answers sites or social Q&A sites , allow an indiivdual to post questions that are typically answered by fellow users of a given site. Several studies have tried to understand search engine roles, behavior, and comparisons (Jansen and Spink, 2006; Vaughan and Thelwall, 2004), but there has not been a great deal of research about Q&A sites. A large portion of research in information science, and in particular, information retrieval, has focused on understanding relevance and retrieval in querydocument models. However, there is very little work that attempts to understand social Q&A models. For instance, examine the search engine market share as illustrated in Figure 1. Overall the results are not very surprising, with Google holding a large chunk of the market share, followed by Yahoo! and Microsoft. On the other hand, if we analyze the market share for Q&A sites, we see in Figure 2 that Yahoo! Answers has almost a monopoly with more than 96 percent of market share (Prescott, 2006).
Figure 1: U.S. core search engine shares as of September 2007.
Figure 2: Answers sites market share as of December 2006.
With social Q&A becoming more popular, it seems strange to see that Google, the leader in search engines, has almost no share at all in that domain. Indeed, Google had a service, called Google Answers (see http://answers.google.com/answers/), but it has failed and discontinued. Several reasons have been proposed for this failure. In this paper we argue that one of the most important factors influencing the success of an answers site is user participation. More importantly, we recognize the need to explore the patterns of user participation in order to understand social Q&A sites. Just as a successful search engine needs wide coverage of the Web, good indexing and retrieval, and an effective ranking function, a successful social Q&A site requires highly active user participation. Keeping this in mind, here are some of research questions that we are interested in investigating in this paper:
- How does user participation relate to the success of a social Q&A site?
- What does constitute as the participation and how to measure it?
- How can we measure user contribution to the community in a social Q&A site?
- How can we evaluate the quality of user participation and contribution?
We plan to address these questions by analyzing various user participation statistics related to Yahoo! Answers and Google Answers, the former one being a huge success, the latter a failure. We claim that user participation is the single most important factor behind the success or failure of a Q&A site and that any such service should be studied from the perspective of user participation. In turn, we provide a framework to facilitate such studies.
The rest of the paper is organized as the following. In the next section we present some background on social Q&A sites and their importance. The data collection process for our study is described in Section 3. These information, collected from Yahoo! Answers as well as Google Answers, are analyzed in Section 4 relative to the research questions noted earlier. Finally, we conclude the paper in Section 5 with some pointers to future work.
With the advent of the Internet, and the growing ease and convenience of seeking information online, the number of users exploring online sources for information has been growing rapidly. Going beyond specialized databases, Web pages, and digital libraries, these sources now also include relatively new media services such as Flickr (http://www.flickr.com) and YouTube (http://www.youtube.com), and formats such as blogs and wikis. The core strength of many of these sources is user participation, as users not only seek, but also create, compile, and share information. One such example is Wikipedia (http://www.wikipedia.com). In contrast to the traditional model of encyclopedia, information in Wikipedia is created by collaboration among many users.
Figure 3: Information seeking using a search engine.
Figure 4: Information seeking using a social Q&A site.
In this paper we shall focus on another of such recently emerged services called social Q&A. Social Q&A is a Webbased service for information seeking by asking natural language questions to other users in a network. The search strategies that it employs are quite different from traditional online searching. In traditional online searching, people use search engines as a gateway to access the content of Web sites or Web pages (Figure 3). People send queries and receive the search results that respond to their queries from search engines. They often need to go through the results and select documents relevant to their information needs. In contrast, in the context of social Q&A, people obtain personalized answers responding to their individual questions. Individuals post questions to the system and receive answers offered by other fellow users (Figure 4). Questions and answers are directly generated and dynamically updated by those users through their voluntary participation. Most social Q&A services are free and open to the public. The topics of the questions vary from personal problems to school projects and business problems. Since it allows anyone to ask and answer questions, the levels of knowledge, expertise and experiences of the questioners and answerers are varied.
The idea of asking questions to peers and sharing information in a social network over the Internet is not new. For years we have used online forums and Usenet groups to raise and answer questions or to have a discussion with our peers. Such forums have, however, stayed focused on specific domains without a usercentered Web service model. On the contrary, social Q&A sites are broader in their scopes and are designed to provide a Webbased portal to their users.
The first official social Q&A service in the United States, AnswerBag, was launched in 2003. Since then the number of social Q&A services open to the public has rapidly increased over the past few years. A few studies have looked at the nature and use of social Q&A. With the consideration of defining different user roles in the context of social Q&A, Gazan (2006; 2007) introduced two user role models, one for questioners and one for answerers, conducting studies with AnswerBag. He viewed social Q&A as an online community for filtering and developing collaborative knowledge. Gazan (2006) divided questioners into Specialists and Synthesists. Specialists are more like knowledge experts who provide answers without referencing other sources, while Synthesists are the ones who do not claim any expertise and provide answers with references. Gazan (2007) identified two roles of answerers as Seekers and Sloths, depending on whether they have continuous conversation/interaction with other members after posting questions. Seekers demonstrate active engagement with the community and pursue communication regarding their questions. Sloths do not pursue further interaction with community members after receiving answers to their questions. In these two studies, the roles of social Q&A users were defined based on the background of answerers and the communicative behaviors among answerers and questioners. We, on the other hand, identify two types of roles of contributors and customers based on their participation in posting questions and answers. The definition of two types and their behaviors in social Q&A are explained in Section 4.
In order to measure user participation behavior in social Q&A sites, we chose Yahoo! Answers as the primary case. Since its inception in December 2005, it has been growing incredibly fast. After one year, Yahoo! Answers announced that it had 60 million registered users worldwide, and 160 million responses in its answer collection (Business Wire, 2006). Currently, it is the most popular social Q&A service on the market (Prescott, 2006). Yahoo! Answers enables people to collaborate by sharing and distributing information among fellow users and making publicly available the entire process and products involved in asking and answering questions. Another feature of Yahoo! Answers is that it allows people to search accumulated questions and answers. It encourages users to participate in various activities not only by questioning and answering, but also by commenting on questions and answers, rating the quality of the answers, and voting for the best answers (Kim, et al., 2007).
As a secondary case to compare user participation behaviors, we selected Google Answers. Google Answers can be considered an expert Q&A system. Expert Q&A systems function similar to social Q&A in that they allow people to ask questions to the systems, but different in that the answers are provided by the group of experts (e.g., librarians; see, for example, Kenney, et al., 2003), rather than by fellow users. In addition, while social Q&A services are mostly free and open to the public, most expert systems charge fees for answers. Google Answers, launched in 2002, was a feebased service that people received answers only from the answerers who had contracts with Google, called Google Answers Researchers, and paid for their answers. To control the quality of answers, Google allowed only a limited number of answerers, who were qualified experts in a given field, to provide answers (Bates, 2007). Both Yahoo! Answers and Google Answers coexisted for a time on the market. Soon, however, most feebased Q&A services failed, and Google Answers retired in November 2006.
One could argue that the failure of Google Answers was the result of its policy to control user participation. Google Answers had its own process for reviewing the qualification of answerers and allowed only a limited number of answerers to provide answers. Access to questioners was also limited to those willing to pay for information. On the other hand, Yahoo! Answers was able to succeed in the market in a very short time thanks to its policy of promoting user participation in a lesscontrolled and more participationoriented environment.
This realization that active user participation is the most essential component of a social Q&A site was the primary driving force for this study. As we noted earlier, despite their immense significance in the digital information age, social Q&A is understudied and, in our knowledge, there has been no examination of social Q&A sites from the perspective of user participation.
In this section we present our methodology for collecting data from Yahoo! Answers and Google Answers. Since they present very different site structures, it was a challenge to collect and bring the data sets to a comparable level.
The data used in our study reported here was collected between October and December 2007. We note here that the only profile information that we collected was a set of statistics related to questions and answers, and no other personal information about users was collected or stored.
3.1 Yahoo! Answers
Fortunately Yahoo! Answers has provided an Application Programming Interface (API; see http://developer.yahoo.com/answers/) to ease mining into their data. However, not everything that we needed could be obtained by merely using APIs. Moreover, the Yahoo! Answers’ APIs are designed around questions, and not users. We, therefore, had to adopt a twotier approach, the first tier securing questions and answers, the second tier accessing user profiles by using user handles. The whole process is summarized below:
Query Yahoo! using its APIs requesting questions that match certain categories. Yahoo! Answers has 25 categories in which questions can be posted. We used all of the categories in roundrobin fashion while querying for questions to keep the data collection across the categories as homogeneous as possible. By default, the API service returns the most recent questions posted. We also obtained several old questions from 2005. We also decided to collect only those questions that were resolved, which means there were no more answers being posted for these questions.
Using question ID, query Yahoo! again to obtain more details about a given question including all answers, comments, and user IDs.
Obtain and store answers as well as comments along with their attributes including user IDs.
Use the user IDs collected from the previous steps to construct URLs for user profile pages.
Obtain user profile pages and extract various attributes including number of questions asked and answered, number of those answers chosen as the best, and points earned.
It should be noted that Yahoo! APIs have a limit of soliciting only 5,000 calls per 24 hours from a single IP. In addition to this, Yahoo! Answers kept blocking us every few minutes while crawling profile pages. This slowed down our data collection process. With this framework and restrictions, we collected more than 250,000 questions with over two million answers during the last three months of 2007. From this sample, we extracted as many user IDs as we could and collected statistics from their corresponding profile pages. At the time of writing this paper, we had crawled more than 55,000 user profiles from Yahoo! Answers.
3.2 Google Answers
It was particularly difficult for us to obtain data from Google Answers, primarily due to the lack of any APIlike interface. In addition, Google Answers did not feature elaborated user profiles like Yahoo! Answers. We used the following procedure to collect data from Google Answers given these constraints:
We manually collected the top level URLs of all categories (10) and subcategories.
These URLs were fed to our crawler that then fetched all pages to capture all questions for a given subcategory.
From each question, we extracted information about the asker and answerer.
The username of the answerer was hyperlinked to his or her page, where we could record the number of answers and average rating for that user among other information.
We could not locate information about the number of questions asked per individual. We had to calculate this based on the whole collection.
Google Answers did not hinder our crawler, nor tried to block our site. This made it possible for us to capture entire user population from Google Answers (83,846 total users).
From the rich data that we collected both from Yahoo! Answers and Google Answers, as described in the previous section, there are several interesting observations and analyses that can be performed. Our analysis in this section, however, will focus on addressing the research questions presented in Section 1.
4.1 Comparison of Yahoo! Answers and Google Answers
As noted earlier, Yahoo! Answers and Google Answers can be considered as two extreme ends of Q&A sites, one immensely successful and the other now retired. Interestingly, Yahoo! Answers and Google Answers represent two different models for Q&A sites. On Google Answers, a user with a question in mind could post it to the site, specifying the amount of a fee they were willing to pay for a satisfactory answer. To ensure wellresearched answers, Google preselected a group, called ‘Google Answers Researchers,’ who would answer questions. There were around 500 Google Answers Researchers who had been approved by (and contracted with) Google. Yahoo! Answers, on the other hand, took the opposite direction. The service is entirely based on free, open participation of the community of users. Any registered users can post a question, and also provide answers to any questions other users have posted. Instead of fees, Yahoo! Answers introduced a point system to acknowledge and encourage participation. This open and free participation model is, in fact, what defines social software. In that sense, we posit that Google Answers was a more traditional expertbased model while Yahoo! Answers represents a social Q&A model.
These distinctive models for Yahoo! Answers and Google Answers and their different fates raise intriguing questions. What factors affected their respective success and failure? How and to what extent do these models trigger different outcomes? In this section, we try to answer these questions by comparing and contrasting basic statistics about questioning and answering that in turn reflect user participation. Our basic premise here is that the different models for each site caused different patterns of user participation, which in turn affected the overall success of the sites. Although we understand that multiple interacting factors are involved, and while we cannot explain the end result of success or failure solely in terms of the user participation, we believe that it is a key factor.
In this section, several usage or participation statistics of these sites will be examined, including the total number of questions/answers and the distribution of users by the number of questions/answers they were involved in. In order to examine user participation in further detail and see whether distinctive patterns of participation can be found in these different Q&A sites, we define two types of fundamental roles that users play in these sites:
- Consumers. They ask questions and consume information (answers).
- Contributors. They provide information (answers) to questions.
In Google Answers, there were a limited number of contributors (‘researchers’ in Googles terminology) preselected for answering questions. Therefore, the number of contributors was predetermined by Googles design. More importantly, there was a clear distinction between contributors and consumers. In Yahoo! Answers, any registered user can be both consumer and contributor. For the users of Yahoo! Answers, therefore, the distinction permits individual and variable participation.
Tables 1 and 2 describe the total number of questions/answers and the number of users associated with the questions/answers in Google Answers dataset and Yahoo! Answers dataset respectively. Note that our Google dataset includes the entire population of questions/answers for the duration of their service.
Table 1: Questions/answers associated with the users of Google Answers.
Note: Total users=83,846.
Type Number Users Rate per user Mode
(Most frequent posts by a single user)
Standard deviation Questions 153,343 83,454 1.84 1 5.25 Answers 56,835 534 106.43 0 25.40
Table 2: Questions/answers associated with the users of Yahoo! Answers.
Note: Total users=55,005.
Type Number Users Rate per user Mode
(Most frequent posts by a single user)
Standard deviation Questions 2,342,263 49,854 46.98 0 115.00 Answers 44,173,175 52,937 834.45 0 1,919.89
Noticeably, in Google Answers, the number of consumers (users who asked questions) was more than hundred times larger than the number of contributors. This huge unbalance certainly had consequences. As can be seen in Table 1, among the total number of 153,343 questions, 96,508 questions (approximately 63 percent) were not answered. This result is not particularly surprising, considering the small number of contributors, but it clearly suggests one reason why Google Answers failed.
On the other hand, Yahoo! Answers’ data reveals quite a different picture. The number of contributors and consumers were well balanced, and the questions on average have a reasonable number of answers. It should be noted that since we collected only resolved questions from Yahoo! Answers, each question in our dataset has at least one answer. There might well be some portion of questions not answered in Yahoo! Answers site. However, the average number of answers per question is fairly high , suggesting that, overall, the community is responsive to questions posted on the site.
In Figure 5, we plot the users of Yahoo! as well as Google relative to the number of questions posted and the number of answers provided. It is not surprising to see a small subset of users on Google Answers that provided answers in response to many questions. Yahoo! Answers, on the other hand, demonstrates a much more balanced relationship between the number of questions and the number of answers. We can further break down these populations and analyze the users for their specific behavior as consumers and contributors. Tables 3 and 4 present data from both of the sites with four groups: (1) users who posted no question and no answer; (2) users who posted no question, but posted at least one answer; (3) users who posted at least one question, but no answer; and, (4) users who posted at least one question and at least one answer.
Figure 5: Yahoo! Answers and Google Answers users showing their behavior as consumers and contributors.
Table 3: Google Answers users participation summary.
Note: Sample size=83,846.
Answers =0 ≠0 Questions =0 0 (0.00%) 392 (0.47%) ≠0 83,312 (99.35%) 142 (0.17%)
Table 4: Yahoo! Answers users participation summary.
Note: Sample size=55,005.
Answers =0 ≠0 Questions =0 3 (0.01%) 5,148 (9.36%) ≠0 2,065 (3.75%) 47,789 (86.88%)
Looking at the above tables, we can clearly see that in the case of Yahoo! Answers, the majority of the population participates actively by posting questions as well as answers. Less than four percent are only consumers. In the case of Google Answers, the majority of the population — about 99 percent acted as consumers.
The differences in the patterns of user participation between Google Answers and Yahoo! Answers appear even more clearly in the distribution of users based on the number of questions/answers posted. As shown in Figures 6 and 7, while the majority of Google consumers posted only a few questions, a large portion of Yahoo! consumers use the service recurrently . One could conclude that the majority of users on Google Answers probably never came back to the site after posting one or two questions. This could be attributed to users’ needs or their dissatisfaction with the service.
Overall, it appears that Yahoo! Answers has developed a responsive community in which users voluntarily participate as both consumers and contributors. In comparison, Google Answers featured many onetime consumers and a small number of contributors who could only cover onethird of questions. Based on these observations, we suppose that Google’s approach of controlling the quality of answers, by not allowing users other than preapproved ‘researchers to answer queries, led to a failure of the service. Yahoo! Answers open participatory model, on the other hand, appears to be successful, with a strong community in place. In the next section, we will examine this participatory model and analyze the participation patterns of Yahoo! Answers users.
4.2 Measuring user participation in Yahoo! Answers
Users are the backbone of social Q&A sites such as Yahoo! Answers. Their active participation is absolutely essential to a given site’s success and survival. In the previous subsection we saw how Google Answers constraints on user participation led to its demise. Here, we explore further intrinsic details of user participation. Since Google Answers had restricted user participation and does not provide the kind of rich data that we need for analysis, the rest of the work reported here is based solely on Yahoo! Answers data.
Figure 6: User distribution for questions in Yahoo! Answers and Google Answers.
Figure 7: User distribution for answers in Yahoo! Answers and Google Answers.
There are several ways of participating in a typical social Q&A site. One can visit a given site and view content, ask or answer a question, vote for an answer, or leave a comment. However, by the very definition of it, the two primary activities are questioning and answering. Considering this, we shall look at the patterns of questions and answers for Yahoo! Answers users. In addition to this, we will study these patterns by grouping users by their level of maturity. This is important because not all the users exhibit the same capacity or levels of contribution; some are much more active than the others. A nonfrequent visitor or a novice user is likely to have a lower level of participation than those who often visit and contribute to a site. It should be noted that the maturity of a user and his or her participation is likely to be highly correlated.
This level of maturity can be measured in many ways. Yahoo! Answers has its own definition of user levels. To encourage participation and reward great answers, Yahoo! Answers has a system of points and levels. For about a dozen kinds of different actions on the Yahoo! Answers site, a user earns or spends certain numbers of points. The accumulation of these points determines one’s level. There are seven levels, with 1 being the lowest and 7 being the highest. The details of these levels and points can be found at http://answers.yahoo.com/info/scoring_system.php.
Figure 8: User distribution across levels in Yahoo! Answers.
From the data that we have collected, the distribution of the users across the levels is illustrated above in Figure 8. While we do not claim that this distribution of samples is a completely faithful representation of the entire population of Yahoo! Answers users, it is obvious that the lower levels have a greater number of users than the higher levels.
Figure 9: Distribution of users in different levels according to the number of questions, number of answers, and the points in Yahoo! Answers.
As illustrated in Figure 9 , users in higher levels are much higher on the ‘Points’ axis compared to those in lower levels. This correlation is simply based on the definition of levels. What is more interesting to note is that there are other patterns as well. Users in higher levels seem to be answering many questions, but not necessarily posing that many more questions, as compared to those at lower levels . We will examine this issue later in this section.
Let us now look at user participation in terms of consumer and contributor behavior, as defined previously, at each level. Users at higher levels exhibit greater participation by contributing more than those at lower levels. This pattern can be inferred by the very definition of levels in Yahoo! Answers. A user achieves a given level based on points earned. While there are several factors that lead to earning or losing points, providing answers — and having those answers selected as the best are the two major factors that lead to the acquisition of points. In other words, in spite of a complex formula for determining levels of user participation, the one factor that can solely reflect these levels is the number of answers contributed by a given user.
A plot of users in levels 1, 4, and 7 is depicted in Figure 10. We can clearly see from this plot that the users at lower levels are basically consumers rather than contributors. There are a few outliers exhibiting a few stronger consumers or contributors. Yahoo! Answers identifies top contributors in each of its 25 categories based on the number of questions a user has answered in that category. In our collection of 55,005 users, we had 1,677 top contributors .
Figure 10: Amount of user participation across different levels in Yahoo! Answers.
The absolute numbers plotted in Figure 10 may give an idea of user participation, but not necessarily an accurate picture. Therefore, we looked at the average number of questions answered by a given user for each question asked. This is presented in Figure 11. As noted earlier, user participation is essential to the survival of a social Q&A site, and taking it even further, user contribution is vital. We can clearly see from Figure 11 that users in higher levels have made larger contributions than those in lower levels. It is interesting to note that levels are highly correlated with user contributions.
Figure 11: Contribution level as measured by number of answers over number of questions given by users at different levels.
Correlations of various parameters from our data are presented in Table 5. While the correlation between QuestionsAnswers and QuestionsPoints is not very strong, Answers and Points are highly correlated. In other words, the complex formula designed by Yahoo! Answers to define the level of a user is highly influenced by the number of answers posted; in other words, the level of a user reflects his or her contributions to the community.
Table 5: Correlation coefficients for different variable pairs. Variable pair Correlation coefficient
QuestionsAnswers 0.3098 QuestionsPoints 0.2267 AnswersPoints 0.8865
4.3 Evaluating the quality of user participation in Yahoo! Answers
In the previous subsection we measured user participation by simply looking at the number of questions and answers posted by a given user. While this provided some insight into the amount of user participation, it does not necessarily reflect the quality of user participation. In this subsection we examine the quality of questions and answers in order to evaluate the quality of user participation.
On Yahoo! Answers, it is possible for the users to indicate that a question is interesting by giving it a star. Questions with many stars are displayed on Yahoo! Answers’ homepage to illustrate the kinds of quality questions being asked. While it is very difficult to judge the actual quality of the questions and the star system itself is not a perfect measure of question quality, we can use the level of interest expressed by the community using stars as a way to compare questions.
To map quality of user participation, we counted the average number of stars that users in different levels received for their questions (see Figure 12). Not very surprisingly, users at higher levels have many more stars on average for their questions than those at lower levels. Hence questions posted by users at higher levels are likely to be more interesting than those at lower levels. It is important to note that higher level users also tend to have more social visibility in terms of contacts and fans in their social circles. In turn, their connectivity makes their questions more noteworthy.
To make a true comparison we should look at the portion of questions that received stars, and not just the absolute values. This, the average number of stars normalized by the average number of questions, is plotted in Figure 13. It is interesting to note in this graph that the bar for level6 users is lower than that for the level5 users, although it is the other way in Figure 12. This is due to the fact that in our collection, level6 users have a higher number of average questions (116) than those at level5 (93). This difference is larger than the difference in number of average stars for them.
Figure 12: Quality of questions as measured by number of stars received by users at different levels.
Figure 13: Quality of questions as measured by the fraction of average number of questions that received stars.
Figure 14: Contribution as measured by average number of answers and best answers given by users at different levels.
Figure 15: Expertise as measured by percentage of answers selected as the best answers for users at various levels.
Let us examine the quality of answers. We can naturally assume that an answer selected as the best answer indicates high quality. However, one needs to be careful with this measure, as it is binary in nature; an answer can be identified as the best or not. For our discussion here we shall approximate the quality of answers by looking at what portion of them were selected as the best.
Figure 14 illustrates the contributions of users at different levels in terms of the number of questions answered as well as the number of answers selected as the best. Not very surprisingly, users in higher levels contributed more than those at lower levels. Once again, we can see that the levels defined by the number of points are very reflective of the contributions of users.
We can extend the concept of contribution further to evaluate a given user’s knowledge. We assume that if a given users answers are routinely selected as the best answers, it is a reflection of that users expertise in a given area . In order to measure this expertise, we examine the fraction of answers that were chosen as the best answers for each level (see Figure 15). Users in levels 6 and 7 stand out as highly knowledgeable, based on our assumption that such a fraction represents expertise. However, there is not much difference in the expertise of users at levels 1 to 5, with the fraction of answers rating as the best ranging from 10 to 12 percent.
Comparing Figures 13 and Figure 15, we observe that while level6 users did not have as many interesting questions as those of level5, their answers were picked more often as the best than those of level5 users. The levels in Yahoo! Answers are defined according to points that users earn. Posing an interesting question does not earn a given user points, whereas getting one’s answer selected as the best answer provides 10 points. Giving a star to a question can be regarded as a contribution to the community, but Yahoo! Answers does not regard it as a contribution and no points are rewarded for this particular action. Hence Yahoo! Answers is focused more on user participation by the quality of a given contribution rather than its consumption.
Questionanswering sites have become some of the most popular destinations for online information seeking. Yahoo! Answers site has been reported as one of the top 100 most visited domains being tracked by Hitwise (Cashmore, 2006). Despite playing an important role in online information seeking as well as providing a platform for social interaction, these sites have not been well studied.
In this paper we examined user participation in social Q&A sites. This was done primarily by collecting and analyzing data from Yahoo! Answers and Google Answers. We discovered the important role of active user participation and a community built around sharing information. We conjectured that failing to incorporate this social factor might have been one of the reasons for the retirement of Google Answers.
We examined data extracted from Yahoo! Answers more closely to understand the nature of user participation. Since Yahoo! Answers has a unique reward system that assigns users to different levels, we analyzed user data according to these levels. We mined the data evaluating participation, contribution, and quality. We found repeatedly that while the reward system in Yahoo! Answers is based on a number of parameters, the factors that strongly affect the level of a user are related to answering questions and the quality of those answers. In other words, Yahoo! Answers’ reward system implicitly encourages users to contribute high quality content that can earn them recognition and social capital.
In summary, we found:
a framework to study user participation in social Q&A sites. For the work reported here, we used Yahoo! Answers and Google Answers only, but a similar method is applicable to other such sites.
a way to identify users in two distinctive roles in a social Q&A site: consumer and contributor. While these two roles are not orthogonal, they helped to analyze various patterns of user participation.
a method for evaluating the quality of user contribution in a social Q&A site.
In addition in our study, we addressed what we considered one of the most important characteristics of social Q&A: voluntary user participation. More specifically, we looked at the patterns of participation in terms of two key activities — questioning and answering in a social Q&A site, Yahoo! Answers. We believe the results of this study have given us insights into dynamics of a successful social Q&A site.
Additionally, we collected and analyzed large datasets. To the best of our knowledge, the datasets used in our study are the largest datasets ever used for studying this phenomenon. The scale of these datasets allowed us to observe patterns that emerged from the data and draw some verifiable conclusions.
There are several limitations to our study. We were constrained by the amount of data we could obtain from Yahoo! Answers. While looking at user participation and contribution we have not accounted for the amount of time that users were members of a given site. It is very likely that the more advanced users have participated for a much longer time, helping them to gain recognition. In future studies we plan to account for this factor. Another measure of quality is turnaround time for a question which might also be an indicator of the responsiveness of the community. Search engines provide instant gratification with quick results to a given query. Quick turnaround time on a Q&A site may be an important issue as well.
About the authors
Chirag Shah, Jung Sun Oh, and Sanghee Oh are doctoral students at School of Information & Library Science (SILS) at University of North Carolina at Chapel Hill. Chirag Shah is working with Gary Marchionini and Diane Kelly on issues related to collaborative information seeking and social searching, as well as contextual information mining for digital preservation. Jung Sun Oh is interested in theoretical and methodological problems related to the concept of shared information space, especially in the context of social bookmarking and social Q&A. Sanghee Oh’s recent work has involved investigating the behavior of individuals and their motivation leading to participation in social Q&A, as well as building a digital library curriculum (see http://curric.dlib.vt.edu/).
Direct comments or questions about this paper to Chirag Shah — chirag [at] unc [dot] edu
1. We use questionanswer sites, answer sites, and social Q&A sites interchangeably in this paper. In other context, they may hold slightly different meanings. For instance, one can argue that Google Answers is a paid expertbased service, and not a social Q&A site.
2. At the time of writing this paper, in our collection of Yahoo! Answers questions and answers, we had 265,352 questions with 2,085,835 answers. Thus, on average a question has 7.86 answers.
3. Strictly speaking, Figures 6 and 7 are not true comparative representations between Yahoo! Answers and Google Answers since we only have a sample of the entire Yahoo! Answers’ user population. They, however, can serve as representations of both of the samples individually and provide some basis for a comparison.
5. This scatter plot may seem misleading as there appears to be many more points for higher levels than for lower levels. This is due to the fact that a majority of points at lower levels fall at the same spot, whereas points at higher levels are more distributed.
6. There is some similarity in the definitions for Yahoo! Answers of top contributor and our usage of contributor. However, we define a contributor as a characteristic of a user based on the number of questions that individual answered.
7. We use knowledge in a broad sense here. A previous study investigating the selection criteria employed to choose the best answers in Yahoo! Answers (Kim, et al., 2007) demonstrated that various criteria were used to select the best answers, depending on what kind of qualities individuals were seeking in answers. Some answers were chosen because they were informative or clearly explained, while the others were chosen because they provided emotional support.
Mary E. Bates, 2007. Q&A. (Info Pro)(Questions and Answers Service), EContent, at http://goliath.ecnext.com/coms2/gi_01996551671/QAinfoproquestions.html, accessed 1 April 2008.
Business Wire, 2006. Yahoo! Answers celebrates one year of knowledge and success as poll reveals use and influence of Q&A sites, at http://yhoo.client.shareholder.com/press/releasedetail.cfm?releaseid=222275, accessed 30 August 2008.
Pete Cashmore, 2006. Yahoo Answers dominate Q&A — No wonder Google gave up, Mashable, Social Networking News, at http://mashable.com/2006/12/28/yahooanswersdominatesqanowondergooglegaveup/, accessed 5 January 2008.
Rich Gazan, 2007. Seekers, sloths and social reference: Homework questions submitted to a questionanswering community, New Review of Hypermedia and Multimedia, volume 13, number 2, pp. 239248.
Rich Gazan, 2006. Specialists and synthesists in a question answering community, In: Andrew Grove (editor). Proceedings of the 69th Annual Meeting of the American Society for Information Science and Technology (Austin), volume 43, and at http://eprints.rclis.org/archive/00008433/, accessed 30 August 2008.
Bernard J. Jansen and Amanda Spink, 2006. How are we searching the World Wide Web? A comparison of nine search engine transaction logs, Information Processing and Management, volume 42, number 1, pp. 248263.
Anne R. Kenney, Nancy Y. McGovern, Ida T. Martinez, and Lance J. Heidig, 2003. Google meets eBay: What academic librarians can learn from alternative information providers, DLib Magazine, volume 9, number 6, at http://www.dlib.org/dlib/june03/kenney/06kenney.html, accessed 15 May 2008.
Soojung Kim, Jung Sun Oh, and Sanghee Oh, 2007, Bestanswer selection criteria in a social Q&A site from the user oriented relevance perspective, Proceedings of the 70th Annual Meeting of the American Society for Information Science and Technology (Milwaukee), and at http://curric.dlib.vt.edu/papers/ASIST2007_0525_Yahoo_Answers_Final_version.pdf, accessed 30 August 2008.
LeeAnn Prescott, 2006. Yahoo! Answers captures 96% of Q and A market share, Hitwise.com, at http://weblogs.hitwise.com/leeann-prescott/2006/12/yahoo_answers_captures_96_of_q.html, accessed 15 April 2008.
Liwen Vaughan and Mike Thelwall, 2004. Search engine coverage bias: Evidence and possible causes, Information Processing and Management, volume 40, number 4, pp. 693707; version at http://www.scit.wlv.ac.uk/~cm1993/papers/search_engine_bias_preprint.pdf, accessed 30 August 2008.
Paper received 20 May 2008; accepted 20 August 2008.
Exploring characteristics and effects of user participation in online social Q&A sites by Chirag Shah, Jung Sun Oh, and Sanghee Oh is licensed under a Creative Commons Attribution-NoncommercialNo Derivative Works 3.0 United States License.
Exploring characteristics and effects of user participation in online social Q&A sites
by Chirag Shah, Jung Sun Oh, and Sanghee Oh
First Monday, Volume 13 Number 9 - 1 September 2008
A Great Cities Initiative of the University of Illinois at Chicago University Library.
© First Monday, 1995-2013.