First Monday

An empirical examination of Wikipedia's credibility by Thomas Chesney



Abstract
Wikipedia is an free, online encyclopaedia; anyone can add content or edit existing content. The idea behind Wikipedia is that members of the general public can add their own personal knowledge, anonymously if they wish. Wikipedia then evolves over time into a comprehensive knowledge base on all things. Its popularity has never been questioned, although some have speculated about its authority. By its own admission, Wikipedia contains errors. A number of people have tested Wikipedia’s accuracy using destructive methods, i.e. deliberately inserting errors. This has been criticised by Wikipedia. This short study examines Wikipedia’s credibility by asking 258 research staff with a response rate of 21 percent, to read an article and assess its credibility, the credibility of its author and the credibility of Wikipedia as a whole. Staff were either given an article in their own expert domain or a random article. No difference was found between the two group in terms of their perceived credibility of Wikipedia or of the articles’ authors, but a difference was found in the credibility of the articles — the experts found Wikipedia’s articles to be more credible than the non–experts. This suggests that the accuracy of Wikipedia is high. However, the results should not be seen as support for Wikipedia as a totally reliable resource as, according to the experts, 13 percent of the articles contain mistakes.

Contents

Introduction
Method
Results
Discussion

 


 

Introduction

Formally launched in 2001 by Jimmy Wales, Wikipedia is an free, online encyclopaedia; anyone can add content or edit existing content. At the time of writing, it contains around 650,000 articles in English and receives about 60 million hits per day [1]. Its popularity has never been questioned, although some have speculated about its authority. The idea behind Wikipedia is that members of the general public can add their own personal knowledge, anonymously if they wish; (a “wiki” refers to a Web page which can be changed by its visitors). Wikipedia then evolves over time into a comprehensive knowledge base on all things. Disputes between authors over content are frequent. These are settled essentially by a process of voting where the authors involved and any other interested parties can discuss the problem between themselves until the truth emerges and is published. Wikipedia claims that none of its articles are ever complete or finished. They also admit that vandalism is a constant problem with people publishing “with an agenda”. This can be seen on the pages of controversial subjects such as abortion. Many articles have a message at the top pointing out that the neutrality of the article has been questioned or that the article is being considered for deletion. Each article has a discussion forum attached to it where authors can debate changes, and a history page where everyone can see what changes have been made, by whom and can compare earlier versions.

That anyone can edit articles has led many writers to discredit Wikipedia. Since at any point in time, by Wikipedia’s own admission, an article may contain errors, this calls into question the accuracy of the entire encyclopaedia. Schneider (2005) finds it inherently untrustworthy, and questions the scope and balance of its articles. In a move inspired by Wikipedia’s success, when the Los Angeles Times started a wiki on its Web site for their readers to write about the Iraq war (Martinez, 2005), the paper’s former editorial page editor criticised it for diluting its reputation (Burgard, 2005). He said the content could not be claimed to be reasoned and informed, and that the paper should be checking all the claims made by the wiki’s authors which, like Wikipedia, it was not. Someone, he claims, needs to be guardian and trustee of the information that is published. Cronin (2005) feels the same way about Wikipedia.

Proponents of Wikipedia have hit back by pointing out that the site does not claim to be authoritative or reliable but realises that is it “possible for biased, out of date or incorrect information to be posted.” [2] They claim that because so many people are contributing, the content will become more reliable as time passes.

So what then is the use of an encyclopaedia which might be unreliable? For some it is no use, for others, it’s a good starting point.

So what then is the use of an encyclopaedia which might be unreliable? For some it is no use, for others, it’s a good starting point. This debate will continue and the current paper will not add directly to it. Instead, since so many people seem to use it as a source of information, it is interesting to test the credibility of Wikipedia’s content, which is the goal of this short study. Attempts have already been made to assess the information quality of Wikipedia. One controversial experiment has been carried out several times. This calls for people to insert deliberately erroneous information into it and time how long it takes for the mistakes to be corrected. Halavais inserted 13 such errors which were all quickly fixed (Glasser, 2004), although because of the way the errors had been inserted, as soon as one had been found, all the others could be easily reverted. Leppik (2004) added five errors, none of which were corrected within a week. A different approach is taken here. This is explained in Section 2 and then the results are presented and discussed.

 

++++++++++

Method

A total of 258 academics (research fellows, research assistants and PhD students) were asked to participate in the study. 69 (27 percent) agreed to take part with 55 (21 percent) actually completing the survey. Each respondent was randomly assigned to one of two experimental conditions. Under Condition 1 they were asked to read an article in Wikipedia that was related to their area of expertise. For example, a member of the Fungal Biology and Genetics Research Group (in the Institute of Genetics at Nottingham University; see http://www.nottingham.ac.uk/biology/Genetics/index.phtml) was asked to look at the article on metabolites. Areas of expertise were found from the academics’ own Web sites with the choice of article being made by the author. If there was any doubt the expert was contacted for advice. Under Condition 2 respondents were asked to read a random article. Wikipedia’s own random article selection feature was used to assign a different article to each Condition 2 respondent. Some articles in Wikipedia are stubs, i.e. they are only one or two lines in length, intended to start off a new article. These articles were not assigned to respondents, nor were any that were questioned for neutrality or being considered for deletion.

After reading their assigned article, all respondents were asked to complete the same online questionnaire which measured the article’s credibility, the writer’s credibility, Wikipedia’s credibility and a measure of how “cynical” the respondent is of information taken from the Internet. Table 1 shows some information about the respondents.

 

Table 1: Information about respondents
  Condition 1 Condition 2
Number of male respondents 16% 19%
Number of female respondents 14% 5%
Total number of respondents 30% 24%
Average age of respondents 33 31

 

So there are two groups, one that is able to assess the content itself and the other which is only able to assess the content on the basis of how it reads. If it is assumed that Wikipedia looks and feels like a professional encyclopaedia, then the second group may be expected to find it reasonably credible. Therefore if Wikipedia, when assessed by the experts, is found not to be credible, it would be expected that the two groups will differ in their responses. Specifically the article’s credibility, the writer’s credibility and Wikipedia’s credibility would be less under Condition 1 than Condition 2. However proponents of Wikipedia may argue that its credibility would be greater under Condition 1 than Condition 2. For this reason, no direction will be specified in the first three hypotheses which are:

H1a Perceived credibility of the author will be different under Condition 1 than Condition 2.

H1b Perceived credibility of the article will be different under Condition 1 than Condition 2.

H1c Perceived credibility of the site will be different under Condition 1 than Condition 2.

It might also be expected that there be a negative correlation between perceived credibility of all respondents and how “cynical” the respondent is of information taken from the Internet. This leads to the following hypotheses:

H2a Perceived credibility of the author will be negatively correlated with how cynical respondents are of Internet information.

H2b Perceived credibility of the article will be negatively correlated with how cynical respondents are of Internet information.

H2c Perceived credibility of the site will be negatively correlated with how cynical respondents are of Internet information.

Measurements

Information with high quality is usually considered to have some or all of the following characteristics: up–to–date, relevant, accurate, economic for the purpose at hand, on time and understandable to the person who needs it. This study measures the credibility of the site, the authors who contribute to the site and the articles they write. Credibility encompasses some but not all of these characteristics. Hovland, et al. (1953) operationalised credibility as perceived expertise and perceived trustworthiness. Berlo, et al. (1970) suggest three scales to measure credibility: safety (safe–unsafe etc.), qualification (trained–untrained etc.) and dynamism (aggressive–meek etc.). The most consistent dimension of media credibility in studies has been believability (Flanagin and Metzger, 2000). Researchers have also used bias, fairness, accuracy, believability and objectivity (for example Sundar, 1999).

All measurements in this study were made using 7–point Likert scales and were reverse coded if necessary to ensure that low scores represented greater perceptions of credibility and greater cynicism. The author’s credibility was measured using nine items taken from Flanagin (2005). The authors were assessed by the extent to which they were perceived to be credible, have high reputation, be successful, be trustworthy, offer information of superior quality, be prestigious, have a sincere interest in important affairs and the extent to which the respondent would be willing to work for them. The mean and standard deviation for this construct were 3.38 and 0.83, and Cronbach’s alpha was 0.890.

The article’s credibility was measured with five items: believability, accuracy, trustworthiness, bias and completeness. These were taken from Flanagin and Metzger (2000) who adapted them from Austin and Dong (1994), Gaziano (1988), Rimmer and Weaver (1987) and West (1994). The mean and standard deviation for this construct were 4.11 and 1.14, and Cronbach’s alpha was 0.838.

Twenty–two items taken from Flanagin (2005) which were adapted from standard source credibility scales (Berlo, et al., 1970; Leathers, 1992; McCroskey, 1966; McCroskey and Jenson, 1975) were used to assess the credibility of the Wikipedia site as a whole. Respondents assessed the extent to which they found the site to be trustworthy, believable, reliable, authoritative, honest, safe, accurate, valuable, informative, professional, attractive, pleasant, colourful, likable, aggressive, involving, bold, interactive, interesting, sophisticated, biased and organized. The mean and standard deviation for this construct were 2.93 and 0.73, and Cronbach’s alpha was 0.937.

How cynical each respondent was towards information on the Internet was assessed with five items taken from Metzger, et al. (2003). Respondents stated whether they: check to see who the author of a Web site is, check to see if the information is current, seek out other sources to validate information found online, consider whether the information is opinion or fact and check to see that the information is complete and comprehensive. The mean and standard deviation for this construct were 2.77 and 0.90, and Cronbach’s alpha was 0.723.

 

++++++++++

Results

Table 2 shows the mean, standard deviation and standard mean error of the responses for the author, article, site credibility and cynicism of respondents under each condition. Table 3 shows the results of testing for a difference in the means using the t test. Pearson’s correlation coefficient was used to examine the relationship between cynicism and credibility. The results of this are shown in Table 4.

 

Table 2: Mean, standard deviation and standard error of the mean of responses
    Number Mean Standard deviation Standard error of the mean
Author Condition 1 30 3.35 0.83 0.15
  Condition 2 24 3.40 0.86 0.18
Article Condition 1 30 2.66 0.89 0.16
  Condition 2 24 3.14 1.07 0.22
Site Condition 1 30 2.84 0.62 0.11
  Condition 2 24 3.00 0.85 0.17

 

 

Table 3: Results of independent samples T test (p < 0.1)
  t Sig  
Author credibility -0.230 0.819 NS
Article credibility 1.797 0.078 S
Site credibility -0.773 0.433 NS
Cynicism 0.590 0.558 NS

 

 

Table 4: Correlation coefficients (p < 0.05)
Correlation between cynicism and ... r  
Author -0.036 NS
Article 0.042 NS
Site -0.105 NS

 

 

++++++++++

Discussion

The results show no significant difference between the mean responses under the two conditions when assessing the writer’s credibility and the site’s (Wikipedia’s) credibility. There is a difference between the mean responses when assessing the article’s credibility, significant at the 10 percent level. The oddity is the direction of this difference — the experts rated the articles as being more credible than the non–experts. Therefore no support was found for H1a or H1c but support was found for H1b. This difference in the articles’ credibility needs to be examined. It may be the case that non–experts are more cynical about information outside of their field and the difference comes from a natural reaction to rate unfamiliar articles as being less credible. If this is correct, it is the credibility of the Condition 2 articles that is being assessed low and not the credibility of the Condition 1 articles that is being assessed high. However, as the cynicism data in Table 3 shows, there was no difference between the groups in terms of their cynicism. However, it should be noted that this cynicism was self–reported and there may be a bias with people wanting to appear more cynical than in fact they are. Having dismissed this as an explanation, that leaves the conclusion that the experts found Wikipedia articles more credible than the non–experts. This suggests that the accuracy of Wikipedia’s information is high, although the mean responses in Table 2 suggest perceived credibility scores of around 3 on a scale from 1 (very credible) to 7 (very incredible). While perhaps not ‘high’ credibility, this certainly is not ‘low’. In the survey, all respondents under Condition 1 were asked if there were any mistakes in the article they had been asked to read. Only five reported seeing mistakes and one of those five reported spelling mistakes rather than factual errors. This suggests that 13 percent of Wikipedia’s articles have errors.

No correlation between respondent’s cynicism and perceived credibility was found suggesting that perceived credibility is unaffected by how cynical respondents are.

These results may be some cause for cheer by advocates of Wikipedia, but they should be taken with caution. The sample size was small and the difference between the means of the articles’ credibility was only significant at the 10 percent level, not at 5 percent. Further work should be done to verify this finding. In any case, the results should not be seen as support for Wikipedia as a totally reliable resource as, according to data collected during this project, 13 percent of the articles contain mistakes. End of article

 

About the author

Thomas Chesney is a Lecturer in Information Systems at the Nottingham University Business School.
E–mail: Thomas [dot] Chesney [at] nottingham [dot] ac [dot] uk

 

Notes

1.http://en.wikipedia.org/wiki/Wikipedia#History, accessed 22 August 2005.

2.http://en.wikipedia.org/wiki/Wikipedia:Overview_FAQ, accessed 22 August 2005.

 

References

E.W. Austin and Q. Dong, 1994. “Source v. content effects on judgments of news believability,” Journalism Quarterly, volume71, number 4, pp. 973–984. http://dx.doi.org/10.1177/107769909407100420

D.K. Berlo, J.B. Lemert and R.J. Mertz, 1970. “Dimensions for evaluating the acceptability of message sources,” Public Opinion Quarterly, volume 33, number 4, pp. 563–576. http://dx.doi.org/10.1086/267745

S. Burgard, 2005. “Turning over editorial pages to the bloggers is a terrible idea,” The Masthead, volume 57, number 3, pp. 12–13, and at http://www.findarticles.com/p/articles/mi_qa3771/is_200510/ai_n15640966, accessed 6 November 2006.

B. Cronin, 2005. “Dean’s Notes: BLOG: see also Bathetically Ludicrous Online Gibberish,” at http://www.slis.indiana.edu/news/story.php?story_id=958, accessed 9 September 2005.

A.J. Flanagin, 2005. “IM online: Instant messaging use among college students,” Communication Research Reports, volume 22, pp. 175–187. http://dx.doi.org/10.1080/00036810500206966

A.J. Flanagin and M.J. Metzger, 2000. “Credibility of the Internet/World Wide Web,” Journalism and Mass Communication Quarterly, volume 77, number 3, pp. 515–540. http://dx.doi.org/10.1177/107769900007700304

C. Gaziano, 1988. “How credible is the credibility crisis?” Journalism Quarterly, volume 65, number 2, pp. 267–278. http://dx.doi.org/10.1177/107769908806500202

M. Glaser, 2004. “Collaborative Conundrum: Do Wikis Have a Place in the Newsroom?” at http://ojr.org/ojr/glaser/1094678265.php, accessed 22 August 2005.

C.I. Hovland, I.L. Janis and H.H. Kelley, 1953. Communication and persuasion: Psychological studies in opinion change. New Haven, Conn.: Yale University Press, pp. 19–48.

D.G. Leathers, 1992. Successful nonverbal communication: Principles and applications. Second edition. New York: Macmillan.

P. Leppik 2004. “How Authoritative is Wikipedia” at http://www.frozennorth.org/C2011481421/E652809545/index.html, accessed 22 August 2005.

A. Martinez, 2005. “Innovations bring transparency that editorial pages need,” The Masthead, volume 57, number 3, pp. 4–5, and at http://www.findarticles.com/p/articles/mi_qa3771/is_200510/ai_n15640944, accessed 6 November 2006.

M.J. Metzger, A.J. Flanagin and L. Zwarun, 2003. “College student Web use, perceptions of information credibility, and verification behavior,” Computers & Education, volume 41, pp. 271–290. http://dx.doi.org/10.1016/S0360-1315(03)00049-6

J.C. McCroskey, 1966. “Scales for the measurement of ethos,” Speech Monographs, volume 33, pp. 65–72. http://dx.doi.org/10.1080/03637756609375482

J.C. McCroskey and T.A. Jenson, 1975. “Image of mass media news sources,” Journal of Broadcasting, volume 19 pp. 169–180. http://dx.doi.org/10.1080/08838157509363777

T. Rimmer and D. Weaver, 1987. “Different questions, different answers? Media use and media credibility,” Journalism Quarterly, volume 64, pp. 28–36. http://dx.doi.org/10.1177/107769908706400104

K.G. Schneider, 2005. “Wikipedia,” at http://freerangelibrarian.com/archives/052905/wikipedia.php, accessed 22 August 2005.

S.S. Sundar, 1999. “Exploring receivers’ criteria for perception of print and online news,” Journalism and Mass Communication Quarterly, volume 76, number 2 pp. 373–387. http://dx.doi.org/10.1177/107769909907600213

M.D. West, 1994. “Validating a scale for the measurement of credibility: A covariance structure modeling approach,” Journalism and Mass Communication Quarterly, volume 71, number 1, pp. 159–168. http://dx.doi.org/10.1177/107769909407100115

 


 

Editorial history

Paper received 16 May 2006; accepted 24 August 2006.


Contents Index

Creative
Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 2.0 License.

An empirical examination of Wikipedia’s credibility by Thomas Chesney
First Monday, volume 11, number 11 (November 2006),
URL: http://firstmonday.org/issues/issue11_11/chesney/index.html