Fifteen years ago, I conducted a small study testing the error-correction tendency of Wikipedia. Not only is Wikipedia different now than it was then, the community that maintains it is different. Despite the crudity of that study’s methods, it is natural to wonder what the result would be now. So I repeated the earlier study and found surprisingly similar results.






In 2008, I published a short paper in this journal reporting the results of research that I had conducted the year before (Magnus, 2008). I inserted fibs into Wikipedia entries about dead philosophers. Each was a one or two sentence falsehood about that philosopher’s biography, written to be implausible but not incoherent. For each fib, I observed whether it was removed or flagged as needing citation in the 48 hours after I inserted it. If neither of these things happened, then I removed the fib myself.

The raw percentage of fibs which were removed or marked was 50 percent (18/36). However, some of these corrections were due to association effects — that is, a Wikipedia user correcting one fib noticed others which I had added from the same IP address at the same time, so they went on to correct the others as well. My method in the study was to use a range of different IP addresses, but I inserted fibs in three different articles from each location. As a result of association effects, one conspicuous fib could carry two others down with it [1]. Correcting for these effects, 36 percent (10/28) of fibs were removed or flagged within 48 hours.

To put it in rough terms: About a third to a half of the fibs were removed or flagged within a 48-hour window.

When I discussed the result, many people found that rate surprisingly high. Although they might have been extrapolating (wrongly) that every 48-hour period would have that kind of correction rate, some just found it to be better than they would have expected.

Of course, I am not the only person to have inserted false claims as a way of testing Wikipedia. In most cases, like the much discussed example of Halavais (2004), the changes have been unsystematic. It is common in discussion threads for someone to comment that they have put a false claim or two of their own into Wikipedia just to see what would happen. In 2015, Gregory Kohs mounted a more systematic campaign. He made each edit from a different IP address, so there were no association effects. His result was that in 63 percent of cases the phony information was not corrected (Kohs, 2015). The flipside, of course, is that they were corrected in 37 percent of cases — almost the same as the (adjusted) result in Magnus (2008). This is an odd coincidence rather than a robust finding. Methodologically, Kohs’ “months-long experiment” was kind of a mess. His phony claims were inserted in all different kinds of articles. The insertions were of highly variable lengths, ranging from just changing a number or a name to adding a whole paragraph-length subsection. He let them stand for an arbitrary, inconsistent amount of time.




Between July and November 2022, I made 33 changes to Wikipedia: one at a time, anonymously, and from various IP addresses. The changes were not given edit summaries. When an IP address was reused, it was only after considerable delay. Each change consisted of a one or two sentence fib inserted into the Wikipedia entry on a notable, deceased philosopher. The fibs were about biographical or factual matters, rather than philosophical content or interpretive questions. Although some of the fibs mention “sources”, no citations were provided. If the fibs were not corrected within 48 hours, they were removed by the experimenter.

The fibs were all, verbatim, ones that I used in Magnus (2008). Two of the original fibs had to be excluded because the target articles were, at the time of the present study, protected or semi-protected; anonymous changes are not allowed in protected articles [2]. Another was excluded because the target article now contains details which explicitly contradict the fib; as such, the fib could not be sensibly inserted in the current article. From the 36 fibs used in the original study, this left 33 — all of which I used this time (See Table 2).




Thirty-six percent (12/33) of changes were corrected within 48 hours. Rounded to the nearest percentage point, this is the same as the adjusted result in Magnus (2008) [3]. See Table 1.


Table 1: Results for fibs inserted as part of the study, sorted alphabetically by philosopher. Results from 2007 and 2022 are given as time to correction, in HH:MM format; “—” indicates that the fib was not corrected within 48 hours. The IP address column indicates when a particular IP address was reused for this study; where no value is given, the IP address was used just once.
Wikipedia article2007 results2022 resultsIP address
Thomas Aquinas0:430:01 
Corpus Aristotelicum1:30IP-A
Jeremy Bentham 
George BerkeleyIP-B
F.H. BradleyIP-C
Rudolf Carnap 
René Descartes 
T.H. Green5:01 
Norwood Russell Hanson 
G.W.F. Hegel7:451:17 
Martin Heidegger5:30 
Carl Gustav Hempel0:44 
David Hume0:21IP-A
Immanuel Kant0:07 
Søren Kierkegaard0:02 
Gottfried Wilhelm Leibniz 
Norman Malcolm5:050:01IP-B
Nicolas Malebranche45:27 
J.M.E. McTaggart5:43 
John Stuart Mill0:092:24 
Michel de Montaigne0:4324:58 
G.E. MooreIP-B
Friedrich Nietzsche 
Karl Popper0:01 
A.N. PriorIP-C
Thomas Reid 
Bertrand Russell 
Gilbert RyleIP-B
George Santayana5:02 
Baruch Spinoza8:323:02 
Ludwig Wittgenstein2:36 


The community of users responding to the fibs is surely different between the two studies. And even if a user from 15 years ago were still contributing Wikipedia, they would be much older now. So one may naturally worry that the result is more about the method than about Wikipedia. To simplify the worry: If about a third of the fibs used in the study were preposterous howlers which anyone could easily detect while the remainder were cunning lies that would slip by, then a correction rate of about a third would be explained more by the structure of the dataset than by features of the community.


Table 2: Fibs used for both Magnus (2008) and for this study. Note, crucially, that none of these are true.
Wikipedia articleFib
Thomas AquinasIn order to highlight the contrast between Christian living and pre-Christian Greek thought, Aquinas encouraged the eating of beans.
Corpus AristotelicumThere are no surviving editions of Aristotles’ Theophrastian ethics, which considered issues in the ethics of animal care. Records indicate that a copy existed as late as the tenth century, in the city of Cordoba.
Jeremy BenthamAs a child, he wrote a series of imaginative dialogues between an unnamed boy and wisdom incarnate in the form of a tiger. These were never published, but reflected the author’s early interest in writing and philosophy.
George BerkeleyThe Principles consisted of three parts, elaborating consequences for metaphysics, ethics, and aesthetics respectively. Of these, only the first was ever published, and Berkeley’s drafts of the second and third parts have not survived.
BoethiusIt is known that he lost two fingers on his left hand in a childhood accident, although there is no record of how exactly it occurred.
F.H. BradleyIn 1900, Bradley was nearly blinded in a sporting accident. He continued to be philosophically active, but his subsequent works were dictated to an assistant.
Rudolf CarnapThe Vienna Circle was also a tightly-knit social group. They regularly met to play cards, including a bridge-like game of their own devising called Whistenschaft.
René DescartesWhile there, Descartes first encountered hermetic mysticism. Although he was briefly a Free Mason, he later abandoned mysticism in favor of reasoned inquiry.
T.H. GreenGreen’s correspondence, published in 1912, also gives insight into his philosophy. In a letter to Victoria Regina, he suggests that moral perfectibility will allow humans to transcend their limitations within the next century.
Norwood Russell HansonHanson was skilled at sleight-of-hand, and would often entertain dinner party guests with card tricks and other feats of legerdemain.
G.W.F. HegelHegel found the work isolating and drank heavily when not working. While drunk, Hegel ran naked through the foyer of the house while chanting the Lord’s Prayer in Latin.
Martin HeideggerSome of the faculty at Freiburg called him ‘Edmund II’, a moniker that Heidegger found demeaning.
Carl Gustav HempelHempel was renowned for whittling at departmental colloquia. If he liked the talk, he would give the resulting figure to the guest speaker.
HeraclitusAccording to some ancient sources, Heraclitus was mildly hydrophobic and refused to travel by boat. This is connected with the probably apocryphal story that he died by drowning.
David HumeHume had begun wrestling with local sportsmen in Bristol, and continued the activity in France until a shoulder injury forced him to stop.
Immanuel KantKant’s poetry was much admired, and handwritten manuscripts circulated among his friends and associates.
Søren KierkegaardAs a young boy Kierkegaard was mauled by a wild dog. Although he recovered, some have suggested that the episode prefigures later themes of anxiety and dread.
Gottfried Wilhelm LeibnizMany of his manuscripts are written in a shorthand of his own invention which uses binary numbers to encode sequences of characters.
Norman MalcolmHe built a greenhouse at his home in Ithaca. He raised orchids, producing several new hybrids including one that bears his name.
Nicolas MalebrancheMalenbranche’s tutor, Pierre Gassendi, was himself a notable philosopher, but there is no indication that philosophy was part of the curriculum.
J.M.E. McTaggartAmong McTaggart’s many interests was antique collecting. His collection boasted the sword that removed Tycho Brahe’s nose.
John Stuart MillFollowing the death of his wife, Mill had a series of mistresses who helped him prepare manuscripts as well as sharing his bed.
Michel de MontaigneMontaigne had been an avid duellist at Guyenne. During this period of isolation, he carried a rapier with him and would challenge anyone who disrupted his work.
G.E. MooreHis influence outside philosophy includes a reference to him in the signature line of the musical Oliver.
Friedrich NietzscheIn a letter to Victoria Regina, Nietzsche even entertained the possibility of burning the remaining copies to collect on insurance.
Karl PopperWhile there, he lived on a cooperative farm. He later claimed that nothing prepares the mind for philosophy like milking a cow.
A.N. PriorWhile at Oxford, Prior wrote a draft of a book on the formal structure of interpersonal awareness. Although he showed parts of the draft to various colleagues, it appears to have been lost.
Thomas ReidReid and Hume met once when both were in London, and the former indicated a fireplace poker as an example of a material object which certainly exists.
Bertrand RussellIn the same year, Russell published a volume of poetry under the psuedonym Christian Bellows. The poems primarily addressed humanistic concerns that he later revisited in works such as “Why I am not a Christian.”
Gilbert RyleAfter retiring, Ryle bought a small farm. He tinkered with automated processes to care for livestock, although they never proved to be commercially viable.
George SantayanaHe was an avid cyclist and, in 1923, he went on a cycling tour of Italy with the novelist Taylor Caldwell.
Baruch SpinozaHe supplemented his income by selling stolen jewelry that had been smuggled into Holland from France.
Ludwig WittgensteinHe was twice forced to pay fines for misuse of strychnine, which he used to control squirrels around the garden.


It is possible to address this worry, since the same fibs were used in both studies. If corrections occurred only because of the content of the fibs, then precisely the same fibs should have been corrected in both studies. So we can compare the outcomes for particular fibs, noting whether the two studies got the same outcome (that either the fib was corrected in both studies or not corrected in either study). The number of fibs which led to the same outcome in the two experiments was only 64 percent (21/33). This is within the range of what one would expect if it were simply due to chance [4]. So it looks as if the outcome is not merely due to the plausibility and implausibility of particular fibs themselves.




Despite the recurrence of an error correction rate of about 36 percent, that specific value should be taken with a grain of salt. Nevertheless, the original study and this one give a rebuttal to the worry raised by Brian Keegan that “it might be trivial to persistently embed disinformation into provincial articles about distant historical events, specialized scientific topics, or marginal trivia about national anthems that lack sustained editorial oversight” [5]. Adding disinformation is not trivially easy. A few sentences of disinformation added to Wikipedia may persist, but they might be caught relatively quickly more than a third of the time.

Although the numerical result is about the same as for the earlier study, the significance is not. Fifteen years ago, concerns about the reliability of Wikipedia were widespread. It was common to see Wikipedia as posing an epistemological problem: How could we trust a source which might have been written by anyone? [6] The original study was thus a small contribution to answering questions of broader epistemology [7]. In the years that followed, these concerns seem to have faded. Wikipedia has become more central to online life, and whining about it has become less common [8]. Search engines offer information from Wikipedia as a sidebar in search results. When we ask questions to the digital assistants on our phones, often the results come from Wikipedia. We seem to have decided, as Don Fallis noted, that “Wikipedia is not all that bad” [9]. So the current study is just a snapshot of what not all that bad means. End of article


About the author

P.D. Magnus is a professor in the Department of Philosophy at the University at Albany, State University of New York.
E-mail: pmagnus [at] fecundity [dot] com



1. I also made the mistake of including two Featured articles. Because these were carefully tended by editors, they are not really comparable to the rest of the data set.

2. See

3. In Magnus (2008), some of the corrected fibs were marked citation needed rather than being removed. In the current study, all the corrected fibs were removed entirely. This may indicate a change in Wikipedia culture to err on the side of removing a dubious passage rather than merely marking it as dubious.

4. Supposing each fib had an identical and independent probability of being corrected, the expected rate of agreement would be 54±17 percent. This calculation overlooks the fact that some of the results of the earlier study were due to association effects, but there is no way to correct for that.

5. Keegan, 2020, p. 86.

6. Frost-Arnold (2018) provides a nice summary of these debates.

7. I situated the earlier study in relation to the broader epistemic issues in Magnus (2009).

8. There is ongoing criticism of Wikipedia, of course, but the focus has shifted. Rather than wondering how Wikipedia could provide knowledge at all, recent critics have more often asked what knowledge — and whose knowledge — it provides. See inter alia Vrana, et al. (2020) and McDowell and Vetter (2022).

9. Fallis, 2011, p. 302.



