First Monday

Survival of the fittest tag: Folksonomies, findability, and the evolution of information organization

Folksonomies have emerged as a means to create order in a rapidly expanding information environment whose existing means to organize content have been strained. This paper examines folksonomies from an evolutionary perspective, viewing the changing conditions of the information environment as having given rise to organization adaptations in order to ensure information “survival” — remaining findable. This essay traces historical information organization mechanisms, the conditions that gave rise to folksonomies, and the scholarly response, review, and recommendations for the future of folksonomies.


The changing information environment & the birth of folksonomies
Folksonomies: Scholarly response & empirical evidence
Folksonomies in the future



“In the struggle for survival, the fittest win out at the expense of their rivals because they succeed in adapting themselves best to their environment.” — Charles Darwin

If information required anything to ensure its immediate survival, it would likely be these two things: first, it must be considered useful. Second, it must be findable. In the rapidly growing information environment, unidentified and unorganized content, however useful it may be, is at risk being rendered unfindable, and thus obsolete. Facing extinction, a new evolutionary adaptation in information organization has appeared: folksonomies.

This paper will trace the emergence of folksonomies as a “forced move” (Shirky, 2005b), arising as a necessary adaptation for the survival — the findability — of useful information amidst a changing environment. The paper will consist of three sections. First, an overview of the context — the information environment — will be presented. This will include an account of some of the traditional information organization methods that preceded folksonomies and how these methods have also been affected by the changing information environment. Second, the scholarly response to folksonomies will be discussed, along with the empirical research that has informed these responses. Last, possible means of integrating folksonomies with other methods of information organization will be considered. This account will demonstrate how information, both in singular units and as a phenomenon, is subject to the same basic principles of nature that face every other entity on earth: adapt to the changing environment, or perish.



The changing information environment & the birth of folksonomies

What is the information environment?

If “environment” is defined as “the totality of surrounding conditions” (“Environment”, n.d.) and “information” is defined as “evaluated, validated or useful data” [1], then the “information environment” can be thought of simply as the totality of evaluated data. Before the proliferation of the personal computer, the information environment available to the public was quite small, dominated by published works that had been vetted and evaluated by information specialists. The majority of accessible information consisted of published works whose content was identified, often with controlled vocabularies — predetermined terms and phrases reflecting some semantic or hierarchical structure (Macgregor and McCulloch, 2006). Identified, information was also findable.

Today, the information environment available to the public is massive, dominated by “self–published” works (“published,” by virtue of being uploaded to the Web and publicly accessible) that have generally not been vetted or evaluated by anyone. The modern information environment holds far more information than specialists to identify it, with estimates that more information was produced in the last 30 years than in the 5,000 years preceding it (Wurman, 1989). Considering the quantity of content on the World Wide Web alone — over 175 million existing Web sites and three million more going online each month (Netcraft, 2008) — it is clear that the information environment is a crowded place, and becoming more so at a rapid pace.

The birth of folksonomies

Crowding, in the information environment, does not entail the same competition for resources (e.g., sustainable habitats) that species in a biological environment are faced with — on the contrary, information “habitats” are eminently sustainable, with memory and processing capacities of information systems expanding at an exponential rate (“Moore’s law”, n.d.). However, crowding threatens the very essence of information survival for another reason: crowding challenges findability.

Millions of unidentified information units being published on the Web, for instance, is akin to millions of undocumented colonists being relocated to a new planet. How many doctors, engineers, and chefs might there be in this new world — in other words, how many valuable resources that no one can find? Without identification, potentially useful contributions run the risk of being unrecognized or under–utilized. Faced with an ever–increasing flow of possibly useful information, “ordinary” users (i.e. non–information organization specialists) took matters into their own hands. By assigning “tags” — identifiers and at times reminders of meaningful information — users unwittingly gave rise to a new information organization system. ( — a bookmarking Web site — is generally thought of as the one of the first sites where user–generated tagging emerged. On this site, users were permitted to organize their bookmarked sites under any category name they choose, as opposed to a predetermined vocabulary (“ about”, n.d.). Shortly after its appearance, the scholarly community began to observe this tagging practice with interest, discussing its merits, misapplications, and general implications. It was in one such blog discussion in 2004 that the practice was given its official name. As the conversation participants mused about what to call the act of tagging, one respondent suggested “folk classification,” to which Thomas Vander Wal clarified that “folk taxonomy” or “folksonomy” would be more appropriate (Vander Wal, 2007).

These blog discussions are evidence of folksonomies’ organic development. They were not the product of scholarly deliberation, but rather developed “in the field,” in response to an environmental need. As Joshua Schachter, inventor of, expressed in an interview, he developed the tagging system as a means to keep track of his own bookmarks: “I tend to collect lots and lots of links and need help managing it” (rands, 2004). Faced with a changing environment in which potentially valuable information faced unfindability, folksonomies were born.

Like all newborn things, though, folksonomies are immature, uncoordinated, and have much to learn from their predecessors. The next section will look back to the older generation of information organization systems, their strengths and limitations, and how they too have changed over time. This account will provide the necessary foundation for considering how folksonomies can be best guided and developed in order to realize their full potential.

Traditional methods of information organization

The earliest recorded information organization schemas were hierarchical, starting with Aristotle’s classification of animals created over two thousand years ago. In hierarchical organization, items are categorized into classes and further divided into subclasses who inherit the properties of the parent class (Kwasnik, 1999). Hierarchical classification are considered advantageous for their straightforwardness — with strict rules determining what does or does not qualify for class inclusion, ambiguity is minimized.

However, the inflexibility of inclusion rules also makes it difficult to account for unusual or new cases. For instance, a classification of books may have parent classes such as “hardcover” or “paperback,” and subcategories such as “academic,” “trade,” “cookbooks” and so forth. When technology introduces new forms — electronic books, for instance — the new forms may not strictly fit within the parameters of existing classes. Also, hybrids may appear that are equally typical of more than one class, such as a children’s cookbook or an academic book that becomes a popular trade paperback (however unlikely that may be).

Rather than try to fit all new information types into the strict parameters of the hierarchical classes, other organization schemes appeared, including trees and faceted analysis (Kwasnik, 1999). Each of these information organization adaptations possessed its own advantages and disadvantages in contending with the new challenges presented in the information environment. Yet conceptually, they drew upon a similar logic in their approach: that organization methods required a fixed structure, governed by fixed rules. However, a new adaptation was on the horizon that would change not only the actual practice of information organization, but also the core concepts underlying the approach: the graded structure.

Graded structures & the foreshadowing of folksonomies

Of all the adaptations that appeared in information organization, the most paradigm shifting may be Ludwig Wittgenstein’s concept of “family resemblance.” With this, Wittgenstein suggested that not all members of classes are equally representative of the class (Wittgenstein, 1953, in Rosch and Mervis, 1996): for instance, “apple” may be a more prototypical fruit than “tomato.” Rather than assign strict parameters that all members of a class must meet equally, the family resemblance concept allows for a graded structure, where members of a class are judged as less or more typical depending on how many attributes they share with other members of their class (Barsalou, 1987).

Organizing information according to a graded structure appears more closely aligned with how people intuitively identify items. By mentally grouping items, people exhibit a “cognitive economy” (Rosch, 1978), diminishing the amount of new information to be processed in favor of drawing inferences about previously encountered concepts (Smith, 1996). The notion that cognition employs the graded structure approach has borne out in research as well. Studies on task performance in categorization, memory retrieval, and vocabulary development have all indicated that people are faster at sorting, recalling, and learning when given more typical examples (e.g., an apple for the category “fruit”) than less typical examples (e.g., a tomato for the category “fruit”) [2].

Graded structures & the evolution of information organization

The graded structure of information organization that arose from Wittgenstein’s “family resemblances” concept can be understood as an intermediary adaptive measure along the road to folksonomies. The concept of graded structures —more flexible than hierarchies but still rule-bound — serve the purpose of an evolutionary bridge linking the inflexible hierarchical classification of Aristotle to the fluid practice of collaborative tagging. As most adaptations arise in response to a changing environment, this next section will consider the information environment transformations that occurred during the course of Wittgenstein’s lifetime that may have contributed to his development of the graded structures approach.

Wittgenstein’s concept of “family resemblances” appeared in his work Philosophical Investigations, published posthumously in 1953 (“Family resemblance”, n.d.). Over the course of Wittgenstein’s lifetime, the information environment underwent a massive expansion, thanks to the birth of air travel, personal automobiles, radio broadcasts, motion pictures, and television broadcasts (“Timeline: History of Communication”, n.d.; “Timeline: History of Transportation”, n.d.). With transportation capable of taking people farther and in less time, more of the world could be seen firsthand. And for those unable to access these transportation options, communication options meant that more of the world could also be seen secondhand, with radio, film, and television broadcasts. The transportation and communication innovations in the first half of the twentieth century dramatically increased the amount of information that the average person might encounter. And as the information environment expanded, it strained the available means to organize it all.



Folksonomies: Scholarly response & empirical evidence

If inventions in the first half of the twentieth century resulted in an information explosion, inventions in the second half of the twentieth century resulted in a Big Bang. While computers and the Internet had been in development since the U.S. Department of Defense established the Advanced Research Projects Agency in the late 1960s, the World Wide Web in its modern state only began to take hold in the late 1980s, with Tim Berners–Lee’s invention of hypertext markup language (HTML) (“A History of the Internet”, n.d.). With HTML, everyday people were transformed into content producers. With folksonomies, these same people were transformed into content organizers. As information management efforts began to appear outside of the hands of the information specialists, the scholarly community took notice.

Praise and critical response

When folksonomies first appeared, opinions were mixed. Perhaps unsurprisingly, the earliest scholarly response was found not in journal publications, but in the blogosphere. Some information scientists praised folksonomies — for their ability to fill the gaps in information organization that controlled vocabulary specialists could not reach (Shirky, 2005a) and as a means for more personal and intuitive organization structures (Sifry, 2005). In an apt metaphor, David Weinberger described the difference between using controlled vocabularies and folksonomies as the difference between trees and fallen leaves: “The old way creates a tree. The new rakes leaves together.” [3] With so many “leaves” — information units — scattered about the information environment, folksonomies seemed to provide an unprecedented opportunity to offer previously unidentified information a second chance at life: to be found again.

Yet not everyone was convinced that folksonomies would deliver on this promise. The absence of rules in assigning tags has been feared to lead to quality problems, including imprecision, overlap, duplication, ambiguity, and erroneous identification (Dotsika, 2007; Guy and Tonkin, 2006). Others expressed doubt that user–generated tags would ever “organically arrive at preferred terms for concepts, or even evolve synonymous clusters” (Rosenfeld, 2005). With no guidelines for tag production, what was there to prevent all that potentially useful information out there from being tagged with broadly non–useful identifiers such as “mydog,” “readlater,” or “vacationpics”? While scholars have acknowledged that folksonomies may be beneficial for self–reminders (e.g., “readlater” tags in a personal bookmarking account) or for data that currently has no retrieval means (e.g., unlabelled photographs), the general sentiment has been one of wait–and–see until empirical studies begin to produce results on just how effective folksonomies are in performing information retrieval tasks.

Empirical research

Folksonomies are generally associated with two Web sites: the bookmark–sharing site launched in 2003, and the photography–sharing site Flickr launched in 2004. Given the recency with which folksonomies have developed, much of the discussion on folksonomies’ effectiveness center around observations of these two Web sites (Noruzi, 2007; Rosenfeld, 2005; Shirky, 2005a). There is little empirical research published on how folksonomies perform in information retrieval, yet the few preliminary studies that do exist reveal encouraging results.

One study that compared findability on against traditional subject directories and search engines found that while the directories outperformed folksonomies in precision and recall, the results from the folksonomies were a close second (Morrison, 2007). Further, when folksonomies were combined with the directories with controlled vocabularies, precision and recall results were higher than in searches using the controlled vocabularies alone.

Another study on the distribution of tags suggests that tags conform to power laws, where a few tags are used by a large population of users, and the majority of tags only used by a few users (Mathes, 2004). This sort of clustering of term distribution offers a counter to the argument that users would not “organically” (Rosenfeld, 2005) arrive at preferred terms or synonymous clusters. However, more research, especially research on the motivations behind term selection, will be necessary to determine whether clustering is in fact organic or possibly driven by an anchoring phenomenon, where users are influenced by existing tags in deciding on their own tag selections.

Some preliminary studies on what motivates people to choose their tags have indicated that community membership may be a contributing factor. In a collection of case studies presented at the ASIS&T 2007 conference, researchers found that people use different tags if sharing content with a community as opposed to identifying content for self–use later (Tonkin, et al., 2008). Additionally, tags may serve as an important liaison in establishing communities of practice, allowing users with similar interests to identify each other (Diederich and Iofciu, 2006). While evidence on folksonomies have yet to appear en masse in scholarly publications, the enthusiasm that folksonomies have generated in the scholarly community suggest that more empirical studies will appear soon enough.



Folksonomies in the future

The mainstream vs the long–tail

The limitations and promises of folksonomies are often discussed in terms of the practice: how things are tagged, by whom, and guided by what motivations. These questions will undoubtedly be subject to empirical study and answered in time. Yet it seems that what makes folksonomies so compelling to so many researchers is the idea of them, the sense that small contributions from the masses will help to steer what direction the information environment takes, and what kind of entity — structure, unstructured, or a symbiotic synthesis of the two — it ultimately becomes.

The idea that small, disparate actions can affect great change when these actions are aggregated is similar to the concept of the “long tail” (Anderson, 2004). The long tail refers to a statistical distribution describing how small, specialized consumer markets, when added together, can rival the economic power of the mainstream. In the case of consumer markets, the mainstream may be platinum records, blockbuster films, and bestselling novels, while in the case of folksonomies, the metaphorical mainstream are both the traditional methods of information organization as well as the popularity of the information that inhabits it. For instance, CNN, NBC, and the New York Times have plenty of other means to have their information found besides the tags they assign to their articles — brand recognition alone ensures a certain degree of traffic. In this way, even without tags, their information is relatively findable. However, for non–mainstream information — that which has been generated by individual bloggers, for instance — self–assigned tags may be an important means by which their Web sites can be found. The mainstream media online may represent the majority of information that most people will seek out, but the non–mainstream — blogs, Web site art projects, film clips, photographs, flash animations, and miscellaneous other information that people upload — comprises that long tail that may represent an aggregate total amount far greater than that produced by the mainstream. Thus, self–tagging may be the best chance these long tailers have of ensuring their findability.

Integrating folksonomies with other information organization methods

The mainstream will not disappear, of course. And neither will the traditional means of organizing information. Peter Morville presents an astute observation on the relationship between traditional and innovative organization methods in describing “pace layering”: how both physical and social constructs are built in layers with varying degrees of stability [4]. The foundations are stable, unchanging, but each new layer is slightly more subject to change, similar to how the foundation of a house is solid, but the stuff that goes inside may change from month to month. Folksonomies can be thought of as the outer layer, drawing their strength upon the undergirding of traditional organizational systems but flexible enough to adapt to the needs of the changing environmental conditions.

In the early blog debates about the merits of folksonomies, one of the most outspoken proponents of them, Clay Shirky, observed: “It doesn’t matter whether we ‘accept’ folksonomies, because we’re not going to be given that choice.[5] The question is not whether folksonomies can compare with the quality of controlled vocabularies, but rather, how to guide folksonomies towards approaching the quality standards associated with controlled vocabularies. Researchers are beginning to address precisely this question, and are generating innovative possibilities in response.

One of the main problems with tags in folksonomies is the absence of context. Since there is no associated thesaurus, typing the tag “mom” into a folksonomy–based system search would not pull up items tagged “mother,” “mum,” or “ma.” To resolve this, associating folksonomies with thesauri has been proposed as one possible improvement (Noruzi, 2007). Linking folksonomies with ontologies — a formalized set of concepts and their relationships and attributes within a domain — has also been proposed as a means for contextualizing folksonomy tags. Methods suggested to achieve this include deriving ontologies based on statistical analysis of term usage and social networks (Van Damme, et al., 2007), and offering terms from the ontology as suggested preferred terms to users when they type new tags into a system (Herzog, et al., 2007).

Others have suggested that tags are indicators of “natural language” and thus information retrieval methods employed in natural language processing can be applied (Peters and Stock, 2007). Educating users about “tag literacy” (Guy and Tonkin, 2006), combined with devising systems to offer tag recommendations, has also been proposed as a means for improving tag quality and providing a form of “tag training.” Regardless of the specific method, there appears to be a theme that both automation and user–training components are necessary to improve folksonomies as both an organizational and structure and retrieval mechanism. Fortunately, as user enthusiasm for tagging only appears to be growing, there will likely be plenty of test beds available to try out all combinations of possible improvements.




Evolution is generally thought of as a slow process, with changes only appearing in populations after many generations have passed. Yet evolution is also marked by bursts of rapid development, its equilibrium punctuated by short bursts of change (“Punctuated equilibrium”, n.d.).These rapid developments occur not in the mainstream of a population, but rather in populations separated from the mainstream, far enough away that the environment poses unique challenges for survival. It is in these nonmainstream populations that the most dramatic adaptations occur. These new adaptations often result in entirely new species.

Folksonomies may be flawed, but they are, at present, the best means known to track what is happening with the non–mainstream of the information environment. If the greatest evolutionary changes in the biological environment — the birth of new species — occur not at the center but in the long tail, what great new transformations may be occurring in the long tail of the information environment? Tagging provides this outlying information, published far from the mainstream, a chance to be found, to be considered useful, and ultimately, to survive.

This paper offered one possible interpretation of what folksonomies represent in the larger information environment. They are at once a radical new adaptation and the logical successor of Aristotle’s hierarchies. Tags’ imprecise, redundant, and personalized nature may be criticized as being difficult to find using today’s information retrieval mechanisms. Yet given the rate of technological change occurring now and the exponential change expected in the future (Kurzweil, 2001), today’s information retrieval mechanisms will likely become extinct before long as well, replaced by some new adaptation capable of meeting the challenge of folksonomies. End of article


About the author

Alexis Wichowski is a doctoral candidate in Information Science at the College of Computing and Information at the University at Albany, State University of New York. Her research interests focus on online content and communication, incidental information encountering, and evolving structures in information organization.



The author wishes to thank her chair, Jennifer Stromer–Galley, for her guidance, and Deborah Andersen and Mihye Seo for their feedback and support.



1. Morville, 2005, p. 46.

2. Cherniak, 1984; Rosch, 1978, in Smith, 1996, p. 504.

3. Weinberger, quoted in Morville, 2005, p. 139.

4. Morville, 2005, pp. 139–140.

5. Shirky, 2005b, italics in original.



C. Anderson, 2004. “The long tail,” Wired, volume 12, number 10, pp. 170–177.

L.W. Barsalou, 1987. “The instability of graded structure: Implications for the nature of concepts,” In: U. Neisser (editor). Concepts and conceptual development: Ecological and intellectual factors in categorization. New York: Cambridge University Press, pp. 101–140.

C. Cherniak, 1984. “Prototypicality and deductive reasoning,” Journal of Verbal Learning and Verbal Behavior, volume 23, pp. 625–642.

Delicious, n.d. “ About,” at, accessed 8 December 2008, from .

J. Diederich and T. Iofciu, 2006. “Finding communities of practice from user profiles based on folksonomies,” Proceedings of the 1st International Workshop on Building Technology Enhanced Learning Solutions for Communities of Practice (TEL–CoPs ’06), co–located with the First European Conference on Technology–Enhanced Learning, Crete, Greece.

F. Dotsika, 2007. “Quality issues in Web information and knowledge management,” Proceedings of the 4th International Conference on Intellectual Capital, South Africa.

Environment, n.d. WordNet search 3.0, at, accessed 8 October 2008.

Family resemblance, n.d. Wikipedia, at, accessed 8 November 2008.

M. Guy and E. Tonkin, 2006. “Folksonomies; Tidying up tags?” D–Lib Magazine, volume 12, number 1, at, accessed 18 April 2009.

C. Herzog, M. Luger, and M. Herzog, 2007. “Combining social and semantic metadata for search in a document repository,” Bridging the Gap between Semantic Web and Web 2.0, Proceedings of the European Semantic Web Conference Workshop, pp. 14–21.

History of the Internet, n.d. Computer History Museum, at, accessed 8 August 2009.

R. Kurzweil, 2001. “The law of accelerating returns,” (7 March), at, accessed 8 October 2008.

G. Macgregor and E. McCulloch, 2006. “Collaborative tagging as a knowledge organisation and resource discovery tool,” Library Review, volume 55, number 5, pp. 291–300.

A. Mathes, 2004. “Folksonomies — Cooperative classification and communication through shared metadata,” Computer–Mediated Communication, LIS5900CMC (Doctoral seminar), Graduate School of Library and Information Science. University of Illinois Urbana–Champaign.

Moore’s law, n.d. Wikipedia, at, accessed 8 October 2008.

P.J. Morrison, 2007. “Tagging and searching: Search retrieval effectiveness of folksonomies on the Web,” Unpublished master’s thesis, Kent State University.

P. Morville, 2005. Ambient findability: What we find changes who we become. Sebastopol, Calif.: O’Reilly.

Netcraft, 2008. “July 2008 Web server survey,” at, accessed 7 August 2008.

A. Noruzi, 2007. “Folksonomies: Why do we need controlled vocabulary?” Webology, volume 4, number 2, at, accessed 18 April 2009.

I. Peters and W.G. Stock, 2007. “Folksonomy and information retrieval,” Proceedings of the 70th Annual Meeting of the American Society for Information Science and Technology, volume 44, pp. 1510–1542.

Punctuated equilibrium, n.d. PBS library: Evolution,, accessed 11 August 2008.

rands, 2004. “A interview,” Rands in Repose, at, accessed 10 August 2008.

E. Rosch, 1978. “Principles of categorization,” In: E. Rosch and B.B. Lloyd (editors). Cognition and categorization. Hillsdale, N.J.: L. Erlbaum Associates.

E. Rosch and C.B. Mervis, 1996. “Family resemblances: Studies in the internal structure of categories,” In: H. Geirsson and M. Losonsky (editors). Readings in language and mind, Cambridge, Mass.: Blackwell, pp. 442–446.

L. Rosenfeld, 2005. “Folksonomies? How about metadata ecologies?” (6 January), at, accessed 8 August 2008.

C. Shirky, 2005a. “folksonomies + controlled vocabularies,” Many 2 Many: A group weblog on social software (7 January), at, accessed 9 August 2008.

C. Shirky, 2005b. “Folksonomies are a forced move: a response to liz,” Many 2 Many: A group weblog on social software (22 January), at, at, accessed 9 August 2008.

D. Sifry, 2005. “Technorati launches tags,” Sifry’s Alerts: David Sifry’s musings (17 January), at, accessed 7 August 2008.

E.E. Smith, 1996, c. 1989. “Concepts and induction,” In: M.I. Posner (editor). Foundations of cognitive science. Cambridge, Mass.: MIT Press, pp. 501–526.

Timeline: History of Communication, n.d. Inventors, at, accessed 7 August 2008.

Timeline: History of Transportation, n.d. Inventors, at, accessed 7 August 2008.

E. Tonkin, E.M. Corrado, H.L. Moulaison, M.E.I. Kipp, A. Resmini, A., H.D. Pfeiffer, and Q. Zhang, 2008. “Collaborative and social tagging networks,” Ariadne, issue 54, at, accessed 18 April 2009.

C. Van Damme, M. Hepp, and K. Siorpaes, 2007. “FolksOntology: An integrated approach for turning folksonomies into ontologies,” Bridging the Gap between Semantic Web and Web 2.0, Proceedings of the European Semantic Web Conference Workshop, pp. 57–70.

T. Vander Wal, 2007. “Folksonomy coinage and definition,”, at, accessed 10 August 2009.

R.S. Wurman, 1989. Information anxiety. New York: Doubleday.


Editorial history

Paper received 27 February 2009; accepted 31 March 2009.

Copyright © 2009, First Monday.

Copyright © 2009, Alexis Wichowski.

Survival of the fittest tag: Folksonomies, findability, and the evolution of information organization
by Alexis Wichowski
First Monday, Volume 14, Number 5 - 4 May 2009