Using social bookmarks and tags as alternative indicators of journal content description
First Monday

Using social bookmarks and tags as alternative indicators of journal content description by Stefanie Haustein and Isabella Peters

Qualitative journal evaluation cumulates content descriptions of single articles. Articles are either represented by author–generated keywords, professionally indexed subject headings, automatically extracted terms or, as recently introduced, by reader–generated tags as used in social bookmarking systems. The study presented here shows that different types of keywords each reflect a different perspective on documents and that tags can be used in journal evaluation to represent a reader–specific view. After providing a broad theoretical background and literature review, methods for extensive automatic term cleaning and calculation of term overlaps are introduced. The efficiency of tags and other metadata for journal content description is illustrated for one particular journal.


Multidimensional journal evaluation
Folksonomies and social tagging
Related studies: Comparison of tags and other terms used for indexing
Motivation and approach of study




Imagine you were a librarian and could support your subscription decisions using readers’ real–time usage data. Imagine you were an author searching for the most suitable journal in which to publish your paper by determining the size and geographical distribution of the journal’s current readership before submission. Imagine you were a researcher who could monitor the response to his or her latest paper in real time in terms of how many people read it and what they thought about it.

Up to now, the traditional way of evaluating a journal’s impact and perception in the scientific community has been citation analysis, which reflects the impact on those readers who subsequently also publish after a certain period of time allowing for those citing papers to be published. Citation–based indicators such as the impact factor (Garfield, 1972) are often exclusively used to decide whether libraries subscribe to a journal, authors submit a research paper, or even if a researcher publishing in those journals is an appropriate candidate for a certain position. As Haustein (2012) pointed out, this approach disregards various other aspects contributing to a journal’s standing (Juchem, et al., 2006; Schlögl and Stock, 2004; Grazia Colonia, 2002). Only a multidimensional approach can analyze journal impact sufficiently. In this paper, we will focus on various aspects regarding the dimension of journal content and compare alternatives, i.e., crowdsourcing, to traditional indexing methods. A substantial review of the literature on crowdsourced subject indexing is provided.

Alternative and multidimensional metrics for journal evaluation can be summarized under the concepts of “scientometrics 2.0” (Priem and Hemminger, 2010) or “altmetrics” and reflect the vision that “with altmetrics, we can crowdsource peer–review” (Priem, et al., 2010). Altmetrics aims at using Web data (i.e., tweets, bookmarks, blog posts) and Web tools (i.e., social networks, social bookmarking systems and reference managers) to fully understand the characteristics of scholarly communication on the Web. Moreover, altmetrics credits scholarly activities (i.e., discussing or linking to journal articles) carried out on the Web which are not yet acknowledged by traditional metrics of scientific impact (such as citation indicators). Altmetrics aims to complement existing impact metrics rather than replace them. Besides translating statistical analyses into new media and tools, altmetrics wants to make use of quantitative and qualitative data (such as user–generated tags or hashtags) to holistically evaluate the article’s, journal’s, or author’s impact where impact does not only imply being cited but also being read and talked about.

Haustein, et al. (2010) introduced the analysis of usage data from social bookmarking platforms to measure journal perception. Similar projects (e.g., apply social bookmarks to measure the scholarly impact of authors based on the number of times their documents are bookmarked. Haustein and Siebenlist (2011) focus on the evaluation of journal readership against the background of global download statistics, which are often inaccessible (Gorraiz and Gumpenberger, 2010; Schlögl and Gorraiz, 2010) by evaluating usage of physics journals on social bookmarking platforms. Tags assigned by users to the bookmarked publications are evaluated and cumulated on the journal level reflecting a reader–specific view of journal content (Haustein and Siebenlist, 2011).

This paper aims to follow up the assumptions made by Haustein, et al. (2011) and Peters, et al. (2011) and investigates readers’ tagging behavior in social bookmarking systems in greater detail. To discover differences from or similarities to common indexing methods used by libraries or database providers, we compare tags (such as user–generated keywords) with author–generated title and abstract terms as well as author keywords, indexer–generated Inspec subject headings and automatically generated KeyWords Plus from Web of Science (WoS). Data is cleaned extensively and similarities are computed on the level of single documents to obtain exact results about similarities and differences between the different indexing perspectives. We base our analysis on 724 documents published in 24 physics journals and show how authors, readers, and indexers describe the journals’ content. The main contribution of this paper is to present the results of the cleaning methods for number and comparison of tags and other metadata as well as the efficiency of tags for describing journal content. A detailed metadata–based content analysis is used as an illustration.



Multidimensional journal evaluation

Multidimensional journal evaluation is based on a conceptional definition of five dimensions which make up and influence scholarly periodicals. It is argued that journal impact is influenced by many different factors and that hence journal evaluation should be multifaceted as well (Glänzel and Moed, 2002; Rousseau, 2002; Coleman, 2007). A single citation metric such as the impact factor is not able to fully cover the standing of a journal in the scholarly community. Therefore, methods from all dimensions of journal evaluation should be considered in order to appropriately reflect the scholarly impact of a journal. The concept of multidimensional journal evaluation is based on work by Grazia Colonia (2002), Stock (2004), Schlögl (2004), Schlögl and Stock (2004), Schlögl and Petschnig (2005) and Juchem, et al. (2006) and taken up by Haustein (2011) and Haustein (2012). A schematic representation of the multidimensional approach used in this study is shown in Figure 1.


Schematic representation of the five dimensions of journal evaluation
Figure 1: Schematic representation of the five dimensions of journal evaluation (Haustein, 2012).


This study focuses on different aspects of journal content. The journal content dimension represents the scope of a periodical and the topics covered in it. Journals can cover a broad field of science and publish more general topics or focus on a specific area of research. Content analysis can reveal the specific topics and help readers find the most suitable source of information to satisfy specific information needs. Authors discover whether the journal is an appropriate publication venue read by the target audience they are seeking.

Various methods of subject indexing can help to analyze and depict journal content. Conventional methods used by database providers and libraries include professional subject indexing with the help of classification systems and thesauri, author keywords or automatic term extraction from titles, abstracts or full text.

We focus on the analysis and comparison of conventional methods of content analysis with social tagging as a crowdsourced alternative reflecting the readers’ perspective on scholarly content.



Folksonomies and social tagging

Collaborative information services, such as Delicious (, YouTube (, Flickr ( or LibraryThing (, are constitutive parts of Web 2.0 and make a significant contribution to its success. They allow users to save and publish Web documents on the Internet and share them with other users of the service (Peters, 2009). Social bookmarking systems specializing in scholarly documents allow bibliographic metadata of scholarly literature to be saved and serve as Web reference managers (Reher and Haustein, 2010).

On such Web 2.0 platforms, users can annotate documents with user–generated keywords, so–called tags. Users do not need to follow any indexing rules or guidelines. They freely choose terms or adopt other users’ tags for indexing and searching. This is why indexing with tags is called “social tagging”. The collection of all the tags of an information service is called a “folksonomy” (Vander Wal, 2005), while the set of all the tags from all the users for a particular document is termed a “docsonomy” (Terliesner and Peters, 2011). All the tags from a certain user form their “personomy”. In formal terms, a folksonomy is a tuple F: = (U, T, R, Y), where U, T, and R are elements of user names, resources, and tags, and Y constitutes the relation between the elements (Hotho, et al., 2006). Similarly, we can define a docsonomy as DF:= (T, R, Z) with Z ⊆ T x R and a personomy as PUT:= (U, T, X) with X ⊆ U x T (Heck, et al., 2011). If you wish to analyze and evaluate a journal’s content, we need a “joursonomy”. The joursonomy collects all the tags from the journal’s docsonomies.

With folksonomies a new dimension of indexing comes into play, which was not available until the emergence of collaborative information services: the users’ view. User–generated tags offer one more layer of metadata to receive the users’ opinions on the document and to evaluate a journal’s or article’s content. As depicted in Figure 2, for content analysis three different types of actors can now be distinguished on the indexer’s side: authors, intermediaries (i.e., professional indexers) and users (Peters and Stock, 2007; Mathes, 2004; Kipp, 2006; Kipp, 2005; Stock, 2007).

These agents take different indexing paths and presumably highlight different characteristics of the documents. Authors appear in indexing when they either index their own information resource or when the document is indexed via text–oriented methods of knowledge representation (e.g., citation indexing or full–text indexing; Garfield, 1979). Automatic indexing via terms extracted from the information resource can also be included among these methods, since it uses the author’s own specific terms. In contrast to these methods, folksonomies consider the users’ language (Sinclair and Cardew–Hall, 2008). If the information resource is indexed using knowledge organization systems (e.g., nomenclatures, thesauri or classification systems), interpreters or intermediaries are needed to translate their objective perception of the document into the language of the knowledge organization system (Aitchison, et al., 2000).


Three views of a document during indexing
Figure 2: Three views of a document during indexing (Peters and Stock, 2009).


We assume that tags and other metadata terms will differ significantly, meaning that authors, intermediaries and users perceive articles’ content differently. Folksonomies, and especially docsonomies, incorporate a new perspective on content, which could be exploited in content analysis for informetric purposes and also for information retrieval. Tags provide a direct channel to the users’ opinions and perceptions of scientific publications, adding a qualitative dimension to informetrics in general and journal evaluation in particular.



Related studies: Comparison of tags and other terms used for indexing

Lin, et al. (2006) compared tags from Connotea, title terms and descriptors from Medical Subject Headings (MeSH). The comparison of the three keyword sets showed that only 11 percent (59 tags from 540 in total) of tags match the MeSH terms. In her analysis of CiteULike resources (indexed with user tags, author keywords and thesaurus terms), Kipp (2006c) (see also Kipp, 2007; Peterson, 2009) demonstrated that users may use descriptions similar to thesauri for indexing purposes, but that there are very few 100 percent matches between tags and descriptors. Kipp (2011) compared CiteULike tags, author keywords and MeSH from two biomedical journals and found that readers as well as authors use terms which are thesaurus–like but are not the exact thesaurus subject headings. Good, et al. (2009) also compared Connotea and CiteULike tags with MeSH but focused on the document level. Overall matches were low but increased when tags and subject headings were syntactically standardized.

Moreover, Lin, et al. (2006) found that tags are often redundant and did not provide any further information (see also Lux, et al., 2007; Jeong, 2008; Jeong, 2009; Heckner, et al., 2008; Syn and Spring, 2009): 19 percent (102/540) of tags matched the automatically extracted title terms. Heymann, et al. (2008) (see also Heckner, et al., 2008, for an analysis of Connotea with similar results) also arrived at this conclusion in their analysis of the social bookmarking system Delicious and reported that tags can be found in 50 percent of the Web pages they index and in 16 percent of the Web page titles (see also Al–Khalifa and Davis (2007; 2006).

Noll and Meinel (2007) carried out a quantitative analysis of Delicious tags, Web site content (metatags and text in the body section) and a Web site PageRank. They found that there is a probability of 36.2 percent for a bookmarked document having at least one tag [1]. To obtain improved matching values, the authors recommended preprocessing of tags and words in all fields. Splitting tags and words at whitespaces and special characters as well as case–insensitive matching led to 12 percent better matching values.

Golub, et al. (2009; see also Matthews, et al., 2010) studied the overlap of tags with terms of Web site titles, URLs and descriptions. Students searched for Web sites and add tags or terms from the Dewey Decimal Classification (DDC). The analysis showed that one third of tags are new, i.e., have not been used before in the title, the URL or the description. Tags from the DDC can be found in the title in 12 percent of cases, in the URL in 1.4 percent and in the descriptions in 18 percent.

Lu, et al. (2010) compared tags from LibraryThing with the expert–assigned subject terms of the Library of Congress Subject Headings (LCSH) on both the folksonomy and the docsonomy level (see also Bates and Rowley, 2011; Rolla, 2009; Heymann and Garcia–Molina, 2009; Thomas, et al., 2009). For the folksonomy, they discovered a probability of 50 percent for a subject heading being used as a tag. The overlap of frequently used terms from users and from experts was very low. Thomas, et al. (2009) arrived at different results and found only 14 percent of exact matches between LCSH and tags. A different observation can be made for docsonomies. The comparison of tags and subject headings with the book title terms showed that in 60.1 percent of the cases at least one tag corresponds to one of the title terms while this is true for only 17.9 percent of subject headings (Lu, et al., 2010; see also Syn and Spring, 2009, for similar results). Yi and Chan (2009) also investigated the overlap between LCSH and Delicious tags and showed that 60.9 percent of tags can be found in the subject headings when comparing terms character by character.

Comparable analyses with similar results have been carried out for other folksonomies as well: Jeong (2009) studied YouTube, Bischoff, et al. (2008) investigated (, Rorissa (2010) compared Flickr tags with a controlled vocabulary from the University of St. Andrews Library Photographic Archive, Lawson (2009) researched Amazon, while Kakali and Papatheodorou (2010) found matches between LCSH and a social OPAC, and Heymann, et al. (2010) considered Goodreads (, the DDC and the Library of Congress Classification.

Bar–Ilan, et al. (2010) compared folksonomies developed by users with different kinds of background information (i.e., no additional information, resource title and accompanying Web site) resulting in high tag matches between the user groups. This indicated that in folksonomies consensus emerges independently of users’ background information and tags reflect the perception of resources by users.

Almost all studies conducted in this research area conclude that folksonomies are advantageous compared to all kinds of controlled vocabulary when it comes to the representation of named entities (i.e., personal, corporate, or geographical) and specialized (e.g., technical) terms. Such terms are frequently represented in tag collections but rarely in subject headings (amongst others Lawson, 2009; Stvilia and Jörgensen, 2010). What is more, a greater diversity of tags on the docsonomy level helps to enhance and complement controlled vocabularies, such as LCSH (Thomas, et al., 2009). The greater the number of different tags (and also author and intermediary keywords) that are assigned, the better the content of an article is characterized (Syn and Spring, 2009). Both aspects support the reader in judging the relevance of the article’s — and when cumulated the journal’s — content.



Motivation and approach of study

The literature review arrives at more or less the same results: Exact overlaps of tags and professionally created metadata are rare; most matches are found when comparing tags and title terms (Syna and Spring, 2009; Noll and Meinel, 2007). Lin, et al. (2006) suspect that this low overlap of tags and professional metadata is due to the different goals of the indexing methods. Professional indexers are forced to index and cover all the topics of a document by using controlled vocabularies. Users seem to concentrate on the subject they are most interested in rather than try to represent the document completely. These findings confirm our hypothesis that users perceive article and journal content differently from intermediaries and that tags provide a basis for content description and qualitative journal evaluation.

In the analyses mentioned earlier, the researchers often do not differentiate between the collection of tags for all the documents of a Web 2.0 service, the folksonomy, and the docsonomy (except for Lu, et al., 2010; Good, et al., 2009; Iyer and Bungo, 2011). Iyer and Bungo (2011) address the advantage of comparisons on docsonomy levels: “Examining the subject headings and tags assigned to a single book provides context to the analysis, rather than using an aggregation of tags and subject headings assigned to a large, diverse sample of items.” Our analysis follows this approach and carries out an article–based comparison.

Moreover, a processing of tags and other metadata terms is regularly missing or not explicitly reported in the papers. Calculating overlaps and matches with unprocessed terms and tags as well as simply relying on one–to–one matches on the string levels can lead to erroneous values and also to invalid conclusions about the nature of folksonomies and metadata. Because of the uncontrolled nature of tags and given technical restrictions in some tagging systems (e.g., only allowing one–word tags for indexing) a great variety of compound terms (e.g., informationretrieval, information_retrieval, information–retrieval) as well as spelling variations (e.g., American English vs. British English) occur in docsonomies. Yi and Chan (2009) found in their analysis of 388 Delicious tags that 11 percent are compound terms, 12 percent are singular–plural variants and seven percent show different grammatical forms.

On the other hand, these term variations cannot occur in the controlled metadata terms, because professionally created index terms are subject to indexing guidelines. Only terms extracted from articles’ titles and abstracts and author keywords may differ from the indexing guidelines because of the journals’ particular prescribed terminology. Therefore, we follow the recommendations of Noll and Meinel (2007), Stvilia and Jörgensen (2010), Heymann and Garcia–Molina (2009), Yi and Chan (2009), and Good, et al. (2009) and preprocess tags and other metadata as described in the “Methods” section to obtain a homogeneous term basis for comparisons.




The data for this analysis is based on a previous study by Haustein, et al. (2010) and Haustein and Siebenlist (2011), which examined the application of social bookmarking data to journal evaluation. For 10,280 documents published in 45 physics journals between 2004 and 2008, bookmarks were retrieved from CiteULike, Connotea and BibSonomy. As checking for doubles revealed a low overlap of user names (i.e., only 2.8 percent of user names appeared on more than one platform), users are believed to choose one service to manage their scholarly literature. Therefore, user and bookmarking data from CiteULike, Connotea and BibSonomy were combined. The dataset contains 13,608 bookmarks by 2,441 users applied to 10,280 documents published in 45 journals between 2004 and 2008 (Haustein and Siebenlist, 2011). Since the study presented in this paper aims at comparing the description of content by readers (i.e., users of social bookmarking platforms), authors, intermediaries and automatic indexing, the initial dataset was limited to a subset of 724 documents for which all of this data was available.

The readers’ perspective on the 724 documents is covered by tags assigned by users of CiteULike, Connotea and BibSonomy. The intermediaries’ perspective is represented through index terms from the Inspec database ( Inspec provides controlled thesaurus and uncontrolled subject headings, which are intellectually chosen and assigned by information professionals. The authors’ point of view is represented by keywords, which are provided by authors in the publication, as well as by terms extracted directly from document titles and abstracts. Automatic index terms are represented by WoS KeyWords Plus, consisting of words extracted from the titles of a publication’s references (Garfield and Sher, 1993). For each of the 724 publications, the required information is retrieved from the particular sources. In order to compare differences and similarities of reader and author, intermediary and automatically indexed terms on the document level, each term is connected to its publication via the DOI (compare Figure 3). If the DOI was missing, it was completed manually.


Schematic representation of data acquisition
Figure 3: Schematic representation of data acquisition.





In contrast to previous studies (amongst others, Lin, et al., 2006; Kipp. 2006b; Kipp, 2007), which mainly compare tags to author keywords or indexing terms for a set of documents or even on the database level, we aim to analyze the similarity of tags and titles, abstracts, author keywords, indexer–generated and automatically generated terms for each document separately. For each publication, the similarity between tags and Inspec terms, KeyWords Plus, author keywords, title and abstract terms is computed applying a cosine measure. Additionally, the percentage of overlap from tag and particular metadata perspectives are given. While others calculated one value across the whole database, we compute the more exact mean of means of 724 documents. This is done to avoid mismatches between tags and metadata of different documents, as we believe the term overlap between docsonomy and respective metadata is more valuable than overlap between folksonomy and the entire metadata collection (Good, et al., 2009). In other words, in order to adequately compare similarities and differences between the author, indexer and reader perspective, terms should be compared that have been assigned to one and the same document by the different actors.

Preprocessing and cleaning

Due to the uncontrolled nature of tags and the different spelling variants of terms in titles, abstracts and keywords, data cleaning and transformation has to be applied. The cleaning of tags in order to yield a linguistically homogeneous tag collection refers to the concept of “weeding”, which represents one of four tag gardening strategies aiming to enhance the expressiveness and quality of folksonomies (Peters and Weller, 2008). Initially all special characters (except hyphens and underscores) are deleted and all letters converted to lower case. Stop words are removed from article titles and abstracts by a list of 571 stop words compiled by Salton and Buckley (n.d.) for the SMART project complemented by a list of dataset–specific terms (i.e., imported, fileimport081104). Following Noll and Meinel (2007), in order to compare tags with titles and abstracts, tags were split at the separating character (i.e. hyphen or underscore) to enable single–word to be matched in terms of title and abstracts.

When comparing tags to author keywords, subject headings and KeyWords Plus, it is necessary to delete hyphens and underscores within tags and blanks within keywords in order to unify different spellings such as complex_network, complex–network, complexnetwork and complex network. Here, we assume that author keywords, subject headings and KeyWords Plus are (partly) controlled and therefore work on the concept level instead of the word level. Multi–word subject headings refer to one concept and should form a single term after processing.

Since especially tags can appear in multiple forms, we take preprocessing one step further and unify variants as far as automatically possible. Fortunately, the great majority of physics publications have been tagged in English, so that the marginal number of non–English terms can be ignored. British English (BE) suffixes are transformed into American English (AE) by applying a rule–based algorithm. The tag synchronisation is thus unified with the AE spelling synchronization. A manual check enables the algorithm to be reversed in six cases, where the rule–based approach fails when changing –our to –or, –ogue to –og and –tre to –ter (four, hour, our, homologue, Lemaitre). Additionally, all terms are stemmed using the Porter 2 stemming algorithm ( based on a previous algorithm by Porter (1980). This unifies tags such as network, networks and networking. To allow for a string–based comparison, the same cleaning methods are applied to the other terms (title, abstract, author keywords, Inspec subject headings and KeyWords Plus) [2].

Term comparison

The number of cleaned unique tags, author keywords, KeyWords Plus, Inspec, title and abstract terms has to be determined for each of the 724 journal articles before the overlap between the different entities can be computed. The overlap counts the number of exact character strings which appear in reader and author, intermediary, automatic indexing, title or abstract data, respectively. In contrast to Good, et al. (2009), who use the harmonic mean, we calculate the arithmetic mean of the similarity values of the 724 documents (see Table 2). First, the percentage of overlap is computed in contrast to the total number of unique tags per document as well as to the number of the particular metaterms to detect the share of common tags from each of the perspectives. The overlap–tag ratio lists the percentage of overlapping tags in contrast to all unique tags assigned to the particular document and is defined as

(1) overlap tag ratio = g/a*100%

where a stands for the number of unique tags per document and g represents the overlap between tags and terms (author keywords, Inspec headings, KeyWords Plus, title or abstract terms, respectively) per document.

The overlap–analyzed term ratio calculates the same overlap from the other perspective.

(2) overlap analyzed term ratio = g/b*100%

where b stands for the number of unique terms per document and g represents the overlap between both sets per document.

To combine the two measurements, the similarity between the readers’ point of view and author, intermediary and automatic indexing perspective is calculated by cosine.

(3) cosine similarity = g/(a*b)-1

where a stands for the number of unique tags per document, b for the number of unique terms and g represents the overlap between tags and terms per document. If a publication is tagged by its readers with exactly the same terms the author used to describe it, the similarity of author and reader indexing is 1. It is 0, if the two sets of terms are completely different.




In the following, we present results of the cleaning process of tags and other metadata as well as findings on term comparison. The different indexing approaches are then described on the basis of the 724 documents in general. Finally, journal content analysis focuses on one particular journal to compare the different actor perspectives in detail.

Tag cleaning

The deletion of hyphens and underscores in tags and blanks within keywords reduces the number of spelling variants in tags by 2.3 percent from 1,743 to 1,703 unique tags. Unification (AE vs. BE) and stemming reduce the unique number of tags further to 1,596. Thus, the combination of all preprocessing methods reduces spelling variations by 8.4 percent compared to unprocessed tags. Unifying BE and AE and applying the Porter stemmer alone brings about a 6.1 percent improvement. Due to the slightly different methods applied when comparing tags to abstract and title terms, the number of unique tags differs between Tables 1a and 1b. Counterintuitively, the separation of tags leads to a reduced number of unique terms (1,515 instead of 1,596 tags). This is caused by the aggregation of parts of different terms, such as complex, network and complex_network.

Especially abstract and title terms can be improved by these methods: term quantity is reduced by 30.5 percent and 19.8 percent, respectively. As expected, due to their controlled nature, the reduction for author keywords (three percent), Inspec headings (2.8 percent) and automatic index terms (5.3 percent) is lower. The most frequently assigned terms after extensive preprocessing for the whole database are listed in Tables 1a and 1b.


Table 1a: Ten most frequently indexed terms representing reader, title and abstract perspectives (number of unique terms after cleaning).
Tags split at separating characters
Title terms
Abstract terms



Table 1b: Most frequently indexed terms representing reader, author, intermediary and automatic indexing perspectives (number of unique terms after cleaning).
Tags (merged at separating character)
Author keywords
Inspec subject headings
KeyWords Plus
electron16 nonlineardynamicalsystem25state17
ion16 iiivsemiconductor25temperatur17


Term comparison

The highest share of overlap is detected between tags and abstracts: 77.6 percent of the 724 articles share at least one term in tags and abstracts. For 66 percent at least one tag appears in the title. This is followed by the overlap of tags and intermediary terms (33.4 percent) and author keywords (29.3 percent). Only 10.5 percent of all documents have at least one tag and automatically generated KeyWord Plus in common. Again, the unification of AE and BE and stemming successfully increases the share of documents with at least one mutual term. The number of at least one overlap of tags and author keywords, Inspec headings, KeyWords Plus, title and abstracts improved by 26.2 percent, 21 percent, 20.6 percent, 9.4 percent and 8.5 percent, respectively.

Most tags are represented in the abstracts, which is to be expected since the number of abstract terms is much greater than that of other metadata. On average, 24.5 percent of title terms are taken up by users, when tagging articles. Strikingly, only 3.4 percent of indexer terms are adopted by users. While this might have dramatic consequences for information retrieval in Inspec, it reveals a wide difference between the reader and indexer perspectives with respect to published contents.

Similarity is computed for tags and Inspec terms, KeyWords Plus, title and abstract terms. While there is no document where all the tags and indexer or abstract terms match exactly, there are documents with 100 percent matches of tags and titles, KeyWords Plus and author keywords, respectively. The highest number of documents with a cosine of one can be found for author keywords (six documents), followed by title (three documents) and KeyWords Plus (two documents).


Table 2: Mean similarity measures comparing reader with author, intermediary, automatic indexing, title and abstract terms.
 Author keywordsInspec subject headingsKeyWords PlusTitle termsAbstract terms 
Mean overlap–tag ratio11.8%13.3%2.9%36.5%50.3%tags
Mean overlap–analyzed term ratio10.4%3.4%3.0%24.5%4.8%
Mean cosine similarity0.1030.0620.0260.2790.143


On average, there is hardly any overlap between reader and professional (0.062) and automatic indexing (0.026) methods. The mean cosine value is highest for title terms (0.279), abstracts (0.143) and author keywords (0.103). These results are quite similar to the findings of Syn and Spring (2009), although they reached a mean cosine value for tags and abstracts of 0.218. Overall, in our study cosine similarities are, however, very low because a large proportion of documents do not have common tags and indexing terms. This implies that social tagging represents a user–generated indexing method and provides a reader–specific perspective on article content, which differs greatly from conventional indexing methods.

General usage statistics

Overall, the 724 documents were tagged 3,073 times by 448 users of social bookmarking systems with 1,596 unique terms. The tag frequency distribution per document is shown in Figure 4. The unique tags analyzed in the following represent the cleaned and stemmed tags merged at separating characters as listed in Table 1b, i.e., the tags complex–network, complex_networks and complexnetworks were subsumed under complexnetwork. On average, the 724 documents were tagged 4.2 times (median = 3.0) with 3.9 unique tags. The most frequently tagged article was also the one with the largest number of unique tags. It was published in the Journal of Statistical Mechanics (J Stat Mech) in 2008 and is entitled “Fast unfolding of communities in large networks” (, the most frequently assigned tags (cleaned and stemmed) were communiti, network, communitydetect and modular.


Tag application frequency per document for 724 documents
Figure 4: Tag application frequency per document for 724 documents.


The 724 documents tagged by users, indexed by their authors (title, abstract, author keywords) and information professionals at Inspec as well as automatically by WoS (KeyWords Plus) were published in 24 different journals between 2004 and 2008. The frequency distribution of tags and tag applications per journal can be seen in Figure 5. Of the 24 journals tagged, J Stat Mech was tagged most frequently. Overall, 99 unique users tagged 94 documents published in J Stat Mech 505 times with 281 unique tags, meaning that 281 different terms were applied 505 times. With 291 different tags, Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment (Nucl Instrum Meth A) was the journal with the largest number of unique tags per journal indicating the greatest variety in content description by the users. However, these 291 tags were applied only 479 times by 51 users. As can be seen in Table 3, the average number of unique tags per journal is 89.8 (median = 58.0). The average number of tag applications is 128.0 (median = 81.0). Each tag was assigned 1.9 times to 1.3 journals on average. Each journal was assigned at least 2 and a maximum of 291 (Nucl Instrum Meth A) unique tags. With a total of 48 assignments, network is the tag applied most frequently overall (see Table 1b), while theori was applied to the largest number, i.e., 10, of different journals.


Frequency distribution of tags and tag applications per journal
Figure 5: Frequency distribution of tags and tag applications per journal.



Table 3: Basic statistics of tag applications.
Note: n = 724 documents.
Unique tags per document3.93.0133
Tag applications per document4.23.0159
Unique tags per journal89.858.02291
Tag applications per journal128.081.02505
Tag applications per tag1.91.0148
Journals per tag1.31.0110
Unique tags per user5.33.01194
Tag applications per user6.93.01379
Users per tag1.51.0135


The tag frequency distribution of all 1,596 unique tags applied 3,073 times can be seen in Figure 6a. Figure 6b shows the 31 most frequently assigned tags. The distribution of tags is power–law–like where a few tags were frequently used while the majority of terms were applied only once. In total, 67.8 percent of all the tags were applied once. A similar distribution applies to the number of tag applications per user. Although the distribution is less skewed (only 18.1 percent of users tag only once), 32.1 percent of users are responsible for 74.8 percent of tag applications. The average number of tag applications per user is 6.9 (median = 3.0). The person who applied tags most frequently is someone with the user name “michaelbussmann”. He applied 194 unique tags 379 times to 37 documents published in nine journals. The distribution of tag applications per user can be seen in Figure 7a, which displays all 448 users, and Figure 7b, which focuses on the most frequently tagging users.


Tag frequency distribution of 1,596 tags assigned to 724 documents
Figure 6a: Tag frequency distribution of 1,596 tags assigned to 724 documents.



31 most frequently applied tags
Figure 6b: Thirty–one most frequently applied tags.



Tag applications per user for 724 documents
Figure 7a: Tag applications per user for 724 documents.



22 most frequently tagging users
Figure 7b: Twenty–two most frequently tagging users.


Detailed content analysis for J Stat Mech

As shown above, the four indexing perspectives, i.e., author (keywords, title, abstract), professional indexer (Inspec subject headings), automatic indexing (KeyWords Plus) and reader (tags), differ greatly in terms of keywords. When aggregated on the journal level, each of the term sets provides a different perspective on journal content. Hence, the following content analysis emphasizes one particular journal to illustrate the differences in detail and show the potential of social tagging for journal content evaluation in contrast to conventional indexing methods. The detailed content analysis focuses on J Stat Mech, which is the most frequently tagged journal in our dataset. As mentioned above, 94 documents published in J Stat Mech were tagged 505 times by 99 unique users with 281 unique tags.


Table 4: Basic statistics of indexing terms applied to J Stat Mech.
Note: n = 94 documents.
Documents with title term1.61.0119
Unique title terms per document6.26.0212
Documents with abstract terms3.11.0155
Unique abstract terms per document55.153.521154
Documents indexed with author keyword2.51.0116
Author keywords per document3.33.016
Documents indexed with KeyWords Plus1.51.0122
KeyWords Plus assigned per document5.75.5110
Documents indexed with Inspec term1.21.0118
Inspec terms assigned per document15.213.0341
Documents per tag1.51.0115
Unique tags per document4.64.0133


Title and abstract terms are the most common access points to a journal article for a reader, either through browsing a journal’s table of contents or a results list in a database. Hence, the terms used in titles and abstracts are important for attracting a reader to a journal article. Title and abstract terms can be described as kinds of author keywords since they are created by the author to summarize the full text. After the cleaning process described above, which included the removal of stop words and stemming, 365 unique title and 1,655 abstract terms remained, which appeared in the titles and abstracts of the 94 documents. On average, each document contained 6.2 title and 55.1 abstract terms and each unique term appeared in the title or abstract of 1.6 and 3.1 documents, respectively. Table 4 summarizes the basic statistics of indexing terms applied to J Stat Mech. Among the 94 J Stat Mech papers, the most common (stemmed) title word was model. It appeared in 19 titles. 77.5 percent of all title terms appeared in the title of one document only. Model was also the most frequently used abstract term. The abstracts of 55 of the documents contained the word stem model at least once. After stemming, both title and abstract terms were on average six characters long (compare Table 6).

Figures 8 and 9 show term clouds of title and abstract terms visualizing the statistics described above. The special feature of term or tag clouds is that they are arranged alphabetically. However, some terms or tags are conspicuous due to different font size. The font size of a term is determined by its popularity on the resource or platform level, i.e., the more frequently the term has been used the larger its font size is (Sinclair and Cardew–Hall, 2008). Thus, term clouds visualize three dimensions of the controlled vocabulary or folksonomy at the same time: terms, term popularity and the alphabetical arrangement of the vocabulary or folksonomy. Therefore, term clouds are a useful method for visualizing statistical data and are well established in Web 2.0 environments.

Both term clouds appear to be quite similar especially regarding the most frequently used terms in the dataset. However, the abstract cloud in Figure 9 shows that there are a larger number of less frequently used terms and that abstracts reflect a greater variety in language use (resulting in a typical long tail of terms) than title terms. This is certainly due to space limitations for titles of journal articles.


Term cloud of abstract terms of 94 documents published in J Stat Mech
Figure 8: Term cloud of abstract terms of 94 documents published in J Stat Mech.



Term cloud of title terms of 94 documents published in J Stat Mech
Figure 9: Term cloud of title terms of 94 documents published in J Stat Mech.


Author indexing is a method applied to relocate the process of intellectual indexing to the person who is supposed to know a document’s content best: its creator. Although the author’s perspective and language used to describe the subject matter of the particular publication are already represented by title and abstract terms, author keywords help to reduce the content to a few significant and central terms, which can help potential readers decide upon the relevance of the document. J Stat Mech asks authors to assign keywords from a specific table of keywords [3], where controlled index terms are provided for 11 different topics. Authors are obliged to choose the most appropriate keywords for their submission to help find the most suitable editor. Currently the number of keywords is limited to four [4] terms per document. However, an analysis of the document reveals that 10 of the 94 documents were indexed with up to six author keywords. On average, authors chose 3.3 keywords from the list to describe their documents’ contents. Overall, 122 unique author keywords were assigned 310 times to the 94 documents. The low number of unique keywords and the relatively high reuse rate (2.5 documents per unique term) can be explained by the controlled vocabulary. The most frequently used author keywords were network and randomgraph. Both were assigned to 16 documents.

The author keyword cloud is depicted in Figure 10. Due to the controlled nature of the keywords that an author has to choose for articles in J Stat Mech, a quite different terminology appears in contrast to abstract or title terms. It is obvious that author keywords are longer, i.e., containing more characters (compare Table 6), and that they are mainly composed of compound terms. There is no pronounced diversity in the terminology of author keywords, which can be ascribed to the limited choice in the controlled terms and to the high specificity of these terms. The more complex a term, the fewer terms the author needs to assign in order to describe the article’s content (Jones, 1972). The first aspect is also the reason why the difference between the smallest and the largest font size is less extreme. This indicates that terms are used by authors with similar frequency for content description resulting in a less skewed term distribution.


Term cloud of author keywords assigned to 94 documents published in J Stat Mech
Figure 10: Term cloud of author keywords assigned to 94 documents published in J Stat Mech.


KeyWords Plus are automatically generated index terms based on the title terms of a document’s cited references. This approach is built on the basic assumption of citation indexing that a citation reflects the thematic similarity of documents. Each document was indexed with at least one, an average of 5.7 and a maximum number of 10 KeyWords Plus. Three hundred and sixty–four unique KeyWords Plus were used to index the 94 documents, 81.9 percent of which were assigned to one document only. The most frequently used keyword was dynam, 22 documents being indexed with it.


Term cloud of KeyWords Plus assigned to 94 documents published in J Stat Mech
Figure 11: Term cloud of KeyWords Plus assigned to 94 documents published in J Stat Mech.


As expected, the keyword cloud in Figure 11 resembles the terms and characteristics of the title cloud (Figure 8) because both are derived from the same kind of article metadata, i.e., title terms. The overall usage frequency of particular terms seems to be the same in both clouds, although it is not the same terms which have been used with equal frequency by the authors of the articles and the authors of the cited references. This is due to our underlying assumption, described in the Methods section, that title terms and (in the case of KeyWords Plus, partly) controlled vocabulary have different properties and should hence be processed differently as well: title terms were split at separating character and controlled terms were merged. This leads to longer terms (i.e., more characters) in the keyword cloud and to less similarity between title cloud and keyword cloud. The average length of KeyWords Plus is 12 characters after stemming (Table 6). We assume that the splitting of keywords would lead to quite similar term and distribution patterns, as shown in the title cloud.

For the 94 documents analyzed, the information professionals at Inspec used an average of 15.2 subject headings per document to represent document content. Indexers can choose between 9,600 controlled subject headings for indexing and may also add free terms. Each document was indexed with at least three and a maximum of 41 headings. These findings are quite different from the results of Lawson (2009), who investigated LCSH and found an average of three subject headings assigned to any document. This may be due to the Library of Congress Subject Cataloging Policy, which recommends a maximum of six subject headings per title. More surprisingly, Inspec also states that “records contain on average three to six Thesaurus Terms that reflect the main concepts described by the title and abstract. Inspec policy is to assign the most specific term appropriate to the subject of the document” [5]. A strict translation of this policy could not be confirmed in our dataset or else it is biased because of extensive indexing of free terms by professionals. At 10.9 percent, the percentage of index terms assigned to more than one document is very low, indicating that subject headings are very specific and used to determine specific differences between documents. The most frequently used Inspec heading was randomprocess, applied to 18 of the 94 documents.


Term cloud of Inspec subject headings assigned to 94 documents published in J Stat Mech
Figure 12: Term cloud of Inspec subject headings assigned to 94 documents published in J Stat Mech.


The Inspec cloud is shown in Figure 12. The cloud shows a notably different distribution pattern from the other term clouds considered previously. There is an extremely long tail of rarely assigned subject headings, but also quite a “long trunk” of terms that are used with the same frequency at the beginning of the term frequency distribution. The long tail (as well as the statistics in Table 4) reveals that there are few articles in J Stat Mech which are described with the same Inspec terms whereas the long trunk shows that there are articles which have shared research topics. It seems as if indexers use Inspec subject headings hierarchically and allocate them to articles in broader topic fields to give background information (e.g., randomprocess) and to specify research by complex terms (e.g., highspeedimag). The specific subject headings are most valuable for the evaluation of article content because they indicate the distinctiveness of articles and best reflect their “unique selling proposition”. Although we do not say that long words (i.e., number of characters) are more complex and thus carry more meaning, we have to emphasize at this point that compound terms often represent detailed and highly specific concepts. Seen in a hierarchy, compound terms mainly identify subordinate concepts which require greater cognitive efforts to be understood (Rosch, 1975). This might be the reason why most readers express difficulty in using controlled vocabularies during indexing and retrieval.

As mentioned above, the 94 documents published in J Stat Mech were tagged with 281 unique tags. The tag applied to the largest number of documents was network. It was applied to 15 different documents (Table 4) 29 times (Table 5b). On average, each unique tag was used for 1.5 different documents and each document was indexed by the users with 4.6 different tags. The largest number of unique tags per document is 33. When applied to journal evaluation, tags from users of social bookmarking tools can reflect a reader–specific view of published content. Tables 5a and 5b show the terms most frequently assigned to J Stat Mech. The preprocessed and cleaned tags and their frequencies can be seen in the tag cloud in Figure 13.


Table 5a: Most frequently indexed terms representing reader, title and abstract perspectives on J Stat Mech (number of unique terms after cleaning).
Tags split at separating characters
Title terms
Abstract terms
complex10 transit46
social10 method45



Tag cloud for 94 documents published in J Stat Mech
Figure 13: Tag cloud for 94 documents published in J Stat Mech.


The tag cloud differs fundamentally from both the Inspec and title clouds in terms of tag diversity and distribution. There are only a small number of unique tags as well as popular tags. Popular tags are frequently used for the description of articles. The tag cloud also reveals typical characteristics of tagging behavior, for example the use of dates (e.g., 2007) as index terms. Such information would normally be found in other metadata fields of journals and would not be used by professional indexers. The same applies to the usage of author names (e.g., vulpiani) or document type (e.g., review). Readers also make use of abbreviations which may not be found in controlled or title and abstract terms. Irrespective of their employment by professional indexers, such types of tags add valuable information and one more semantic layer to articles and journals and reflect the readers’ view. The term network is most frequently assigned by authors and tagging users and often appears in titles and abstracts as well. This may reflect two aspects of tagging behavior: network is the most suitable term for the article which also matches the readers’ mental lexicon or it is most convenient because readers are lazy in choosing tags. However, our analysis in Table 2 shows that, on average, more than 75 percent of tags were not adopted from the title terms.


Table 5b: Most frequently indexed terms representing reader, author, intermediary and automatic indexing perspectives on J Stat Mech (number of unique terms after cleaning).
Tags (merged at separating character)
Author keywords
Inspec subject headings
KeyWords Plus



Table 6: Basic statistics on length of index terms assigned to J Stat Mech.
Note: n = 94 documents.
Length of title terms66215
Length of abstract terms66237
Length of author keywords2120345
Length of KeyWords Plus1211332
Length of Inspec subject headings1615349
Length of tags97255



Number of tag applications and citations per document for 94 documents published in J Stat Mech
Figure 14: Number of tag applications and citations per document for 94 documents published in J Stat Mech.


Figure 14 shows the number of tag applications per document in comparison to the number of citations received. The number of citations is used as an indicator of a document’s visibility in the scholarly community. The number of tag applications per document reveals the users’ interest. Analyzing the rank order correlation for the 94 documents shows that the two metrics do not correlate. Kendall’s Τ equals 0.171 (correlation is significant at the 0.05 level). This suggests that for the 94 documents analyzed impact measured by social bookmarking and tagging differs from that measured by citations and hence makes different documents more visible. Larger data sets are, however, needed to draw general conclusions.




The results of our study show that author–generated, indexer–generated and user–generated index terms each reflect a different view of article content. Most term matches can be found in the comparison of tags and abstract as well as title terms, but on average nearly the half of tags used (49.7 percent) do not occur in abstracts and 63.5 percent are completely different from title terms. The comparison of tags and author keywords, Inspec subject headings and KeyWords Plus results in far fewer matches. These findings confirm our basic assumption that journal and article evaluation can profit from the application of user–generated tags for content analysis, as they add a third layer of perception besides the author and indexer perspectives. Due to the dynamic nature of social bookmarking and tagging, these descriptions evolve in real time. They offer direct channels for the readers’ opinions and depict trends in the language of a specific discipline. However, it is still unclear how tag suggestions in tagging systems affect users’ choice of new tags during indexing. Such system–specific properties may lead to distorted article and journal descriptions which must be taken into account before applying tag–based journal content evaluation.

At the same time, we were able to demonstrate that extensive preprocessing and cleaning (i.e., removal of special characters, unification of American and British English spellings and stemming) of all term sets leads to a more homogeneous collection of terms, which improves the calculation of overlapping terms: that is, 8.5 percent increase of overlap between tags and abstract terms and 26.2 percent improvement for comparison to author keywords. These findings indicate that term comparison without applying extensive cleaning methods is misleading and displays distorted results. Cleaning methods are still limited. Since terms are not compared semantically, some problems remain (synonyms, homonyms, different languages). The advantage is, however, that all facets of the users’ description can be depicted.

It is strongly recommended that tags and metadata should be matched on the document level, as this yields more accurate results than calculating similarities on the folksonomy level when analyzing indexing consistency of users, authors and intermediaries.

The study presented in this paper solely relies on string matching (i.e., comparing tags and controlled terms and author terms character by character). Here, the matching of terms on the concept level via ontologies or other taxonomies is neglected. We assume with Yi and Chan (2009) that a conceptual comparison of tags and controlled metadata would lead to higher similarity values. Up to now, the overlap ratios between terms have been quite low, which reflects two considerations: readers do not know what terms professional intermediaries and authors use and readers perceive things in their own way. We do not say that semantic unification on the concept level (as in controlled vocabularies) is favorable in terms of content analysis and also information retrieval of documents. From an information retrieval as well as from a content analysis point of view, the unification of tags and of controlled vocabulary would lead to both fewer access points to articles (as long as search engines also make use of string matches) and to fewer possibilities of evaluating the relevance of articles and journals.

Our future work will involve the evaluation of readers’ perception using weighted tag and term information. In broad docsonomies (Terliesner and Peters, 2011) not only can all unique tags assigned by users be considered, but also frequency information about how often a particular tag is used. If we assume that frequently used tags are more important for a document’s content, then we can calculate weighted overlap values (Cattuto, et al., 2008). This would reveal whether authors and users attach importance to the same or different topics of a document’s or a journal’s content.

Moreover, in terms of trend detection or discipline-specific tagging and indexing behavior, it will be valuable to carry out detailed content analyses of the same journals at different points in time or to compare different journals with shared research areas and target groups. As the tagging data presented in this study is too sparse for such extended analyses these research questions remain to be answered in future research projects.

Future work should also comprise analyses of the extent to which tags and other indexing terms of our term sets are able to group similar articles or journals in clusters revealing whether the same terms are used to describe the same content. Syn and Spring (2009) found in their study that “clustering of tags shows wider distribution of papers whereas keywords distribute papers in more concentrated way.” We will apply clustering methods to our dataset in order to test whether these findings are also valid there. End of article


About the authors

Stefanie Haustein is a researcher in the bibliometrics team of the Central Library at Forschungszentrum Jülich, Germany, and a lecturer at Heinrich Heine University Düsseldorf, Germany. She holds a Ph.D. in information science. Her research focuses on informetrics, journal evaluation and altmetrics.
E–mail: Haustein [dot] stefanie [at] gmail [dot] com

Isabella Peters is a researcher at Heinrich Heine University Düsseldorf, Germany, and holds a Ph.D. in information science. Her research priorities include folksonomies in knowledge representation, information retrieval, and knowledge management as well as scholarly communication on the Web and altmetrics.
E–mail: isabella [dot] peters [at] uni-duesseldorf [dot] de



We would like to thank Jens Terliesner for his help in carrying out data collection and tag cleaning.



1. This low value may be caused by the early date of the study.

2. Please note that comparison of tags and other metadata is based on a one–to–one mapping on the string level. Mapping between tags and terms and the concepts they represent is not performed.






Jean Aitchison, Alan Gilchrist and David Bawden, 2000. Thesaurus construction and use: A practical manual. Fourth edition. London: Aslib IMI.

Hend S. Al–Khalifa and Hugh C. Davis, 2007. “Exploring the value of folksonomies for creating semantic metadata,” International Journal of Semantic Web and Information Systems, volume 3, number 1, pp. 13–39.

Hend S. Al–Khalifa and Hugh C. Davis, 2006. “Folksonomies versus automatic keyword extraction: An empirical study,” International Journal on Computer Science and Information Systems, volume 1, number 2, pp. 132–143.

Judit Bar–Ilan, Maayan Zhitomirsky–Geffet, Yitzchak Miller and Snunith Shoham, 2010. “The effects of background information and social interaction on image tagging,” Journal of the American Society for Information Science and Technology, volume 61, number 5, pp. 940–951.

Jo Bates and Jennifer Rowley, 2011. “Social reproduction and exclusion in subject indexing: A comparison of public library OPACs and LibraryThing folksonomy,” Journal of Documentation, volume 67, number 3, pp. 431–448.

Kerstin Bischoff, Claudiu S. Firan, Wolfgang Nejdl and Raluca Paiu, 2008. “Can all tags be Used for search?” CIKM ’08: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 193–202.

Ciro Cattuto, Andrea Baldassarri, Vito Servedio and Vittorio Loreto, 2008. “Emergent community structure in social tagging systems,” Advances in Complex Systems, volume 11, number 4, pp. 597–608.

Anita Coleman, 2007. “Assessing the value of a journal beyond the impact factor,” Journal of the American Society for Information Science and Technology, volume 58, number 8, pp. 1,148–1,161.

Eugene Garfield, 1979. Citation indexing — Its theory and application in science, technology, and humanities. New York: Wiley.

Eugene Garfield, 1972. “Citation analysis as a tool in journal evaluation: Journals can be ranked by frequency and impact of citations for science policy studies,” Science, volume 178, number 4060, pp. 471–479.

Eugene Garfield and Irving H. Sher, 1993. “KeyWords Plus™ — Algorithmic derivative indexing,” Journal of the American Society for Information Science, volume 44, number 5, pp. 298–299.<298::AID-ASI5>3.0.CO;2-A

Wolfgang Glänzel and Henk F. Moed, 2002. “Journal impact measures in bibliometric research,” Scientometrics, volume 53, number 2, pp. 171–193.

Koraljka Golub, Catherine Jones, Marianne Lykke Nielsen, Brian Matthews, Jim Moon, Bartlomiej Puzon and Douglas Tudhope, 2009. “EnTag: Enhancing social tagging for discovery,” JCDL ’09: Proceedings of the 9th ACM/IEEE–CS Joint Conference on Digital Libraries, pp. 163–172.

Benjamin M. Good, Joseph T. Tennis and Mark D. Wilkinson, 2009. “Social tagging in the life sciences: Characterizing a new metadata resource for bioinformatics,” BMC Bioinformatics, volume 10, number 1, at, accessed 18 May 2012.

Juan Gorraiz and Christian Gumpenberger, 2010. “Going beyond citations: SERUM — A new tool provided by a network of libraries,” Liber Quarterly, volume 20, number 1, pp. 80–93.

Grazia Colonia, 2002. “Informationswissenschaftliche Zeitschriften in szientometrischer Analyse,” Kölner Arbeitspapiere zur Bibliotheks– und Informationswissenschaft, band 33, at, accessed 18 May 2012.

Stefanie Haustein, 2012. Multidimensional Journal Evaluation. Analyzing Scientific Periodicals Beyond the Impact Factor. Berlin: De Gruyter/Saur.

Stefanie Haustein, 2011. “Taking a multidimensional approach toward journal evaluation,” Proceedings of the 13th International Conference of the International Society for Scientometrics and Informetrics (Durban, South Africa), pp. 280–291.

Stefanie Haustein and Tobias Siebenlist, 2011. “Applying social bookmarking data to evaluate journal usage,” Journal of Informetrics, volume 5, number 3, pp. 446–457.

Stefanie Haustein, Isabella Peters and Jens Terliesner, 2011. “Evaluation of reader perception by using tags from social bookmarking systems,” Proceedings of the 13th International Conference of the International Society for Scientometrics and Informetrics (Durban, South Africa), pp. 999–1,001.

Stefanie Haustein, Evgeni Golov, Kathleen Luckanus, Sabrina Reher and Jens Terliesner, 2010. “Journal evaluation and science 2.0: Using social bookmarks to analyze reader perception,” Book of Abstracts of the 11th International Conference on Science and Technology Indicators (Leiden, the Netherlands), pp. 117–119.

Tamara Heck, Isabella Peters and Wolfgang G. Stock, 2011. “Testing collaborative filtering against co–citation analysis and bibliographic coupling for academic author recommendation,” RecSys ’11: Third Workshop on Recommender Systems and the Social Web, at, accessed 18 May 2012.

Markus Heckner, Susanne Mühlbacher and Christian Wolff, 2008. “Tagging tagging. Analysing user keywords in scientific bibliography management systems,” Journal of Digital Information, volume 9, number 2, at, accessed 18 May 2012.

Paul Heymann and Hector Garcia–Molina, 2009. “Contrasting controlled vocabulary and tagging: Do experts choose the right names to label the wrong things?” WSDM 2009: Proceedings of the Second ACM International Conference on Web Search and Data Mining, at, accessed 18 May 2012.

Paul Heymann, Andreas Paepcke and Hector Garcia–Molina, 2010. “Tagging human knowledge,” WSDM 2010: Proceedings of the International Conference on Web Search and Web Data Mining, pp. 51–60, and at, accessed 2 November 2012.

Paul Heymann, Georgia Koutrika and Hector Garcia–Molina, 2008. “Can social bookmarking improve Web search?” WSDM ’08: Proceedings of the International Conference on Web Search and Web Data Mining, pp. 195–206, and at, accessed 2 November 2012.

Andreas Hotho, Robert Jäschke, Christoph Schmitz and Gerd Stumme, 2006. “Information retrieval in folksonomies: Search and ranking,” In: York Sure and John Domingue (editors). The semantic Web: Research and applications. Lecture Notes in Computer Science, volume 4011, pp. 411–426.

Hemalata Iyer and Lucy Bungo, 2011. “An examination of semantic relationships between professionally assigned metadata and user–generated tags for popular literature in complementary and alternative medicine,” Information Research, volume 16, number 3, at, accessed 18 May 2012.

Wooseob Jeong, 2009. “Is tagging effective? Overlapping ratios with other metadata fields,” Proceedings of the International Conference on Dublin Core and Metadata Applications (Seoul, Korea), pp. 31–39, and at, accessed 2 November 2012.

Wooseob Jeong, 2008. “Does tagging really work?” Proceedings of the American Society for Information Science and Technology, volume 45, number 1, pp. 1–3.

Karen Spärck Jones, 1972. “A statistical interpretation of term specificity and its application in retrieval,” Journal of Documentation, volume 28, number 1, pp. 11–21.

Kerstin Juchem, Christian Schlögl and Wolfgang G. Stock, 2006. “Dimensionen der Zeitschriftenszientometrie am Beispiel von ‘Buch und Bibliothek’,” Information, Wissenschaft und Praxis, volume 57, number 1, pp. 31–37.

Constantia Kakali and Christos Papatheodorou, 2010. “Exploitation of folksonomies in subject analysis,” Library & Information Science Research, volume 32, number 3, pp. 192–202.

Margaret E.I. Kipp, 2011. “Tagging of biomedical articles on CiteULike: A comparison of user, author and professional indexing,” Knowledge Organization, volume 38, number 3, pp. 245–261.

Margaret E.I. Kipp, 2007. “Tagging for health information irganisation and retrieval,” Proceedings of the North American Symposium on Knowledge Organization, pp. 63–74, and at, accessed 2 November 2012.

Margaret E.I. Kipp, 2006a. “@toread and cool: Tagging for time, task, and emotion,” Proceedings of the 17th Annual ASIS&T SIG/CR Classification Research Workshop, at, accessed 18 May 2012.

Margaret E.I. Kipp, 2006b. “Exploring the context of user, creator and intermediary tagging,” Proceedings of the 7th Information Architecture Summit (Vancouver, Canada), at, accessed 18 May 2012.

Margaret E.I. Kipp, 2005. “Complementary or discrete contexts in online indexing: A comparison of user, creator, and intermediary keywords,” Canadian Journal of Information and Library Science, volume 29 number 4, pp. 419–436.

Karen J. Lawson, 2009. “Mining social tagging data for enhanced subject access for readers and researchers,” Journal of Academic Librarianship, volume 35, number 6, pp. 574–582.

Xia Lin, Joan E. Beaudoin, Yen Bui and Kaushal Desai, 2006. “Exploring characteristics of social classification,” Proceedings of the 17th Annual ASIS&T SIG/CR Classification Research Workshop,, at, accessed 15 May 2012.

Caimei Lu, Jung–ran Park and Xiaohua Hu, 2010. “User tags versus expert–assigned subject terms: A comparison of LibraryThing tags and Library of Congress Subject Headings,” Journal of Information Science, volume 36, number 6, pp. 763–779.

Mathias Lux, Michael Granitzer and Roman Kern, 2007. “Aspects of broad folksonomies,” DEXA ’07: Proceedings of the 18th International Conference on Database and Expert Systems Applications, pp. 283–287.

Adam Mathes, 2004. “Folksonomies — Cooperative classification and communication through shared metadata,” at, accessed 18 May 2012.

Brian Matthews, Catherine Jones, Bartlomiej Puzon, Jim Moon, Douglas Tudhope, Koraljka Golub and Marianne Lykke Nielsen, 2010. “An evaluation of enhancing social tagging with a knowledge organization system,” ASLIB Proceedings, volume 62, numbers 4/5, pp. 447–465.

Michael G. Noll and Christoph Meinel, 2007. “Authors vs. readers: A comparative study of document metadata and content in the WWW,” DocEng ’07: Proceedings of the 2007 ACM Symposium on Document Engineering, pp. 177–186.

Isabella Peters, 2009. Folksonomies: Indexing and Retrieval in Web 2.0. Translated by Paul Becker. Berlin: De Gruyter/Saur.

Isabella Peters and Katrin Weller, 2008. “Tag gardening for folksonomy enrichment and maintenance,” Webology, volume 5, number 3. at, accessed 18 May 2012.

Isabella Peters and Wolfgang G. Stock, 2007. “Folksonomies and information retrieval,” Proceedings of the American Society for Information Science and Technology, volume 44, number 1, pp. 1–28.

Isabella Peters, Stefanie Haustein and Jens Terliesner, 2011. “Crowdsourcing in article evaluation,” Proceedings of the Third ACM International Conference on Web Science (Koblenz, Germany), at, accessed 18 May 2012.

Elaine Peterson, 2009. “Patron preferences for folksonomy tags: Research findings when both hierarchical subject headings and folksonomy tags are used,” Evidence Based Library and Information Practice, volume 4, number 1, at, accessed 18 May 2012.

M.F. Porter, 1980. “An algorithm for suffix stripping,” Program, volume 14, number 3, pp. 130–137.

Jason Priem and Bradley M. Hemminger, 2010. “Scientometrics 2.0: Toward new metrics of scholarly impact on the social Web,” First Monday, volume 15, number 7, at, accessed 18 May 2012.

Jason Priem, Dario Taraborelli, Paul Groth and Cameron Neylon, 2010. “Altmetrics: A manifesto,” at, accessed 18 May 2012.

Sabrina Reher and Stefanie Haustein, 2010. “Social bookmarking in STM: Putting services to the acid test,” Online, volume 34, number 6, pp. 34–42.

Peter J. Rolla, 2009. “User tags versus subject headings: Can user–supplied data improve subject access to library collections?” Library Resources & Technical Services, volume 53, number 3, pp. 174–184.

Abebe Rorissa, 2010. “A comparative study of Flickr tags and index terms in a general image collection,” Journal of the American Society for Information Science and Technology, volume 61, number 11, pp. 2,230–2,242.

Eleanor Rosch, 1975. “Cognitive reference points,” Cognitive Psychology, volume 7, number 4, pp. 532–547.

Ronald Rousseau, 2002. “Journal evaluation: Technical and practical issues,” Library Trends, volume 50, number 3, pp. 418–439.

Gerard Salton and Chris Buckley, n.d. “Stop word list,” at, accessed 18 May 2012.

Christian Schlögl, 2004. “Zeitschriften des Informationswesens: Eine Expertenbefragung,” In: Eveline Pipp (editor). Ein Jahrzehnt World Wide Web: Rückblick, Standortbestimmung, Ausblick: Tagungsbericht vom 10. Österreichischen Online–Informationstreffen und 11. Österreichischen Dokumentartag, 23. — 26. September 2003, Universität Salzburg, Naturwissenschaftliche Fakultät. Vienna: Phoibos, pp. 63–72.

Christian Schlögl and Juan Gorraiz, 2010. “Comparison of citation and usage indicators: The case of oncology journals,” Scientometrics, volume 82, number 3, pp. 567–580.

Christian Schlögl and Wolfgang Petschnig, 2005. “Library and information science journals: An editor survey,” Library Collections, Acquisitions, and Technical Services, volume 29, number 1, pp. 4–32.

Christian Schlögl and Wolfgang G. Stock, 2004. “Impact and relevance of LIS journals: A scientometric analysis of international and German-language LIS journals–citation analysis versus reader survey,” Journal of the American Society for Information Science and Technology, volume 55, number 13, pp. 1,155–1,168.

James Sinclair and Michael Cardew–Hall, 2008. “The folksonomy tag cloud: When is it useful?” Journal of Information Science, volume 34, number 1, pp. 15–29.

Wolfgang G. Stock, 2007. Information Retrieval: Informationen suchen und finden. Munich: R. Oldenburg.

Wolfgang G. Stock, 2004. “Internationale und deutschsprachige Zeitschriften des Informationswesens: Ein Test der Garfield–Hypothese,” In: Eveline Pipp (editor). Ein Jahrzehnt World Wide Web: Rückblick, Standortbestimmung, Ausblick: Tagungsbericht vom 10. Österreichischen Online–Informationstreffen und 11. Österreichischen Dokumentartag, 23. — 26. September 2003, Universität Salzburg, Naturwissenschaftliche Fakultät. Vienna: Phoibos, pp. 53–62.

Besiki Stvilia and Corinne Jörgensen, 2010. “Member activities and quality of tags in a collection of historical photographs in Flickr,” Journal of the American Society for Information Science and Technology, volume 61, number 12, pp. 2,477–2,489.

Sue Y. Syn and Michael B. Spring, 2009. “Tags as keywords — Comparison of the relative quality of tags and keywords,” Proceedings of the American Society for Information Science and Technology, volume 46, number 1, pp. 1–19.

Jens Terliesner and Isabella Peters, 2011. “Der T–Index als Stabilitätsindikator für dokument–spezifische Tag–Verteilungen,” Proceedings of the International Symposium for Information Science (Hildesheim, Germany), pp 123–133.

Marliese Thomas, Dana M. Caudle and Cecilia M. Schmitz, 2009. “To tag or not to tag?” Library Hi Tech, volume 27, number 3, pp. 411–434.

Thomas Vander Wal, 2005. “Folksonomy explanations” (18 January), at, accessed 18 May 2012.

Kwan Yi and Lois M. Chan, 2009. “Linking folksonomy to Library of Congress subject headings: An exploratory study,” Journal of Documentation, volume 65, number 6, pp. 872–900.


Editorial history

Received 15 June 2012; accepted 16 October 2012.

Creative Commons License
This paper is licensed under a Creative Commons Attribution–NonCommercial–ShareAlike 3.0 Unported License.

Using social bookmarks and tags as alternative indicators of journal content description
by Stefanie Haustein and Isabella Peters
First Monday, Volume 17, Number 11 - 5 November 2012

A Great Cities Initiative of the University of Illinois at Chicago University Library.

© First Monday, 1995-2019. ISSN 1396-0466.