Creators of commercial (.com) and non-commercial (not.com) Web sites presumably have different goals, which may be revealed in the number and nature of links from the sites. This study examines links from the home pages of heavily used Web sites, finding that commercial sites have more links. Not.com sites vary considerably in linking behavior, with .gov and .org sites having the most links. Commercial sites have a lower percentage of links to outside sites, possibly an indication of design for stickiness to keep visitors on site while .net sites show the highest rate of external linking.
Scholarly Communication (Bibliometrics) Perspectives on Citations and Links
Journalism/Media Perspectives on Links
Business/E-Commerce Perspectives on Links
Commercial (.com) Web sites are presumably established to enhance their organizations' marketing efforts, and in the world of "there's no such thing as bad publicity," the more visitors the site attracts and holds, the better. By contrast, non-commercial ("not.com") organizations may have other goals for their sites - for example improving understanding of a topic or connecting site visitors with additional sources of information - which might lead to designs (such as snowball referrals) not aimed to attract and hold the visitor's attention.
This study examined heavily-visited Web sites, looking for differences between the .com and not.com sites in number and nature of links, both internal and external. This report begins with a general discussion of linking on the Web, then considers how links are viewed from three perspectives: scholarly communication, journalism, and business/e-commerce. These perspectives are the basis for comparing heavily-visited .com and not.com Web sites.
Web links have been exploited as indicators of Web site quality and relevance - basically, a site is considered superior if other sites link to it, especially if those other sites are also highly linked. The meteoric rise of the Google search engine, which uses a version of this algorithmic weighting, provides evidence that links can make a difference. Google was founded in 1998, and supplanted Inktomi as Yahoo's search engine in July 2000.
Important and useful as links on the Web appear to be, we know relatively little about when and how links are made. The Internet Tips and Secrets Web site (Lowe and Arevalo-Lowe, 2000) for example, lays out the basics:
A link is the address of a Web page which is maintained on a page which allows you to jump to the new Web page. One of the great things about HTML and the World Wide Web is this ability to link sites together. This results in the internet truly becoming a Web, and the benefits to the surfer are tremendous.
It is normal for virtually every site on the Web to maintain a list of links to "favorite", "featured" or "special" places to visit. This is almost never done with permission, nor does it need to be.
There is some concern over the acceptableness of this linking behavior. Some commercial sites are not happy with linking because it bypasses their "home page" or sidesteps their advertising. There is even some discussion that linking is a trademark infringement or violates some other sacred rights.
Superior Software Solutions, for example, offers advice on improving a Web site's ranking with search engines by increasing connections with other sites:
Trade links with other Web sites that focus on the same topic as yours.
Identify and get listed in topic-specific portals.
Use reverse link lookups to track not only your popularity, but also the popularity of the sites you trade links with.
Ultimately, decisions on when to make links from a Web document reside with the author or owner of the (potentially) linking page. Practices presumably reflect individual experience and preferences as well as perceived modes or models for various genres. For example, contributors to First Monday likely perceive that links from the contents list or introduction to the referenced sections in their articles are appropriate, as are links from the citations to Web-available documents. However, links from one sentence to the next, or to the search engines through which the author found the cited items, may not be considered appropriate by some.
The next four sections consider Web links from various perspectives: Bibliometrics and the study of citations in scholarly communication; Education and learning; Journalism and online newspapers; and Business/electronic commerce. These necessarily brief sketches provide a sense of how linking has been approached in these domains.
Scholarly Communication (Bibliometrics) Perspectives on Citations and Links
Bibliometrics, including the study of scholarly citations, considers both motivations for and consequences of citation. While there are obvious parallels between citing and linking, Rousseau summarizes our limited knowledge of their comparability; "Web pages, however, are usually not scientific articles and links are probably ... made to inform readers where to find more information about the issues presented or discussed on the page" (Rousseau, 1997, p. 1).
Eugene Garfield, the founder of Science Citation Index, proposed reasons for citing another work, including: paying homage to pioneers, giving credit for related work (homage to peers), providing background reading, and providing leads to poorly disseminated, poorly indexed, or uncited work (Garfield, 1965). Authors of electronic (scholarly) publications make links in e-journals for very similar and some novel reasons, including: providing readers with an easy or immediate access mechanism; giving a graphical image of what was presented; and following suggestions of the editor or editorial policy to use links (Kim, 2000).
In bibliometric analyses self-citation - making reference to one's own works - is seen as at least a confounding variable, sometimes as "superfluous" (Tagliacozzo, 1977). MacRoberts and MacRoberts (1989) found reports claiming that from 10 to 30% of all citations are self-citations, but White's (2001) careful analysis found self-citations ranging from 3% to 8% and Snyder and Bonzi (1998) found an average of 9% self citations across academic disciplines. Rousseau's study of Web site links (which he calls "sitations") found 30% "self-sitations" in a study of Web sites devoted to the topic of bibliometrics.
Bibliographic citation has recently been called "the mother of all hyperlinks" (Harnad and Carr, in press), and several researchers have extended bibliometrics approaches from print to the world of the Web (Cronin, in press; Cronin, Snyder, Rosenbaum, Martinson and Callahan, 1998; Larson, 1996). While acknowledging limitations in search engine coverage (Snyder and Rosenbaum, 1999), researchers have, for example, investigated whether Web sites' influence can be deduced from the number of links made to them - the Web Impact Factor (Ingwersen, 1998; Smith, 1999; Thelwall, 2000). Google's ranking algorithm is essentially the same measure.
Journalism/Media Perspectives on Links
Developments in information technology, from the telegraph to the Web, have increased the speed and the potential for interaction between journalist and audience. Heeter (1989) identified six dimensions of media interactivity: complexity of user choice, amount of effort the user must exert, responsiveness to the user, potential to monitor system use, ease of adding information, and facilitation of interpersonal communication. All six clearly relate to potential Web site design and use.
Recently two studies have used Heeter's dimensions of interactivity to compare Web sites. McMillan (1998) hypothesized that health-related Web sites supported by volunteers, non-profit organizations, and governmental or educational institutions would be more interactive than sites supported by for-profit companies or by sponsorship and advertising. She found that the volunteer/non-profit group had greater interactivity in terms of complexity of choice (measured by both number of links from the home page and presence of a search engine for the site). However, when interactivity based on user effort was considered, the for-profit sites (though not those supported by sponsors/advertising) were more likely to have menu bars with hot links to facilitate navigation of the site.
Kenney, Gorelik and Mwangi (2000) analyzed interactivity of online newspapers, again comparing for-profit and non-profit sites. They expected greater interactivity at for-profit sites, assuming that the prevalence of a marketing/publicity perspective would outweigh the transmission model of communication. Of the 100 newspaper sites studies, 52% had some type of links, and 33% had links within news stories. Contrary to their expectation, the non-profit sites were more interactive (in the middle range on the scale) than the for-profit sites. The authors do not indicate how the two groups scored on the number of links; they report only the overall measure of interactivity.
Business/E-Commerce Perspectives on Links
Ha and James (1998, p. 460) summarized the case for interactivity on business Web sites. "Companies are assumed to be interested in interactive communication with consumers. The advocates of relationship marketing contend that more communication between the consumer and the company will build the relationship... result[ing] in higher sales." They looked at the home pages of 110 businesses on the Web created in 1996 - early adopters of this medium. They found that many sites had several (more than 13) unique links from the home page, but most of these were self-promotional - to other parts of the site, about the product or service, or about the company. Third party links "were rare with no more than two links per site average" (p. 468). The authors remarked that the links on these Web sites provided a "false sense of empowerment" because choices are still defined by the company, thus controling access to competing or conflicting messages (p. 470).
Linking behavior, especially self-referential links, is an important piece of potential Web site "stickiness," a term apparently first used to connote the user's difficulty in leaving a Web site in 1997 (Bentley, 1997). Stickiness refers to spiders' design of their Webs with sticky filaments to capture insects which light on the Web; in the business world, stickiness involves retaining users and driving them into the site (Bedoe-Stephens, 1999) or getting them "to leave something of themselves behind" (Bo Peabody, quoted in Seybold, 2000). If stickiness is the goal of Web site design, one would predict Ha and James' findings; commercial sites 1) do not link to outsiders ("retain users' eyeballs") and 2) have many internal links (keep them drilling down).
Essential differences between .com and not.com Web sites are not easily derived from the perspectives on linking behavior outlined above. A rough gradient of self-citing might assume that the more scholarly sites would have a low level of self-citation, if the modest self-citation rates observed for individual authors transfer to their institutional citing patterns. The stereotype of .gov and .org sites as collecting links to external sources of information would corroborate McMillan's (1998) and Kenney, Gorelik and Mwangi's (2000) findings of higher interactivity for non-commercial sites. Commercial sites, presumably aiming to attract and hold visitors, would be more internally referential and have fewer external links in order to enhance their stickiness.
Heavily-visited Web sites were identified using the Media Metrix ratings service. Media Metrix "provides Web site ratings based on a sample of about 100,000 Web surfers worldwide. Most are home users. All have meters on their computers, which monitor the sites they visit" (Sullivan, 2000). The Media Metrix list of the top 50 sites for March and October, 2000, provided a total of 99 .com sites. Heavily visited not.com sites were chosen by selecting all sites in the Media Metrix top 500 listings for March and September 2000, then excluding any .com sites - a total of 99 heavily-visited not.com sites resulted. The domains represented were: .edu, .gov, .mil, .net, and "others" - state organizations such as K12.ca.us and non-U.S. entities such as terra.es (a Spanish portal). Media Metrix delays public posting of its monthly rankings by about three months, so the March samples were checked in August 2000, the September and October lists in December 2000.
Because of the reported high level of interest in pornography on the Web (Cronin and Davenport, in press) and our assumption that Media Metrix monitored surfers might not visit those sites, we also studied two expert-identified porn sites for comparison. Another comparison group of personal pages was created of the home pages for 19 individuals cited in this paper identifiable on the Web as of December 2000.
The home page for each site was checked using the NetMechanic HTML Toolbox (http://www.netmechanic.com/toolbox/html-code.htm) to analyze its links. Following McMillan's (1998) example, only the home pages, not complete sites, were checked. We recorded the number of links, number of bad (broken) links, and number of HREF links (links, as distinguished from images, scripts, etc.). The URLs for links to outside domains were recorded as well. External links were defined as "links to another site not owned by this organization." Thus, a link from Yahoo.com to yahooligans.com or fr.yahoo.com was not considered an external link. Decisions on when a link was truly external were based on the judgment of two reviewers, but did not involve extensive analysis to uncover less obvious ownership or cooperation arrangements.
The home pages for 213 Web sites were evaluated, with the basic counts shown in Table 1. Drawing the Media Metrix samples in both spring and fall produced considerable overlap - 32 of the 99 .com sites appeared both times, and 35 of 99 not.com sites were repeats. In the comparison among types of sites only the second (December) test was used for the repeated sites. The basis for this study, then, is 146 unique Web sites (67 .com, 64 not.com, plus comparison samples of 2 porn sites and 13 individual home pages). From even an initial inspection of the data it was clear that the .com/not.com distinction was too simplistic.
Table 1: Average Number Links by Type of Web Site
*Others include states (e.g. k12.ca.us) and non-U.S. domains (e.g. terra.es)
Domain Number of sites Average number of links Average number of HREF links Average number of external links Percent of HREFs to external links
146 65.6 51.9 4.0 7.8
67 96.9 78.6 4.8 6.0
64 50.2 32.4 3.7 11.3
15 46.6 30.6 1.3 4.4
10 61.8 37.9 2.3 6.1
2 34 16 2 12.5
26 44.2 27.6 5.8 21
5 57.2 46.6 0.2 0.4
6 67.2 43.4 6.2 14.3
individual home pages**
13 16.7 14.1 1.6 11.5
2 31 18 7 38.9
**Individual home pages and pornography sites were selected to test observations from the Media Metrix sites
Following the examples of McMillan (1998) and Kenney, Gorelik and Mwangi (2000), number of links was selected as a first order representation of complexity of choice, a dimension of interactivity. These Web sites' home pages used various kinds of links, all of which were counted by NetMechanic's HTML toolbox. Because many of these were links to include images (.gif, .jpg, etc.) or occasionally program applets, the number of HREF links was chosen as a more precise measure of interactivity. HREFs accounted for an average of 79% of the links for all sites, and 81% for the .com sites. The military sites were the only type which did not devote at least half their linking activity to interaction (47% of all .mil links were HREFs); the .mil sites' links were primarily to images.
The .com home pages averaged over twice as many HREFs as the not.coms overall (78 to 32, respectively). Figure 1 shows this relationship, and breaks out the different types of not.com sites. The .org sites were closest to the .coms, though even they were 25% lower, with an average of 78 HREFs for the .coms, and 58 HREFs for the .orgs. Individual home pages (11 HREFs) and military sites' home pages (3 HREFs) were the lowest.
Figure 1: Average Number of HREFS and External Links
The not.com collection showed a wide range of external linking behavior. External links as a percentage of all HREF links varied from 0.4% for .org sites to 21% for the .net group, and over 38% for the comparison porn sites. As Figure 2 shows, the .com external linking percentage is closest to the .gov sites, with both at 6%. A rough measure of self-citation may be found in the links to other, affiliated sites. For the .com sites, 25% of links to other sites were actually to sites affiliated with or owned by the same organization; for not.com sites, only 10% of the apparently external links were actually self-referential.
Figure 2: Percentage of HREFS to External Sites
NetMechanic's site test for bad links was modified between the two data checks; as of December 2000 it checked the viability of only the first 25 links from a site. Therefore, this portion of the analysis is based on the 95 sites which were checked in August 2000. The overall average for this group was 60 links, and 0.8 bad links, or 1.3% of the links not working when checked. The .com sites averaged 0.2% of links down, while at the other extreme the .net sites averaged 5% down. Overall average for the not.com sites was 2.1% of the links bad when checked.
The low numbers and percentages of external links indicate that Web site home pages are intended to be used differently from scholarly articles, where links to other researchers are expected as part of an author's work. The general perception derived from the literature was that scholarly (.edu) sites would follow the citation patterns of individual scholars with a high percentage of external links. Instead, the .edu sites and the general information/news (.gov, .org) sites cluster with the .com sites at 6% or below (see Figure 2). The heaviest use of external links comes from the home pages of "others" (states and non-U.S. sites), .net (which includes the only porn site in the Media Metrix sample), and purposely sampled porn sites.
The study design limits interpretation in some regards. As external links are also made from pages within a Web site, overall linking profiles are likely to differ from this assessment of home pages. Complex Web sites may lead designers to concentrate links in relatively few, easily maintained areas, which would probably not be the home page - not unlike the journal article with references at the end rather than footnotes on the page where a citation is made. Moreover, limits of human attention (Miller's magical number seven, plus or minus two) may compel the considerate designer to control the number of topics and links on a page.
Nonetheless, speculation on the observed linking behavior is intriguing. Why did .org sites average less than one external link on the home page? Three of the .org sites are complex compilations of links to other sites, with the links made from other pages (Open Directory Project, Webring, and Thinkquest); the other .org sites, Public Broadcasting Service and the Mayo Clinic, appear to serve as information sources with few links to external sources.
The "extroverts" with hefty numbers of external links engaged others for various reasons. The porn sites linked frequently to credit card transaction sites (for business purposes) and to filtering sites (presumably to reduce complaints from inadvertent visitors). The .net sites are an odd assortment of organizations. Most are portals, sources for Internet connections, and such. Others are gaming sites, or offer "free stuff." Some are additional outlets for .com site owners. The only highly visited porn site was also a .net. The state agency and non-U.S. sites' ("others") relatively high use of external links reflects the three non-U.S. .net members of this group, whose profiles were similar to the .nets discussed above.
Potential stickiness, in terms of retaining users and driving them into the site, can be represented by links within the site and to sites owned by the same organization. The .com sites were by far the most connected, averaging 93% more links in total and 143% more HREF links than the not.com sites. This investment in complex .com Web sites was clearly internally focused, however, as the not.com sites made 87% more links to external sites. Another possible measure of site investment, at least from the site administrator's perspective, is the number of bad links. The .com average of 0.3%, 7 times lower than the not.com's, indicates a considerable investment in site design and maintenance.
The six months between samples amounts to about 3.5 Internet years, and as Table 2 shows, there was some evidence of change over that period. The average number of links went up for the not.com sites, but down for the .com sites. The number of HREF links increased for both groups, while the number of external links decreased for both. From this snapshot it appears that differences between the types of sites are diminishing, and that both types are reducing their connections with the outside world.
Table 2: Changes in Average Number of Links,
for the 67 sites sampled in both August 2000 and December 2000
Average number of links Percentage of change Average number of HREF links Percentage of change Average number of external links Percentage of change All
63->84 33.8 59->61 4.5 4.3->3.9 -8.7 .com
148->115 -22.1 85->887 2.5 4.1->3.7 -9.2 not.com
53->58 8.9 34->37 7 4.5->4.1 -8.3
The assumption that Media Metrix' monitoring would alter behavior away from the reportedly heavy use of the Web for pornographic material was difficult to test. One apparently pornographic site, bigfast.net, showed up in both the March and September Media Metrix reports. However, its linking behavior is noticeably different from the porn sites suggested by the local expert. These two "control" sites had external links to credit card sites or filtering sites, while the site ranked in the top 500 had no such links but did appear to be part of the reciprocal referral chain through which porn sites link to each other (Glidewell, 2000). It is possible that the apparently external links in this chain are in fact to sites owned or controlled by a single organization.
This snapshot of Web site home pages, as of 2000, confirms some assumptions, disputes others, and reveals several avenues for further investigation. The Web-as-business model is evident, with home pages being used to keep visitors on site. The .com sites accomplished this through complex sites (lots of links) as well as self-referencing (low percentage of external links). Most other types also had little use of external links, except in the pornography and .net areas. The significantly higher investment in .com sites reflects Cho and Garcia-Molina's observation that 40% of the pages in this domain changed in some respect at least once a day, while 50% of .edu and .gov pages did not change in their four month observation. The introduction of additional domain types will provide opportunities to look for further differentiation among Web sites.
Analyzing Web sites' home pages provides a sense of how organizations would have themselves perceived by visitors, but cannot capture the full range of linking behavior. Indeed, external links may well be driven off the home pages as sites become larger and more complex. A follow-up investigation of links on entire sites would provide a more complete picture of representation, reference and referral in the Web.
About the Author
Debora Shaw is Associate Professor in the Indiana University School of Library and Information Science. She has taught courses on organization of information, database design, and online searching. Her research interests include applications of bibliometrics, the use of quantitative analysis to describe patterns of publication or scholarly activity.
The author thanks Blaise Cronin, Charles H. Davis, and Howard Rosenbaum, who provided expert advice and reviewed drafts of this paper. Bo Rim Lee and Erik Estep assisted with data collection and coding.
Paul Bedoe-Stephens, 1999. "Yahoo: Gettin' Sticky with It," Wired News, at http://www.wired.com/news/culture/0,1284,18229,00.html, consulted 23 January 2001.
Trevor Bentley, 1997. "Webs Are for Catching Flies," Management Accounting, volume 75, number 4, p. 52.
Junghoo Cho and Hector Garcia-Molina, 1999. "The Evolution of the Web and Implications for an Incremental Crawler," available from the Stanford Database Group Publication Server at http://www-db.stanford.edu/~cho/papers/incre.pdf
Blaise Cronin, in press. "Bibliometrics and Beyond: Some Thoughts on Web-Based Citation Analysis," Journal of Information Science.
Blaise Cronin and Elisabeth Davenport, in press. "Erogenous Zones: Positioning Pornography in the Digital Economy," The Information Society.
Blaise Cronin, Herbert W. Snyder, Howard Rosenbaum, Anna Martinson and Ewa Callahan, 1998. "Invoked on the Web," Journal of the American Society for Information Science, volume 49, number 14, pp. 1319-1328.
Eugene Garfield, 1965. "Can Citation Indexing Be Automated?" In: Mary E. Stevens, Vincent E. Giuliano and Laurence B. Heilprin (editors). Statistical Association Methods for Mechanized Documentation, Symposium Proceedings, Washington 1964. NBS Miscellaneous Publication 269. Washington D.C.: National Bureau of Standards, pp. 189-192.
Richard A. Glidewell, 2000. "Porn's Parallel Web Universe," UpsideToday, at http://www.upside.com/Ebiz/38adbbff0.html, consulted 27 January 2001.
Louisa Ha and E. Lincoln James, 1998. "Interactivity Reexamined: a Baseline Analysis of Early Business Web Sites," Journal of Broadcasting & Electronic Media, volume 42, number 4, pp. 457-474.
Stevan Harnad and Leslie Carr, in press. "Integrating, Navigating and Analyzing Open Eprint Archives through Open Citation Linking (The OpCit Project)," Current Science, and at http://www.cogsci.soton.ac.uk/~harnad/Papers/Harnad/harnad00.citation.htm, consulted 25 January 2001.
Carrie Heeter, 1989. "Implications of New Interactive Technologies for Conceptualizing Communication," In: J.L. Salvaggio and J. Bryant (editors). Media Use in the Information Age: Emerging Patterns of Adoption and Consumer Use. Hillsdale, N.J.: Lawrence Erlbaum Associates, pp. 221-225.
Peter Ingwersen, 1998. "The Calculation of Web Impact Factors," Journal of Documentation, volume 54, number 2, pp. 236-243.
Keith Kenney, Alexander Gorelik and Sam Mwangi, 2000. "Interactive Features of Online Newspapers," First Monday, volume 5, number 1, at http://firstmonday.org/issues/issue5_1/kenney/, consulted 25 January 2001.
Hak Joon Kim, 2000. "Motivations for Hyperlinking in Scholarly Electronic Articles: A Qualitative Study," Journal of the American Society for Information Science, volume 51, number 10, p. 891.
Ray R. Larson, 1996. "Bibliometrics and the World Wide Web: An Exploratory Analysis of the Intellectual Structure of Cyberspace," In: S. Hardin (editor). Global Complexity: Information, Chaos, and Control. Proceedings of the 59th Annual Meeting of the American Society for Information Science, pp. 71-83, and at http://sherlock.berkeley.edu/asis96/asis96.html
Richard Lowe and Claudia Arevalo-Lowe, 2000. "Internet Tips and Secrets," at http://www.internet-tips.net/Webrings/internettip.htm, consulted 18 August 2000.
Michael H. MacRoberts and Barbara R. MacRoberts, 1989. "Problems of Citation Analysis: A Critical Review," Journal of the American Society for Information Science, volume 40, number 5, pp. 342-349.
Sally J. McMillan, 1998. "Who Pays for Content? Funding in Interactive Media," Journal of Computer Mediated Communication, volume 4, number 1, at http://www.ascusc.org/jcmc/vol4/issue1/mcmillan.html, consulted 23 January 2001.
Ronald Rousseau, 1997. "Situations: An Exploratory Study," Cybermetrics,, volume 1, number 1, pp. 1-9.
Patricia Seybold, 2000. "Ubiquity Breeds Wealth," Business 2.0 (1 March), http://www.business2.com/content/magazine/indepth/2000/03/01/20685, consulted 23 January 2001.
Alastair G. Smith, 1999. "A Tale of Two Web Spaces: Comparing Sites using Web Impact Factors," Journal of Documentation, volume 55, number 5, pp. 577-592.
Herbert Snyder and Susan Bonzi, 1998. "Patterns of self-citation across disciplines (1980-1989)," Journal of Information Science, volume 24, number 6, pp. 431-435.
Herbert Snyder and Howard Rosenbaum, 1999. "Can Search Engines Be Used as Tools for Web-Link Analysis? A Critical View?" Journal of Documentation, volume 55, number 4, pp. 375-384.
Danny Sullivan, 2000. "Media Metrix: Search Engine Ratings," Search Engine Watch, at http://searchenginewatch.com/reports/mediametrix.html, consulted 23 January 2001.
Superior Software Solutions, "Search Engine and Link Analysis," at http://www.supersoft-solutions.com/linkanal.htm, consulted 18 August 2000.
Renata Tagliacozzo, 1977. "Self-Citations in Scientific Literature," Journal of Documentation, volume 33, number 4, pp. 251-265.
Mike Thelwall, 2000. "Web Impact Factors and Search Engine Coverage," Journal of Documentation, volume 56, number 2, pp. 185-189.
Howard D. White, 2001. "Authors as Citers over Time," Journal of the American Society for Information Science and Technology, volume 52, number 2, pp. 87-108.
Paper received 31 January 2001; accepted 21 February 2001.
Copyright ©2001, First Monday
Playing the Links: Interactivity and Stickiness in .Com and "Not.Com" Web Sites by Debora Shaw
First Monday, volume 6, number 3 (March 2001),
A Great Cities Initiative of the University of Illinois at Chicago University Library.
© First Monday, 1995-2013.