By reverse engineering Google queries and by tracing back the referrer values found in Apache log files, the use of images made available from infomotions.com is examined. Ethical and economic questions questions are then asked. While all the images from the site are freely available under the GNU Public License, they are not always used in the intended manner. This raises interesting questions regarding the time spent making the images available, the expense of the hardware and network connections, and whether or not the application of the images is put to good and moral purposes. This essay addresses these questions in an attempt to come to an understanding regarding the place of data, information, and knowledge in an open environment.Contents
Introduction
Log file analysis
The issues restated with possible solutions
Conclusion
Introduction
I spend my days at the University of Notre Dame (Notre Dame, Indiana) as an academic librarian where I help the University Libraries do stuff digital. This has meant working with teams of people to maintain the Libraries’ Web site, maintaining the campuswide search engine, maintaining the Libraries intranet, conducting usability tests, putting into practice usercentered design, supporting an open source software digital library toolkit called MyLibrary, collaborating on a National Science Foundation grant called Ockham, and most recently implementing an institutional repository.
Because of my enthusiasm for the job I am often asked to give outside talks and conduct workshops. This type of consulting work is something academics often do. To support the expenses of doing consulting Infomotions, Inc. — a closelyheld Class B corporation was created. Its existence makes it easier to keep the money generated from my vocation separate from the money generated by the consulting. The purpose of Infomotions, Inc. is not to generate a million dollars but to pay for the expenses of running a business while providing the infrastructure for doing business. Infomotions, Inc. purchases and pays for hardware/software, Internet connections, taxes, and tax preparation. It supplements my learning through the purchase of books and computer toys (PalmPilots, iPods, and digital cameras, etc.) Infomotions, Inc. pays for incidentals such a tips with no receipts, and other miscellaneous items.
Infomotions.com is a modernday shingle for Infomotions, Inc. It is a place where people can see the types of products and services Infomotions, Inc. provides. It is also a place where people can read, download, and experience my abilities as a librarian. Just as importantly, infomotions.com is a place where I can play and put into practice the principles of librarianship as I see them. It is a sandbox where I can experiment with lists of books, MARC records, and Z39.50. I use infomotions.com to markup mark up texts in TEI and repurpose the results. Infomotions.com hosts databases and indexes. Infomotions.com uses XML to catalog my water collection and XSLT to provide access to it via a Web browser. I write articles, presentations, travel logs complete with pictures, and computer software. I give all of these things away through searchable/browsable interfaces as well as RSS feeds and OAI interfaces. Like Ben Franklins printing press, Infomotions.com is a medium for expressing my ideas. It is a place where I can freely describe and demonstrate to the outside world my abilities as a librarian. I have been practicing open source software distribution and open access publishing for longer than the phrases have been coined.
Herein lies the first dilemma. How do I balance my desire to practice some of the ideals of librarianship free and equal access to information with the realities of business? How can I afford to spend time writing new software, maintain established content, and freely disseminate it when it is necessary to pay the tax man and the telephone company significant expenses?
Then there is the second dilemma. Even though services and content from Infomotions.com is freely distributed under the GNU Public License, how do I reconcile this with the fact that the content is used in ways I believe to be inappropriate and/or without attribution. Log file analysis clearly shows how the resources of Infomotions, Inc. (notably disk space, CPU time, content, and network connections) are being exploited without compensation. Maybe the people using the content are ignorant of their actions. Maybe they are trying to hide in anonymity. But information wants to be free. Right? Infomotions, Inc. should take all of this in stride. Right? Infomotions, Inc. should figure this is just a cost of doing business in a globally networked environment. Right? Probably not.
So my two dilemmas remain. First, how does Infomotions, Inc, put into practice open source software distribution, open access publishing, and the principle of free and equal access to information when it incurs significant personal expenses? Second, once the content is made available, to what degree does Infomotions, Inc. have a say in how it is accessed, how often it gets copied, and into what contexts it is placed? Put another way, how does Infomotions, Inc. build a sustainable distribution model in an environment where there are significant costs and where content can be very easily used as well as abused?
Log file analysis
The computing resources of Infomotions, Inc. are minimal. They include a sixyear old, singleprocessor, Intelbased computer running Linux on top of 512 MB of RAM and an 18 GB hard disk drive. Most desktop machines have more horsepower than the server. The bottleneck regarding the servers functionality is not its CPU but rather its network connection. Infomotions.com requires a static IP address, and Infomotions, Inc. can only afford a DSL connection with an outbound speed of 512 Kbps/second. This means connections to infomotions.com are slow, especially when many have DSL connections with 23 MB/second inbound speeds. The network is only as fast as the slowest connection, and infomotions.com is quite likely the weakest link in the chain.
Log files of infomotions.com are interesting to evaluate. During the month of January 2006, infomotions.com processed 2.2 million requests. In February 2.4 million requests were processed. This came to a total of 79 GB of data sent over the wires to 300,000 distinct computers around the world. Not bad for such a small computing infrastructure, and it just goes to show how you dont need a big bad computer do to this kind of work.
Sometimes, just for fun, I just watch as the Apache server log files scroll by on my terminal. Im not so interested in what is getting used as much as I am interested in where traffic is coming from the referrer information. Most of my traffic comes from three places: 1) my own site; 2) search engines like Google, Yahoo, and MSN; and, 3) social networks like MySpace. The referrals from my own site are the results of local search engine queries, links to cascading style sheets, and images incorporated into articles and travel logs. The referrals from search engines point to images or sets of electronic texts I have collected and repurposed. The referrals from the social networks almost always point to images. Figure 1 illustrates this point in the form of a pie chart. It depicts referring URLs from the combined log file data of January and February of 2006 but excludes URLs from Infomotions.com:
Figure 1: Referring URLs (excluding URLs from Infomotions.com).
I consider referrals from own site legitimate, naturally. I consider referrals from search engines beneficial. On the other hand, referrals not from my own site and not from search engines are often times dubious. During January and February there were roughly 235,000 referrals from such places. Figure 2 illustrates where these referrals came from, and as you can see they mostly came from social networking sites:
Figure 2: Referring sites not from Infomotions.com and not from search engines.
These 235,000 referrals from social networking sites accounted for roughly 5 percent of the total requests during the months of January and February, and they resulted roughly 9.5 GB of transmitted data. That was 12 percent of Infomotions.coms total throughput. These social networking sites consumed a disproportionate amount of Infomotions, Inc.s most expensive and least plentiful resource, network bandwidth.
Example usages
Using the referral information found in the logs it easy and sometimes fun to see how the images are used. I have been able to distinguish between four types of uses.
- Backgrounds
The first usage is to provide an aesthetic background or establish a theme. I have found that various flower images or the image of neon lights seems to be a popular choice. Frankly, I think something with less detail would be more appealing, but there is no accounting for taste. In these cases it is difficult for the HTML author to seamlessly attribute the source of the image. Here are two examples:
daffodils usage (cache) Jamaican flower usage (cache)
Emphasizing a point The most common use of the images is to illustrate or emphasis a point. In these cases some topic is being discussed (college, beauty, strength, death, etc.) and the images are used to reenforce the authors opinion. It would be nice if the authors would link back to Infomotions.com, but the vast majority do not. The use of the images in this manner is analogous to quoting something from a journal article but not citing the article. You might say this is poor scholarship, but then again, these people are not writing scholarly articles. Here are two typical examples:
kittens usage (cache) neon usage (cache)
Advertisements Probably the most egregious use of the images from Infomotions.com is copy for products or services. Granted, this does not occur very often, but the authors of these pages are using the images for their personal gain. In at least one example there are links back to Infomotions.com, but in most there are not. Two examples follow:
tractor usage (cache) first home usage (cache)
Avatars Some use the images from Infomotions.com as avatars pictorial representations of themselves. This happens on modernday bulletin board systems. Create an account. Enter your name and address. Insert a URL to a picture representing you. Post a comment to a Web site and your avatar is displayed too. To display that avatar a connection to Infomotions.com is made and the image is sent. Other times there are referrals in the logs pointing to Webhosted email servers. These are untraceable because the reading the email messages are token/sessionbased, and I am not able to log into their email accounts, obviously. Here two examples for the former use:
cowboy usage (cache) Sam Houston usage (cache)
Popular pictures
More detailed log file analysis allows me to determine which pictures are most popular. They are listed below and include their title, the number of times used during January and February of 2006, and the total number of bytes transfered.
Sam Houston
28,302 hits
.64 GB
Google image search
lily
19,926 hits
.38 GB
Google image search
graveyard
19,479 hits
1.8 GB
Google image search
neon
12,898 hits
.77 GB
Google image search
rose
6,499 hits
.15 GB
Google image search
kittens
5,220 hits
.14 GB
Google image search
devil
4,894 hits
.33 GB
Google image search
first home
2,653 hits
.09 GB
Google image search
How do people find these images in the first place? I suspect it is through Googles Image Search because almost all of them can be found on the first or second page of Googles search results. Each hot link associated with the image searches Google for a simple word. Try the links and see for yourself.
Try it for yourself
You can do a bit of log file analysis for yourself with the report and tools created for this article. For example you can view the entire log file report (all 4 MB of it) or you can use the search tool created to do more detailed analysis. Interesting queries include myspace, rose, building, yucca, lilly, or sculpture. Remember, the network connection to Infomotions.com is definitely on the slow side.
The issues restated with possible solutions
The issues can be restated as two questions:
- The accessibility of freely available content from Infomotions.com is a drain on Infomotions, Inc.s resources. How can Infomotions, Inc. lower its expenses and/or increase its revenue in order to eliminate or at least minimize this problem?
- There are people who are using the content from Infomotions.com in a dubious manner. Are the copyright expectations of Infomotions, Inc. clearly articulated, explicitly stated, and to what degree enforceable?
A number of solutions present themselves, and each are outlined in the following sections.
Remove images
One solution is to simply remove the images from Infomotions.com. The articles, presentations, and software of Infomotions, Inc. are the best examples of what Infomotions, Inc. can do. The images could be viewed as eye candy and therefore unnecessary. On the other hand, the way the images are created, maintained, displayed, and made accessible through searchable as well as browsable interfaces do exemplify the skills of Infomotions, Inc. Removing the images would be defeating much of the purpose of Infomotions.com.
Programatically limit referrals
There are at least two programmatic ways of limiting the undesirable referrals to Infomotions.com. The first way is to trap undesirable referrals before they are served. This can be accomplished by updating the Infomotions.com Web server (Apache) configuration file something like this [1]:
RewriteEngine On
RewriteCond %{HTTP_REFERER} !=""
RewriteCond %{HTTP_REFERER} "!^http:/infomotions.com/.*$" [NC]
RewriteCond %{REQUEST_URI} "\.(gif|jpg|jpeg|png)$"
RewriteRule .* - [F]This set of lines:
- Turns on a URL rewriting module in Apache;
- Checks to see if there is no referrer information, and if not then do nothing;
- Checks to see if the referrer is from Infomotions.com, and if so then do nothing;
- Checks to see if the requested URL is an image, and if so then continue; and,
- If it gets this far, then return to the browser a fail request; deny the request.
To really make this recipe work well, I would want to change Line 3 to allow things like Google or other indexers to read the images.
The second programatic way of limiting referrals this through some sort of redirection program. This program would take a pointer representing an image as input, do the same sort of referral checking like the recipe above, and return an image. While implementable and Web server independent, it this solution is not very practical.
License referrals
An entrepreneurial solution is to license the images. To do this I would change the last line of the recipe to point to an image different from the requested one. The different image could say something like, You are stealing an image from Infomotions, Inc. Go to this URL to see how you can license this image. When the user goes to URL they could be asked for their name, email address, URL of the page where the image is desired, and a monetary donation. Upon receipt of the URL and donation Line 3 could be updated to allow the referral to be successful.
I seriously doubt this solution would generate very much money. It would be seen as too much of a hassle to complete the necessary forms and send a donation. The potential author would probably just go find another image.
Advertise
Since December of 2005 Infomotions, Inc. has been experimenting with advertising, and this has been moderately successful and relatively painless. Here is how the process works:
- Request to be an advertiser with Google or Yahoo.
- After making sure your site is legitimate Google or Yahoo gives you a special number.
- Visit Google or Yahoos advertising site and log in with your number.
- Use the online forms to generate the size and shape of banners ads.
- When finished, copy the resulting Javascript snippet, and insert it into your HTML.
- When people visit your pages the ads are displayed.
- When people click on the ads you earn a few cents. Sometimes you earn a few cents just for having an ad displayed.
- Receive a check from Google or Yahoo once a month.
- Return to Step 3 as desired.
When I first started doing this I earned about US$1 a day, but after reading the instructions more carefully, I placed the ads differently, and now I earn about US$4 a day. While I can not quit my day job based this sum, it does cover the costs of Infomotions, Inc.s Internet connection.
Implementing ads is a bit of an ethical dilemma for me. Advertising encourages people to buy things they did not necessarily want to purchase. It can be seen as misleading. Are the information literacy skills of my readers hight enough to discern the difference between ads and real content? While I am still not 100 percent comfortable with this solution, I believe the librarian in me is making a bigger deal about the issue than is warranted [2].
Participate in affiliate programs
I tried to be an affiliate with Amazon.com, but that didnt work very well.
A significant portion of Infomotions, Inc.s content is made of the electronic texts, the Alex Catalogue of Electronic Texts. The Catalogue is a list of public domain documents from places like Project Gutenberg and a defunct archive from Virginia Tech. A newer version of the Catalogue includes some of the texts marked up in TEI and transformed through XSL into HTML, PDF designed for printing, PalmPilot, Rocket eBook, and even Newton Paperback files. (Remember the Apple handheld Newton device?)
As an added value, I included the ability to purchase versions of books from the Catalogue through Amazon.com. As things like Huckleberry Finn were displayed user have the ability to download any number of digital versions of the text but also get a printed book from Amazon.com. Since December these links have been displayed tens of thousands of times, but to date only one book has been purchased and Infomotions, Inc. has earned only US$0.87 from the venture.
Sell merchandise
An additional possibility for increasing revenue is to sell merchandise, and CafePress makes it easy to do this, but this too has proven to be less than profitable.
CafePress is essentially a print shop. You send them one or more images, and they will print them on anything from tshirts to mugs, mouse pads to clocks, etc. You then display your wares on your Web site, people place orders, and CafePress takes care of shipping, credit card handling, etc. Since December of 2005 Infomotions, Inc. has had a store allowing people to buy such merchandise and like the Amazon.com venture the store has been displayed thousands of times but it has yet to generate a single sale.
Add watermarks
Adding watermarks to the images stating their ownership by Infomotions, Inc. and availability on Infomotions.com will raise copyright issues surrounding the images, but they will also detract from the images aesthetic appeal and probably not really generate additional income. Explicitly stating copyright is the prudent thing to do though.
Embed rights statements
Akin to adding watermarks is embedding copyright statements within the images and prominently displaying the same statements on the Web site. Presently everything is available under the GNU Public License. Upon reflection I am now leaning towards some form of a Creative Commons license because it are more flexible to design and easy to integrate to files. This process will also make the copyrights more explicit.
Conclusion
Infomotions, Inc. is not able to provide free disk space to people on social networks. It simply does not have the expendable resources to make this a reality, and it is counterproductive to its purposes. Despite the professional ethics involved, Infomotions, Inc. will probably implement a nonredirection policy making it difficult to refer image links from nonInfomotions.com Web sites. Infomotions, Inc. will probably also try to license images because the expense of such a venture is inexpensive. Infomotions, Inc. will also continue to display ads from Google. The ads generate enough income to pay for network connections and they are not associated with the more scholarly materials on Infomotions.com. Furthermore, people probably have more information literacy skills than I give them credit for and they know when they are seeing an ad as opposed to real content. Selling merchandise and being an affiliate has produced no income, and increasingly this is seen by Infomotions, Inc. as more of a free advertising ploy as opposed to an revenuegenerating venture.
As our economy has moved away from industry and towards services, data and information have become the commodities of the day. Infomotions, Inc. strives to support the professional development of an individual in such an environment yet the free dissemination of data and information is not really free. It must be supported, and it is irresponsible to allow people to take advantage of the system without due compensation. Creating sustainable and free access to data and information is not an easy task, but at the same time it not too difficult if a person is willing let go of a few lofty and idealistic concepts regarding the access of data, information, and knowledge.
About the author
Eric Lease Morgan is the Head of the Digital Access and Information Architecture Department at the University Libraries ofthe University of Notre Dame.
Email: emorgan [at] nd [dot] edu
Notes
1. This solution is taken directly from a book by Ken Coar and Rich Bowen called Apache cookbook (published by OReilly, 2004), p. 86.
2. Ironically, a couple of years ago I was approached by a marketer via email. He asked me to put sets of innocuous ads throughout the Infomotions.com Web site. Buy refrigerators from refrigerators.com. Buy flowers from flowers.com. Etc. He was going to pay me US$360/month. I turned him down because my professional ethics got in the way. Dumb!?
Editorial history
Paper received 8 May 2006; accepted 19 May 2006.
Copyright ©2006, First Monday.
Copyright ©2006, Eric Lease Morgan.
Ethical and economic issues surrounding freely available images found on the Web by Eric Lease Morgan
First Monday, volume 11, number 7 (July 2006),
URL: http://firstmonday.org/issues/issue11_7/morgan/index.html