Primary Sources, Research, and the Internet: The Digital Scriptorium at Duke
Primary Sources, Research, and the Internet: The Digital Scriptorium at Duke by Steven L. Hensen

As the digital revolution moves us ever closer to the idea of the "virtual library," repositories of primary sources and other archival materials have both a special opportunity and responsibility. Since the materials in their custody are, by definition, often unique, these institutions will need to work very carefully with scholars and other researchers to determine what is the most effective way of making this material accessible in a digital environment.

In the Special Collections Library at Duke University, there is currently an office called the "Digital Scriptorium." It is intended to be a place where faculty and students can come to use computers to conduct their research into digital primary sources as well as to enhance their research by creating digitized versions of materials from the library's collections that can then be made available on the Internet for others to use, contributing, it is hoped, to a new paradigm for research and scholarship.


A Virtual Special Collections
Papyrus Online
Metadata and Archives
Future Prospects


A year or so ago at a symposium at Duke on the future of the Library a member of our esteemed faculty was wondering out loud why the library couldn't just "digitize everything" so that the entire contents of the library - and, indeed, why not every library? - might be accessible over the Internet [ 1 ]. Such an advance certainly would give legitimacy to the somewhat inchoate concept of the "virtual library" as it has been emerging these last several years - a notion, I might add, more compelling so far in its conception than in its implementation. Since this professor was no digital naïf and was fully aware of the technological implications of such a venture, it was further suggested that no one library should have to bear the entire brunt of this effort and that if each library took upon it the responsibility of digitizing ten - or even a hundred - volumes each, eventually the entire world's body of knowledge would soon be reduced to its equivalent in digital bytes which would then be accessible from any workstation anywhere that is connected to the Internet.

While the scale and realities of such aspirations might make those of us in the libraries wince (especially since we would be the ones responsible for doing this), it is hard to fault the basic expectations underlying this. After all, it is the scholar's job to conduct research; and it is the research librarian's job to make the raw material of that research - monographs, serial publications, primary resources, etc. - as accessible as possible. For the past several years, we librarians have been touting the concept of the "virtual" or digital library and have offered up just enough examples to give this idea some legitimacy. We now know that it is technically, or more exactly, technologically possible to "digitize everything." Like Archimedes and his lever, with the proper resources it is possible to move the very earth itself; all that is needed are enough scanners and workstations to do the work, enough petabytes of disk space to store the material, and enough bandwidth to move it quickly and efficiently around the world.

However, granting this technical capacity does not mean that such an undertaking can be done (at least not easily); nor does it even mean that it necessarily should be done. There are complex questions beyond the simple ones of scale articulated in the language of petabytes and bandwidth. A recent issue of Forbes ASAP carrying the subtitle "The Digital Revolution: Where Do We Go From Here?" [ 2 ] contains brief articles on this question from a variety of "seers and sages," including several prominent historians. All agreed that this revolution had enormous implications for historical research - from the demographic and statistical number-crunching requirements of cliometrics to new possibilities for more traditional research into archival and primary resources inherent in presenting those resources digitally. At the same time, limitations were also seen and concomitant concerns voiced. And, while there was little agreement on the precise characteristics of these concerns, Mark Helprin seemed to best capture the sense of overall caution in saying, "When expanding one's powers as we are in the midst of now doing by many order of magnitude in the mastery of information, we must always be aware of our natural limitations, mortal requirements, and humane preferences [ 3 ]."

Not surprisingly, the libraries and archives of the world are grappling with many of the same issues. It is not a simple failure of will in our hesitation to "digitize everything." Even if we currently had the technical capacity (which we don't, but that is another story altogether), I'm not sure that it would be wise to proceed recklessly in that direction. There are larger questions of policy and philosophy that first must be acknowledged and addressed. For ultimately, we believe that the matter of Internet access to research materials and collections is not one of simply doing what we have always done - except digitally. It represents instead an opportunity to rethink the fundamental triangular relationship between libraries and archives, their collections, and their users.

The Special Collections Library at Duke University is actively exploring these issues in ways that we hope are both innovative and useful. I believe our experiences may help illuminate the challenges and opportunities - for both libraries and their users - inherent in the digital library concept, particularly in an archival setting.

The Library

Some background on the Library first: The holdings of the Special Collections Library range from ancient papyri to the records of modern advertising. They number more than 200,000 printed volumes and more than 9,500,000 items in manuscript and archival collections. They support research in a wide variety of disciplines and programs, including African-American studies, anthropology, classics, economics, history, literature, political science, religion, sociology, and women's studies. Particular strengths in the Library's holdings include materials relating to the history and culture of the American South, the history of Great Britain and the British Empire, Wesleyana and British Methodism, modern American (particularly Southern) literature, American literary historiography, advertising, and the history of economics.

While these holdings include some truly singular collections of primary research materials (of which we are justly - though we hope not excessively - proud), the above description, with minor variations in its particulars, could characterize a number of prominent research libraries and archives. What sets the Special Collections Library at Duke apart, however, is, and has always been, its unswerving approach to access to its collections. While the library's mission statement declares that "the library's holdings are developed in relation to instructional and research interests in the University [and] are available for use by visiting scholars and the general public as well as Duke faculty and students," the reality of this is that access to and use of our collections is not just a matter of passive lip-service, but instead is something that is actively encouraged and promoted - for faculty, visiting scholars, graduate students, undergraduates, and the general public. Our rare books and manuscripts are not simply museum objects treasured more for their artifactual or associational characteristics than they are for the light they shed on human experience. Towards this end, we view ourselves not so much as a fortress in which we are protectors and guardians of our collections (though we certainly do our best to protect and preserve them), as we do as advocates for and facilitators of their use.

Thus, for many years now we have done what we can to advance that use. We have described and cataloged our holdings, prepared finding aids and indexes to the archival and manuscript collections, published a guide to our manuscript holdings, reported to the National Union Catalog of Manuscript Collections, mounted library exhibits, made classroom presentations, and have consulted with faculty members and students both with respect to collection development and to potential pedagogical and research uses of specific collections for classroom assignments. In addition, through a generous gift of an alumnus of the University, we have been able to establish an annual writing prize for both graduate and undergraduate students for research papers using the resources of the Special Collections Library.

The advent of online public access catalogs and the sharing of cataloging information among libraries though national bibliographic utilities gave our efforts in this direction a little more breadth and coverage. Potential users at some physical remove from the university could now discover more detailed information about our holdings, but it still required considerable correspondence between reference staff and those researchers for anything more precise. It was not until the arrival of the Internet that an entirely new paradigm of access to primary resources began to develop.

A Virtual Special Collections

In 1992 at a meeting of the Southeast Archives and Records Conference in Virginia Beach, Va., I was giving a paper in which I was speculating on the implications of the then-emerging Internet and the "virtual library" (I believe I had just heard the term for the first time scant weeks before this presentation) as it might affect the world of archival description and access. I was focused then on the technical difficulties inherent in contemplating any "reasonable scenario that would involve the digital scanning of thousands of feet of routine state records, let alone the scanning and transmission of the correspondence files of some controversial contemporary literary figure [ 4 ]." While the difficulties associated with this proposition are no less daunting today, it does seem even clearer now that the primary materials - such as the manuscript and archival collections and the rare books - that typically make up the holdings of special collections libraries have a unique opportunity and, dare I say, responsibility in this new environment.

The reasons for this ought to be obvious. First, as rare and, in many cases, utterly unique, research resources that almost never circulate or are loaned in the manner of other library materials, such materials usually require scholars and researchers to travel to the library in which they are held in order to use them. Anything that can be done to make these materials accessible remotely makes the job of basic historical and literary research that much easier.

Second, with one of the greatest barriers to propagating the virtual library being the fear of publishers over imagined or potential violations of their copyright interests, there is obviously a great advantage in putting up material on the Internet for which copyright is not a particular problem. And, while there are certainly copyright issues associated with many primary resources, the onus here most often lies with the users of the material rather than with the institution making it available. Thus, archival and other primary resources represent an enormous supply of rich and meaningful Internet content, with few of the copyright-related problems that are inherent in almost all published works.

Finally, the digitizing of unique and fragile primary resources addresses directly the preservation problems inherent in providing direct hands-on access to these materials. With fewer people actually handling these materials the risk of undue wear and damage by such handling is concomitantly decreased. And, while there is still some question about the long-term viability of digitizing as a preservation strategy, there are few that would argue that from an access perspective, sitting in one's own office at one's own computer console with a high-resolution color image of a document is highly preferable to sitting in a library looking at a black and white film image of the same document with your head inside a microfilm reader.

Thus, the Special Collections Library at Duke required little convincing to start exploring new ways to make our collections more accessible electronically. The fact that we had a generous donor that had given us funds specifically earmarked for promoting such access certainly made it possible for us to begin a little sooner; but even in the early days of the evolution of the Internet it quickly became clear to us that we had some unique opportunities to move boldly in interesting and important new directions.

The Digital Scriptorium was established in the Library to facilitate this. It consists of both a physical space with a Sun Solaris server, several high-end workstations, and high-resolution color scanners. It is staffed by a full-time director and several student assistants. However, it is also a virtual space and an evolving idea. All of this was made possible not only through the evolution of the technology and the instrumentality of available funds, but also because we were able to hire staff that had both an essential humanistic perspective as well as the requisite computer skills. Through a combination of the technology and this perspective we were able to articulate a vision and sense of purpose for this operation that embodies our essential mission while still moving the implementation of that mission in directions consistent with the potential of the technology. The primary goals of the Scriptorium are to

  1. "enhance access and aid preservation;"
  2. "to increase research value;" and
  3. "to create knowledge."
I would like to discuss several Scriptorium projects by way of illustrating not only how we are attempting to meet these goals, but also how our experiences iluminate the larger challenges and questions raised by this technology.

Papyrus Online

Perhaps the most striking example can be found in the Duke Papyrus Archive. A little over five years ago, the Special Collections Library made an application to the National Endowment for the Humanities as part of an attempt to introduce some order to an area of our collections that had heretofore received little attention. Initially, it was our hope to bring Duke's somewhat large and distinguished collection of ancient Greek and Egyptian papyri under basic physical and intellectual control by requesting grant funds for conservation and cataloging work.

The cataloging work took the somewhat innovative approach of using what was then becoming the standard cataloging format for modern archival and manuscript material. Although the papyri were not traditionally thought of in this sense, it seemed logical to take the position that, whatever the antiquity of these papers, they were, as bills, receipts, government decrees, letters, scraps of literature, etc., not really all that different from 19th and 20th century archives and personal papers that are our more usual domain.

An additional agenda in cataloging these papyri into our online catalog was to open them for research consistent with the rest of the collections in the library. At that time, research in and publication of papyri collections essentially followed the Dead Sea Scrolls model, in which the materials were widely regarded as the private research preserve of specially annointed - usually local - faculty members and scholars. Such exclusivity was incompatible with the broader and more democratic access principles of the Library.

The original proposal called for creating photographs of the papyri. These photographs would be sent to two papyrological photo archives - one at the University of Illinois and another in Belgium. After the grant was made, it was discovered that both of these archives had become essentially defunct. At this point, based on some interesting work that the Art History Department had done in creating a digital database of slides, it was decided to redirect the money that had originally been earmarked for photography into digital scanning and to attempt to create a similar database.

While the technical details of our various experiences in this area are interesting as an object lesson in trying to harness and manage rapidly evolving technology, they are not particularly germane here. What is important is that in the midst of this project the World Wide Web burst onto to the scene. It immediately became apparent that the database model we were pursuing was not the answer and that in the Web we had the perfect medium for managing both the descriptive information about the papyri and the images of the papyri themselves - to say nothing of exponentially broadening access to the collection.

Moreover, changes in the standard library cataloging format that were then underway to better manage cataloging of electronic files made it a simple matter to provide dynamic links from the catalog records to the Internet address of the digital object being cataloged. Thus, with this collection fully scanned and cataloged it is now possible for a remote scholar or researcher to search the Duke online catalog by author, title, subject, or keyword for any of the over 1,700 papyri in Duke's collection. A click of the mouse on the record will instantly bring up thumbnails giving a choice of several different resolution images for that text. Another click on the thumbnail will bring up the text itself in a resolution suitable for magnification up to four times. If an extremely high resolution copy is required, a file from an archive tape (which are too large for routine Internet downloading) can be provided via an e-mail request for an FTP transfer. Since this site was opened in 1995, it has been visited by nearly 200,000 unique sites (for a total of over 1.3 million hits), and has been featured in numerous newspaper stories, on NBC television and on National Public Radio.

Although the term "paradigm shift" is much abused today, it is no exaggeration to say that this project has once and forever altered the very nature of papyrological research. Scholars are no longer forced to travel great distances to peer at these ancient and fragile texts through binocular microscopes. They can sit in the comfort of their own office or study and download the images for examination at their leisure. In addition, another current NEH-funded project is facilitating similar work at Columbia University, the University of Michigan, and other institutions with strong papyrus collections, creating, what was called in the grant application, an "Advanced Papyrological Information System."

Metadata and Archives

As exciting as the papyrus archive project has been though, in some ways it has been a tough act to follow. This has been particularly the case since it was an example of a project in which an entire collection was digitized and it has been difficult to apply its logic and methodologies to other collections. However, one of its more interesting aspects has been the application of the aforementioned dynamic linkages between such metadata as cataloging records and electronic versions of actual collection materials. This has been especially intriguing from the perspective of archival description and cataloging.

There are two reasons for this. The first relates to the larger issue of metadata and the virtual library. While it is now possible to digitize the books, periodicals, primary sources, etc., found in research libraries, we have discovered that traditional approaches to managing bibliographic metadata - that is, cataloging - is longer adequate to the task. Whether this is related to a higher level of expected retrieval by users in an electronic environment or whether our descriptive cataloging models (which, unfortunately, are still firmly rooted in the 19th century) no longer work is not clear. What is clear is that digital information as it exists on the Internet today requires more navigational, contextual, and descriptive data than is currently provided in traditional card catalogs or their more modern electronic equivalent. One simply cannot throw up vast amounts of textual or image-based data onto the World Wide Web and expect existing search engines to make much sense of it or users to be able to digest the results.

The second reason this is interesting is that archivists and manuscript curators have for many years now been providing just that sort of contextual detail in the guides, finding aids, and indexes that they have traditionally prepared for their holdings. Since archival materials are by nature more diffuse and less "packaged" than their published counterparts in libraries, these finding aids have been essential to the effective use of archival resources. The only problem has been that this particular variety of metadata has been dutifully prepared, typed (or word-processed), and then filed away in their respective repositories awaiting the arrival of someone who might need to use the archival resources described therein. There has been no way to search the information contained in these finding aids unless one were on-site or had a copy of the finding aid in front of you. And even then, "searching" was more a matter of reading them than it was of using any electronic tools.

This is no longer the case. Thanks to some ground-breaking work done over the past several years at the library at the University of California-Berkeley (with the generosity and assistance of the Commission on Preservation and Access), the Bentley Historical Library at the University of Michigan, the National Endowment for the Humanities, the Council on Library Resources, the Society of American Archivists, and others, it is now possible to search these archival finding aids in entirely new and dynamic ways.

Several years ago, a number of enterprising archivists saw that the Internet presented a potential avenue for making their archival finding aids more accessible. Establishing Gopher and WAIS sites, they simply took the machine-readable text of these finding aids from their word-processors and made them available over the Internet. The problem was that the technology of the period frankly wasn't much better than sending users photocopies of those finding aids. The ability to search these online versions electronically was extremely limited and there was no way to search across a number of finding aids - let alone across repositories.

In an attempt to address these limitations, Daniel Pitti, from the library at the University of California-Berkeley (and now with the Institute for Advanced Technology in the Humanites at the University of Virginia) and others developed the Berkeley Finding Aid Project in which it was decided to test the feasibility of encoding these finding aids in Standard Generalized Markup Language (SGML). While others were already moving beyond the relative primitivness of Gopher and experimenting with HyperText Markup Language (HTML), those involved in the Berkeley project understood that HTML was essentially a presentational encoding scheme and lacked the formal structural and content-based encoding that SGML would offer.

A "Document Type Definition" (or DTD) within SGML was then developed based on what were perceived to be common structural elements among a large sample of archival finding aids from a number of repositories. These elements included such things as title page information, contextual biographical or historical information, collection scope and content notes, internal organizational information, and detailed container lists and indexes. Moreover the DTD was designed to accommodate the entire range of descriptive hierarchy that reflected the essential archival hierarchy of the materials themselves.

While I could go on at greater length on the details of the development of this project, suffice it to say for our purposes here that " Encoded Archival Description" (as this DTD has come to be defined) is quickly moving towards become an internationally embraced standard for the encoding of archival metadata in a wide variety of archival repositories and special collections libraries. And the Digital Scriptorium at Duke has become one of the early implementors of this standard.

What this means for our work at Duke and for our larger goals with respect to the Scriptorium is that we now have a mechanism to manage descriptive information about our holdings within a context that is fully consistent with both the power of the Internet and our goals to make those holding more accessible. Moreover, this encoding structure allows us to make linkages from more generalized descriptions (such as exist in online catalog systems) as well as to more specific representations of collection materials themselves. Thus, a scholar can now launch a search for archival material in an online catalog and follow dynamic links from that catalog into ever more detailed layers of description, and ultimately to digital images of actual material from the collection.

Future Prospects

Duke is currently involved in a project that is funded through NEH and also involves the libraries of Stanford, the University of Virginia, and the University of California-Berkeley. This project (dubbed the "American Heritage Virtual Digital Archives Project") will create a virtual archive of encoded finding aids from all four institutions. This archive will permit seamless searching of these finding aids - at a highly granular level of detail - through a single search engine on one site and will, it is hoped, provide a model for a more comprehensive national system in the near future. Thus, while this project does not actually deliver a "virtual library" for archival materials, it does provide a framework upon which one could more reasonably be built.

The last aspect of the Digital Scriptorium upon which I wish to touch is, I believe, easily its most exciting and innovative. It is related to its goal to "create knowledge." To quote from the Scriptorium's home page, "While many computer and network tools are touted for their ability to keep researchers away from the library, we can instead use these tools in conjunction with our collection strengths to bring researchers into the library. This center looks beyond simply providing researchers with information and instead envisions the library as a place where scholarship happens, where knowledge is created. Through collaborative projects with faculty and students, this center will encourage scholars to come to the library to use our resources in innovative ways, and to leave the fruits of their scholarship with the library for future researchers [ 5 ]." Examples of this include some Civil War women's materials that have been assembled in collaboration with the Women's Studies Archivist and some graduate interns. Here, the Library's small collection of the papers of Rose O'Neal Greenhow, a Confederate spy, and an 1864 diary of Alice Williamson, a 16 year old girl from Gallatin, Tenn., have been scanned and fully transcribed and provided with background information and a descriptive apparatus. Judging by the e-mail we receive, both of these sites have proved to be highly popular resources for secondary school teachers and students around the country.

We are also working with faculty and students at Duke to discover new ways to use the Library's collections through this facility. A project currently under consideration is working with an undergraduate class in Native American history where the students would collaborate on the preparation of full documentary critical editions of selected letters and diaries in this area. This would include full digital scanning of originals, transcription and encoding of text (probably using some subset of the SGML Text Encoding Initiative DTD), creation of a descriptive and editorial apparatus, and preparation of hypertext footnotes. The fact that much of this  project could be conducted outside of regular library hours with no direct physical contact with the originals is also a significant advantage for both the students and the library.

Other possibilities include mounting the student prize papers I mentioned at the outset on the Scriptorium's Web site. These papers could be enhanced from their "hard-text" versions by the addition of hyper-links from footnotes to digital versions of the actual collection resources used.

The utility of this particular approach is that it provides a technology-rich platform to teach traditional research skills in primary resources, while also taking full advantage of that technology to leave behind the fruits of that work for subsequent researchers to draw or build upon.

Ultimately, however, we must go back to Helprin's caution about our "mortal requirements" and "humane preferences" and also remember that electronic access to archival material can never fully supplant access to the originals. Simon Schama in an article in the same Forbes ASAP entitled "Hot-Wired History...Unplugged," makes this point most directly when he observes, " physical exposure to the fragile relics of vanished worlds has the power to summon the ghosts and make them substantial and eloquent. So, while you can wire Clio till she's red-hot and cyber-cool, if you make her virtual, the lady crashes [ 6 ]." It is our position that the approach we are taking enhances and broadens access to these materials and that they are being made "substantial and eloquent" to more and different kinds of users and that the virtual access we provide should ultimately lead to a wider and more meaningful application of their lessons to modern students and scholars.


While Duke's Special Collections Library is a long way from digitizing its entire holdings - both philosophically and in actuality - we believe that the projects we have either finished or are working towards are helping us and our sister institutions to better understand the challenges and opportunities presented by the so-called "digital library." Though our mission remains essentially the same as it has always been - to preserve and promote the use of the documentation of human experience and culture - we believe that this new model of electronic access has the potential for us to meet these goals better and more widely than we ever dreamed possible. End of article

Steven L. Hensen is Director of Planning and Project Development in the Special Collections Library at Duke University. He is the author of Archives, Personal Papers, and Manuscripts, the AACR2-based standard for archival cataloging, and of numerous articles and papers in the area of archival description and standards, as well as over 40 workshops and consultancies in archival cataloging and description and the use of the USMARC format. He is a graduate of the University of Wisconsin-Madison, where he also received his MLS in 1971. He has worked in special collections/archives at the State Historical Society of Wisconsin, the University of Chicago, Yale University, and the Library of Congress and has served as Program Officer for Archives, Manuscripts, and Special Collections at the Research Libraries Group. He is a Fellow of the Society of American Archivists and is currently serving on its governing Council. He was a member of the SAA National Information Systems Task Force, the Working Group on Standards for Archival Description, the 1989 Airlie House Multiple Versions Forum, the development team for EAD, an SGML-compliant standard for encoding archival finding aids, and numerous other groups. His current activities involve developments relating to enhanced network access to archival information and collections.

1. This paper was originally presented at the annual meeting of the American Historical Association, in New York City on January 4, 1997.

2. Forbes ASAP, the Big Issue "The Digital Revolution: Where Do We Go From Here?" (December 2, 1996).

3. Mark Helprin, 1996. "The Acceleration of Tranquility," Forbes ASAP (December 2) p. 20.

4. Steve Hensen, 1991. "Beyond Cataloging: The Internet and Its Implications for the Evolution of Archival Description," Southeastern Archives and Records Conference annual meeting, Virginia Beach, Va., (May).


6. Simon Schama, 1996. "Hot-Wired History...Unplugged," Forbes ASAP (December 2), p. 55.

