First Monday

Digitizing for Access and Preservation: Strategies of the Library of Congress by Deanna B. Marcum

Our time’s digital information revolution makes being a librarian exciting. The Library of Congress, like others, is exploring new ways of using digital technology for both access and preservation. This work, and the excitement, will grow as the library completes moving its audio–visual resources into its new National Audiovisual Collection Center. The library hopes to share new developments and work with others in meeting the challenges of the digital information era.






Amid the daily challenges of dealing with personnel problems, budget questions, and other administrative headaches, I sometimes forget the most important thing [1]. In earlier days, American librarians were happy if they could finance shelving and keep a wood stove going. I recently read about a nineteenth–century librarian who “wrapped herself in a blanket, with a soapstone at her feet, during the coldest of Saturday afternoons, the only time the library was open.” [2] I easily forget how far libraries have advanced in a relatively short time.

We have far outrun what that early librarian, even if she got warm, could possibly have imagined. Since the 1990s, we have developed the electronic magic of the Internet to make material in our institutions quickly available to anyone, anywhere, with computer access. Our digitized materials include sound recordings and film as well as texts.

We are working today in the midst of the digital information revolution. We are part of an exciting time in the cultural history of humankind. This paper reports on what we are doing and thinking at the Library of Congress about the stewardship of library resources in this era of digital information.




I call this paper, “Digitization for Access and Preservation.” The thought that the two could be linked is important to me because we at the Library of Congress have so many resources to preserve. We must provide stewardship for more than 132 million items. These include more than 58 million manuscripts, 30 million books, 13 million prints, five million maps, five million pieces of music, three million sound recordings, and one million films and videos. Moreover, our collections grow by approximately 13 thousand items every day [3].

Now we have the opportunity to make much of this material accessible far beyond our buildings’ walls in Washington, D.C. How will we make use of that opportunity while also safely preserving so many things? I will answer with a story that began a decade and a half ago.


How exciting it was back in 1990 when the Library of Congress launched an experiment with the new technique called digitization. For the next four years, we identified audiences for digital collections, established technical procedures, wrestled with intellectual property issues, and explored distribution formats such as CD–ROM. We sent CD–ROMs containing some of our materials to 44 schools and libraries across the country. We received enthusiastic responses, but the format proved inefficient and costly (Library of Congress, 2007a).

Then came the Internet. In October 1994, with 13 million dollars from private donors, we announced our plan to go online with a National Digital Library Program. The program’s flagship became the American Memory historical collection that we had begun digitizing in the experimental project. The collection contained historical documents in multiple media. The Web became the means of making access efficient and affordable. Private donations to the program soon tripled. And the Congress gave us another $15 million for five years (Library of Congress, 2007a).

In 1996, we expanded partnerships in the program. With two million dollars from the Ameritech Corporation, we opened a competition. We invited non–federal libraries, museums, archives, and historical societies to submit proposals for digitizing material to add to our American Memory collection. We placed 23 prize–winning digital collections on our American Memory Web site (Library of Congress, 2007a).

By the year 2000, we exceeded our goal of putting five million items on the site. Today it contains nine million items in more than 100 thematic collections. Included are digital copies of books, manuscripts, pamphlets, prints, photographs, maps, sheet music, sound recordings, and films. Each collection appears with explanatory features. And all collections can be searched electronically (Library of Congress, 2007b).

In the 1990s, many other libraries in the United States and abroad also built digital collections for access online. This seemed wonderful. But at the same time, most of us realized that we did not know how long we could preserve the resources we were creating. Digital media lack the durability of paper. Digital documents depend for readability on computer systems that quickly become obsolete. And digital preservation has to deal with formats of many, changing kinds (Library of Congress, 2007c).

We did not digitize with the intent of replacing original materials. But we wanted to preserve also the digital copies in which we made such substantial investments (Arms, 2000). Thus our gains in access brought new challenges in preservation. The challenges increased when we started accepting materials created digitally. These included the selection of Web sites that we and the Internet Archive began trying to preserve.

Toward the end of the twentieth century, we drew upon experts outside as well as inside the library to identify five major methods for digital preservation (Arms, 2000). The first essentially called on vendors to develop better digital storage media. The second called for “refreshing” digital data, which basically meant copying streams of digital “bits” from one location to another.

The third, more complex strategy came to be called “migration.” This meant transferring digital material from one format to a newer, and hopefully enhanced, format.

Our fourth strategy, called “technical emulation,” required programming new computer systems to mimic systems on which digital material had originally been generated.

Our fifth strategy became a desperation option that we called “digital archaeology.” This meant trying to reconstruct the meaning of digital material that had become otherwise unreadable. Of course, we also paid attention to storage conditions, data replication, data validation, and other measures for basic security.

We knew that waiting to act until digital material had deteriorated would leave us only the digital archaeology option. To avoid that, we began to try managing digital data from their creation. That included trying to retain metadata — information that helps us manage and retrieve digital data. We envisioned creating a long–term, comprehensive system for storing and managing multiple kinds of digital materials and the metadata needed for their ongoing use (Arms, 2000).


With those thoughts in mind, we began in the twenty–first century to do three things for preservation. One, we continued our experimentation. Two, we began building a National Audiovisual Conservation Center to house audiovisual resources in digital as well as traditional forms. Three, we crossed our fingers and prayed that solutions to the digital preservation challenges would emerge.

A preservation expert on our staff expressed our hope more confidently. In concluding an excellent report in 2000 on our digital preservation program, she wrote: “Technology advances, while sure to present new challenges, will also provide new solutions for preserving digital content” (Arms, 2000).

We were not so confident, however, as to wait around for that to happen. We recognized that it would not happen without a major effort — an effort larger than the Library of Congress could undertake alone.

Therefore, in 2001, we joined others in persuading the U.S. Congress to appropriate nearly $100 million for a national program to “ensure the long–term storage, preservation, and authenticity” of digital collections [4]. This became the National Digital Information Infrastructure and Preservation Program — known by the acronym NDIIPP.

Many libraries and others helped with NDIIPP’s development. And institutional partners have received NDIIPP grants for projects to improve digital preservation. The Library of Congress joined with the U.S. National Science Foundation to administer the NDIIPP grant program (Library of Congress, n.d.a.).

Some of the grant recipients work on preserving non-textual resources. For example, the University of North Carolina at Chapel Hill is using an NDIIPP grant to develop a framework for preserving digital video collections. The San Diego Supercomputer Center at the University of California, San Diego, is using an NDIIPP grant to develop a process for managing the preservation of videos from creation through ultimate use (Library of Congress, n.d.a.).

In addition, NDIIPP is helping SCOLA (Satellite Communications for Learning Associations) to archive high–interest television programs. SCOLA is a non–profit educational corporation that receives and retransmits television programs of potential research value from around the world (Library of Congress, 2006). The Library of Congress and many others expect to learn from such NDIIPP–assisted projects.

At the same time, other preservation efforts have emerged from other national programs.

For example, in 1992, the U. S. Congress passed a National Film Preservation Act. It financed a fact–finding study by the Librarian of Congress. The completed study, entitled Film Preservation 1993, reported that only half of films made before 1950 survive. Only 20 percent of feature films made in the 1920s survive. And only 10 percent of those made in the decade beginning in 1910 survive (Library of Congress, n.d.b).

The study also found that even the surviving films suffered from preservation problems. These included color fading, film–base decay, soundtrack deterioration, and flammable film stock (Library of Congress, n.d.b).

Out of the study came a plan for action, entitled Redefining Film Preservation. The Library of Congress worked on this plan with the National Film Preservation Board, and with archivists, educators, filmmakers, and film–industry executives. The plan, released in 1994, called on all concerned to take such actions as the following:

Similarly, we have joined with others to focus attention on needs for preserving sound recordings. Eight years after the Film Preservation Act, the Congress passed the National Recording Preservation Act. It called for establishing a National Recording Registry in the Library of Congress.

Also to help preserve significant sound recordings, the act called for a National Recording Preservation Board. The act charged the board to study current preservation needs and practices, and to plan a national audio preservation program.

Participants in this planning have declared that “audio preservation today is not simply a matter of collecting and storing, or transferring endangered records to the digital domain.” Long–term preservation requires commitment to long–term processes, which may have, as one expert put it, “no discernible end” (Library of Congress and National Recording Preservation Board, n.d.).

In preparing the audio preservation plan, we held public hearings and solicited comments from representatives of sound–recording archives, recording companies, audio engineers, and interested organizations of scholars. Also we consulted specialists in intellectual property law and individuals with collections of recorded sound (Library of Congress and National Recording Preservation Board, n.d.).

The study coincided with our work to build a new National Audiovisual Collection Center. We basically completed the Center in March 2007.

Since 1999, our Motion Picture, Broadcasting, and Recorded Sound Division and our American Folklife Center have worked on ways to make digital copies of moving image and recorded sound collections. Among other things, we have explored means of scanning motion picture film, of transferring video recordings from tapes to digital files, and of packaging digital materials (Library of Congress, n.d.d.). We carry on such work in our new National Audiovisual Collection Center.


What exactly is this Center?

It is a complex of four structures. They cover 45 acres near Culpeper, Virginia, in the United States, 60 miles south of Washington, D.C. The complex occupies 415,000 square feet. We built much of it into the west face of a mountain, covering the buildings with earth, grass, and trees to keep the site as natural as possible [5].

In March 2007 we accepted the complex officially. The Packard Humanities Institute transferred it to the Architect of the U.S. Capitol, who oversees the operation of our buildings.

Packard managed and financed the construction at a cost that we estimate will reach $150 million, the largest gift in the Library’s history. The U.S. Congress provided an additional $52 million for buying shelving and equipment, relocating collections and staff, and hiring new staff [6].

Once we have fully moved in, the complex will house all collections and facilities of our Motion Picture, Broadcasting, and Recorded Sound Division. Also it will provide space for a staff that we anticipate will grow to 150.

A Collections Building will store all our audiovisual collections, except for those on nitrate film. This flammable film will go into specially constructed vaults in a second building. A third structure contains a central plant for heating and air–conditioning the complex. Our fourth structure is a three–tiered Conservation Building. It houses administrative, curatorial, and processing staffs. It also contains a theatre and two laboratories for the preservation of all kinds of films, videos, and sound recordings [7].

Because the Collections Building is underground, it efficiently provides ideal conditions for audio–visual storage: low temperature and low humidity. The building contains large vaults with compact shelving for all of our media formats [8].

The 175,000 square feet of the Conservation Building contains a state–of–the–art facility for listening to sound recordings. This area and exhibit spaces are open to the public. Also the building has naturally lighted work spaces. And it has a 200–seat theatre with an organ console for music that used to be heard with silent movies [9].

The complex enables our library to consolidate collections previously stored in three Washington buildings and five others in Maryland, Ohio, Pennsylvania, and Virginia. Audiovisual reference services remain in our recorded–sound and moving–image research rooms in Washington. Sound and videotape collections may be accessed there electronically. Film collections, at least for now, are brought to researchers in Washington from Culpeper on a regular schedule [10].

Audiovisual materials comprise a rising proportion of the world’s historical record. We expect our complex to have room for additions to our audiovisual collection for at least the next 25 years. This estimate includes storage for materials created digitally [11].

Our new audio–visual center includes a system for acquiring and preserving digital materials. There we intend also to preserve digitally the analog materials that we previously would have transferred to analog formats that are growing obsolete.

As Gregory Lukow, chief of the division in charge of our audiovisual collections, has explained, “The change will be evolutionary and sequenced.” Already we have begun to preserve sound recordings digitally. Now we are working on the digital preservation of videotape. Eventually we hope also to use digital technology to preserve and manage film. This is more difficult to do. We will need improvements in technology that can lower costs [12].

Our plans include sharing with other cultural institutions the innovations we expect to develop in the new center. It contains meeting places for visiting scholars, archival professionals, and students from graduate courses in moving–image and recorded–sound archiving. There they will be able to discuss curatorial and technical challenges and examine improvements in audiovisual preservation and access [13].

For example, the new center has an experimental image workstation. In it we will use newly developed technology to speed the digitizing and preserving of 78–rpm shellac and acetate recordings. The new technology comes from the Lawrence Berkeley National Laboratory in Berkeley, California.

The laboratory calls the technology IRENE, which stands for “Image, Reconstruct, Erase, Noise, Etc.” IRENE is a kind of restoration software. It enables us to create high–resolution, digital maps of the grooved surfaces of deteriorated recordings. From these images, technicians can remove debris and extraneous sounds, and repair damaged portions (Sternstein, 2006).

The laboratory used a grant from the National Endowment for the Humanities to create the image machine and demonstrate that it works. The laboratory is preparing software that will enable library technicians to use IRENE with just basic training (Sternstein, 2006).

Though we are excited about getting into our new audio–visual preservation center, we are also excited about other preservation projects. For example, in 2005, we joined with the National Endowment for the Humanities to provide digital access to preserved newspapers of historical value.

Since 1982, the endowment has spent $54 million on grants to preserve some 70 million pages of newsprint on microfilm. These grants have gone to repositories in every state and three territories of the United States. The grants support microfilming of newspapers published since the eighteenth century. The Library of Congress has provided technical assistance since the project’s start. The endowment expects to conclude the project in 2007 (National Endowment for the Humanities, n.d.).

Next we will work with the endowment to make historical newspapers accessible via the Internet. Over the next two decades, we will develop an online National Digital Newspaper Program. We will digitize historically significant newspapers published in all the U. S. states and territories between 1836 and 1922. These will be available in a free, searchable database (National Endowment for the Humanities, n.d.).

We have begun by supporting projects to digitize 100,000 pages from newspapers published in California, Florida, Kentucky, New York, Utah, and Virginia between 1900 and 1910. This first batch will help us evaluate technical guidelines and selection criteria. Also we will evaluate whether the program effectively enables users to browse and search newspaper pages (National Endowment for the Humanities, n.d.).

If we receive continued funding, we will make grants in every state and territory. In each, one organization will coordinate newspaper digitizing by several partners. If this project succeeds, the microfilm copies will ensure long–term preservation while we use the digital copies to provide access (National Endowment for the Humanities, n.d.).

We feel excited also about a project that we announced in January 2007. We call it “Digitizing American Imprints at the Library of Congress.” The Alfred P. Sloan Foundation has given us $2 million to digitize thousands of books. These include “brittle books” that we are in danger of losing. We hope to make this a demonstration project from which many libraries can learn how to scan their physically vulnerable works safely (Library of Congress, 2007d).

Our digitization project uses book–scanning technology called “Scribe” from the non–profit Open Content Alliance. In addition, the project will develop technology for electronically turning pages, displaying foldouts, and capturing tables of content and indices (Library of Congress, 2007d).

Besides “brittle books,” we will digitize other works, all in the public domain: American history books, genealogies, regimental histories, other U.S. Civil War material, and six collections of rare books. We also will digitize works about photography, particularly artistic publications, biographies of photographers, and works on technical aspects of photography (Library of Congress, 2007d).

We have established formal selection criteria for deciding which works to reformat digitally:

In the Sloan project, as in others, we want to preserve the original copies. But we want also to ensure that digital copies will be accessible for a long time. Consequently, we are working throughout the Library of Congress on policies to manage digital data over time. We seek more durable media, better storage conditions, and improved technologies for managing digital data. We also work on methods and schedules for checking and maintaining the integrity of digital files (Library of Congress, n.d.f).

Additionally, we have begun work on a new, overall strategic plan for the Library Services unit of the Library of Congress. The plan will cover the fiscal years 2008 through 2013. It will give major attention to digitization for access and preservation. The plan is a work in progress, but here is a summary of items stressed in the current draft:

“... we need to better understand what is being created digitally and increase our contact with those creating these works. Our skills at collecting traditional works need to expand to the digital world. We need to identify digital resources as they are created and apply our collections specialists’ knowledge to determine which items should be collected ... .

[Also we must] work closely with the Library’s Office of Strategic Initiatives to advance the science and practice of preserving digital works, and to develop trusted repositories for digital items in the Library’s collections.” [14]




I think our new strategic plan will emphasize the following general points:

Like most libraries, we will continue to digitize as much material as we can. And we will take advantage of the Internet for making our resources available worldwide. We must do so to enable people far from our physical libraries to use and enjoy our holdings. Because we now have the means to extend the reach of our libraries, I think we also have a moral imperative to do so.

Additionally we will take advantage of digitization to help us meet preservation needs. We will recognize that providing access to digital copies enables us to reduce use of, and thus preserve longer, our fragile originals. In that sense, digitization and preservation go together.

However, our plan also will recognize that we do not yet know how long we can fully preserve the material we are digitizing. Nor are we confident of our long–term ability to preserve material digitally created. Both kinds of digital material increase daily, and their preservation needs accordingly grow. The current wisdom seems to be that no “silver bullet” — no universal solution for digital preservation problems — will emerge. But we can progress by developing and refining different techniques for preserving different digital formats.

The need for multiple approaches makes it important for librarians to work together, as we do in the national programs I described. And libraries need to share with others the digital preservation advances they individually make. Thinking again of the early librarian whom I described, I hope we will not sit alone in our institutions, huddled in blankets against the chills of change, warming our feet on the soapstones of tradition. We now have the technological ability to operate far beyond our walls. Let us also cross over our walls to help each other do it. End of article


About the author

Deanna B. Marcum is the associate librarian of Congress for library services, in Washington D.C. Previously she has been president of the Council on Library and Information Resources, dean of the School of Library and Information Science at The Catholic University of America, and a staff member of the Association of Research Libraries and the libraries of Vanderbilt University and the University of Kentucky. She has a B.A. in English from the University of Illinois, an M.L.S. from the University of Kentucky, and a Ph.D. in American studies from the University of Maryland. Currently she chairs the UNESCO Memory of the World program and serves on the National Historical Publications and Records Commission and the executive committee of the Digital Library Federation.



1. This paper began as a keynote address to the WebWise Conference sponsored by the OCLC, the J. Paul Getty Trust, and the Institute of Museum and Library Services in Washington, D.C. on 2 March 2007.

2. Heidinger, 2006, p. C–2.

3. Library Services, 2006a, p. 2.

4. Library of Congress, 2004, p. 175.

5. Dalrymple, 2006, pp. 167–168.

6. Dalrymple, 2006, p. 171.

7. Dalrymple, 2006, pp. 168, 170.

8. Dalrymple, 2006, p. 168.

9. Dalrymple, 2006, p. 168.

10. Dalrymple, 2006, pp. 169, 171.

11. Dalrymple, 2006, p. 169.

12. Dalrymple, 2006, p. 170.

13. Ibid.

14. Library Services, 2006b, p. 8.



Caroline R. Arms, 2000. “Keeping Memory Alive: Practices for Preserving Digital Content at the National Digital Library Program of the Library of Congress,” RLG DigiNews, volume 4, number 3 (June), at, accessed 9 February 2007.

Helen Dalrymple, 2006. “Film and Sound Treasures in the Mountain Lair, Audiovisual Conservation Center Takes Shape in Culpeper,” Library of Congress Information Bulletin, volume 65, numbers 7-8 (July-August), pp. 167–171, also at, accessed 13 February 2007.

Katherine Heidinger, 2006. “The Loneliest Buildings in Town,” Bangor Daily News (12 October), p. C–2.

Library of Congress, 2007a. “Mission and History,” American Memory program, Web page at, accessed 9 February 2007.

Library of Congress, 2007b. “About the [American Memory] Collections,” Web page at, accessed 9 February 2007.

Library of Congress, 2007c. “Technical Information [on American Memory]: Preservation,” Web page, at, accessed 9 February 2007.

Library of Congress, 2007d. “$2 Million Sloan Foundation Grant to Help Digitize Thousands of Books,” news release (31 January), at, accessed 9 February 2007.

Library of Congress, 2006. “Library Partnership Supports Preservation of Foreign News Broadcasts,” news release (21 July), at, accessed 9 February 2007.

Library of Congress, 2004. “Office of Strategic Initiatives,” Annual Report of the Librarian of Congress for the Fiscal Year Ending September 30, 2003 (Washington, D.C.: Library of Congress), pp. 175–189.

Library of Congress, n.d.a. “Library of Congress-National Science Foundation Digital Preservation Project Descriptions,” Web page at, accessed 9 February 2007.

Library of Congress, n.d.b. “National Film Preservation Plan, Overview and Background,” Web page at, accessed 9 February 2007.

Library of Congress, n.d.c. “Redefining Film Preservation: A National Plan,” Web page at, accessed 9 February 2007.

Library of Congress, n.d.d. “Digital Audio–Visual Preservation Prototyping Projects,” Web page at, accessed 9 February 2007.

Library of Congress, n.d.e. “Selection Criteria for Preservation Digital Reformatting,” Web page at, accessed 12 February 2007.

Library of Congress, n.d.f. “Life–Cycle Management of Digital Data,” Web page at, accessed 12 February 2007.

Library of Congress and National Recording Preservation Board, n.d. “Study on the Current State of Recorded Sound Preservation,” Web page at, accessed 13 February 2007.

Library Services, 2006a. “Overview of Library Services,” Strategic Plan, FY 2008–2013, Library Services, Library of Congress, unpublished paper, draft 3.0 (June), pp. 2–3.

Library Services, 2006b. “Strategic Goal One,” Strategic Plan, FY 2008–2013, Library Services, Library of Congress, unpublished paper, draft 3.0, June, pp. 8–9.

National Endowment for the Humanities, n.d. “National Digital Newspaper Program,” Web page at, accessed 9 February 2007.

Aliya Sternstein, 2006. “Hello, IRENE, Library of Congress to Use Image Tool for Sound Restoration,” News (13 February), at, accessed 10 February 2007.



Contents Index

Copyright ©2007, First Monday.

Copyright ©2007, Deanna B. Marcum.

Digitizing for Access and Preservation: Strategies of the Library of Congress by Deanna B. Marcum
First Monday, volume 12, number 7 (July 2007),