The National Archives and Records Administration (NARA) is responsible for preserving historically valuable records of the U.S. Government. In the digital realm, that encompasses an exponentially growing volume of electronic record in a great variety of increasingly complex formats. To address this challenge, NARA is developing the Electronic Records Archives system .
The Challenge of Electronic Records
Accomplishments to Date
The Challenge of Electronic Records
The Electronic Records Archives (ERA) Program addresses missioncritical needs of the National Archives and Records Administration (NARA), first of all to be able to preserve any type of electronic record created on any computing platform anywhere in the United States. Government and second to provide discovery and delivery for such records to anyone who has an interest and a legal right of access, while obeying all laws and regulations about restrictions on access to federal government information. This need extends from the present through the life of the republic. Third, NARA also has a responsibility to help all other agencies do a better job in managing the records that they use to go about the governments business. To address the first two needs, NARA must understand the specific characteristics of the electronic records created by the rest of the government and of the systems used to create and store them. To address the third need, NARA must find ways of preserving and providing sustained access to electronic records that not only work in the National Archives, but can be applied in other agencies.
NARA is building the ERA system both to preserve and provide public access to records we preserve, and to enable online transactions and collaboration with other agencies over the entire life cycle of government records. While we are focused on NARAs mission, we recognize that accomplishing that mission entails collaboration with many other organizations in and outside of the U.S. Government. For example, we expect that digital libraries will become important players as brokers for access to records preserved in the National Archives. Given that anyone can request access to our holdings and given the great variety of information they contain, NARA cannot provide specialized support for all the different local communities interested in federal government information. We believe that libraries which serve those communities are better positioned to interact with them. From NARAs perspective, digital libraries could play an important role in expanding the dissemination of our holdings. From the libraries perspective, NARAAs holdings may become part of their virtual collections. Such relationships should be mutually beneficial.
In addition, there will be opportunities for other organizations to do things we cannot. NARA must preserve records in authentic form. Many government records document property ownership and rights. Many others provide information which is of critical importance in legal proceedings. We must preserve electronic records so that they remain as reliable as they were the day they were created. Thus, we cannot modify what we preserve to make it more accessible, useful or valuable to particular types of users. But other organizations can focus on particular user needs, even if that entails changing the records in some way.
Another aspect that propels us to look to other organizations is the recognition that other archives, libraries, museums, and many other institutions face the same challenges that we do: technological obsolescence, increasing varieties of digital information, increasing complexity of what is created in digital form, enormous growth in volumes. These challenges are so big that we need to capitalize on opportunities to collaborate in developing and implementing solutions.
The specific challenges facing NARA are daunting. The Congressional Research Service estimated about 15 years ago that more than 96 percent of all federal information starts out life in a computer. That was before anyone started talking about egovernment. While less than five percent of federal records are preserved in the National Archives, the volumes of digital data we need to ingest and preserve are projected to grow exponentially. A few examples are:
Twentyfive million State Department diplomatic messages created between 1972 and 2000. We receive them in batches of one million messages a year.
Every four or eight years, when a Presidents term ends, we receive large bursts of White House electronic records. From the first Bush Administration, we received a few hundred thousands of email messages; from the Clinton White House, we received 32 million; and from the current administration we are expecting a lot more.
The 2000 Decennial Census will give us hundreds of million of images of the scanned returns, as well as all the quantitative data that was derived from those returns.
We face a big bubble in military personnel records, which we estimate at about one billion scanned images. While we are trying to decide the best way to manage all of that, the military services are already en route to replacing their current imaging systems with datacentric personnel systems.
One example illustrates the challenge of complex digital formats. Traditionally the National Archives preserves a complete set of ships drawings at least for every class of Navy ship, and such records are requested fairly frequently. Today, there are no ships drawings. They have been replaced by computerassisted design, engineering and manufacturing records, amounting to at least 100,000 digital files per ship. The problem of preserving such records is one we share with the Navy. The Navy keeps ships in operation for many decades. These ships are, in effect, floating cities. Like cities, they are not static entities. Over time, they need to be repaired and changed. They change their mechanical system, computer systems, kitchens, laundries, and so on.
What does the Navy, or anyone, know about computerassisted design, engineering or manufacturing systems 25 years from now? The only thing they can count on is that such systems will be different. There is no way of guaranteeing that todays data can be used in future systems, either to modify or repair a ship or even to show what its design was. Initial research that we have done with the Navy, other government agencies, and computer scientists and engineers around the country, has shown that solving these problems is not only beyond the state of the art in technology. It is beyond the state of computer science.
Facing such challenges, NARAs strategy started with the critical problem of digital preservation. But we must address that problem in terms of NARAs responsibility for overall lifecycle management of records of the Federal Government. We decided to do that in alignment with the direction of information technology in the Federal Government. We assumed that, if we could find solutions in technologies that the agencies are using or are planning on using to do their business, the transition of the electronic records created in their systems to us should be easier. Furthermore, we decided to look for solutions in technologies that were projected to be main stream in the nextgeneration nationalinformation infrastructure. We estimated that the alternative of developing technologies specifically for digital preservation would be too expensive.
In addition to attacking the preservation challenge, the ERA system must support lifecycle management of all types of records. The key decisions in the lifecycle of records, such as how long they should be kept by their creators and whether they should be preserved in the National Archives, do not depend on whether they are in digital form, or on paper, film, or any other medium. They depend on what the records tell us. So NARA needs a comprehensive approach to making management decisions about government records, regardless of their form or characteristics.
In essence then, the ERA system will be a system within a system. The outer system will support lifecycle management processes for all government records. Inside of that system will be a system that allows us to ingest, preserve, and provide access to electronic records. ERA will support different processes in three different lines of business: the National Archives, presidential libraries, and federal records centers.
There will be several major benefits from the ERA system:
Improved, easier, and faster access to records held by NARA and to the services we offer to the government and to citizens.
A single portal, providing onestop shopping for access to records and services.
Automated aids for finding out what records NARA holds; for managing the lifecycle of government records; and for maximizing access to those records, while respecting confidentiality.
A comprehensive and consistent approach to managing the lifecycle of records.
Preservation and sustained access to enormous quantities and unlimited varieties of electronic records.
Online communication and collaboration tools, so that we can better interact with the various stakeholders we serve.
Building a system that can preserve and provide access to electronic records indefinitely into the future entails some exceptional requirements:
The system has to be evolvable. There is no fix for digital preservation because it is a dynamic problem. Anything implemented at present, or at any other point in time will become obsolete. To enable preservation and access over generations of information technology, it must be possible to replace any and all hardware or software, while the system as a whole continues to provide all required functionality and the records stored in it remain authentic.
For us, the system obviously has to be scalable in both directions, to handle one billion images from the Department of Defense and, downward, to handle some very specialized collections that have to be kept in isolation from others.
The system must be extensible. As long as information technology continues to change, there will be new kinds of electronic records created that people have not even thought of yet. Ten, twenty or one hundred years from now, we have to be able to modify the system to manage, preserve and provide access to new data types.
Accomplishments to Date
NARAs initiative to develop the Electronic Records Archives system is clearly leading edge. There is no system in the world today which provides the functionality, scalability, evolvability and extensibility required to achieve NARAs mission. To specify our needs, we spent several years eliciting and validating requirements. The process involved stakeholders from all organizations within NARA, repeated agencywide review, detailed evaluation by management, and culminated in inviting both the information technology industry and the general public to comment on the requirements. We also organized two conferences, one for prospective users and another for industry, to discuss our plans and get their feedback. Several hundred people attended the conferences. Then, NARA turned to the information technology industry for the actual development. In 20042005, we ran a design competition, between Harris Corporation and Lockheed Martin Corporation. Each developed an architecture and design for the overall system, and a system engineering plan for developing it. In September of 2005, we selected Lockheed Martin Corporation to build and operate the system. Their architecture is a serviceoriented architecture based on the Open Archival Information System reference model.
Given the scope and complexity of the requirements, we articulated a system development lifecycle in which the system will be developed incrementally over six years. Full capability should be achieved in 2011. The first increment of the system is scheduled for deployment this September. It will provide for online creation and submission of records schedules and documents required for transferring records to NARA. Then, about six months later, we will add the ability to transfer electronic records to the National Archives.
We are limiting the initial access to the system to our staff and to four other agencies in the Navy, the Department of Energy, the Bureau of Labor Statistics, and the Patent and Trademark Office. Gradually we will add more users. We hope to start offering public access in the third incremental development, which will start in 2009.
The ERA system will be transformational for NARA. There is currently very little automated support for core mission activities. ERA will change that radically. It will change the way individuals and teams work, and it will change the way we interact with other agencies, other institutions, and the public. It will also enable us to do something that neither NARA nor anyone else can currently do: guarantee the preservation and sustained access to vast quantities of highly diverse and very valuable electronic records, ensuring that the people can discover, use, and learn from this rich documentary heritage.
About the author
Kenneth Thibodeau is Director of the Electronic Records Archives Program at the National Archives and Records Administration. An internationally recognized expert on electronic records and archives, he has over thirty years experience in the field.
1. An earlier version of this paper was presented at the 2007 WebWise conference on 1 March 2007.
For further information about ERA see http://www.archives.gov/era.
This work is licensed under a Creative Commons Public Domain License.
The Electronic Records Archives Program at the National Archives and Records Administration by Kenneth Thibodeau
First Monday, volume 12, number 7 (July 2007),