Digital objects have extensively existed in daily work and life. Some of them often need to be kept accessible and usable for a relatively long period of time. Therefore, digital preservation has emerged as a pressing demand for the communities of archives, libraries, and publishers, and even for ordinary computer users. However, compared to traditional papery and magnetic preservation, digital preservation poses novel challenges to these communities. In this paper, we briefly introduce how the challenges are addressed in the PROTAGE system developed by integrating the widely adopted software agent and Web service technologies.
The PROTAGE system
Development and status of PROTAGE
In a digital era, digital objects have gradually emerged as the primary means in which we create, disseminate, and exchange information (Farquhar and Hockx–Yu, 2007; Giaretta, 2006). Consequently, digitized information has been rapidly changing the ways we work, live, and play. Since the volume of digital information is growing at an explosive speed, is a pressing demand to transfer digital objects from various IT systems to digital repositories, libraries, and archives for long–term preservation. However, long–term archiving of digital objects is obviously a highly complicated task because of the rapid changes and ongoing development in hardware and software systems as well as the ICT infrastructure. Furthermore, the diversity in the size and complexity of digital objects implies that modern digital preservation systems must be highly scalable and adaptable to various types of digital objects. However, existing strategies of digital preservation are labour intensive and often require specialist skills. Therefore, contemporary digital preservation calls for new levels of efficiency and automation. For this reason, long–term digital preservation has been attracting significant efforts from both academia and industry (e.g., Chapman, 2004; Cohen, et al., 2006; Farquhar and Hockx–Yu, 2007; Giaretta, 2006; Joannidis, 2005; Snow, et al., 2008; Watry, 2007). A number of research and development projects have been initiated to investigate and implement long–term preservation of digital objects, such as, PANIC (Hunter and Choudhury, 2006), CAMiLEON (Mellor, et al., 2002), RODA (Ramalho, et al., 2008), and CRiB (Ferreira, et al., 2007). The European Union has also funded quite a few projects since its Fifth Framework Research Programme, such as, PLANETS (Brown, 2007), CASPAR (Cohen, et al., 2006), and SHAMAN (Watry, 2007).
Web services are a type of Web applications that can be published and used via the Internet. For example, through a Web service called “Getting weather forecast information” Internet users can get the forecasted information of the weather in the coming day(s). Web services, which are usually implemented by Web Service Description Language (WSDL) and Simple Objects Access Protocol (SOAP), and deployed in Service Oriented Architectures (SOA), allow data and applications to interact with each other but without human intervention through dynamic and ad hoc connections. It has been demonstrated that Web services technology can be applied into a broad variety of architectures. Particularly, it can coexist with other technologies and software design approaches. More importantly, Web services technology can be employed in an evolutionary manner without requiring major transformations to legacy applications and databases. So far, lots of traditional locally–used applications have been made available on the Internet as Web services.
Agent–oriented computing has been regarded as a promising computing paradigm for developing and implementing complex, distributed software systems, as this paradigm based on software agents and multi–agent systems enables software engineers to model applications in a natural way that resembles how humans perceive the problem domains (Braubach, et al., 2003; Chmiel, et al., 2005; Jennings, 2001). Software agent technology has been successfully applied in many industrial and commercial areas (Luck, et al., 2005). It has also gained great success in studying complex physical and social problems. For example, multi–agent systems have been successfully adopted to investigate the impact of climate change on biological populations and the influence of public policy options on social or economic behaviour.
PROTAGE (PReservation Organization using Tools in AGent Environments), funded by the European FP7 Research Programme, aims to employ Web services and software agent technology to computerize long–term digital preservation. It intends to advance the level of efficiency and automation in digital preservation such that both memory institution users and ordinary users can readily preserve their digital objects while reducing the preservation cost. PROTAGE develops flexible and extensible Web services and software agent tools for long–term digital preservation and access, which can cooperate with and be integrated in existing or new digital preservation systems. In this paper we briefly introduce the PROTAGE’s methodology for long–term digital preservation. More specifically, we present the general framework of the PROTAGE system as well as its three types of key subsystems, namely, client systems, access points, and server systems. Next, we describe the three main stages of the typical digital preservation process in the PROTAGE system. Note that due to space limitation this paper will not elaborate any technical details.
The PROTAGE system
The PROTAGE system is essentially an Internet–based, decentralized system. Figure 1 presents its general framework, where three types of subsystems are involved, namely, client systems, access points, and server systems.
Users enjoy the service of the PROTAGE system via client systems. PROTAGE server systems provide publicly available Web services to client systems and act also as the archives, while access points are deployed to manage digital preservation related actions and action plans. Here, an action is actually a function used to carry out a specific digital preservation task, such as, extracting the metadata of a digital object; scanning and cleaning viruses; migrating a file from one format to another. An action may be developed locally or obtained by wrapping existing Web services. An action plan is essentially a sequence of actions for fulfilling a complete digital preservation process. The PROTAGE system is organized in hybrid architecture. Specifically, client systems and access points, as well as client systems and server systems are connected through client/server structures, respectively. On the other hand, all client systems are organized in a peer–to–peer structure.
In Figure 1, the solid, dash–dotted, and dashed lines with arrows represent the communications between clients and access points, clients and clients, and clients and servers, respectively. PROTAGE client systems communicate with access points to submit/retrieve certified actions or action plans. The communications between PROTAGE client and server systems aim to access Web services on server systems and upload (or, download) digital objects to (or, from) them. Finally, as the important feature of the PROTAGE system, the client systems are organized in a peer–to–peer structure. Through peer–to–peer communications, a client system can ask for actions or action plans from other existing client systems. It may also provide its own actions and action plans to other client systems.
Figure 1: The general framework of the PROTAGE system.
PROTAGE client systems
A PROTAGE client system usually contains six main components. In what follows, we briefly introduce the functionality of each component:
- Client GUI. Through this GUI, a user can create an account and further interact with the PROTAGE system. Specifically, the user can manage his/her profile, actions, and action plans, call external third–party digital preservation related tools, and communicate with other users, access points, and server systems.
- Local repository. This component is actually a data warehouse developed with MySQL. It contains the profiles of existing users who ever logged into the PROTAGE system via the present client. It also contains the actions and action plans. Besides, the local repository contains other related databases.
- External tools interface. The interface is used by the client system to call existing third–party tools for fulfilling certain digital preservation tasks. For example, through the interface, the client system can call the tool, Exiftool, to identify the format of a digital object and further get its metadata.
- Kernel. This component provides a set of Java APIs for accessing the local repository and remote access point(s). It can also be used to interact with external tools. Through the kernel, a user can also communicate with other existing PROTAGE users. The communications can be carried out in either an explicit way (i.e., sending e–mail messages) or an implicit way (i.e., through the built–in multi–agent system). Besides, the kernel has a set of APIs for the client system to access the Web services provided by PROTAGE server systems.
- Multi–agent system. In the client system, software agents are mainly responsible for automatically recommending suitable actions or action plans for PROTAGE users. For this purpose, they search in their local repository or existing access points for relevant actions or action plans. More importantly, through agent–based communications, agents may also ask for actions or action plans from their peers existing in other client systems.
- Workflow engine. This component is responsible for performing action plans, recommended by the multi–agent system and determined by users. The performance can be carried out in two different manners, automatic and interactive. In the automatic manner, users may only need to select the collection of digital records and the workflow engine is fully responsible for preserving it. In the interactive manner, users may interactive with the PROTAGE client system. At each step, users may decide what and how to perform the preservation action of the next step based on the result and feedback obtained from the previous step.
Note that in PROTAGE the multi–agent system and the workflow engine are the major components for automating the digital preservation.
Access points are primarily responsible for two tasks: (1) to manage the detailed profiles of existing users and the certified actions and action plans; and, (2) to provide Web services for accessing user profiles, actions, and action plans. In the PROTAGE system, there may exist multiple access points. An access point contains two main components, namely, remote repository and Web services.
- Remote repository. This repository is actually a MySQL–based data warehouse. It is used to store users’ profiles, as well as the actions and action plans, which are different from those stored in client systems as they are certified. Here, “certified” means that they have been verified by a public memory institution to make sure that they comply with the requirements of digital preservation. Through its kernel, a client system may submit certified actions or action plans into access points. It may also retrieve actions or action plans in which it is interested. By saving the profiles into an access point, users may not be confined to their present client machines. Instead, they can access the PROTAGE system via different machines.
- Web services. These Web services are specialized for client systems so as to access the remote repository, namely, adding, removing, updating, or querying user profiles, actions and action plans. In the PROTAGE system, Web services are implemented by AXIS2  and deployed on the platform, Tomcat 6.0.
In PROTAGE, server systems are designed with two main functions. Firstly, a server system can provide Web services which, corresponding to actions, carry out certain digital preservation tasks. For instance, the metadata extraction Web service may help user obtain the metadata of digital objects. Secondly, a server system acts as the archives for storing preserved digital objects.
- External tools. These are a set of external third–party tools used to fulfil certain digital preservation tasks. In a server system, the external tools may be updated frequently so as to make sure the latest tools are used.
- Web services. Unlike those specialized ones provided in access points for accessing the remote PROTAGE repository, these Web services are developed for PROTAGE users to use third–party digital preservation tools which may not be available on client systems. Therefore, these Web services may be publicly available for ordinary users that may not use the PROTAGE system.
- FTP server service. To some Web services, if users want to use them, they have to first upload their digital objects onto the server and then download the generated new files, if any, back to their client systems. For this reason, the PROTAGE server system has also to play the role of an FTP server. Note that the uploaded objects in this case will be removed after the service.
- Storage space. The storage space on PROTAGE server systems is used to (1) store, as the archives, digital objects that need to be permanently preserved; and, (2) temporarily store digital objects for using Web services only.
Development and status of PROTAGE
The PROTAGE project adopts an iterative and incremental process to implement the targeted system, which begins with identification and analysis of user needs and functional requirement analysis, followed by technical specifications, implementation and system testing. In particular, upon analysis of user needs, a number of typical digital preservation scenarios have been identified, covering the overall process of long–term digital preservation. So far, the third prototype of the PROTAGE system has been released in June 2010, which is available at the Web site of the PROTAGE project . Figure 2 presents a screenshot of the third prototype, where an action plan is being executed.
Figure 2: A screenshot of the third PROTAGE prototype, where an action plan is being executed.
As digital objects emerge as the primary way in which we create, disseminate, and exchange information, we have entered an era of digital information explosion. Many organizations and individuals have been suffering difficulty in keeping pace with the preservation demands of digital information, because the present digital preservation solutions are usually labour intensive and often require expertise. Therefore, next generation preservation solutions for digital information should have a high level of automation and self–reliance. In this paper, we have introduced how PROTAGE, an EU–funded project, attempts to achieve the above objectives based on the newly emerged, but very promising, Web services and software agent technology. We have introduced the methodology of the PROTAGE project to long–term digital preservation. More specifically, we present the general framework of the PROTAGE system as well as its three types of key subsystems, namely, client systems, access points, and server systems. Finally, we introduce the implementation and status of the PROTAGE system.
About the authors
Xiaolong Jin obtained the Ph.D. degree in computer science from Hong Kong Baptist University in 2005. He obtained the M.Eng. degree, also in computer science, from the Chinese Academy of Sciences in 2001 and the B.Sc. degree in applied mathematics from the Beijing University of Aeronautics and Astronautics in 1998, respectively. His current research interests include artificial intelligence, multi–agent systems, distributed problem solving, performance modelling and evaluation, communication networks, resource allocation and optimization, peer–to–peer computing, grid computing, and autonomy oriented computing.
E–mail: x [dot] jin [at] bradford [dot] ac [dot] uk
Jianmin Jiang received the B.Sc. degree from the Shandong Mining Institute, China, in 1982, the M.Sc. degree from the China University of Mining and Technology in 1984, and the Ph.D. degree from the University of Nottingham, Nottingham, U.K., in 1994. His research interests include visual information retrieval, image/video processing, visual content management, Internet video coding, stereo image coding, and neural network applications. He has published more than 150 refereed research papers. Dr. Jiang is a Fellow of the Institution of Engineering and Technology (IET) and the RSA U.K.
E–mail: j [dot] jiang1 [at] bradford [dot] ac [dot] uk
Geyong Min is a Reader in the Department of Computing at the University of Bradford, U.K. He received the PhD. degree in computing science from the University of Glasgow, U.K., in 2003, and the B.Sc. degree in computer science from the Huazhong University of Science and Technology, China, in 1995. His research interests include performance modelling and evaluation, computer networking, traffic engineering, mobile computing, and multimedia systems.
E–mail: g [dot] min [at] bradford [dot] ac [dot] uk
L. Braubach, W. Lamersdorf, and A. Pokahr, 2003. “Jadex: Implementing a BDI-infrastructure for JADE agents,” EXP — in search of innovation, volume 3, number 3, pp. 76–85.
A. Brown, 2007. “Developing practical approaches to active preservation” International Journal of Digital Curation, volume 1, number 2, pp. 3–11.http://dx.doi.org/10.2218/ijdc.v2i1.10
S. Chapman, 2004. “Counting the costs of digital preservation: Is repository storage affordable?” Journal of Digital Information, volume 4, number 2, at http://journals.tdl.org/jodi/article/viewArticle/100, accessed 7 October 2010.
K. Chmiel, M. Gawinecki, P. Kaczmarek, M. Szymczak, and M. Paprzycki, 2005. “Efficiency of JADE agent platform,” Scientific Programming, volume 13, number 2, pp. 159–172.
S. Cohen, D. Naor, L. Ramati, and P. Reshef, 2006. “Towards OAIS–based preservation aware storage — A white paper,” Technical report, IBM Haifa Research Labs.
A. Farquhar and H. Hockx–Yu, 2007. “Planets: Integrated services for digital preservation,” International Journal of Digital Curation, volume 2, number 2, pp. 88–99.http://dx.doi.org/10.2218/ijdc.v2i2.31
M. Ferreira, A. Baptista, and J. Ramalho, 2007. “An intelligent decision support system for digital preservation,” International Journal on Digital Libraries, volume 6, number 4, pp. 295–304.http://dx.doi.org/10.1007/s00799-007-0013-x
D. Giaretta, 2006. “CASPAR and a European infrastructure for digital preservation,” ERCIM News, number 66, pp. 47–49.
J. Hunter and S. Choudhury, 2006. “PANIC: An integrated approach to the preservation of composite digital objects using semantic Web services,” International Journal on Digital Libraries, volume 6, number 2, pp. 174–183.http://dx.doi.org/10.1007/s00799-005-0134-z
N. Jennings, 2001. “An agent–based approach for building complex software systems,” Communications of the ACM, volume 44, number 4, pp. 35–41.http://dx.doi.org/10.1145/367211.367250
Y. Joannidis, 2005. “Digital libraries from the perspective of the DELOS network of excellence,” Proceedings of the 2005 IEEE International Symposium on Mass Storage Systems and Technology, pp. 51–55.
M. Luck, P. McBurney, O. Shehory, S. Willmott and the AgentLink community, 2005. “Agent technology: Computing as interaction (A roadmap for agent based computing),” at http://www.agentlink.org/roadmap/al3rm.pdf, accessed 7 October 2010.
P. Mellor, P. Wheatley, and D. Sergeant, 2002. “Migration on request: A practical technique for digital preservation,” Lecture Notes in Computer Science, volume 2458, pp. 516–526.http://dx.doi.org/10.1007/3-540-45747-X_38
J. Ramalho, M. Ferreira, L. Faria, R. Castro, F. Barbedo, and L. Corujo, 2008. “RODA and CRiB: A service–oriented digital repository,” Proceedings of the Fifth International Conference on Preservation of Digital Objects (iPRES 2008), at http://www.bl.uk/ipres2008/presentations_day2/37_Ramalho.pdf, accessed 7 October 2010.
K. Snow, B. Ballaux, B. Christensen–Dalsgaard, H. Hofman, J. Hansen, P. Innocenti, M. Nielsen, S. Ross, and J. Thogersen, 2008. “Considering the user perspective: Research into usage and communication of digital information,” D–Lib Magazine, volume 14, numbers 5/6, at http://www.dlib.org/dlib/may08/ross/05ross.html, accessed 7 October 2010.
P. Watry, 2007. “Digital preservation theory and application: Transcontinental persistent archives testbed activity,” International Journal of Digital Curation, volume 2, number 2, pp. 41–68.http://dx.doi.org/10.2218/ijdc.v2i2.28
Received 10 July 2010; accepted 28 September 2010.
Copyright © 2010, First Monday.
Copyright © 2010, Xiaolong Jin, Jianmin Jiang, and Geyong Min.
A software agent and Web service based system for digital preservation
by Xiaolong Jin, Jianmin Jiang, and Geyong Min.
First Monday, Volume 15, Number 10 - 4 October 2010