First Monday

A software agent and Web service based system for digital preservation by Xiaolong Jin, Jianmin Jiang, and Geyong Min

Digital objects have extensively existed in daily work and life. Some of them often need to be kept accessible and usable for a relatively long period of time. Therefore, digital preservation has emerged as a pressing demand for the communities of archives, libraries, and publishers, and even for ordinary computer users. However, compared to traditional papery and magnetic preservation, digital preservation poses novel challenges to these communities. In this paper, we briefly introduce how the challenges are addressed in the PROTAGE system developed by integrating the widely adopted software agent and Web service technologies.


The PROTAGE system
Development and status of PROTAGE




In a digital era, digital objects have gradually emerged as the primary means in which we create, disseminate, and exchange information (Farquhar and Hockx–Yu, 2007; Giaretta, 2006). Consequently, digitized information has been rapidly changing the ways we work, live, and play. Since the volume of digital information is growing at an explosive speed, is a pressing demand to transfer digital objects from various IT systems to digital repositories, libraries, and archives for long–term preservation. However, long–term archiving of digital objects is obviously a highly complicated task because of the rapid changes and ongoing development in hardware and software systems as well as the ICT infrastructure. Furthermore, the diversity in the size and complexity of digital objects implies that modern digital preservation systems must be highly scalable and adaptable to various types of digital objects. However, existing strategies of digital preservation are labour intensive and often require specialist skills. Therefore, contemporary digital preservation calls for new levels of efficiency and automation. For this reason, long–term digital preservation has been attracting significant efforts from both academia and industry (e.g., Chapman, 2004; Cohen, et al., 2006; Farquhar and Hockx–Yu, 2007; Giaretta, 2006; Joannidis, 2005; Snow, et al., 2008; Watry, 2007). A number of research and development projects have been initiated to investigate and implement long–term preservation of digital objects, such as, PANIC (Hunter and Choudhury, 2006), CAMiLEON (Mellor, et al., 2002), RODA (Ramalho, et al., 2008), and CRiB (Ferreira, et al., 2007). The European Union has also funded quite a few projects since its Fifth Framework Research Programme, such as, PLANETS (Brown, 2007), CASPAR (Cohen, et al., 2006), and SHAMAN (Watry, 2007).

Web services are a type of Web applications that can be published and used via the Internet. For example, through a Web service called “Getting weather forecast information” Internet users can get the forecasted information of the weather in the coming day(s). Web services, which are usually implemented by Web Service Description Language (WSDL) and Simple Objects Access Protocol (SOAP), and deployed in Service Oriented Architectures (SOA), allow data and applications to interact with each other but without human intervention through dynamic and ad hoc connections. It has been demonstrated that Web services technology can be applied into a broad variety of architectures. Particularly, it can coexist with other technologies and software design approaches. More importantly, Web services technology can be employed in an evolutionary manner without requiring major transformations to legacy applications and databases. So far, lots of traditional locally–used applications have been made available on the Internet as Web services.

Agent–oriented computing has been regarded as a promising computing paradigm for developing and implementing complex, distributed software systems, as this paradigm based on software agents and multi–agent systems enables software engineers to model applications in a natural way that resembles how humans perceive the problem domains (Braubach, et al., 2003; Chmiel, et al., 2005; Jennings, 2001). Software agent technology has been successfully applied in many industrial and commercial areas (Luck, et al., 2005). It has also gained great success in studying complex physical and social problems. For example, multi–agent systems have been successfully adopted to investigate the impact of climate change on biological populations and the influence of public policy options on social or economic behaviour.

PROTAGE (PReservation Organization using Tools in AGent Environments), funded by the European FP7 Research Programme, aims to employ Web services and software agent technology to computerize long–term digital preservation. It intends to advance the level of efficiency and automation in digital preservation such that both memory institution users and ordinary users can readily preserve their digital objects while reducing the preservation cost. PROTAGE develops flexible and extensible Web services and software agent tools for long–term digital preservation and access, which can cooperate with and be integrated in existing or new digital preservation systems. In this paper we briefly introduce the PROTAGE’s methodology for long–term digital preservation. More specifically, we present the general framework of the PROTAGE system as well as its three types of key subsystems, namely, client systems, access points, and server systems. Next, we describe the three main stages of the typical digital preservation process in the PROTAGE system. Note that due to space limitation this paper will not elaborate any technical details.



The PROTAGE system

The PROTAGE system is essentially an Internet–based, decentralized system. Figure 1 presents its general framework, where three types of subsystems are involved, namely, client systems, access points, and server systems.

Users enjoy the service of the PROTAGE system via client systems. PROTAGE server systems provide publicly available Web services to client systems and act also as the archives, while access points are deployed to manage digital preservation related actions and action plans. Here, an action is actually a function used to carry out a specific digital preservation task, such as, extracting the metadata of a digital object; scanning and cleaning viruses; migrating a file from one format to another. An action may be developed locally or obtained by wrapping existing Web services. An action plan is essentially a sequence of actions for fulfilling a complete digital preservation process. The PROTAGE system is organized in hybrid architecture. Specifically, client systems and access points, as well as client systems and server systems are connected through client/server structures, respectively. On the other hand, all client systems are organized in a peer–to–peer structure.

In Figure 1, the solid, dash–dotted, and dashed lines with arrows represent the communications between clients and access points, clients and clients, and clients and servers, respectively. PROTAGE client systems communicate with access points to submit/retrieve certified actions or action plans. The communications between PROTAGE client and server systems aim to access Web services on server systems and upload (or, download) digital objects to (or, from) them. Finally, as the important feature of the PROTAGE system, the client systems are organized in a peer–to–peer structure. Through peer–to–peer communications, a client system can ask for actions or action plans from other existing client systems. It may also provide its own actions and action plans to other client systems.


Figure 1: The general framework of the PROTAGE system
Figure 1: The general framework of the PROTAGE system.


PROTAGE client systems

A PROTAGE client system usually contains six main components. In what follows, we briefly introduce the functionality of each component:

Note that in PROTAGE the multi–agent system and the workflow engine are the major components for automating the digital preservation.

Access points

Access points are primarily responsible for two tasks: (1) to manage the detailed profiles of existing users and the certified actions and action plans; and, (2) to provide Web services for accessing user profiles, actions, and action plans. In the PROTAGE system, there may exist multiple access points. An access point contains two main components, namely, remote repository and Web services.

Server systems

In PROTAGE, server systems are designed with two main functions. Firstly, a server system can provide Web services which, corresponding to actions, carry out certain digital preservation tasks. For instance, the metadata extraction Web service may help user obtain the metadata of digital objects. Secondly, a server system acts as the archives for storing preserved digital objects.



Development and status of PROTAGE

The PROTAGE project adopts an iterative and incremental process to implement the targeted system, which begins with identification and analysis of user needs and functional requirement analysis, followed by technical specifications, implementation and system testing. In particular, upon analysis of user needs, a number of typical digital preservation scenarios have been identified, covering the overall process of long–term digital preservation. So far, the third prototype of the PROTAGE system has been released in June 2010, which is available at the Web site of the PROTAGE project [2]. Figure 2 presents a screenshot of the third prototype, where an action plan is being executed.


Figure 2: A screenshot of the third PROTAGE prototype, where an action plan is being executed
Figure 2: A screenshot of the third PROTAGE prototype, where an action plan is being executed.





As digital objects emerge as the primary way in which we create, disseminate, and exchange information, we have entered an era of digital information explosion. Many organizations and individuals have been suffering difficulty in keeping pace with the preservation demands of digital information, because the present digital preservation solutions are usually labour intensive and often require expertise. Therefore, next generation preservation solutions for digital information should have a high level of automation and self–reliance. In this paper, we have introduced how PROTAGE, an EU–funded project, attempts to achieve the above objectives based on the newly emerged, but very promising, Web services and software agent technology. We have introduced the methodology of the PROTAGE project to long–term digital preservation. More specifically, we present the general framework of the PROTAGE system as well as its three types of key subsystems, namely, client systems, access points, and server systems. Finally, we introduce the implementation and status of the PROTAGE system. End of article


About the authors

Xiaolong Jin obtained the Ph.D. degree in computer science from Hong Kong Baptist University in 2005. He obtained the M.Eng. degree, also in computer science, from the Chinese Academy of Sciences in 2001 and the B.Sc. degree in applied mathematics from the Beijing University of Aeronautics and Astronautics in 1998, respectively. His current research interests include artificial intelligence, multi–agent systems, distributed problem solving, performance modelling and evaluation, communication networks, resource allocation and optimization, peer–to–peer computing, grid computing, and autonomy oriented computing.
E–mail: x [dot] jin [at] bradford [dot] ac [dot] uk

Jianmin Jiang received the B.Sc. degree from the Shandong Mining Institute, China, in 1982, the M.Sc. degree from the China University of Mining and Technology in 1984, and the Ph.D. degree from the University of Nottingham, Nottingham, U.K., in 1994. His research interests include visual information retrieval, image/video processing, visual content management, Internet video coding, stereo image coding, and neural network applications. He has published more than 150 refereed research papers. Dr. Jiang is a Fellow of the Institution of Engineering and Technology (IET) and the RSA U.K.
E–mail: j [dot] jiang1 [at] bradford [dot] ac [dot] uk

Geyong Min is a Reader in the Department of Computing at the University of Bradford, U.K. He received the PhD. degree in computing science from the University of Glasgow, U.K., in 2003, and the B.Sc. degree in computer science from the Huazhong University of Science and Technology, China, in 1995. His research interests include performance modelling and evaluation, computer networking, traffic engineering, mobile computing, and multimedia systems.
E–mail: g [dot] min [at] bradford [dot] ac [dot] uk







L. Braubach, W. Lamersdorf, and A. Pokahr, 2003. “Jadex: Implementing a BDI-infrastructure for JADE agents,” EXP — in search of innovation, volume 3, number 3, pp. 76–85.

A. Brown, 2007. “Developing practical approaches to active preservation” International Journal of Digital Curation, volume 1, number 2, pp. 3–11.

S. Chapman, 2004. “Counting the costs of digital preservation: Is repository storage affordable?” Journal of Digital Information, volume 4, number 2, at, accessed 7 October 2010.

K. Chmiel, M. Gawinecki, P. Kaczmarek, M. Szymczak, and M. Paprzycki, 2005. “Efficiency of JADE agent platform,” Scientific Programming, volume 13, number 2, pp. 159–172.

S. Cohen, D. Naor, L. Ramati, and P. Reshef, 2006. “Towards OAIS–based preservation aware storage — A white paper,” Technical report, IBM Haifa Research Labs.

A. Farquhar and H. Hockx–Yu, 2007. “Planets: Integrated services for digital preservation,” International Journal of Digital Curation, volume 2, number 2, pp. 88–99.

M. Ferreira, A. Baptista, and J. Ramalho, 2007. “An intelligent decision support system for digital preservation,” International Journal on Digital Libraries, volume 6, number 4, pp. 295–304.

D. Giaretta, 2006. “CASPAR and a European infrastructure for digital preservation,” ERCIM News, number 66, pp. 47–49.

J. Hunter and S. Choudhury, 2006. “PANIC: An integrated approach to the preservation of composite digital objects using semantic Web services,” International Journal on Digital Libraries, volume 6, number 2, pp. 174–183.

N. Jennings, 2001. “An agent–based approach for building complex software systems,” Communications of the ACM, volume 44, number 4, pp. 35–41.

Y. Joannidis, 2005. “Digital libraries from the perspective of the DELOS network of excellence,” Proceedings of the 2005 IEEE International Symposium on Mass Storage Systems and Technology, pp. 51–55.

M. Luck, P. McBurney, O. Shehory, S. Willmott and the AgentLink community, 2005. “Agent technology: Computing as interaction (A roadmap for agent based computing),” at, accessed 7 October 2010.

P. Mellor, P. Wheatley, and D. Sergeant, 2002. “Migration on request: A practical technique for digital preservation,” Lecture Notes in Computer Science, volume 2458, pp. 516–526.

J. Ramalho, M. Ferreira, L. Faria, R. Castro, F. Barbedo, and L. Corujo, 2008. “RODA and CRiB: A service–oriented digital repository,” Proceedings of the Fifth International Conference on Preservation of Digital Objects (iPRES 2008), at, accessed 7 October 2010.

K. Snow, B. Ballaux, B. Christensen–Dalsgaard, H. Hofman, J. Hansen, P. Innocenti, M. Nielsen, S. Ross, and J. Thogersen, 2008. “Considering the user perspective: Research into usage and communication of digital information,” D–Lib Magazine, volume 14, numbers 5/6, at, accessed 7 October 2010.

P. Watry, 2007. “Digital preservation theory and application: Transcontinental persistent archives testbed activity,” International Journal of Digital Curation, volume 2, number 2, pp. 41–68.


Editorial history

Received 10 July 2010; accepted 28 September 2010.

Copyright © 2010, First Monday.
Copyright © 2010, Xiaolong Jin, Jianmin Jiang, and Geyong Min.

A software agent and Web service based system for digital preservation
by Xiaolong Jin, Jianmin Jiang, and Geyong Min.
First Monday, Volume 15, Number 10 - 4 October 2010