PROGRAM: P-23
Title:
LONG-TERM ARCHIVE SYSTEM FOR UNIVERSITY-WIDE RESEARCH DATA PRESERVATION2Planning and Information Management Department, Kyoto University, Sakyo, Kyoto, 6068501, JAPAN
Abstract:
In 2013 and 2014, Japanese academia was shocked by high-profile incidents of scientific misconduct.
Members of the academy at research institutes and their colleagues at government offices and academic
societies were called upon to help with the urgent reconstruction and development of policy, guidelines,
and procedures to ensure research integrity. In particular, a mandate for preservation of research data
more than 10 years was issued for both researchers and research institutes. For many researchers, it is a
natural assumption that their research data should be kept for as long as possible, protected from data loss
and corruption due to any accidental or artificial reason. However, this becomes extremely difficult to
achieve when data preservation is mandated to every researcher. Ensuring the availability and integrity of
research data for more than 10 years goes beyond an individual researcher’s personal IT skills. Because of
such circumstances, Kyoto University and its central IT division (Institute for Information Management
and Communication, IIMC) decided to develop and provide a long-term data preservation system.
IIMC designed a stable and cost-effective research data archiving system in FY2016. This system consists
of an enterprise content management (ECM) system and an optical disc storage system. The schematic
concept is shown in Fig. 1. ECM provides user interface for document management, such as "access
control and auditing", "metadata tagging", "revision management" and "searchable content and metadata".
However, it is difficult to secure research data for the long-term with ECM due to the shorter lifetime of
ECM system hardware, software and database structure compared to the required time for preservation.
This problem could be solved by connecting ECM with other long-term preservation archiving systems in
which retrieved data is archived on classical and open data formats and file systems. For this system, the
IIMC utilizes an Oracle WebCenter Content (OWCC) and FUJITSU Eternus DA700 data archiver.
OWCC is an instance of ECM software which provides the requisite functions mentioned above. The
DA700 is a disc array system consisting of an Archival Disc. The Archival Disc is 'write once read many
(WORM)' media and guarantees more than 50 years of data preservation time. Moreover, discs are
assembled in a cartridge and may incorporate RAID5 or 6 to improve redundancy.
The typical scenario for data operation and archiving is as follows.
- Users can create folders and upload their research data on OWCC. Users may organize their research data using OWCC functionality, such as tagging metadata, utilizing revision control, or sharing collaborators for local use.
- A user can issue the 'archive' command on any folder under his/her administration. The archive command retrieves all content within a given folder and its descendants. These contents are copied to DA700 with additional information such as metadata, an access control list, etc. If content has several revisions on OWCC, the content’s owner can choose copying the latest version only or all revisions to DA700.
- When the data copy from OWCC to DA700 is finished, index information on DA700 is included with the source content as metadata. Additionally, the access to the source content on OWCC is set to 'read only,' including for the owner of the content. This process ensures the contents on OWCC and DA700 are the same. The content owner may retrieve write or administrative access control with several steps on OWCC, and make a copy on DA700 again. This feature enables the user to keep archive revisions on DA700, as well as reducing frequent copying to DA700.