over the information about the mismatches has a po-
tential for the contribution to humanities; it will be
a powerful tool for a comparative analysis between
the transcribed Buddhist sutras and the woodblock-
printing ones.
In this paper, we give an introduction to the devel-
oping support system for making the documents using
the shot images and text files of the sutras, with the aid
of the version control software Subversion. Through
the experiment, we made sure that the system can be
used as a multiuser transcription support tool, but the
problem becomes how to integrate the workers’ re-
sults which might conflicts each other. We then pro-
pose the consolidation method as well as the working
model for resolving this problem.
2 THE SYSTEM
Kongoji Issaikyo is a collection of thousands of Bud-
dhist sutras which were transcribed nearly 1,000 years
ago and nowpossessed by Kongoji Temple in Japan in
good preservation. It is true that the sutra were written
in Chinese and currently most Japanese people (and
maybe Chinese) who are good command of Chinese
characters cannot understand the passages of the su-
tras, but Buddhism researchers regularly read these
historical materials to make clear the religious cir-
cumstances of those days. Such researchers consider
that the sutras will make a contribution to Buddhist
study or historical science of Japan. Although they
began to survey the documents decade ago, the re-
searchers has been energetically doing the work about
the checkup of the scriptures and the photographing
of all the descriptive content. The authors received a
number of shot images taken with a digital camera,
and since the we have been investigating the auto-
mated method for combining the images and existing
text files of Buddhist sutras.
In order to search the shot images of Buddhist su-
tra for a passage, we need the text document which is
in strict correspondence with the images. The most
established text files of Buddhist sutras are main-
tained by Chinese Buddhist Electronic Text Associ-
ation (CBETA, ) now. We refer to the data set as
CBETA texts. The files are based on Taisho Tripi-
taka which is derived from Buddhist sutras by wood-
block printing. The images of Buddhist scriptures and
the CBETA texts in our own hands look largely same,
though, we can easily find the difference by character
between the image and the text. Some of the differ-
ences are due to the transcription error such as a literal
error or an omitted error, but some may be the very
distinctions that the Buddhism researchers for throw-
ing light on, that is, the clue to the propagation of
Buddhist sutras and Buddhism. A goal of our study is
to supply a practical supportsystem for comparingthe
shot images and the relevant texts or for contrasting
the edited texts compatible with the Kongoji Issaikyo.
Based on this problem consciousness together
with the contents in hand, we are developing the data
management system and the interface for enabling a
user to read the shot image and the text all together
and modify the text so that the text may be compat-
ible with the image. By using the system, we will
be able not only to search images through a full-text
search of the text files but to get a foothold for learn-
ing the difference between transcribed Buddhist su-
tras and those of woodblock printing. Note that the
system are not for correcting the shot image, since
we have to show our respect for the historical mate-
rial. In addition we do not intend to send the modified
text to CBETA; we actually attempt to make digital
transcriptions of Kongoji Issaikyo or ancient Japanese
Buddhist sutras efficiently using CBETA texts.
A typical Buddhist sutra consists of a few thou-
sand Chinese letters. Particular two-letter idioms
such as the word meaning bodhisattva appear fre-
quently, and repetitive sequence of letters are often
seen. Whether it is transcribed or wood-block printed,
the number of letters in line is basically fixed and the
characters are arranged in a methodical fashion ver-
tically and horizontally. These properties of the doc-
uments imply that it is impossible only for a single
worker to finish the absolutely perfect text file. To en-
sure a quality of the text, we should pay attention to
the multiple users’ operations.
For supporting the multiple-user modification and
management of the documents, we put in use the ver-
sion control software Subversion (SVN, ) in our sys-
tem. Subversion was originally intended for the ver-
sion management of source files of software. The tool
tells us the difference of the contents between any two
points of time, which is suitable for managing and
displaying the distinction of the modified text files of
Buddhist sutras.
We are introducing several terms around Subver-
sion. A group of files maintained in a server for ver-
sion control is a repository, while a working copy is a
unit of file manipulation associated with a repository.
Note that the repository and the working copy have a
one-to-many relationship. An update means the oper-
ation where the latest version is sent from a repository
to a working copy. Conversely the action to transmit
the modification within a working copy to the repos-
itory is called a commit (used as a noun as well as a
verb). In doing a commit, the user, committer, can
leave a message which is commonly called a com-
TRANSCRIPTION SUPPORT SYSTEM USING SUBVERSION
151