support, OpenType fonts, 3D graphics, audio/video
content, and consistency with other PDF file formats
like PDF/X (printing), PDF/E (engineering) and
PDF/UA (universal accessibility). PDF/A-2 is set to
be released in late 2009 or early 2010 (Fluckinger).
4 PDF/A = DAS
A digital archival surrogate (or DAS) is a hybrid of
archival master and high-quality digital surrogate.
PDF/A has the potential of acting as a true DAS that
can deal with the inherent complexities of
manuscript material. PDF/A can preserve the
structure, layout, and visual appearance of a
manuscript. The manuscript can be read as a textual
document or viewed as a visual object.
PDF/A, via Adobe Acrobat Reader, provides for
a variety of ways to view the document – single,
multi-page, and/or side-by-side. This eliminates or
greatly reduces any potential loss of context or
meaning.
The Acrobat Reader browser allows for non-
linear navigation. The Reader makes it simple to
page forward or back, or jump to a selected page.
The thumbnail function also allows for a simple
means to navigate quickly between several pages.
The magnification function of Acrobat Reader
allows for close, detailed examination of the
document. The high-resolution of the images offer a
high-quality replication.
PDF/A offers a means to embed documentary
information regarding the original manuscript using
Extensible Metadata Platform (XMP) functionality.
Essentially any metadata standard, such as Dublin-
Core, could be used to create a record containing
information on the creator, date, description, format,
identifier, language, publisher, relation, rights,
source, subject, and title or any other data that was
needed.
4.1 Concerns Regarding Adaptation
The potential obstacles to PDF/A acceptance as a
DAS are file size and efficiency as a web-
deliverable surrogate, and questions of conversion
and migration. Another concern, albeit minor, is the
limitations caused by the complexities of manuscript
material in what sub-class of PDF/A can be used for
DAS.
4.1.1 Web Deliverability
PDF/A files are web-deliverable and can be read
with Adobe Acrobat Reader or a plug-in version of
the reader for a web browser. Since PDF/A is
backward compatible, it provides an added
advantage if a user is without the latest version of
Adobe Acrobat Reader. The problem for the use of
PDF/A, though, is that some of the files can be quite
large, which can make it hard for their transfer over
the web.
With the proliferation of high-speed, broadband
Internet access, file size is less of an issue. Once a
100MB+ document would be a daunting task to
deliver via the Web, this, though, has changed over
the past few years. The substantial increases in
electronic transfer rates, sending megabytes per
second, have made large document delivery,
although not instantaneous, quite tolerable.
The next generation of PDF/A, PDF/A-2, will go
further to change attitudes towards web delivery
with its use of JPEG2000 technology. JPEG 2000 is
an ISO standard that has been published as ISO/IEC
15444. JPEG2000 allows for LZW (lossless)
compression of large, high-resolution images.
Compression will reduce file size significantly,
which in turn will exponentially increase the
efficiency of web delivery for PDF/A files.
4.1.2 Future Conversion & Migration
When adapting a new file standard, questions of
future conversion and migration need to be asked.
PDF/A is an ISO standard with a requirement for
backward compatibility. It will be readable and
accessible by future versions of Acrobat Reader or
other PDF viewers.
PDF/A files can be converted to other formats,
like TIFF, using commercial software programs like
Adobe Acrobat or freeware such as MyMorph
[http://docmorph.nlm.nih.gov/docmorph/] developed
by the United States National Library of Medicine.
Pages from a multi-page PDF/A can also be
abstracted and converted using available software.
Future conversion and migration of PDF/A, if
needed, will not be an issue as long as PDF/A
continues to be an ISO standard, and as long as long
as PDF technology continues to be an industry-wide
standard.
4.1.3 Use of PDF/A-1b as DAS
The strong visual nature of manuscript material
would make the sub-class PDF/A-1b the only
current option for DAS since text attraction (via
OCR) would be made difficult due to the
handwritten characters. Handwritten text, currently,
is very difficult to convert using any standard OCR
capture program. PDF/A-1b, of course, meets all the
minimal but important standards for the format.
PDF/A - TOWARDS A TRUE DIGITAL ARCHIVAL SURROGATE (DAS) FOR DIGITAL MANUSCRIPT
COLLECTIONS
237