Publishing stacks of images and text

Imagine a universe, parallel and not so distant from ours, in which an editor could create a list of URIs from various repositories that could be parsed by a web interface to bring all of the content and associated metadata referenced by this list into one space to be viewed together and operated on by the same set of tools.

When MITH was working on the Shakespeare Quartos Archive (SQA), we envisioned something like this universe. We worked to separate our content (image files and TEI documents) from the application in order to make the content reusable by other (non-MITH) interfaces, and also to develop an interface that could be used for other (non-Shakespearean) content. To this end we published the raw images, tiles for deep zoom, and TEI-XML files at stable URIs on our homepage so they could be used outside of the interface we designed. We adapted the code, with relatively little effort, for use in an interface for another project (soon to be publicly released) with very different content.

The separation of content and interface code meant that no references to data should be hard coded into the interface software. Instead, these references were kept in a set of informational XML documents, which we called “manifest files”, that provided a sort of table of contents for the interface software so that it knew which images to use for which quarto and in what order they should be presented. The files were intended for internal use only, and we did not create a DTD (or any sort of schema) to define our tags. In the months that followed, though, I became aware that many other projects, including some of our TILE partners, were either developing or in search of a similar method for organizing digital content for web publication. Because this problem is obviously related to the work of linking images and text, and because we here at TILE are long overdue for a new blog entry, I have decided to use this space to discuss our solution.

There are, it should be noted, several well-documented metadata standards that provide some of our specified functionality. TEI, for instance, provides the “facsimile” tag that can be associated with any part of the encoded text. METS, likewise, allows digital resources to be grouped together and the relationship among the items to be defined. In some cases, these, or similar existing standards, would have been sufficient to describe the relationships among digital assets. However, we wanted to create very small files that could be quickly loaded and parsed based on dynamic user interaction and stored in memory without unreasonably slowing performance.

To this end we designed a system in which the interface only loads the data the user needs or might be expected to need in the very near future. This approach is similar to the image-serving technique often called “deep zoom”.

[Quick explanation of deep zoom for images, skip if you understand]

A deep-zoom viewer first loads a fairly low-resolution image. If the user were to zoom in on this image with a “normal” image viewer (like iPhoto) the picture would quickly become blurry and pixilated. In a deep zoom interface, though, an image of slightly higher resolution is loaded instead. However, because at this zoom level the entire image cannot be viewed at once on the user’s monitor, rather than wastefully loading all of the image data the system loads only the region currently in view (usually with a border of extra tiles to allow for fast panning outside of the view space). This is accomplished by cutting copies of the image at various resolutions into sets of small squares (or tiles); more tiles are generated for images of higher resolutions and fewer for lower resolution versions. A pyramid structure is thereby created with a small, low resolution, and undivided image at the top and a very high resolution image subdivided into lots and lots of little tiles at the bottom.

[Schema discussion continues from above]

Like the top of a deep zoom image pyramid, the highest level manifest file in our repository contains pointers to the manifest files for each quarto in our collection. It is expressed in XML and has as a root tag which we named “doc” with an attribute that specifies the base directory path for all of the manifest files:

<doc base="./manifest/">

The tag contains many “manifest” tags which each point to another xml file for each book in our collection along with some basic metadata. For example:

<manifest uri="ham-1603-22275x-bli-c01-manifest.xml"> <title id="TITLE:" name="The tragedy of Hamlet Prince of Denmarke: an electronic edition."/> <info id="INFO:" data="BL C.34.k.1"/> <notes id="NOTES:" con="Transcription of the printed text and annotations created from digital images of the copy of the 1603 quarto. Annotations were checked against the original."/> <start page="9"/> </manifest>
If the user selects this book, the file “ham-1603-22275x-bli-c01-manifest.xml” is then loaded. This file again contains the contains a root tag “doc” with an attribute that points to the directory relevant to this quarto:
<doc base="ham-1603-22275x-bli-c01"></div>

that contains a set of “page” tags, each of which has a pointer to the full size image, the directory in which the tiles are stored for deep zoom, and the location of the TEI files for both the verso [left hand] and recto [right hand] page in the image.
<page> <img>ham-1603-22275x-bli-c01-006.jpg</img> <tiledir>ham-1603-22275x-bli-c01-006-tiles</tiledir> <ver sig=''> <xml>page_ham-1603-22275x-bli-c01-006a.xml</xml> </ver> <rec sig=''> <xml>page_ham-1603-22275x-bli-c01-006b.xml</xml> </rec> </page>

We cheat a bit and do not provide a manifest file for the tile directories because they are each organized exactly the same way (according to the file structure generated by the tile cutting program we used). To be more interoperable, though, there should probably at least be an attribute somewhere in the quarto-level manifest that identifies the format of the tile directories (there are several popular ones including the Tile Map Service (TMS) and Microsoft’s Deep Zoom Image (DZI) format).

Lessons learned

It should be noted that, despite our attempts to compact the XML and load only what is requested, the Quartos prototype interface is nonetheless slower than we would like. There are several reasons for this which we hope to correct in future iterations, but the process could arguably be made even more efficient if the XML files were parsed into a database and even smaller snippets of data served to interface rather than loading ad parsing the still substantial XML file for all of the images in the set. Of course, this approach would require a new AJAX call to the remote database with every turn of the page, but, in fact, the server is already accessed at each such interaction in order to load the image tiles in any case.

However, the XML files do provide a way for us to publish our manifest data so that it can be used by other interfaces. Without needing access to our database, a tool developer could access these manifest files and know exactly where to find each item in our collection. Ideally, such manifest files would be a common feature of each repository and generated according to a metadata schema. The manifest files could then be ingested by tools like the TILE image tagger. Such a schema should be more general and robust than the one we built for SQA, but it should be at least as lightweight. If anyone knows of such a schema or would like to work to create one, I welcome suggestions in the comments.

Publishing stacks of images and text

Pages

Categories