Publishing stacks of images and text

dreside — Wed, 09 Jun 2010 00:12:52 +0000

Imagine a universe, parallel and not so distant from ours, in which an editor could create a list of URIs from various repositories that could be parsed by a web interface to bring all of the content and associated metadata referenced by this list into one space to be viewed together and operated on by the same set of tools.

When MITH was working on the Shakespeare Quartos Archive (SQA), we envisioned something like this universe. We worked to separate our content (image files and TEI documents) from the application in order to make the content reusable by other (non-MITH) interfaces, and also to develop an interface that could be used for other (non-Shakespearean) content. To this end we published the raw images, tiles for deep zoom, and TEI-XML files at stable URIs on our homepage so they could be used outside of the interface we designed. We adapted the code, with relatively little effort, for use in an interface for another project (soon to be publicly released) with very different content.

The separation of content and interface code meant that no references to data should be hard coded into the interface software. Instead, these references were kept in a set of informational XML documents, which we called “manifest files”, that provided a sort of table of contents for the interface software so that it knew which images to use for which quarto and in what order they should be presented. The files were intended for internal use only, and we did not create a DTD (or any sort of schema) to define our tags. In the months that followed, though, I became aware that many other projects, including some of our TILE partners, were either developing or in search of a similar method for organizing digital content for web publication. Because this problem is obviously related to the work of linking images and text, and because we here at TILE are long overdue for a new blog entry, I have decided to use this space to discuss our solution.

There are, it should be noted, several well-documented metadata standards that provide some of our specified functionality. TEI, for instance, provides the “facsimile” tag that can be associated with any part of the encoded text. METS, likewise, allows digital resources to be grouped together and the relationship among the items to be defined. In some cases, these, or similar existing standards, would have been sufficient to describe the relationships among digital assets. However, we wanted to create very small files that could be quickly loaded and parsed based on dynamic user interaction and stored in memory without unreasonably slowing performance.

To this end we designed a system in which the interface only loads the data the user needs or might be expected to need in the very near future. This approach is similar to the image-serving technique often called “deep zoom”.

[Quick explanation of deep zoom for images, skip if you understand]

A deep-zoom viewer first loads a fairly low-resolution image. If the user were to zoom in on this image with a “normal” image viewer (like iPhoto) the picture would quickly become blurry and pixilated. In a deep zoom interface, though, an image of slightly higher resolution is loaded instead. However, because at this zoom level the entire image cannot be viewed at once on the user’s monitor, rather than wastefully loading all of the image data the system loads only the region currently in view (usually with a border of extra tiles to allow for fast panning outside of the view space). This is accomplished by cutting copies of the image at various resolutions into sets of small squares (or tiles); more tiles are generated for images of higher resolutions and fewer for lower resolution versions. A pyramid structure is thereby created with a small, low resolution, and undivided image at the top and a very high resolution image subdivided into lots and lots of little tiles at the bottom.

[Schema discussion continues from above]

Like the top of a deep zoom image pyramid, the highest level manifest file in our repository contains pointers to the manifest files for each quarto in our collection. It is expressed in XML and has as a root tag which we named “doc” with an attribute that specifies the base directory path for all of the manifest files:

The tag contains many “manifest” tags which each point to another xml file for each book in our collection along with some basic metadata. For example:

<info id="INFO:" data="BL C.34.k.1"/> <notes id="NOTES:" con="Transcription of the printed text and annotations created from digital images of the copy of the 1603 quarto. Annotations were checked against the original."/> <start page="9"/> </manifest> </code> If the user selects this book, the file “ham-1603-22275x-bli-c01-manifest.xml” is then loaded. This file again contains the contains a root tag “doc” with an attribute that points to the directory relevant to this quarto: <code> <doc base="ham-1603-22275x-bli-c01"></div></code> that contains a set of “page” tags, each of which has a pointer to the full size image, the directory in which the tiles are stored for deep zoom, and the location of the TEI files for both the verso [left hand] and recto [right hand] page in the image. <code> <page> ham-1603-22275x-bli-c01-006.jpg</img> <tiledir>ham-1603-22275x-bli-c01-006-tiles</tiledir> <ver sig=''> <xml>page_ham-1603-22275x-bli-c01-006a.xml</xml> </ver> <rec sig=''> <xml>page_ham-1603-22275x-bli-c01-006b.xml</xml> </rec> </page></code> We cheat a bit and do not provide a manifest file for the tile directories because they are each organized exactly the same way (according to the file structure generated by the tile cutting program we used). To be more interoperable, though, there should probably at least be an attribute somewhere in the quarto-level manifest that identifies the format of the tile directories (there are several popular ones including the Tile Map Service (TMS) and Microsoft’s Deep Zoom Image (DZI) format). Lessons learned It should be noted that, despite our attempts to compact the XML and load only what is requested, the Quartos prototype interface is nonetheless slower than we would like. There are several reasons for this which we hope to correct in future iterations, but the process could arguably be made even more efficient if the XML files were parsed into a database and even smaller snippets of data served to interface rather than loading ad parsing the still substantial XML file for all of the images in the set. Of course, this approach would require a new AJAX call to the remote database with every turn of the page, but, in fact, the server is already accessed at each such interaction in order to load the image tiles in any case. However, the XML files do provide a way for us to publish our manifest data so that it can be used by other interfaces. Without needing access to our database, a tool developer could access these manifest files and know exactly where to find each item in our collection. Ideally, such manifest files would be a common feature of each repository and generated according to a metadata schema. The manifest files could then be ingested by tools like the TILE image tagger. Such a schema should be more general and robust than the one we built for SQA, but it should be at least as lightweight. If anyone knows of such a schema or would like to work to create one, I welcome suggestions in the comments. </article> <article> <h1>How does TILE relate to TEI?</h1> dporter — Sat, 13 Mar 2010 13:00:58 +0000 One question that we frequently get about TILE is how it relates to TEI. TEI is the Text Encoding Initiative, the de facto standard (or, more properly, a set of flexible guidelines) for humanities text encoding. The most recent version to TEI, P5, includes guidelines for incorporating images into text editions: linking the TEI document to image files representing the document (either its source or, for a TEI document containing annotations, the object of those annotations), noting specific areas of interest on the images, and linking the areas of interest to sections of the TEI document corresponding to them (either transcribed text appearing in the images, or annotations on the images). So, how does TILE relate to TEI? Although the directors of the TILE project have long been involved with the TEI Consortium and are active users of TEI, and although the TEI community is one of the major intended audiences of TILE, TILE is not a TEI tool as such. It does not rely on TEI for its internal design and, unlike the Image Markup Tool (<a href="http://tapor.uvic.ca/%7Emholmes/image_markup/">http://tapor.uvic.ca/~mholmes/image_markup/</a>), which has as its output a single TEI-conformant document type, TILE is being designed to enable output in a variety of formats. Given the needs of the TILE partner projects, initially TILE will provide output in TEI (any flavour, including the EpiDoc customization), using facsimile or SVG for the image-linking mechanism, and in the IMT flavour of TEI, as well as in METS. However, when complete, TILE will be flexible enough to provide any output that can be defined using the TILE API – including output not in XML. One result of this flexibility is that, again unlike the IMT, TILE will not be “plug and play”, and processing of the output will be the responsibility of projects using the software. This will require a bit of work on the part of users. On the other hand, as a modular set of tools, TILE will be able to be incorporated into other digital editing software suites that would otherwise have to design their own text-image linking functionality or go without. We hope that the flexibility of output makes TILE attractive for the developers of other software, and that the variety of text-linking functionality is supplies will make it equally attractive to editors and other end-users. In a future blog post, we’ll discuss TILE functionality in detail. </article> <article> <h1>Some Thoughts on TILE Partner Projects</h1> John Walsh — Sat, 06 Mar 2010 15:26:51 +0000 <h1>Newton, Swinburne, Kirby: One of these things is not like the other?</h1> TILE is a community-driven effort, with many partners. As one of those partners, my role, at least as I see it, is to provide use case scenarios that help guide the development of the TILE tools, to implement the tools in the context of some projects that we hope will provide challenging testing environments, and to provide feedback that will lead to evolution and improvement of the TILE tools. I have other roles and responsibilities in TILE, related to earlier phases of tool design, metadata modeling, and such, but the bringing the tools to bear on my various projects is to me the most interesting and exciting part of TILE. The projects I bring to the table are <a href="http://www.chymistry.org">The Chymistry of Isaac Newton</a>, <a href="http://www.swinburneproject.org/">The Algernon Charles Swinburne Project</a>, and <a href="http://www.cbml.org/">Comic Book Markup Language</a> (or, CBML). Three projects, one on early modern science; another on Victorian poetry, fiction, and criticism; and a third on twentieth-century popular culture. Three admittedly diverse research projects. Given the range of topics covered by these three projects, folks sometimes wonder, and sometimes ask, What the hell do these projects have to with one another? How do they cohere as part of a unified research agenda. In this blog post, I’ll try to begin answering that question in a general sense and then look more specifically at what the projects, as a group, have to offer the TILE enterprise. The larger research agenda is not about Newton, Swinburne, or Kirby. (That’s <a href="http://en.wikipedia.org/wiki/Jack_Kirby">Jack Kirby</a>, by the way, one of the most influential creators in the history of comics. Working at Marvel comics, along with Stan Lee, Kirby transformed the comic book industry in the 1960s with a new “Marvel method” of creative collaboration and the development of characters such as the Fantastic Four, the Hulk, the Avengers, and others.) The larger research agenda is about exploring the digital representation of complex documents, not just texts, but documents—manuscripts; printed books; comic books; the original, annotated artwork for comic books—documents in all their glorious materiality. The various, often fading and messy inks of Newton’s manuscripts that make us wonder when or if Newton is using his own recipe “<a href="http://www.chymistry.org/mss/dipl/ALCH00110/f13r">To make excellent Ink</a>.” And then we have Swinburne’s poems. His <a href="http://www.purl.org/swinburnearchive/html/aicatlnt00/">Atalanta in Calydon</a>, with a <a href="http://www.rossettiarchive.org/docs/sa121.rap.html">binding</a> designed by <a href="http://www.rossettiarchive.org/">Dante Gabriel Rossetti</a>. The large blue foolscap paper on which Swinburne composed most of his works. The visual documents, artworks, by <a href="http://en.wikipedia.org/wiki/File:Whistler_James_Symphony_in_White_no_2_%28The_Little_White_Girl%29_1864.jpg">Whistler</a>, <a href="http://www.rossettiarchive.org/docs/s98.raw.html">Rossetti</a>, and <a href="http://en.wikipedia.org/wiki/File:Borghese_Hermaphroditus_Louvre_Ma231_n4.jpg">others</a>, that inspired many of Swinburne’s poems (“<a href="http://www.purl.org/swinburnearchive/html/pb1miror00/">Before the Mirror</a>,” “<a href="http://www.purl.org/swinburnearchive/html/pb1carol00/">A Christmas Carol</a>“, “<a href="http://www.purl.org/swinburnearchive/html/pb1hrmph00/">Hermaphroditus</a>“). Comic books with their yellowed newsprint held together by rusty staples, the panels of artwork and word balloons and narrative captions, the Sea Monkey advertisements, and fan mail. Any research into the many theoretical, technical, practical and other issues related to digital representations of complex document types would be seriously disadvantaged by a focus on a homogenous set of documents from any one particular historical period or genre. By examining 17th-century scientific manuscripts, 19th-century literary manuscripts and published books, and twentieth-century pop culture artifacts, I bring to the problem a reasonably diverse set of documents with a large and varied set of issues and challenges. And in the context of TILE, a Text and Image Linking Environment, these documents provide a rich suite of text-image relationships. In all cases, transcriptions of text need to be linked to facsimile page images. Newton’s manuscripts have additional graphic elements, in the form of alchemical symbols, diagrams, and Newton’s own pictorial illustrations. As mentioned above, Swinburne has poems inspired by visual art. Swinburne wrote a book-length study of Blake’s poetry, critical remarks on the Royal Academy Exhibition of 1868, “Notes on Designs of the Old Masters at Florence,” and a famous defense of Victorian artist Simeon Solomon. In these works, Swinburne’s texts share complex relationships with external, graphic documents. Comic books intricately weave together textual and graphic elements, and digital representation of these documents requires mechanisms to link these elements and describe the relationships. Rich textual-graphic relationships are one feature shared by these diverse document types. Many documents in these three projects also share a richness of authorial and editorial annotation. So with Newton, Swinburne, and CBML, we have three diverse projects being pursued under the umbrella of larger investigations into the issues related to representation of complex documents in digital space and in the context of larger, linked information environments. Our other TILE partner projects bring similarly complex documents. We hope this community of people and projects will provide a robust foundation on which to develop a widely usable suite of open source text-image linking tools. </article> <article> <h1>Layers 3 and 4</h1> dreside — Tue, 09 Feb 2010 13:01:40 +0000 In my <a href="http://mith.info/tile/2010/02/03/a-four-layer-model-for-image-based-editions/">last blog entry</a> I detailed the first two layers of a four-layer model for electronic editions and archives. The final two layers are detailed below: Level 3: Interface layer While stacks of multimedia files and transcripts in open repositories would, in some ways, improve the current state of digital libraries, interfaces are required if users are to do anything but simply access content a file at a time. Of course, interfaces can be very expensive to develop and tend to become obsolete very quickly. Unfortunately, the funding for interface development rarely lasts longer than a year or two, so the cost of maintaining a large code base usually falls to the hosting institution, which rarely has the resources to do so adequately. A new system and standard for interface development is required if interfaces are to be sustainably developed. Code modularization and reusability have long been ideals in software development, but have only been realized in limited ways in the digital humanities. Several large infrastructure projects, most notably <a href="http://seasr.org/">SEASR</a>, seek to provide a sustainable model for interoperable digital humanities tools, but have yet to achieve wide-scale adoption. Our model will follow the example of SEASR, but with a scope limited to web-based editions and archives, we may therefore impose some code limitations that more broadly intentioned projects could (and should) not. We propose a code framework for web-based editions, first implemented in JavaScript using the popular <a href="http://jquery.com/">jQuery library</a>, but adaptable to other languages when the prevalent winds of web development change. An instance of this framework is composed of a manifest file (probably in XML or JSON format) that identifies the locations of the relevant content and any associated metadata and a core file (similar to, but considerably leaner than, the core jQuery.js file at the heart of the popular JavaScript library) with a system of “hooks” onto which developers might hang widgets they develop for their own editions. A widget, in this context, is a program with limited functionality that provides well-defined responses to specific inputs. For example, one widget might accept as input a set of manuscript images and return a visualization of data about the handwriting present in the document. Another might simply adapt a deep zooming application, such as <a href="http://openlayers.org/">OpenLayers</a>, for viewing high resolution images and linking them to a textual transcript. Each widget should only depend on the core file and, if applicable, the content and other input data; no widget should directly depend on any other. If data must be passed from one widget to the next, the first widget should communicate with the core file that can then call an instance of the second one. It should be noted that we are, in fact, proposing to build something like a content management system at a time when the market for such systems is very crowded. Nonetheless, experience with the major systems (<a href="http://omeka.org/">Omeka</a>, <a href="http://drupal.org/">Drupal</a>, <a href="http://www.joomla.org/">Joomla</a>, etc.) has convinced us that while a few provide some of the functionality we require, none are suited for managing multimedia scholarly editions. Just as Omeka clearly serves a different purpose and audience than Drupal, so will our system meet the similar yet nonetheless distinct needs of critical editors. Level 4: User generated data layer Many recent web-based editions have made use of “web 2.0” technologies which allow users to generate data connected to the content. In many ways, this is the most volatile data in current digital humanities scholarship, often stored in hurriedly constructed databases on servers where considerations of scale and long-term data storage have been considered in only the most cursory fashion. Further, the open nature of these sites mean that it is often difficult to separate data generated by inexperienced scholars completing a course assignment from that of experts whose contributions represent real advances in scholarship. Our framework proposes the development of repositories of user-generated content, stored in a standard format, which will be maintained and archived. Of course, storing the data of every user who ever used any of the collections in the framework is impossible. We therefore propose that projects launch “sandbox” databases, out of which the best user-generated content may be selected for inclusion and “publication” in larger repositories. In some cases, these repositories may also store scholarly monographs that include content from a set of archives. Subscription fees may be charged for accessing these collections to ensure their sustainability. Conclusion It should be noted that much in the above model is already practiced by some of the best electronic editing projects. However, the best practices have not been articulated in a generalized way. Although we feel confident our model is a good one, it would be the height of hubris to call it “best practice” without further vetting from the community. That, dear reader, is where you come in. The comments are open. </article> <article> <h1>A four layer model for image-based editions</h1> dreside — Wed, 03 Feb 2010 19:33:19 +0000 Perhaps the most iconic sort of project in the literary digital humanities is the electronic edition. Unfortunately, these projects, which seek to preserve and provide access to important and endangered cultural artifacts, are, themselves, endangered. Centuries of experimentation with the production and preservation of paper have generated physical artifacts that, although fragile, can be placed in specially controlled environments and more or less ignored until a researcher wants to see them. On the other hand, only the most rudimentary procedures exist for preserving digital artifacts, and most require regular care by specialists who must convert, transfer, and update the formats to those readable by new technologies that are not usually backwards compatible. A new model is required. The multi-layered model pictured here will, we believe, be attractive to the community of digital librarians and scholars, because it clearly defines the responsibilities of each party and requires each to do only what they do best. Level 1: Digitization of Source materials The creation of an electronic edition often begins with the transfer of analog objects to binary, computer readable files. Over the last ten years, these content files (particularly image files) have proven to be among the most stable in digital collections. While interface code must regularly be updated to conform to the requirements of new operating systems and browser specifications, text and image file formats remain relatively unchanged, and even 20 year old GIFs can be viewed on most modern computers. The problem, then, lays not so much with the maintenance of these files but in their curation and distribution. For various reasons (mostly bureaucratic and pecuniary rather than technical), libraries have often attempted to limit access to digital content to paths that passed through proprietary interfaces. This protectionist approach to content prevents scholars from using the material in unexpected (though perhaps welcome) ways, and also endangers the continued availability of the content as the software that controls the proprietary gateways becomes obsolete. Moreover, these limitations are rarely able to prevent those with technical expertise (sometimes only the ability to read JavaScript code) from accessing the content in any case, and so nothing is gained, and (potentially) everything is lost by this approach. More recently, projects like the Homer Multitext Project, the Archimedes Palimpsest, and the Shakespeare Quartos Archive, have taken a more liberal approach to the distribution of their content. While each provides an interface specially designed for the needs of their audience, the content providers have also made their images available under a Creative Commons license at stable and open URIs. Granting agencies could require that content providers commit to maintain their assets at stable URIs for a specified period of time (perhaps 10-15 years). At the end of this period, the content provider would have the opportunity to either renew their agreement or move the images to a different location. The formats used should be as open and as commonly used as possible. Ideally, the library should also provide several for each item in the collection. A library might, for instance, chose to provide a full-size 300 MB uncompressed tiff image, a slightly smaller JPEG2000 image served via a Djatoka installation, or a set of tiles for use by “deep zooming” image viewers such open layers. Level 2: Metadata The files and directories in level 1 should be as descriptive as possible and named using a regular and easily identifiable progression (e.g. “Hamlet_Q1_bodley_co1_001.tif”); however, all metadata external to the file itself should be considered part of level 2. Following Greene and Meissner’s now famous principle of “More Product, Less Process”, we propose that all but the most basic work of identification of content should be located in the second level of the model, and possibly performed by institutions or individuals not associated with the content provider at level 1. The equipment for digitizing most analog material is now widely available and many libraries have developed relatively inexpensive and efficient procedures for the work, but in many cases there is considerable lag time between the moment the digital surrogates are generated and the moment they are made publicly available. Many content providers feel an obligation to ensure that their assets are properly cataloged and labeled before making them available to their users. While the impulse towards quality assurance and thorough work is laudable, a perfectionist policy that delays publication of preliminary work is better suited for immutable print media than an extensible digital archive. In our model, content providers need not wait to provide content until it has been processed and catalogued. Note also that debates about the proper choice or use of metadata may be contained at this level without delaying at least basic access to the content. By entirely separating metadata and content, we permit multiple transcriptions and metadata (perhaps with conflicting interpretations) to point to the same item’s URI. Rather than providing, for example, a single transcription of an image (inevitably the work of the original project team that reflects a set of scholarly presuppositions and biases) this model allows those with objections to a particular transcription to generate another, competing one. Each metadata set is equally privileged by the technology, allowing users, rather than content-providers, to decide which metadata set is most trustworthy or usable. In my next blog entry I will discuss the next (and final) two layers of this model: interfaces and user-generated data. </article> <article> <h1>TILE directors begin blogging</h1> dreside — Wed, 03 Feb 2010 15:23:56 +0000 Last week, the TILE team held their six month project meeting in Bloomington, Indiana. At this meeting we further refined the scope of the project and have agreed to deliver the following tools by July of 2010: <ul> <li>A extension of the image markup features of the Ajax XML Encoder (AXE). The extension will feature a newly designed, more user-friendly web interface and will permit editors to link regions of any shape to tags selected from a metadata schema supplied by the editor. Additionally, editors will be able to link non-contiguous regions and specify the relationship between the two regions.</li> <li>A automated region recognizing plugin for AXE that can be modified to recognize regions of any type but which will initially be designed to identify all of the text lines in an image of horizontally-oriented text.</li> <li> A jQuery plugin that permits text annotation of an HTML document.</li> </ul> Also, in order to better communicate the work of the project with our partners as well as the larger digital humanities community, we have decided to blog weekly about some important issue relating to the project or text & image linking in particular. This week, I (Doug Reside) will post a series of articles about a new, structural model for multimodal editions. We welcome your feedback. </article> </main></body></html>

TILE » Uncategorized

Seadragon and Djatoka

TILE partners with EMiC

A Simple Page Turner

External review of TILE

Publishing stacks of images and text