La Mort D’Impression? : How Google (and others) Digitize Le Morte D’Arthur

(Apologies if the French translation is off–I don’t speak it and am relying on a machine translation (and I’m sure Julia can tell us why that’s a bad idea!))

Since my interests lie more heavily in the still-copyrighted 20th century, I turned to my other love of Arthurian legends for this task.  Specifically, I looked at the seminal collection of French (and one Middle English) tales written into English as Le Morte D’Arthur by Sir Thomas Malory, which was available in all 4 digital libraries.  I chose to focus on Volume 1 to narrow down the information and compare the resources.

Project Gutenberg offered the second-greatest number of formats (HTML, EPUB, Kindle, Plucker, QiOO Mobile, and Plain Text UTF-8), but for only one edition of the book which is not clearly identified.  It says the editor is William Caxton, who produced an edition in 1485 that has become the basis for most of the editions of the book (the other being the Winchester Manuscript), and contains his Preface, but it also contains a Bibliographic note by A. W. Pollard without identifying him as the editor.  Nor does it contain a publisher or print date beyond the release date of November 2009.  It also lacks any information as to which specific source was the basic for their digitization.  In terms of page layout, the EPUB and Kindle editions specify that there are no images, but whether that has an impact is unclear with out a specified edition.  A big frustration when reading online is the lack of page numbers to correspond with the chapter listings in the table of contents, if not hypertext links from the table of contents to those chapters, making it hard to move through the book unless you know the specific page to jump to.  Although there is no specific place on the book page to report errors, the top of the screen does have an “ad” reading: “Did you know that you can help us produce ebooks by proof-reading just one page a day? Go to: Distributed Proofreaders“.  This suggests that they are crowdsourcing their quality assurance process.  The online reader seems to be restricted to viewing only; however, you can download copies of the books to give you the affordances of the other formats (such as Kindle).

Google Books hosts several editions of Le Morte D’Arthur.  One is the Everyman Library edition, also based on the Caxton text, edited by Ernest Rhys and published by J.M. Dent in 1906.  It was sourced from the University of Michigan and is available as an EPUB and a PDF in addition to online viewing.  This edition includes the rather beautifully illustrated title pages; however, one has to scroll past multiple scans of the University of Michigan title plate, blank pages, and this interesting failure in scanning to find it:

Screen Shot 2013-02-05 at 6.15.10 PM

It also preserves Caxton’s original preface.  Google Books also hosts another version of Caxton’s text published by, which appears to be a scan of a PDF and is just plain text. There’s also an edition by ebook for purchase.  Other than reviews, there does not seem to be a system for reporting errors (otherwise, I’d assume someone would have already have cut out the excess pages).  Google Books allows you to download, search within, and save a copy to “My Library”; however, it does not allow you to annotate the book.

HATHITrust also has the Rhys editions, but scanned by Google from the University of Cornell and University of Virginia in addition to the University of Michigan.  In addition, it has two other 19th century editions: an 1891 Macmillan publication with the Caxton text edited and introduced by Edward Strachey from the Universities of Michigan and Toronto, digitized by Google; and an 1889 Nutt publication in which Caxton’s text is “‘reprinted page for page, line for line’, but in modern type”, edited by Oskar Sommer and introduced by Andrew Lang, from the University of California, digitized by Google.  Each of the editions is only available in PDF format, and for some reason, both Rhys editions are for volume 2, rather than one of each.  Although HATHITrust offers the most viewing options (Classic View, Scroll, Flip, Thumbnails, and Plain Text), the Flip presentation of a book spine and cover are clearly a graphical representation instead of a realistic one.  (I will say that it’s fun to run your cursor over the “pages” and watch the “jump to page __” numbers flip rapidly.  For some reason this strikes me as similar to riffling the pages of a real book.)  Page layouts are preserved, including italics, spacing, and footnotes.  HATHITrust offers a Feedback form if there are any problems with the text, as well as the ability to search, download single pages or the whole document, add the book to a collection (if one has University access to sign in!), or share it with others.  HATHITrust offers a few full text versions, but many were only limited to viewing or to “snippets” of the full text.

The Internet Archive offers the greatest number of formats, with each edition available for download in PDF, EPUB, Kindle, Daisy, Full Text, and DjVu.  It contains the Rhys edition from the University of Michigan as digitized by Google, but also from the University of Toronto and the New York Public Library; the Strachey edition from Stanford Library and the University of California; and the Sommer edition from the Universities of Toronto, Michigan, and Cornell University.  The Internet Archive presents the book as if one were looking at a paper version, with page turns instead of scrolling, in a slightly more realistic way than HATHITrust (and offers the same satisfaction in riffling the pages).  Also, for the Strachey version, it looked as if many of the actual page images were presented instead of just the scanned text; I could clearly see that the bibliographic page in the Stanford book was torn and repaired with tape.  Some pages are badly scanned, with the margins of text cut off or wavy.  However, the marginalia from users has been preserved.

Yet more fingers.

Yet more fingers.

The Internet Archive offers an editable web page on Open Library that seems like the method for users to make changes (such as adding new editions), but I’m not sure if it also acts as an official reporting system for errors.  It allows users to search, bookmark, write reviews, share the book, and have a computer read the text aloud.  Interestingly, when I asked the computer to read aloud, it was forced to spell out “Rhys” rather than pronounce it, but had no trouble pronouncing the words “Igraine” or “pyonce”.  There do not seem to be any restrictions on use, and the site offers “selected metadata” that might be useful for creating databases for further study.

I tested the search features in each library by searching the book for the word “swoon” (since the amount of swooning, primarily among the supposedly noble and heroic knights of the Round Table, surprised me the most when I read the book).  Google Books shows 14 results in the book with hyperlinks to the individual pages and excerpts from the text to show the context of the word.  HATHITrust showed the word on 13 pages for a total of 15 results, also with hypertext linking and excerpts to show context, although the excerpts were shorter than those in Google Books.  Surprisingly, the Internet Archive produced no results; it did manage to find character names when asked, and provided a popup window of context with links to the individual word searched.  The Kindle download from Project Gutenberg found 25 results, displayed in a sidebar which shows the context and the location, which can be clicked on; however, the search term is not highlighted on the page when it is brought up, and so can still take a bit long to find.

One of the biggest challenges in examining Le Morte D’Arthur was that the different editions were labelled inconsistently in the catalogs.  For example, some editions claimed to have Janet Cowen as the editor, and when opened, turned out to be the Strachey edition.  Still others were not clearly labeled as to which volume it was.  Most concerning is the lack of any particular identifying information about the Project Gutenberg text.  Clearly, digital libraries need to establish the same criteria as print libraries for making sure their catalog databases are precise and accurate.

This entry was posted in Exercises by Katie Kaczmarek. Bookmark the permalink.

About Katie Kaczmarek

1st year English Ph.D. student at the University of Maryland. I'm interested in investigating how print authors are changing the way they write to appeal to the generation who reads differently because they have grown up with technology.

14 thoughts on “La Mort D’Impression? : How Google (and others) Digitize Le Morte D’Arthur

  1. So I’m interested in whether or not the Project Gutenberg versions of the text incorporated line numbers in any way to their editions, since a lot of printed versions of the text do. If, say, PG’s HTML text disregards line numbers, this actually makes it less useful for scholarship, since line citations are common.

    Additionally, was there any inconsistency or did the metadata ever allude to the language use in Malory. Since he writes in unstandardized Middle English, I was curious whether or not this is acknowledged in any way by the databases, or if it clarifies whether or not an edition has been modernized.

    • The question of line/page numbering is important — and presumably we’ll all need to shift over to some page-size-independent practice of chapter/section/sentence numbering some day, though such ingrained practices always seem to long outlive their usefulness.

    • Nigel and I spoke in person and determined that line numbers might be the specific provenance of the critical (print) editions that he owns, published much later than the books available in the digital libraries. But Josh has a good point about trying to find a new standard numbering practice. That’s what I liked about the MLA Commons’ numbering of paragraphs (in addition to allowing comments on those specific paragraphs).

  2. I love the images of fingers, I’m so glad you posted them! It reminds me that behind all of the digital images on the web, there are human beings who have to physically make it happen, whether its entering tedious metadata in the catalog record or physically scanning the book.

  3. Hi there just wanted to give you a quick heads up. The text in your content seem to be running off the screen in Opera. I’m not sure if this is a formatting issue or something to do with web browser compatibility but I thought I’d post to let you know. The design look great though! Hope you get the issue solved soon. Cheers

  4. Its like you read my mind! You seem to know so much about this, like you
    wrote the book in it or something. I think that you can
    do with some pics to drive the message home a bit, but other than that,
    this is excellent blog. An excellent read. I will certainly be back.

  5. Hey! This is my 1st comment here so I just wanted to give a quick shout out and say I really enjoy reading
    your posts. Can you recommend any other blogs/websites/forums that cover the same subjects?

    Thanks for your time!

  6. Why buy a tempur mattress when you can have a cool latex mattress for much less.
    Your data may get deleted after use of Hay Day Hack. I think that the most
    romantic thing that anyone has ever done for me was when a particular someone left
    a CD, that was special to both of us, with a single red rose
    on my doorstep.

  7. I simply want to mention I am all new to blogging and absolutely liked you’re website. More than likely I’m planning to bookmark your site . You absolutely have good article content. Thanks a bunch for revealing your blog.

Leave a Reply to Nigel Lepianka Cancel reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>