La Mort D’Impression? : How Google (and others) Digitize Le Morte D’Arthur

(Apologies if the French translation is off–I don’t speak it and am relying on a machine translation (and I’m sure Julia can tell us why that’s a bad idea!))

Since my interests lie more heavily in the still-copyrighted 20th century, I turned to my other love of Arthurian legends for this task.  Specifically, I looked at the seminal collection of French (and one Middle English) tales written into English as Le Morte D’Arthur by Sir Thomas Malory, which was available in all 4 digital libraries.  I chose to focus on Volume 1 to narrow down the information and compare the resources.

Project Gutenberg offered the second-greatest number of formats (HTML, EPUB, Kindle, Plucker, QiOO Mobile, and Plain Text UTF-8), but for only one edition of the book which is not clearly identified.  It says the editor is William Caxton, who produced an edition in 1485 that has become the basis for most of the editions of the book (the other being the Winchester Manuscript), and contains his Preface, but it also contains a Bibliographic note by A. W. Pollard without identifying him as the editor.  Nor does it contain a publisher or print date beyond the release date of November 2009.  It also lacks any information as to which specific source was the basic for their digitization.  In terms of page layout, the EPUB and Kindle editions specify that there are no images, but whether that has an impact is unclear with out a specified edition.  A big frustration when reading online is the lack of page numbers to correspond with the chapter listings in the table of contents, if not hypertext links from the table of contents to those chapters, making it hard to move through the book unless you know the specific page to jump to.  Although there is no specific place on the book page to report errors, the top of the screen does have an “ad” reading: “Did you know that you can help us produce ebooks by proof-reading just one page a day? Go to: Distributed Proofreaders“.  This suggests that they are crowdsourcing their quality assurance process.  The online reader seems to be restricted to viewing only; however, you can download copies of the books to give you the affordances of the other formats (such as Kindle).

Google Books hosts several editions of Le Morte D’Arthur.  One is the Everyman Library edition, also based on the Caxton text, edited by Ernest Rhys and published by J.M. Dent in 1906.  It was sourced from the University of Michigan and is available as an EPUB and a PDF in addition to online viewing.  This edition includes the rather beautifully illustrated title pages; however, one has to scroll past multiple scans of the University of Michigan title plate, blank pages, and this interesting failure in scanning to find it:

Screen Shot 2013-02-05 at 6.15.10 PM

It also preserves Caxton’s original preface.  Google Books also hosts another version of Caxton’s text published by bompacrazy.com, which appears to be a scan of a PDF and is just plain text. There’s also an edition by digireads.com ebook for purchase.  Other than reviews, there does not seem to be a system for reporting errors (otherwise, I’d assume someone would have already have cut out the excess pages).  Google Books allows you to download, search within, and save a copy to “My Library”; however, it does not allow you to annotate the book.

HATHITrust also has the Rhys editions, but scanned by Google from the University of Cornell and University of Virginia in addition to the University of Michigan.  In addition, it has two other 19th century editions: an 1891 Macmillan publication with the Caxton text edited and introduced by Edward Strachey from the Universities of Michigan and Toronto, digitized by Google; and an 1889 Nutt publication in which Caxton’s text is “‘reprinted page for page, line for line’, but in modern type”, edited by Oskar Sommer and introduced by Andrew Lang, from the University of California, digitized by Google.  Each of the editions is only available in PDF format, and for some reason, both Rhys editions are for volume 2, rather than one of each.  Although HATHITrust offers the most viewing options (Classic View, Scroll, Flip, Thumbnails, and Plain Text), the Flip presentation of a book spine and cover are clearly a graphical representation instead of a realistic one.  (I will say that it’s fun to run your cursor over the “pages” and watch the “jump to page __” numbers flip rapidly.  For some reason this strikes me as similar to riffling the pages of a real book.)  Page layouts are preserved, including italics, spacing, and footnotes.  HATHITrust offers a Feedback form if there are any problems with the text, as well as the ability to search, download single pages or the whole document, add the book to a collection (if one has University access to sign in!), or share it with others.  HATHITrust offers a few full text versions, but many were only limited to viewing or to “snippets” of the full text.

The Internet Archive offers the greatest number of formats, with each edition available for download in PDF, EPUB, Kindle, Daisy, Full Text, and DjVu.  It contains the Rhys edition from the University of Michigan as digitized by Google, but also from the University of Toronto and the New York Public Library; the Strachey edition from Stanford Library and the University of California; and the Sommer edition from the Universities of Toronto, Michigan, and Cornell University.  The Internet Archive presents the book as if one were looking at a paper version, with page turns instead of scrolling, in a slightly more realistic way than HATHITrust (and offers the same satisfaction in riffling the pages).  Also, for the Strachey version, it looked as if many of the actual page images were presented instead of just the scanned text; I could clearly see that the bibliographic page in the Stanford book was torn and repaired with tape.  Some pages are badly scanned, with the margins of text cut off or wavy.  However, the marginalia from users has been preserved.

Yet more fingers.

Yet more fingers.

The Internet Archive offers an editable web page on Open Library that seems like the method for users to make changes (such as adding new editions), but I’m not sure if it also acts as an official reporting system for errors.  It allows users to search, bookmark, write reviews, share the book, and have a computer read the text aloud.  Interestingly, when I asked the computer to read aloud, it was forced to spell out “Rhys” rather than pronounce it, but had no trouble pronouncing the words “Igraine” or “pyonce”.  There do not seem to be any restrictions on use, and the site offers “selected metadata” that might be useful for creating databases for further study.

I tested the search features in each library by searching the book for the word “swoon” (since the amount of swooning, primarily among the supposedly noble and heroic knights of the Round Table, surprised me the most when I read the book).  Google Books shows 14 results in the book with hyperlinks to the individual pages and excerpts from the text to show the context of the word.  HATHITrust showed the word on 13 pages for a total of 15 results, also with hypertext linking and excerpts to show context, although the excerpts were shorter than those in Google Books.  Surprisingly, the Internet Archive produced no results; it did manage to find character names when asked, and provided a popup window of context with links to the individual word searched.  The Kindle download from Project Gutenberg found 25 results, displayed in a sidebar which shows the context and the location, which can be clicked on; however, the search term is not highlighted on the page when it is brought up, and so can still take a bit long to find.

One of the biggest challenges in examining Le Morte D’Arthur was that the different editions were labelled inconsistently in the catalogs.  For example, some editions claimed to have Janet Cowen as the editor, and when opened, turned out to be the Strachey edition.  Still others were not clearly labeled as to which volume it was.  Most concerning is the lack of any particular identifying information about the Project Gutenberg text.  Clearly, digital libraries need to establish the same criteria as print libraries for making sure their catalog databases are precise and accurate.

Moby-Dick: The Whiteness of the Page

My book of choice for any bibliographic project will usually be Moby-Dick. Katie and Susie can both attest to this after having to sit through a semester of me geeking out over the textual history of the novel. Of course, by posting later than some of the others, I can only echo what they have said: Project Gutenberg provides the most formats for a given text, including an audio option, which neither HATHITrust nor Google Books gives you (as they only allowed for pdf downloads, and with HATHITrust permission was required, and Google payment), and it was the certainly the easiest to download, because it came with virtually no strings attached. But while I have traditionally always turned to it first for my canonical etext needs, I found it the least transparent of the three versions of Moby-Dick I collected.

For those unfamiliar with Melville scholarship in general one name pretty much reigns as the foremost editor of Melville’s novels, especially Moby-Dick: Hershel Parker. He has edited since the 60s three ‘authoritative’ versions of MD that have formed the foundation of most of Melville scholarship and editing practices since. As someone heavily invested in Melville, Parker’s imprint is typical in any edition I come across, and the lack of it is suspicious. It is not a bad thing, of course, but it raises questions. Project Gutenberg does not note an editor or recognize their copy-text in either of the two full-text editions of MD, but instead does include the note:

Produced by Daniel Lazarus, Jonesey, and David Widger

I do not recognize any of the names personally, and these people are not specifically named as editors, so it is difficult to determine what sort of mark they may have left on the text, and without providing information about the copy-text, the text’s specific origins are unknowable to an outsider. Of course, Project Gutenberg provides a (somewhat reasonable) defense for this:

Creating the works from public domain print editions means that no one owns a United States copyright in these works, so the Foundation (and you!) can copy and distribute it in the United States without permission and without paying copyright royalties.

This is what made Project Gutenberg’s text of MD so easy to acquire, versus HATHITrust and Google, who expressed copyright claims to their digital versions and locked the downloads behind certain obstacles, and while I can appreciate the reverence paid to access, the unclear provenance of the text, other than its recognition as a “public domain text” does not point me to the copy-text being reliable. This perhaps is fine for a general reader, but unsettling for a scholar.

On the other hand, HATHITrust and Google Books both provide some more concrete information because the book is viewed through images of a scanned hard copy. What is unfortunate is that the two public domain editions available on each platform were also very dated. HATHITrust’s edition of MD is from a 1929 Macmillan edition (which is about the time Melville was rediscovered but well before academics began critically editing his work) and Google Books full text edition is from the 1851- the year the book was published. Google’s edition wins, for me at least, because the 1851 edition at least is more reputable than whatever edition served as the copy-text of Project Gutenberg’s edition, and it stands to reason may have served as the copy-text for HATHITrust’s version. Easily accessing the first edition of the book leaves little questions to scholars as to what they are working with, and can actually be very useful not only as a text itself, but as an artifact of the novel’s original form (before critical editing).

Of course, I can’t spend all my time musing on editions and validity. The formatting of the texts is also interesting for one major reason: in the Gutenberg edition, since it does not mimic the page scrolling format Google Books and HATHITrust adhere to, we find awkward moments in the text where the body of the text is interrupted by Melville’s footnotes (which he typically wrote in to clarify any esoteric nautical information). In the page scans from the other two databases, this does not occur, because they reproduce the pages and so the text remains in a more traditional form (with footnotes at the bottom, clearly demarcated as outside of the body).

In response to the Duguid article, where one of the primary critiques of Google Books is the poor scanning of pages and distorted words, Google’s edition of MD looks to be pretty polished. In my sampling of the scanned pages, I did not find cut edges, distortions at the spine, or anything of that sort. That problem, however, was prevalent in the HATHITrust version, where the illustrations of the cover page were cut off near the spine, and some marginalia went over the edge of page (someone made a note on the Table of Contents that spanned the margin between Chapters XIII and XVIII that I think might have said ‘BORING!’ , but I cannot be sure).

Finally, in terms of feedback, HATHITrust made the process the easiest by providing, on the same page as the book was read on, a little button that opened a survey asking about the quality of the book, where any errors could be reported including missing, distorted, curved, and blurry text. Google unfortunately, only allowed users to review the book, which could be more concerned with plot and enjoyment, instead of textual quality. Project Gutenberg did not provide any easily accessed method of evaluation, but does include links on the home page to get in contact with them, and to submit missing pages for texts (which I suppose counts as one form of correction).

I was surprised, especially after reading Duguid, of what I found in Google Books. Their images of the Moby-Dick text looked more professional and refined than the HATHITrust edition, was an 1851 first edition, and posed no issues in the formatting of the text. The same could not be said of the HATHITrust and Project Gutenberg versions, whose scans were less sophisticated, contained marginalia (incomplete and cutoff at that) or posed formatting issues by presenting a text with footnotes incorporated into the body without separating them in any way. As I said, the Duguid article made me fearful of what I would find on Google, and their issues with Tristram Shandy are of course valid concerns, but perhaps it’s possible Google has learned or has improved their process since that article was published in 2007, since while Google Books’ major downside was the lack of a reporting feature, of the three editions I have looked at, it was surprisingly the one that needed it the least.

The Marble Faun

For this week’s exercise, I chose Nathaniel Hawthorne’s The Marble Faun, because I’m currently reading the text in book form and thought it would be interesting to compare the digital versions alongside my current “textual” reading experience. The text was readily available in multiple formats on Project Gutenburg, HATHITrust and Google Books. As Cliffie noted, Project Gutenburg offers the most versions available for download, though HATHITrust also offers versions for PDF download with a “partner login.” I explored this option since I figured the university would be affiliated, and I was correct. After logging in with UMD, I was able to download a full PDF of the text. Google, too, offers PDFs of certain texts for download, as well as ebooks (free or at cost) through Google Play.

Because several different versions of the text were available through each platform, various sources were available. The Project Gutenburg eBook did not specify which copy-text it reproduced, but rather cited its own 2006 release date and noted its being “Produced by” Michael Pullen and David Widger, who I would presume are the text’s editors. The Google Book I chose was a Penguin Classics version, which clearly (because the pages of the original text were reproduced and therefore reflected typical publication details) stated its copyright, editors, publishers, etc. The Penguin Classics version is that of the Centenary Edition of the Works of Nathaniel Hawthorne, associated with the Ohio State University Press. The “Two Volumes in One” edition of The Marble Faun I eventually settled on from HATHITrust (there were 3 pages of options) was an “Illustrated Library Edition” published in 1876 by James R. Osgood and Company; the digitized version was provided by Google Books and the original came from the University of Virginia (both institutions were cited on each page with a digital watermark). Out of curiosity, I checked my Oxford World Classics version, which, like the Penguin Classics, comes from the Centenary Edition of NH’s works and is reproduced with the permission of Ohio State University Press.

The PG eBook has little to no formatting in terms of “design,” but pages must be clicked through. The “click-through” versus “scroll” layout is interesting, since it is perhaps closer to the feeling of turning a page. Some “pages” are longer than others, but I couldn’t seem to pinpoint why—chapter divisions didn’t dictate this, since not all started on a new “page” but were rather just denoted with a title and break. Paragraphs, however, were never broken up, and neither were sentences. This, I should think, does aid in a continuity of reading. The Google Books Penguin Classics edition replicates the textual layout very accurately, though I’ve just noticed it’s not a full preview. I’ve switched over to a Houghton & Mifflin version from 1900, which, in terms of format, is more interesting anyway. Though there are clearly scanning issues (crooked pages, etc.) illustrations are reproduced, as are original (though original with whom, who knows) underlines and marginalia. This text comes from the University of Wisconsin, and has clearly been read—and annotated—before. The HATHITrust Marble Faun didn’t seem to have many formatting issues, though this version was the slowest to load. The pages were more “centered” than the Google Books version (better scanning/uploading?), but the text was denser (inky, almost) and slightly harder to read.

In terms of the viewing setup, I liked the HATHITrust options for “Classic,” “Scroll,” “Flip,” “Thumbnail,” and “Plain Text” views. “Flip” is almost comical in its cartoonish reproduction of a book (though the pages then become so small that you wouldn’t be able to read the text, while “Plain Text” is more like PG’s formatting. “Classic” and “scroll” are the easiest for reading, though I did use “Thumbnail” view to check out all of the prefatory pages at once.

As far as I could tell, none of these platforms allowed for a reporting of errors. The closest option is that Google Books allows you to “review” the text, so I suppose one could also report frustrations with errors, etc., if only for other potential readers. I’ve already mentioned some features I like—HATHITrust viewing options—but each platform has several functional perks. I don’t have a Kindle, but PG’s Kindle downloads are clearly a useful resource, since Kindles allow you to keep the text in your own collection (on a single device) and annotate as you please (depending on the version of your Kindle). If reading the eBook version of a PG text online, you can keep “bookmarks,” but I wasn’t quite sure how this worked—if you could bookmark pages within a text, or only text themselves. When I clicked “My Bookmarks,” PG remembered which texts I was reading (Volumes I and II of The Marble Faun) but it didn’t seem to notice which page I was on. PG allows one to “Go To” a certain page, but there aren’t any search features for finding certain words or phrases within the text. Google Books and HATHITrust offer many more search options. With GoogleBooks, there is a simple search bar, for finding words or phrases (which than appear highlighted in yellow and noted in the scrolling bar). Google Books converts chapters into hyperlinks on the contents page, so that you can jump to various chapters and sections. You can also access these jumps via a drop-down bar above the text. With a Google account, you can add books to your library and view your history, you can make lists, such as “Favorites,” “To Read,” “Reading Now” and “Have Read,” and like I mentioned before, you can write reviews. Many of these features are replicated with HATHITrust, and there’s also a “Share” feature in the left-hand column. I would imagine it’s easy enough to copy the link to a PG or Google Book, but I thought it was interesting that HATHITrust supplies a “Permanent link” for each of its texts, in clear view for the reader.

Aside from the Preview restrictions I experienced with the Penguin Classics version I originally viewed with Google Books, I didn’t experience any restrictions. It’s nice working in the 19th century, because so many things are part of the Public Domain (my HATHITrust version of The Marble Faun noted this, with a link to explain the details of the Public Domain) and available through (very) open access. I particularly enjoyed PG’s note to readers, “This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever.”

I agree with Kathryn’s concluding comments below–online reading seems to have a long way to go. Right now, I think these sources are excellent for those readers who wish to read digitally, but not necessarily academically. This is a bit of a personal preference, but I’ve used online resources far more often for critical texts I want to preview and search for themes and terms, rather than for full literary texts I wish to read from start to finish. I remember once when I was abroad reading an entire collection of George Moore’s short stories on PG, but that was because I didn’t want to purchase more books than I could take home with me (which does point to the financial and material benefit of these online resources). But in terms of my anecdotal introduction, I will definitely finish The Marble Faun with my text edition from Oxford World Classics, which I can carry with me (I don’t have an ereader), annotate and keep on a physical bookshelf.

THE HOUSE OF M1KTH: Digital Wharton

I decided to base my digital bibliography exercise on Edith Wharton’s The House of Mirth. Of the three databases I chose for my exercise (Google Books, HATHITrust, and Project Gutenberg), I’m most familiar with Google Books, so I decided to go there first. I entered in my search terms and got two actual results (i.e. Wharton’s text, and not texts about Wharton’s text). The first one listed was the full text of Wharton’s The House of Mirth (with illustrations by A. B. Wenzell), published by Charles Scribner’s Sons in 1905. Google offered two versions of this edition of Mirth for download, EPUB and PDF. The second search result was a 2007 Digireads.com version that cost $2.99 to download. While the Google Books PDF was free and a fast download, I was pretty annoyed to discover that I couldn’t search the text – I tried on my work computer (which uses Windows) with Adobe Reader and my MacBook with both Preview and Adobe Reader.

Although the online Google version was searchable, since there were no ways to highlight or annotate, it didn’t seem very useful beyond yanking quotes out of the depths of the novel for use in other projects (which is actually how I tend to use Google Books). Indeed, Google even seems somewhat prepared for this – their primary source of textual manipulation (when viewing the book on my Mac – this feature disappeared on my work computer) is the ability to ‘clip’ a line into plain text format, a link to an image of the selected text, or a link to embed the text. While it might be neat to generate a digital image of the text, it actually limits the user to ‘clipping’ in rectangular forms only, meaning you can’t carry over onto the next line unless you want additional words from surrounding sentences caught in the rectangular clipping field. I’m not sure what the point of this clipping is – I really don’t think I’ve ever seen someone use it (or so rarely that I can’t recall). Google also allows you to generate a link for the specific page of text that you are currently reading, almost as a digital bookmark for later citations. There didn’t seem to be any ways to report errors for Google beyond writing a review for the text, but that leaves me questioning: what is a book review supposed to review? The actual content of the novel penned by Wharton? Or the scanning quality of the book? I’ve seen this happen on Amazon for Kindle versions a few times – people give a book low reviews based on the amount of grammatical and/or digital formatting errors, which confuses/frustrates those who are interested in the quality of the story.

Next up was HATHITrust, which I’ve encountered briefly before. I got a little lost the last time I was searching around for quick text downloads (actually, for Woodchipper, a data-mining tool we used in Technoromanticism), which turned me off to the site initially. However, when I searched for Wharton’s text on HATHI, I got four full-text hits for four different editions of Mirth: C. Schribner’s Sons (1905), C. Scribner’s Sons (1922), C. Scribner’s Sons (1933), and First Scribner/Macmillan Hudson River Edition (1989). When I clicked on the 1905 edition, I discovered that it was the same digital text that I encountered on Google Books (except for a badly digitized front cover scan). It even had the same pink thumbtip of a careless scanner in the bottom corner of a page! However, HATHITrust includes a watermark next to the “Digitized by Google” that reads “Original from UNIVERSITY OF CALIFORNIA.” I re-checked the Google Books version, and there is no such notation made for the edition’s provenance, which is odd, since it appears to be the same exact book and scans. HATHI attributed all of the universities that held the physical copies of Mirth contained in their database (two from UC, one from University of Virginia, and one from University of Michigan). It also revealed that all four digital texts were “Digitized by Google.” So… why weren’t they all available on Google Books?

Also, since the one version I was most interested in obtaining in PDF form (the 1905 one) was also offered on Google Books, I found it a bit silly that I had to log-in via UMD partnership in order to download it. It was a long process of “Building” the PDF, then downloading it, all to obtain pretty much the same text as Google. I was able to search the HATHI PDF on my work computer using Adobe Reader in a hit-or-miss fashion (I was sent to the correct page with a box appearing roughly around the portion of text that contained my search term), but I was unable to search it at home using my MacBook with either Adobe Reader or Preview. In HATHI’s site version I thought it was interesting that I could toggle between views (Classic and Plain Text), which might have made searching easier (otherwise the site just directs you to the right page with no highlights or line indicators), but the very first time I tried toggling over to Plain Text, I caught a number of typos on the page I happened to have open, the most glaring being the running head, which read: THE HOUSE OF M1KTH. HATHI does have a Feedback link at the bottom of the page that allows for error reporting, though I’m not sure I would have the will to submit a new one for each Plain Text page.

Like Clifford, I found Project Gutenberg to offer the most variety in file formats, and like her, found the image-lacking disclaimer pointless, as the HTML and plain text versions did not contain images either. Project Gutenberg offered HTML, EPUB (no images), Kindle (no images), Plucker, QiOO Mobile, Plain Text UTF-8, and MP3 files of The House of Mirth; for my purposes I converted the HTML version to a PDF file, one which (finally!) is fully searchable. Unlike either Google Books or HATHI, there seems to be no printed referent for Project Gutenberg’s text. The only noted provenance is a release date of the digital text (June 1, 1995) and a few notes at the end of the text:

Notes:
1. I have modernized this text by modernizing the contractions: do n’t becomes don’t, etc.
2. I have retained the British spelling of words like favour and colour.
3. I found and corrected one instance of the name “Gertie,” which I changed to “Gerty” to be consistent with rest of the book.
-Linda Ruoff

There is also a notice at the end of the text that “Updated editions will replace the previous one–the old editions will be renamed.” It almost seems as if Project Gutenberg is leaving little to no room for discussion on authoritative editions, variants, and the like (though you are free to email them with errors you may discover). There also appears to be no interest in preserving a digital transmission history of their edition of House of Mirth, as any discrepancies will be obliterated with no discernible trace (unless you leave a note, as Linda Ruoff did).

All in all, in order to accomplish the two things I want most in a digital text (searchability – a digital affordance, and writeability – a print affordance), I had to save a PDF file from an HTML version of The House of Mirth – one that had no perceivable basis in print. Project Gutenberg’s version is pure text, no book, which leaves me wondering: how would I cite these quotes that I am able to find at a moment’s notice? Would I have to turn around and utilize Google Books’ scans to pin specific quotes to page numbers? Makes one wonder, are Post-It Flags really so terrible?

Exploring _The Castle of Otranto_

The book that I have chosen to investigate on Project Gutenberg, Google Books, HATHITrust, and the Internet Archive is Horace Walpole’s The Castle of Otranto (1764). Given that the author of the text alleged to be a translator by the name of William Marshall who had recovered the text (said to have been originally printed in 1529) from obscurity in an old library in England and reprinted it for public dissemination, I thought this made The Castle of Otranto an interesting choice (my love of early Gothic literature aside). For as we all know, one important role that digital archivists play involves the rescuing of obscure texts, which are then scanned to the web for public consumption. In terms of availability, all four of the digital archives mentioned above have copies of The Castle of Otranto. The text is available in HTML, EPUB (with images), EPUB (no images), Kindle (with images), Kindle (no images), Plucker, QiOO Mobile, PDF, and Plain Text UTF-8. In terms of editions and provenances, they tend to vary. In the Internet Archive, you can find a version of the novel that is the third edition and that comes from the Bodleian Library at Oxford with a date stamp of 27 Oct 1930. There is also an edition from the University of Toronto library. On Google Books, there are versions from the Stanford University Library, the Library of the University of Michigan, and the same third edition scan from the Bodleian Library that can be found at the Internet Archive. In the HATHITrust Digital Library, one can find the University of Michigan version, as well as versions from the University of California (published in 1823), Princeton University (1811), and Indiana University (1854). The version available on Project Gutenberg appears to be the 1901 version taken from the Library of the University of Michigan. There definitely seems to be a lot of overlap between these digital archives, though from my examinations of the sites, it appears that HATHITrust has the best range of copies since they date back to 1811.

The first result you get when you search for The Castle of Otranto on Google Books is also perhaps the worse copy available. After you get the cover, you have to scroll down through several scans of a woman’s hand to get to the actual title page. Even then, there are still occasional fingers or dark ink splotches that cover up parts of the text. If someone actually wanted to read this version, it would be possible, as long as you could fill in the blanks caused by the more damaged scans. Ink splotches happen on several other versions, and sometimes the text cuts off the sides in some copies. Each of the versions seems to have little quirks like dirty pages or ink splotches or text that is blocked by mysterious rectangle-shaped objects. However, overall, like I said, the text tends to still be readable for the most part. I wouldn’t say these are the best scans ever, but given the amount of texts being scanned and the fact that we are in the midst of the transition to digital archives, rather than approaching the final stages of completion, I would say that the texts serve their purpose at a very basic level. The ability to perform searches within the text is a feature that has definitely been helpful for me as an academic. Reading The Mysteries of Udolpho by Ann Radcliffe and then trying to go back and find a quote that I didn’t highlight because I did not think it was useful at the time is not a fun task. Digital Libraries like Google Books, HATHITrust, and the Internet Archive that allow you to not only find words quickly, but also see their context before you go to the actual page the word is on, is definitely a blessing for the toiling scholar.

One of the things that I found most interesting about the Internet Archive is the ability to read the actual book online. The archive is set up to present the book in such a way that makes you feel as if you are actually reading the book itself, rather than just scrolling down a screen. It keeps several of the affordances of the book, such as the comparative space, and gives you the illusion of a three-dimensional object as you “flip” through the pages. This is nice for a reader wanting the experience of the actual text and the comparative space is definitely a plus, but such a skeuomorphic design does little to utilize the affordances of the digital archive. Several of the other versions allow you to click through the pages, but most often this still gives you one page at a time, and as with Google Books, there is still some scrolling involved to see the full text. Of course, the option to download on each of the Digital Libraries lets you make the page bigger or smaller as you like so you can use the page up and page down keys.

As I just stated, each of these sites allows you to download the text. However, if you prefer to stay digital, Google Books lets you compile a “library” of books and HATHITrust lets you create a “Collection” of books. In terms of making these texts writable as well as readable, I did not find any options to annotate any of the versions of my text. Additionally, only authorized users seem to be able to add texts to the digital libraries, making this an exclusive project that is available for consumption by readers, but not open for reciprocity. Along those lines, I did see a link to provide feedback on HATHITrust and report any errors or trouble with the text. As for Google Books and the Internet Archive, I did not see any link for feedback, but there are links set up where readers can write reviews of the text. I imagine these reviews could both be for the book itself and the quality of the scans. However, I do not know if the people who are able to make changes to the texts will actually be reading those reviews. I did not find any way of providing feedback on Project Gutenberg.

The advent of Digital Libraries is a wonderful thing. However, from what I saw of the somewhat obscured scans, the inability to “write” on the texts, and the limited capability for providing feedback that will go directly to the people in charge of the scanning process, there is still much work to be done. As I stated above, I see us in the middle of a transition to Digital Libraries and engaged in work that is nowhere near completion. As time progresses, I hope to see more innovative archives that better utilize the affordances of the web to make texts that are writable/readable and that allow us to research and analyze texts in new and innovative ways that could not be done away from a computer.

Yes, there’s an award for that…

You can now cast your vote for the best digital projects and contributions to the field of DH in 2012.  Voting is open to anyone.  To learn more about these new awards, see the slate of nominees in various categories, and ultimately cast your vote, go to: http://dhawards.org/dhawards2012/voting/

But the ballot is good for more than just voting, it seems to me that it could also serve as a nice introduction to current work in the field.  The slate of nominees was distilled from public submissions by a nominating committee, and includes MITH’s own Amanda Visconti as well as the Bamboo DiRT project.

The voting is open to anyone, and it will be interesting to see how the awards play out, given that there is no way to enforce that voters actually look at all the nominations (ah, democracy…).  The question of this being just a popularity contest is confronted in the Awards FAQ (http://dhawards.org/faqs/):

Doesn’t that just turn it into a popularity contest? In some ways, yes, it does. The other alternative would be to have the winners decided by a shadowy oligarchy. DH Awards was set up intentionally as a community-nominated and community-voted form of recognition. If we start controlling who has the right to vote it undermines this.

This is, I think, a conundrum worthy of some further discussion. Are there really only these two choices (= popularity contest or shadowy oligarchy)?  What are awards determined by this procedure likely to reward?  Is there a better way to choose projects for recognition?  What additional importance does this selection procedure lend to the social aspects of DH?