War of the EBooks

I tried to be a scifi nerd and use Neuromancer for this exercise, but I had to settle for War of the Worlds. Doesn’t make for the best catchy blog post title but, what are you gonna do.

Project Gutenberg offers H.G. Wells’ The War of the Worlds in HTML, EPUB, Kindle, Plucker, QiOO Mobile, Plain Text UTF-8, and several kinds of zip files. It can also be read online as an EBook, although it is immensely frustrating to read that way as it is formatted into chunky paragraphs requiring links to the previous or following pages. According to Project Gutenberg it is EBook #36, released in 1992 and updated in 2008. The site allows the user to create bookmarks on the “pages”. Unlike the other sites, it notes that the user “can help us produce ebooks by proof-reading just one page a day” (http://www.gutenberg.org/catalog/world/readfile?fk_files=1697601&pageno=2).

HATHITrust offers downloadable PDFs of single pages without a log-in and a full downloadable PDF for members, as well as an online view of the Bernhard Tauchnitz Leipzig edition. HATHITrust offers two dates: “1898 [i.e. 1929?]“. The online version is originally from the University of Virginia, digitized by Google Books. It allows you to search the book or jump to different sections, to render it in plain text, to share a link to the book or to a single page, to view the book in “Flip” or “Scroll” mode or with thumbnails of the pages, and create new collections of books with a member log-in. The site notes that the book is public domain in the United States, although, “Google requests that the images and OCR not be re-hosted, redistributed, or used commercially” (http://www.hathitrust.org/access_use#pd-us-google).

Google Books offers EPUB and PDF downloads with both “Flowing Text” and “Scanned Pages.” It can be read in plain text and the user can “Advance Search” the book for specific phrases. Google offered the widest variety of editions, from a limited view of a 2012 edition to a full view of a 1898 illustrated edition published by Harper & Brothers in New York. The latter came from the Pennsylvania State University Library, and has the entirety of the table of contents in hyperlinks, which was the first instance of this I noticed in browsing several editions and which makes navigation quite easy. Unfortunately the book does not offer any information about the illustrator, but it contains a frontispiece of HG Wells and a number of beautifully drawn and rendered bluish black and white images that scanned crisply.

The frontispiece from The War of the Worlds. Unfortunately I could not find an information about the illustrator.

The frontispiece from The War of the Worlds. Unfortunately I could not find any information about the illustrator.

At the end of this copy is a library binders’ mark from August 3, 1967, in Philipsburg, Pennsylvania. Also contained at the end of the book was the mostly blank “Date Due” card, containing crossed out dates from 1993. Lastly, and most fun for me, there are no less than 5 scanned images of the book’s maroon back cover and bar code, two of which have the archivist’s bright pink latex glove in the corner and two of which were captured when the book was in the process of being opened and flipped over, with a black and white checkered pattern on the edge from what I am assuming is the inside cover of the book.

The back cover of HG Well's The War of the Worlds, as seen in Google Books.

The back cover of HG Well’s The War of the Worlds, as seen in Google Books.

A pink Martian's...errrr, archivist's thumb on the back cover of War of the Worlds.

A pink Martian’s…errrr, archivist’s thumb on the back cover of War of the Worlds.

Google allows the user to search the book and write a review, and offers perhaps the most flexible interface with multiple page views of the book, the ability to “cut” or highlight sections of pages, and a zoom tool. The site restrictions and terms of service state that this “copy and paste” function needs to be “used within the prescribed limits and only for personal non-commercial purposes” (http://books.google.com/intl/en/googlebooks/tos.html). Google watermarks also may not be removed from the digital content.

I found Google Books to be the most versatile interface for viewing and downloading this book. While the Kindle edition I downloaded from Project Gutenberg was readable and there didn’t seem to be huge issues with it in terms of formatting, I found myself annoyed by the fact that new chapters don’t start on new pages. On all of these sites, it was hard to find information about access to these books for people with disabilities.

La Mort D’Impression? : How Google (and others) Digitize Le Morte D’Arthur

(Apologies if the French translation is off–I don’t speak it and am relying on a machine translation (and I’m sure Julia can tell us why that’s a bad idea!))

Since my interests lie more heavily in the still-copyrighted 20th century, I turned to my other love of Arthurian legends for this task.  Specifically, I looked at the seminal collection of French (and one Middle English) tales written into English as Le Morte D’Arthur by Sir Thomas Malory, which was available in all 4 digital libraries.  I chose to focus on Volume 1 to narrow down the information and compare the resources.

Project Gutenberg offered the second-greatest number of formats (HTML, EPUB, Kindle, Plucker, QiOO Mobile, and Plain Text UTF-8), but for only one edition of the book which is not clearly identified.  It says the editor is William Caxton, who produced an edition in 1485 that has become the basis for most of the editions of the book (the other being the Winchester Manuscript), and contains his Preface, but it also contains a Bibliographic note by A. W. Pollard without identifying him as the editor.  Nor does it contain a publisher or print date beyond the release date of November 2009.  It also lacks any information as to which specific source was the basic for their digitization.  In terms of page layout, the EPUB and Kindle editions specify that there are no images, but whether that has an impact is unclear with out a specified edition.  A big frustration when reading online is the lack of page numbers to correspond with the chapter listings in the table of contents, if not hypertext links from the table of contents to those chapters, making it hard to move through the book unless you know the specific page to jump to.  Although there is no specific place on the book page to report errors, the top of the screen does have an “ad” reading: “Did you know that you can help us produce ebooks by proof-reading just one page a day? Go to: Distributed Proofreaders“.  This suggests that they are crowdsourcing their quality assurance process.  The online reader seems to be restricted to viewing only; however, you can download copies of the books to give you the affordances of the other formats (such as Kindle).

Google Books hosts several editions of Le Morte D’Arthur.  One is the Everyman Library edition, also based on the Caxton text, edited by Ernest Rhys and published by J.M. Dent in 1906.  It was sourced from the University of Michigan and is available as an EPUB and a PDF in addition to online viewing.  This edition includes the rather beautifully illustrated title pages; however, one has to scroll past multiple scans of the University of Michigan title plate, blank pages, and this interesting failure in scanning to find it:

Screen Shot 2013-02-05 at 6.15.10 PM

It also preserves Caxton’s original preface.  Google Books also hosts another version of Caxton’s text published by bompacrazy.com, which appears to be a scan of a PDF and is just plain text. There’s also an edition by digireads.com ebook for purchase.  Other than reviews, there does not seem to be a system for reporting errors (otherwise, I’d assume someone would have already have cut out the excess pages).  Google Books allows you to download, search within, and save a copy to “My Library”; however, it does not allow you to annotate the book.

HATHITrust also has the Rhys editions, but scanned by Google from the University of Cornell and University of Virginia in addition to the University of Michigan.  In addition, it has two other 19th century editions: an 1891 Macmillan publication with the Caxton text edited and introduced by Edward Strachey from the Universities of Michigan and Toronto, digitized by Google; and an 1889 Nutt publication in which Caxton’s text is “‘reprinted page for page, line for line’, but in modern type”, edited by Oskar Sommer and introduced by Andrew Lang, from the University of California, digitized by Google.  Each of the editions is only available in PDF format, and for some reason, both Rhys editions are for volume 2, rather than one of each.  Although HATHITrust offers the most viewing options (Classic View, Scroll, Flip, Thumbnails, and Plain Text), the Flip presentation of a book spine and cover are clearly a graphical representation instead of a realistic one.  (I will say that it’s fun to run your cursor over the “pages” and watch the “jump to page __” numbers flip rapidly.  For some reason this strikes me as similar to riffling the pages of a real book.)  Page layouts are preserved, including italics, spacing, and footnotes.  HATHITrust offers a Feedback form if there are any problems with the text, as well as the ability to search, download single pages or the whole document, add the book to a collection (if one has University access to sign in!), or share it with others.  HATHITrust offers a few full text versions, but many were only limited to viewing or to “snippets” of the full text.

The Internet Archive offers the greatest number of formats, with each edition available for download in PDF, EPUB, Kindle, Daisy, Full Text, and DjVu.  It contains the Rhys edition from the University of Michigan as digitized by Google, but also from the University of Toronto and the New York Public Library; the Strachey edition from Stanford Library and the University of California; and the Sommer edition from the Universities of Toronto, Michigan, and Cornell University.  The Internet Archive presents the book as if one were looking at a paper version, with page turns instead of scrolling, in a slightly more realistic way than HATHITrust (and offers the same satisfaction in riffling the pages).  Also, for the Strachey version, it looked as if many of the actual page images were presented instead of just the scanned text; I could clearly see that the bibliographic page in the Stanford book was torn and repaired with tape.  Some pages are badly scanned, with the margins of text cut off or wavy.  However, the marginalia from users has been preserved.

Yet more fingers.

Yet more fingers.

The Internet Archive offers an editable web page on Open Library that seems like the method for users to make changes (such as adding new editions), but I’m not sure if it also acts as an official reporting system for errors.  It allows users to search, bookmark, write reviews, share the book, and have a computer read the text aloud.  Interestingly, when I asked the computer to read aloud, it was forced to spell out “Rhys” rather than pronounce it, but had no trouble pronouncing the words “Igraine” or “pyonce”.  There do not seem to be any restrictions on use, and the site offers “selected metadata” that might be useful for creating databases for further study.

I tested the search features in each library by searching the book for the word “swoon” (since the amount of swooning, primarily among the supposedly noble and heroic knights of the Round Table, surprised me the most when I read the book).  Google Books shows 14 results in the book with hyperlinks to the individual pages and excerpts from the text to show the context of the word.  HATHITrust showed the word on 13 pages for a total of 15 results, also with hypertext linking and excerpts to show context, although the excerpts were shorter than those in Google Books.  Surprisingly, the Internet Archive produced no results; it did manage to find character names when asked, and provided a popup window of context with links to the individual word searched.  The Kindle download from Project Gutenberg found 25 results, displayed in a sidebar which shows the context and the location, which can be clicked on; however, the search term is not highlighted on the page when it is brought up, and so can still take a bit long to find.

One of the biggest challenges in examining Le Morte D’Arthur was that the different editions were labelled inconsistently in the catalogs.  For example, some editions claimed to have Janet Cowen as the editor, and when opened, turned out to be the Strachey edition.  Still others were not clearly labeled as to which volume it was.  Most concerning is the lack of any particular identifying information about the Project Gutenberg text.  Clearly, digital libraries need to establish the same criteria as print libraries for making sure their catalog databases are precise and accurate.

The Marble Faun

For this week’s exercise, I chose Nathaniel Hawthorne’s The Marble Faun, because I’m currently reading the text in book form and thought it would be interesting to compare the digital versions alongside my current “textual” reading experience. The text was readily available in multiple formats on Project Gutenburg, HATHITrust and Google Books. As Cliffie noted, Project Gutenburg offers the most versions available for download, though HATHITrust also offers versions for PDF download with a “partner login.” I explored this option since I figured the university would be affiliated, and I was correct. After logging in with UMD, I was able to download a full PDF of the text. Google, too, offers PDFs of certain texts for download, as well as ebooks (free or at cost) through Google Play.

Because several different versions of the text were available through each platform, various sources were available. The Project Gutenburg eBook did not specify which copy-text it reproduced, but rather cited its own 2006 release date and noted its being “Produced by” Michael Pullen and David Widger, who I would presume are the text’s editors. The Google Book I chose was a Penguin Classics version, which clearly (because the pages of the original text were reproduced and therefore reflected typical publication details) stated its copyright, editors, publishers, etc. The Penguin Classics version is that of the Centenary Edition of the Works of Nathaniel Hawthorne, associated with the Ohio State University Press. The “Two Volumes in One” edition of The Marble Faun I eventually settled on from HATHITrust (there were 3 pages of options) was an “Illustrated Library Edition” published in 1876 by James R. Osgood and Company; the digitized version was provided by Google Books and the original came from the University of Virginia (both institutions were cited on each page with a digital watermark). Out of curiosity, I checked my Oxford World Classics version, which, like the Penguin Classics, comes from the Centenary Edition of NH’s works and is reproduced with the permission of Ohio State University Press.

The PG eBook has little to no formatting in terms of “design,” but pages must be clicked through. The “click-through” versus “scroll” layout is interesting, since it is perhaps closer to the feeling of turning a page. Some “pages” are longer than others, but I couldn’t seem to pinpoint why—chapter divisions didn’t dictate this, since not all started on a new “page” but were rather just denoted with a title and break. Paragraphs, however, were never broken up, and neither were sentences. This, I should think, does aid in a continuity of reading. The Google Books Penguin Classics edition replicates the textual layout very accurately, though I’ve just noticed it’s not a full preview. I’ve switched over to a Houghton & Mifflin version from 1900, which, in terms of format, is more interesting anyway. Though there are clearly scanning issues (crooked pages, etc.) illustrations are reproduced, as are original (though original with whom, who knows) underlines and marginalia. This text comes from the University of Wisconsin, and has clearly been read—and annotated—before. The HATHITrust Marble Faun didn’t seem to have many formatting issues, though this version was the slowest to load. The pages were more “centered” than the Google Books version (better scanning/uploading?), but the text was denser (inky, almost) and slightly harder to read.

In terms of the viewing setup, I liked the HATHITrust options for “Classic,” “Scroll,” “Flip,” “Thumbnail,” and “Plain Text” views. “Flip” is almost comical in its cartoonish reproduction of a book (though the pages then become so small that you wouldn’t be able to read the text, while “Plain Text” is more like PG’s formatting. “Classic” and “scroll” are the easiest for reading, though I did use “Thumbnail” view to check out all of the prefatory pages at once.

As far as I could tell, none of these platforms allowed for a reporting of errors. The closest option is that Google Books allows you to “review” the text, so I suppose one could also report frustrations with errors, etc., if only for other potential readers. I’ve already mentioned some features I like—HATHITrust viewing options—but each platform has several functional perks. I don’t have a Kindle, but PG’s Kindle downloads are clearly a useful resource, since Kindles allow you to keep the text in your own collection (on a single device) and annotate as you please (depending on the version of your Kindle). If reading the eBook version of a PG text online, you can keep “bookmarks,” but I wasn’t quite sure how this worked—if you could bookmark pages within a text, or only text themselves. When I clicked “My Bookmarks,” PG remembered which texts I was reading (Volumes I and II of The Marble Faun) but it didn’t seem to notice which page I was on. PG allows one to “Go To” a certain page, but there aren’t any search features for finding certain words or phrases within the text. Google Books and HATHITrust offer many more search options. With GoogleBooks, there is a simple search bar, for finding words or phrases (which than appear highlighted in yellow and noted in the scrolling bar). Google Books converts chapters into hyperlinks on the contents page, so that you can jump to various chapters and sections. You can also access these jumps via a drop-down bar above the text. With a Google account, you can add books to your library and view your history, you can make lists, such as “Favorites,” “To Read,” “Reading Now” and “Have Read,” and like I mentioned before, you can write reviews. Many of these features are replicated with HATHITrust, and there’s also a “Share” feature in the left-hand column. I would imagine it’s easy enough to copy the link to a PG or Google Book, but I thought it was interesting that HATHITrust supplies a “Permanent link” for each of its texts, in clear view for the reader.

Aside from the Preview restrictions I experienced with the Penguin Classics version I originally viewed with Google Books, I didn’t experience any restrictions. It’s nice working in the 19th century, because so many things are part of the Public Domain (my HATHITrust version of The Marble Faun noted this, with a link to explain the details of the Public Domain) and available through (very) open access. I particularly enjoyed PG’s note to readers, “This eBook is for the use of anyone anywhere at no cost and with almost no restrictions whatsoever.”

I agree with Kathryn’s concluding comments below–online reading seems to have a long way to go. Right now, I think these sources are excellent for those readers who wish to read digitally, but not necessarily academically. This is a bit of a personal preference, but I’ve used online resources far more often for critical texts I want to preview and search for themes and terms, rather than for full literary texts I wish to read from start to finish. I remember once when I was abroad reading an entire collection of George Moore’s short stories on PG, but that was because I didn’t want to purchase more books than I could take home with me (which does point to the financial and material benefit of these online resources). But in terms of my anecdotal introduction, I will definitely finish The Marble Faun with my text edition from Oxford World Classics, which I can carry with me (I don’t have an ereader), annotate and keep on a physical bookshelf.

THE HOUSE OF M1KTH: Digital Wharton

I decided to base my digital bibliography exercise on Edith Wharton’s The House of Mirth. Of the three databases I chose for my exercise (Google Books, HATHITrust, and Project Gutenberg), I’m most familiar with Google Books, so I decided to go there first. I entered in my search terms and got two actual results (i.e. Wharton’s text, and not texts about Wharton’s text). The first one listed was the full text of Wharton’s The House of Mirth (with illustrations by A. B. Wenzell), published by Charles Scribner’s Sons in 1905. Google offered two versions of this edition of Mirth for download, EPUB and PDF. The second search result was a 2007 Digireads.com version that cost $2.99 to download. While the Google Books PDF was free and a fast download, I was pretty annoyed to discover that I couldn’t search the text – I tried on my work computer (which uses Windows) with Adobe Reader and my MacBook with both Preview and Adobe Reader.

Although the online Google version was searchable, since there were no ways to highlight or annotate, it didn’t seem very useful beyond yanking quotes out of the depths of the novel for use in other projects (which is actually how I tend to use Google Books). Indeed, Google even seems somewhat prepared for this – their primary source of textual manipulation (when viewing the book on my Mac – this feature disappeared on my work computer) is the ability to ‘clip’ a line into plain text format, a link to an image of the selected text, or a link to embed the text. While it might be neat to generate a digital image of the text, it actually limits the user to ‘clipping’ in rectangular forms only, meaning you can’t carry over onto the next line unless you want additional words from surrounding sentences caught in the rectangular clipping field. I’m not sure what the point of this clipping is – I really don’t think I’ve ever seen someone use it (or so rarely that I can’t recall). Google also allows you to generate a link for the specific page of text that you are currently reading, almost as a digital bookmark for later citations. There didn’t seem to be any ways to report errors for Google beyond writing a review for the text, but that leaves me questioning: what is a book review supposed to review? The actual content of the novel penned by Wharton? Or the scanning quality of the book? I’ve seen this happen on Amazon for Kindle versions a few times – people give a book low reviews based on the amount of grammatical and/or digital formatting errors, which confuses/frustrates those who are interested in the quality of the story.

Next up was HATHITrust, which I’ve encountered briefly before. I got a little lost the last time I was searching around for quick text downloads (actually, for Woodchipper, a data-mining tool we used in Technoromanticism), which turned me off to the site initially. However, when I searched for Wharton’s text on HATHI, I got four full-text hits for four different editions of Mirth: C. Schribner’s Sons (1905), C. Scribner’s Sons (1922), C. Scribner’s Sons (1933), and First Scribner/Macmillan Hudson River Edition (1989). When I clicked on the 1905 edition, I discovered that it was the same digital text that I encountered on Google Books (except for a badly digitized front cover scan). It even had the same pink thumbtip of a careless scanner in the bottom corner of a page! However, HATHITrust includes a watermark next to the “Digitized by Google” that reads “Original from UNIVERSITY OF CALIFORNIA.” I re-checked the Google Books version, and there is no such notation made for the edition’s provenance, which is odd, since it appears to be the same exact book and scans. HATHI attributed all of the universities that held the physical copies of Mirth contained in their database (two from UC, one from University of Virginia, and one from University of Michigan). It also revealed that all four digital texts were “Digitized by Google.” So… why weren’t they all available on Google Books?

Also, since the one version I was most interested in obtaining in PDF form (the 1905 one) was also offered on Google Books, I found it a bit silly that I had to log-in via UMD partnership in order to download it. It was a long process of “Building” the PDF, then downloading it, all to obtain pretty much the same text as Google. I was able to search the HATHI PDF on my work computer using Adobe Reader in a hit-or-miss fashion (I was sent to the correct page with a box appearing roughly around the portion of text that contained my search term), but I was unable to search it at home using my MacBook with either Adobe Reader or Preview. In HATHI’s site version I thought it was interesting that I could toggle between views (Classic and Plain Text), which might have made searching easier (otherwise the site just directs you to the right page with no highlights or line indicators), but the very first time I tried toggling over to Plain Text, I caught a number of typos on the page I happened to have open, the most glaring being the running head, which read: THE HOUSE OF M1KTH. HATHI does have a Feedback link at the bottom of the page that allows for error reporting, though I’m not sure I would have the will to submit a new one for each Plain Text page.

Like Clifford, I found Project Gutenberg to offer the most variety in file formats, and like her, found the image-lacking disclaimer pointless, as the HTML and plain text versions did not contain images either. Project Gutenberg offered HTML, EPUB (no images), Kindle (no images), Plucker, QiOO Mobile, Plain Text UTF-8, and MP3 files of The House of Mirth; for my purposes I converted the HTML version to a PDF file, one which (finally!) is fully searchable. Unlike either Google Books or HATHI, there seems to be no printed referent for Project Gutenberg’s text. The only noted provenance is a release date of the digital text (June 1, 1995) and a few notes at the end of the text:

Notes:
1. I have modernized this text by modernizing the contractions: do n’t becomes don’t, etc.
2. I have retained the British spelling of words like favour and colour.
3. I found and corrected one instance of the name “Gertie,” which I changed to “Gerty” to be consistent with rest of the book.
-Linda Ruoff

There is also a notice at the end of the text that “Updated editions will replace the previous one–the old editions will be renamed.” It almost seems as if Project Gutenberg is leaving little to no room for discussion on authoritative editions, variants, and the like (though you are free to email them with errors you may discover). There also appears to be no interest in preserving a digital transmission history of their edition of House of Mirth, as any discrepancies will be obliterated with no discernible trace (unless you leave a note, as Linda Ruoff did).

All in all, in order to accomplish the two things I want most in a digital text (searchability – a digital affordance, and writeability – a print affordance), I had to save a PDF file from an HTML version of The House of Mirth – one that had no perceivable basis in print. Project Gutenberg’s version is pure text, no book, which leaves me wondering: how would I cite these quotes that I am able to find at a moment’s notice? Would I have to turn around and utilize Google Books’ scans to pin specific quotes to page numbers? Makes one wonder, are Post-It Flags really so terrible?

Exploring _The Castle of Otranto_

The book that I have chosen to investigate on Project Gutenberg, Google Books, HATHITrust, and the Internet Archive is Horace Walpole’s The Castle of Otranto (1764). Given that the author of the text alleged to be a translator by the name of William Marshall who had recovered the text (said to have been originally printed in 1529) from obscurity in an old library in England and reprinted it for public dissemination, I thought this made The Castle of Otranto an interesting choice (my love of early Gothic literature aside). For as we all know, one important role that digital archivists play involves the rescuing of obscure texts, which are then scanned to the web for public consumption. In terms of availability, all four of the digital archives mentioned above have copies of The Castle of Otranto. The text is available in HTML, EPUB (with images), EPUB (no images), Kindle (with images), Kindle (no images), Plucker, QiOO Mobile, PDF, and Plain Text UTF-8. In terms of editions and provenances, they tend to vary. In the Internet Archive, you can find a version of the novel that is the third edition and that comes from the Bodleian Library at Oxford with a date stamp of 27 Oct 1930. There is also an edition from the University of Toronto library. On Google Books, there are versions from the Stanford University Library, the Library of the University of Michigan, and the same third edition scan from the Bodleian Library that can be found at the Internet Archive. In the HATHITrust Digital Library, one can find the University of Michigan version, as well as versions from the University of California (published in 1823), Princeton University (1811), and Indiana University (1854). The version available on Project Gutenberg appears to be the 1901 version taken from the Library of the University of Michigan. There definitely seems to be a lot of overlap between these digital archives, though from my examinations of the sites, it appears that HATHITrust has the best range of copies since they date back to 1811.

The first result you get when you search for The Castle of Otranto on Google Books is also perhaps the worse copy available. After you get the cover, you have to scroll down through several scans of a woman’s hand to get to the actual title page. Even then, there are still occasional fingers or dark ink splotches that cover up parts of the text. If someone actually wanted to read this version, it would be possible, as long as you could fill in the blanks caused by the more damaged scans. Ink splotches happen on several other versions, and sometimes the text cuts off the sides in some copies. Each of the versions seems to have little quirks like dirty pages or ink splotches or text that is blocked by mysterious rectangle-shaped objects. However, overall, like I said, the text tends to still be readable for the most part. I wouldn’t say these are the best scans ever, but given the amount of texts being scanned and the fact that we are in the midst of the transition to digital archives, rather than approaching the final stages of completion, I would say that the texts serve their purpose at a very basic level. The ability to perform searches within the text is a feature that has definitely been helpful for me as an academic. Reading The Mysteries of Udolpho by Ann Radcliffe and then trying to go back and find a quote that I didn’t highlight because I did not think it was useful at the time is not a fun task. Digital Libraries like Google Books, HATHITrust, and the Internet Archive that allow you to not only find words quickly, but also see their context before you go to the actual page the word is on, is definitely a blessing for the toiling scholar.

One of the things that I found most interesting about the Internet Archive is the ability to read the actual book online. The archive is set up to present the book in such a way that makes you feel as if you are actually reading the book itself, rather than just scrolling down a screen. It keeps several of the affordances of the book, such as the comparative space, and gives you the illusion of a three-dimensional object as you “flip” through the pages. This is nice for a reader wanting the experience of the actual text and the comparative space is definitely a plus, but such a skeuomorphic design does little to utilize the affordances of the digital archive. Several of the other versions allow you to click through the pages, but most often this still gives you one page at a time, and as with Google Books, there is still some scrolling involved to see the full text. Of course, the option to download on each of the Digital Libraries lets you make the page bigger or smaller as you like so you can use the page up and page down keys.

As I just stated, each of these sites allows you to download the text. However, if you prefer to stay digital, Google Books lets you compile a “library” of books and HATHITrust lets you create a “Collection” of books. In terms of making these texts writable as well as readable, I did not find any options to annotate any of the versions of my text. Additionally, only authorized users seem to be able to add texts to the digital libraries, making this an exclusive project that is available for consumption by readers, but not open for reciprocity. Along those lines, I did see a link to provide feedback on HATHITrust and report any errors or trouble with the text. As for Google Books and the Internet Archive, I did not see any link for feedback, but there are links set up where readers can write reviews of the text. I imagine these reviews could both be for the book itself and the quality of the scans. However, I do not know if the people who are able to make changes to the texts will actually be reading those reviews. I did not find any way of providing feedback on Project Gutenberg.

The advent of Digital Libraries is a wonderful thing. However, from what I saw of the somewhat obscured scans, the inability to “write” on the texts, and the limited capability for providing feedback that will go directly to the people in charge of the scanning process, there is still much work to be done. As I stated above, I see us in the middle of a transition to Digital Libraries and engaged in work that is nowhere near completion. As time progresses, I hope to see more innovative archives that better utilize the affordances of the web to make texts that are writable/readable and that allow us to research and analyze texts in new and innovative ways that could not be done away from a computer.

Dracula and the Digital

I’ve selected as my book of choice Bram Stoker’s Dracula.  While it may not test or strain the abilities of Google Books in quite the same way as Paul Duguid’s selection, Tristram Shandy, it does offer unique ways in which to present the book in the digital format.  The epistolary style could be better presented in the digital format than it has ever been in the printed editions.  And while I recognize that what we are doing with this particular exercise is simply to survey how well Google Books, Project Gutenberg, HATHITrust, and/or the Internet Archive succeeded in capturing the bookness of our selected text, I still was interested to see how they would manage with such an interest on as Dracula.

Dracula is available in a wide range of formats, Project Gutenberg–as one might expect–offering the most (HTML, EPUB (without images), Kindle (likewise, no images), Plucker, QiOO Mobile, Plain-Text UTF-8, and even audio.  I must say, however, the warning that the EPUB and Kindle versions lack images seems pointless as I couldn’t, in a glance through the other offerings, locate any images in any of the formats.  Further, even in the PDF format offered by HATHITrust, the full text online offered by the Internet Archive or its EPUB version, or the ebooks Google presented could I find an illustrated version.  This is fine by me as I can’t recall any of my editions (other than the annotated Les Klinger copy I have) having any images at all, it just seemed that if Project Gutenberg saw fit to warn me about the lack of them, they might have at least snuck in a small image of a blood-sucker somewhere or other in the other versions to make it all worth it.

The provenance or source of the digital texts is a bit spotty.  For example, while Project Gutenberg assures us that their copy is based on the 1897 edition of the text and that the digital copy was published May 9, 2008 and updated September 3, 2012 there are few other specifics provided such as publisher, city of publication, or anything else that one might find on the inside of a printed copy.  Google fairs a bit better, though one of their versions simple details the digital copy’s origin (Plain Label Books, Aug 30, 2007), the other proclaims that it is published by W. R. Caldwell in 1897.  That particular edition even has a make of inheritance as Duguid discusses as the first page is emblazoned with “Stanford University Library, Gift of John W. Dobbins, Esq.”  To be fair this is also the nearest one of the digital versions come to being illustrated as there is an image of “Castle Dracula” on the fourth page and some owls on the fifth–this is apparently the “three owl edition” of the story.  HATITrust’s copy, amusingly enough, is actually one of Google’s digitized copies from the University of Michigan (and a very poorly scanned one at that, as several pages are more than half cut off at the start of the book) and of a far more recent printing (judging by the image of Bela Lugosi on the front cover).  In fact, the full text version that the Internet Archive offers is actually copyrighted Project Gutenberg and seems to be the identical copy to the HTML version offered on their site with the same source and publication dates.

As I mentioned before, some of the scanning or digitizing of the copies was less than ideal.  HATHITrust’s version looks as though the first scanned pages were trying to escape the scanner and no one noticed, though as that may have been the interior of the dust-jacket, it may be understandable.  Google’s version from Stanford University has a few badly scanned pages with small portions of texted clipped off at the edges of pages, it appears, but nothing too apparent.  The Internet Archive HTML version appears to have just been a rough cut and paste of Project Gutenberg’s as they have managed to copy the link names, but not the links, to the mp3 audio files that Project Gutenberg provided in addition to the text.  The Plain Label Books edition offered on Google Books or Project Gutenberg’s own HTML editions appear to be the easiest to read, though neither has even attempted to retain the “bookness” of the book.  Rather than scanned editions, they have retyped the text.  The effect is, at least for me, a bit jarring as it no longer looks like a “genuine book” to me, which is to say a printed copy; however, the pages are not marred with artifacts and smudges from life on a library shelf and there are no missing parts of pages or words so in that way they are much easier to read.  Nothing has been lost from the presentation in these, certainly, and Project Gutenberg has even taken the time to add hyperlinks to the table of contents so that one may jump to a desired chapter with ease.

None of the editions seem to provide an easy or obvious method to report or correct errors, though at least in the Project Gutenberg Kindle edition one was able to highlight or annotate the text–a feature that I couldn’t find on the other versions.  Further, all except the poorly copied version of Project Gutenberg’s HTML offered by the Internet Archive, offered means to jump through the text.  Most did this with a “go to page” field one could use, though Project Gutenberg stood out by offering the linked table of contents as well as the ability to create bookmarks.  HATHITrust was also original in that it also offered the ability to view the text as a series of thumbnails.

All the versions I explored offered the ability to search within the text for given words, though the Project Gutenberg HTML required on to do this with the use of the search or find feature in one’s browser, rather than offering a specific search box for the purpose.  All of the sites, with the exception of Project Gutenberg, did offer the ability to add it to a “library” if one signed into the website, however.  In fact, if one preferred to read offline, all of the site offered the ability to download the text in one or more formats for later study.

Finally, while the sites offered many abilities with the text they were all about the same.  None stood head and shoulders above the others in terms of affordances.  This is a shame really, considering the digital medium.  One was really is limited to reading the texts from start to finish or searching them for select terms.  The idea of “flipping through” the text was almost non-existant for the time it took to load the scanned pages in Google Books and HAHTITrust made that impossible (while my internt could be to blame here, I doubt it, given that I’m the only one using it at the moment).  Further, affordances one would have with the physical copy were no offered online–highlighting, dog-earring pages, etc.  So while the possibilities ought to be almost endless with the digital version of the text, they were sadly underutilized.