On reading, translating and making questions

I selected a group of short stories by Edgar Allan Poe for this exercise. It was difficult to work with Machado de Assis this time, because I did not find the translation into English. Also, I was curious about seeing the particular voice of Poe’s stories, its peculiar vocabulary. But I also thought that it would be interesting to see some translation phenomena at the same time. I selected the anthology that Charles Baudelaire translated by the title Histoires extraordinaires. As I could find this edition in Project Gutenberg as well as the complete stories by Edgar Alan Poe, I decided to create a document in English with the same short stories.

It is well known that Baudelaire was the first translator of Poe into French and that this translation was very important for European literature. I wanted to see what happened if I compared the two anthologies through Wordle and WordItOut, and then HyperPo. So I began my exercise with some extra questions: Could we get interesting or relevant information about the words that appeared in the original and the translation? Are those programs helpful tools for Translation Studies?

I began with the Google Ngram Viewer, to compare Poe and Baudelaire in their respective languages, with pretty obvious results (I must admit I spent some time playing battles between couples like Derrida/Deleuze; Godard/Truffaut, etc. with amazing results):

 

ENGLISH FRENCH

But I wanted to see what happened in Spanish, and the results were more interesting. They are published or are subject of analysis almost at the same time! Why did this happen? Is the reception of Poe similar to Baudelaire’s in the Spanish speaking world? Are their figures similar?

SPANISH

When I created a word cloud through WordItOut I realized that there was a list of common words that the cloud ignored, and that I could change that list as well as replace characters. Also, I could change a lot of settings as number of words, order, color, etc. But when I tried to create a word cloud with the French version, I did not have the option of a foreign language, so I did it myself, adding the most common French words to be ignored by the cloud. The result was this:

WORDITOUTENGLISH

WordItOut- English

WORDITOUTFRENCH

WordItOut – French

 

I was surprised that most of the words were very common words, so I wonder if analyzing these results could be interesting. The importance of the word “now” maybe is telling us something about Poe’s short stories style regarding the treatment of time. We can make multiple interpretations from this result: the question of “time” in Poe’s literature, or moreover, the question of “time” in Baudelaire’s literature. Why Baudelaire chose these and not other stories to his first anthology of Poe’s work? Is there something behind the words?

When I used Wordle, I realized that the list of ignored words is not so big. Some common words  entered in the word cloud. I noticed that this program had a filter for different languages, but it happened the same with the French version, as I could see many words of common usage, as “bien” or  “cette” or  “comme”. So, in that case, Wordle was less useful to find meaningful results.

We have to think on one important issue: that we have to customize very carefully these tools. That arises the following questions: Are we making a text say what we want it to say? Is it just another way to do the same as the kind of literary criticism we already have?

When I pasted the words from WorditOut to Up-Goer, the program permitted all of them except six: “Dupin”, for it is a surname, “indeed”, “balloon”, “manner”, “itself” and “earth”.  I found it interesting that most of Poe’s words were common.

UPGOER FIVE

UpGoer Five

Using CLAWS, I found that most of the words are nouns, (I used the help of Wordle to see this in a clearer way!), adverbs, adjectives, general determiners, the “base forms” of the verb “to be”, prepositions, etc. I think it is an interesting tool when you are looking for something very specific. Again, all depends on the questions you have, the relevance of those questions and the relevance of the results. Data just for the data is meaningless.

CLAWS RESULTS

CLAWS

Finally, TaPor is a very interesting program. It is much more sophisticated and useful than the word cloud creators. It works with texts in French, Spanish, German. The “voyant tools” were interesting, like seeing the frequency of certain word(s) in a graphic, in context, etc. You actually can “see through your texts” as the Web page invite the users. I found that “death” and “idea” appears the same amount of times! And “great” and “little” are the most common used adjectives. It is also interesting to see the differences between the two languages. The results tell us a lot about the particularities of both languages, like the common use of the verb “to say” in English language literature opposed to the use of synonyms of that verb in other languages’ literatures, as it is more frequent in the English version that in French version. There are a lot of data to read and analyze here!

TAPOR ENGLISH TAPOR FRANCES

I think all these tools are useful for translators to understand some phenomena, how we translate, how some writers and some translators use a particular vocabulary, style, phrase construction, etc. I think it would be great to do that with an own translation and see the results, and also to compare two translations of the same work!

Conclusions

At this level (just trying new tools, not researching for any particular paper) I found curious numbers and graphs, but if I had had in mind a set of questions and hypothesis, it would have been very useful –but always depending on the relevance of the questions and responses. I think that if we have questions very well defined, there will be some interesting results. (I wonder about the difference between answers and results. Do computers answer or just give us results?) And once we have some answers from the computer, we can reformulate new questions, which is the most interesting part of literary criticism, activity that, as Ramsay says, did not change with the introduction of computers. We interpret the results that machine can give to a certain research –word  frequency through a book, through time, etc. As Ramsay affirms,

“If something is known from a word-frequency list or a data visualization, it is undoubtedly a function of our desire to make sense of what has been presented. We fill the gaps, resolve contradictions, and, above all, generate additional narratives in the form of declarative realizations.”(62)

Results are results and they can not be changed; they are a fact. But we read them and we arrive to different conclusions, even though we have the same object in front of our eyes: algorithms, data or a book. Those are just different ways that let us read a story, and reading through machines is a fascinating one that many times defy our preconceived ideas or give us new perspectives of reading it. That feeling of learning to read again, of seeing a text in a whole different way, that “ostranenie”, are fundamental to begin making questions, to try to find new paths to fight against common places in literary criticism. Also it is a way to do things with books: like snipping its pages. That is something we can do because we are working with digital texts. And digital texts have a very different substance than printed texts. With computers we can analyze just text, but not other important details that also make part of our reading of a book (and the readings that that book had in the history of our culture) as the book itself: how and where it was published, how its covers are, from which collection, etc. So not everything can be read through machines, and we have to pay attention not to isolate the “text”, as if everything that should be read is just in the (digital) words of a text.

Through my experience working with different programs for this exercise, I realized that I was finding new questions (everything was questions! and I could not arrive to any answer at this level); I also found new ways of thinking texts, of thinking translations, and that is what I really like to do as a reader and as a student.

“Computer-enabled play” — hacking The Marble Faun

(I steal “computer-enabled play” from Irizarry, quoted by Ramsay on pg. 36 of Reading Machines).

The reason I was drawn to include the phrase “computer-enabled play” in my title was because that is really what I felt like I was doing throughout the exercise: playing, fiddling, fooling around, testing out, exploring, etc. Similar to what Mary expressed, I found that some of these experiments were overwhelming (or annoying) in their unfamiliarity, but I soon discovered that if I “played” with the tool enough, I could eventually gain some insights into The Marble Faun in a new way (i.e., different insights than I would have garnered from reading the text in a traditional manner).

Wordle and WordItOut seemed especially “playful” with their fun names, bright colors and graphic visualization. As I’ll probably reiterate several times in this post, these tools did require some “fiddling,” though.

WordItOut

WordItOut

Wordle

Wordle

For both tools, I used the full text of The Marble Faun, from the Project Gutenburg plain text online version. With WordItOut, I appreciated the function to tweak the list that was generated. I could view the list in ascending count order, alphabetically or randomly (though I’m not sure how the last two would aid in a critical analysis). I was able to increase or decrease the number of words. Also, unlike Wordle, WordItOut allows you to see the generated word list in both list and visualization form, which was helpful when I wanted to copy the list for further exercises.

As others have observed, with narrative forms (novels), it seems that names and other pronouns seem to be most prominent. In a story like The Marble Faun, it was actually interesting to see which character was the most represented: Miriam. (The Marble Faun is sort of like modern day sitcoms that center on a group of friends–so imagine that we Wordled an episode of “Friends” and discovered that “Rachel” is the largest name–what does this tell us about the group dynamic?). In The Marble Faun, the drama centers on MIRIAM. I thought it was interesting that Hilda is the second largest name–so in a novel which actually has fewer female characters than male, the females still win out in nominal presence.

Other prominent words seemed to thematically center on art (not surprising; the characters are artists living in Rome) and time (not surprising; Hawthorne often focuses on the interplay of past, present and future reality). Wordle and WordItOut thus demonstrated for me the point Ramsay notes more than once, that at a base level, digital tools might merely confirm analyses we have already made.

Upgoer 5

Up-Goer 5

After struggling with the “define digital humanities” Up-Goer 5 challenge before beginning the exercise (how unnatural for literary scholars to prioritize simple, oft-used words over our sophisticated vocabularies!) it was interesting to use the tool to investigate a pre-generated text. I used my list from WordItOut, but removed the names as I didn’t think they would propagate any new insights (i.e., it would be no surprise that “Hilda” is not in the top 1000 used words). What I did find, though, WAS intriguing.

Of 97 words produced by WordItOut as “most used” in The Marble Faun, only eleven did not make the Up-Goer 5 top 1000 word list. These were: sculptor, Rome, marble, among, itself, whom, Roman, nor, poor, tower, and indeed. I’m guessing that Rome and Roman are too specific (proper nouns) to merit top-1000 usage, while sculptor and marble as nouns also seem too obscure (we don’t talk about sculpture very generally or often). Nor, whom and indeed are rather sophisticated uses of grammar, so their absence doesn’t surprise me. I don’t have an explanation for among, itself, poor, or tower—any thoughts?

What’s left behind (in the top 1000) is interesting when you consider the words as “topics” of interest (in the sense that oft-used words might represent broader themes): life, heart, friend, good, human, world, love, art, idea, moment. Not only are they huge topics in Hawthorne’s text, but also, apparently, in everyday speech.

CLAWS

CLAWS

On to CLAWS, the realm of tagging. This, I did not find playful. I was rather confused, though finding the accompanying “tagset” key was somewhat illuminating. I didn’t have the patience to count the different types of word forms, but that could have been interesting to see–were Hawthorne’s 100 most-used words from The Marble Faun mostly pronouns? (probably). Singular nouns? Comparative adjectives? Etc. etc. So I can see how CLAWS could be a useful tool, but I didn’t like the aesthetic of the list that was generated (no spacing, no counting) so, admittedly, I moved on.

TAPoR

TAPoR

TAPoR, while also intimidating with its unfamiliar interface, was “playful” in its potential for “fiddling,” as I previously described. The more I played around with it, the more I found ways to make it work for me. After scrolling through the word lists in the lower left-hand corner (sorted Frequency vs. Count vs. Trends) I clicked on “heart” to see what came up.

I don’t really understand the graph in the upper-right hand corner, though I know you can view two words at once to—I presume—compare frequency at various points in the book. For instance, I viewed “woman” and “sympathy” together and saw a very similar pattern, suggesting that woman & sympathy are often discussed in tandem. This is not surprising, given that Hawthorne’s romance could really be considered a sentimental novel and he’s constantly talking about the female characters’ womanhood and capacities for sympathy (e.g. Hilda is very sympathetic, Miriam not so much). What confused me were the “segments,” though I suppose you could generate the graph so that it represented chapters, if you knew how to finagle that breakdown. That way, you could see where, in the novel, topics were discussed with higher frequencies. “Heart,” for instance, skyrockets at the end of The Marble Faun, according to this “Word Trends” graph.

I also got the hang of the concordances tab and found these lists extremely interesting. Under “heart,” I could observe the following concordances:

Intimate/heart/knowledge
Hilda/heart/life
close/heart/beautiful
brain/heart/think
trust/heart/trusts
secret/heart/burns

These are only a few examples (from 89 instances of heart in the first volume of the novel) but SO INTERESTING! I’m especially intrigued by instances like “knowledge,” “brain” and “think” surrounding the presence of the heart, since we get that tension between cognition and emotion there. Trust and secrets regarding the heart don’t surprise me at all given the nature of the novel, nor does the presence of the “beautiful.” Hilda’s concordance is also not surprising—the trio of “Hilda,” “heart” and “life” is only too perfect. (I know I’m not being very clearly critical here, but I’m sure you can see the potential for developed analytical writing on these topics).

One question I had while using TAPoR concordances: TAPoR doesn’t select the immediate surrounding words, but rather “keywords.” For instance, “secret/heart/burns” comes from the sentence: “There is a secret in my heart that burns me!—that tortures me!” This is a pretty good example—we presumably don’t care about “in,” “my” or “that,” but how are the keywords chosen? Does the software just eliminate prepositions? Do we lose the presence of “torture” here? Compare to an example like “only/heart/sought.” The sentence from which this concordance is generated is, “But if it were only a pent-up heart that sought an outlet?” To me, “pent-up” seems important, while “sought” and “outlet” are equally important. So, I’m just wondering (and perhaps someone can actually tell me) how the concordances work—how are the surrounding terms generated?

I could really see myself using TAPoR in the future (though, again, the interface doesn’t really appeal to me and I wished I could have enlarged everything–but these are minor complaints of a whiny variety). As someone who was widely unexposed to DH tools before this class, Ramsay’s Reading Machines and our exercises have legitimately moved me “Towards An Algorithmic Criticism.” The text, in its descriptions, examples and analysis of digital tools and their impact on/interaction with literary criticism was seriously illuminating. We were prompted to consider how

“the effect is not the immediate apprehension of knowledge, but instead what the Russian Formalists called ostranenie—the estrangement and defamiliarization of textuality” (3)

regarding our experience with the various digital tools today, and it certainly applies. “Estrangement” and “defamiliarization” certainly describe my “computer-enabled play” with The Marble Faun today. We are distanced from the text when the computer intervenes, transforming prose into lists, visual graphics, concordances, and line graphs. BUT this does offer, though not immediate, new “apprehension of knowledge,” I believe. From reading Hawthorne’s prose, I do not “know” whose name appears most often in the text, even if I can guess. Conjecture becomes fact, and fact leads us to points of inquiry, new questions regarding “why?” More articulately put by Ramsay on pg. 62:

“If something is known from a word-frequency list or a data visualization, it is undoubtedly a function of our desire to make sense of what has been presented. We fill in gaps, make connections backward and forward, explain inconsistencies, resolve contradictions, and, above all, generate additional narratives int he form of declarative realizations.”

I’d like to point out a couple of other passages which stuck out to me and helped me frame algorithmic criticism this week:

“The computer revolutionizes, not because it proposes an alternative to the basic hermeneutical procedure, but because it reimagines that procedure at new scales, with new speeds, and among new sets of conditions” (31).

 

“Rather than hindering the process of critical engagement, this relentless exactitude produces a critical self-consciousness that is difficult to achieve otherwise” (34).

And, to end, perhaps a point that generates and necessitates discussion: in opposition to “ambiguity,” the computer “demands an answer” (67). Is this a limitation? Shouldn’t there be room for ambiguity in literature, even if it doesn’t fit into an automated output? Ramsay continues, “…the computer demands abstraction and encapsulation of its components” (67). Again–is a limitation present, here? Are all texts (/words/phrases/data sets) discrete, with potential to be “encapsulated?” Does the computer miss something the subjective mind would not?

Mapping Scarlet Letters

Wordle and WordItOut

To start, I decided to place the text of The Scarlet Letter into Wordle and WordItOut and compare them side by side. Here are the clouds I created. First, Wordle:

Wordle Scarlet Letter

Nothing too surprising here. And yes, I kept all the defaults. On to WordItOut:

WordItOut-Word-cloud-161404

I kept the layout simple for easy reading. As you can see, both word clouds are quite similar, and the content of the words isn’t surprising. Names are quite prevalent, with Hester and Pearl topping the list. Words like “Heart,” “Life,” and “Mother” get at the core issues of the novel. WordItOut represented quite a few more mundane words that don’t mean much on their own: “within,” “among,” “whether,” “indeed,” “even,” ect., while Wordle came up with more interesting results overall, though many of these words appear small due to their low frequency. In particular, the category of morality pops up, which shouldn’t be surprising for someone that has read the novel: “soul,” “sin,” “shame,” ect. Finally, while admittedly a common word, the large frequency of “One” is a bit puzzling, but something to take note of for later.

Up-Goer Five Text Editor

Up-Goer Five Text Editor is an interesting experiment in constraint. It does for the depth of language what Twitter does for length. It took a bit of rewriting before I got a definition of the Digital Humanities that didn’t seem horrible: “The use of computers in order to find new ways of doing and making while focusing on older ways of understanding.” Wow, does Up-Goer Five Text Editor require simplicity or what? Already I was ready for a few rejected words, so I put my results from WordItOut into the box and clicked enter. This is what I found:

Up-Goer Scarlet Letter

Most of this isn’t too surprising. I did not expect names like Prynne and Chillingsworth to be among the ten hundred most used words. Moreover, words not in use anymore, such as “thee” and “thy,” were rejected, although I was surprised and disgusted by the rejection of “whom.” You would think a word like “itself” would appear, but this demonstrates just how limited you must make your vocabulary in order to use this tool. This was an amusing experiment, and the constraint works in a similarly way as  Twitter, forcing the user to create something under set limitations.

CLAWS Part-of-Speech

Next, we move on to CLAWS part-of-speech tagger, which is a fun experiment, but not quite as amusing as the other tools. I would have appreciated a function that sorts the words of alike parts of speech together, but I suppose you cannot ask for everything. From what I can tell, there is actually a variety of parts of speech here, with proper nouns, reflexive pronouns (the prevalence of “self” is interesting), adverbs, singular nouns, prepositions, pronouns, and more. Could I have discovered this on my own? Probably. But CLAWS brings these facts to my attention as a way of sparking new questions or pursuing new areas of study. But for now I’ll leave CLAWS alone and move on to the final tool.

TAPoR

Trying out HyperPo and experimenting with different combinations of words was worthwhile. After fading through the first page of largely uninteresting words, I came across the word “One” once again. Equipped with this new tool, I decided to map out its presence throughout the text and perhaps account for it.

HyperPo 1 One

Yes, “One” is a common word and could have little significance. It could also be a particular vocabulary quirk of Hawthorne (or perhaps the era in which the book was written) to use “one” rather than “you” or “she” or “he,” or to refer back to a person. Certainly, this is the case. But there are numerous instances in which “One” serves a more interesting purpose. In sentences like “. . . deep a dye as the one betokened by the scarlet letter,” one is used to emphasize the unique suffering of Hester’s situation. Again, more likely Hawthorne uses “One” incidentally as part of his diction, but cases like these suggest the possibility of something more.

Next I decided to experiment with a more concrete idea. I selected “child” and “infant,” both of which refer to the character Pearl in the novel, and attempted to set them against each other on the graph. This did not work for some reason, so I was forced to look at them separately. As expected, “Infant” occurs almost entirely on the left side of the graph, the beginning of the novel, when Pearl is, well, and infant. Child, meanwhile, appears steadily throughout the novel, starting just after “infant” ends (with some overlap), as well as a slight dip in the set of chapters in which Pearl does not appear. This looks good. Despite the minor technical hiccup, HyperPo seems to be doing its job. Of course, in this case, I only set it to tell me something I already know, suggesting that I may not be asking the right questions. But this was an short experiment with the capabilities of the tool itself, so I have no choice but to forgive myself.

As one final experiment, I noticed that HyperPo allows you to collapse different words and view their frequency as one unit. I tried this with “sin” and “shame,” words associated with Hester’s scarlet letter:

HyperPo 3 sin and shame

As you can see, the greatest frequency of these words together occurs toward the beginning of the novel, while it fluctuates up and down before going up near the end. What can we determine from this graph alone? Perhaps the scarlet letter torments Hester most toward the beginning of the novel, during Pearl’s infancy. The passage at the end is also notable for the line,

. . . long since recognised the impossibility that any mission of divine and mysterious truth should be confided to a woman stained with sin, bowed down with shame, or even burdened with a life-long sorrow.

Focusing on these central themes at the end accounts for the tiny spike. Of course, I can verify none of this without directly consulting the novel, which further indicates the use of this tool as a form of provocation, a way of reshuffling the words of the text to raise interesting questions. In this sense, to “see through the text” involves a specific mapping which requires zeroing in specifically on finite sets of words. The experience of HyperPo is like reading a text with a powerful, magical magnifying glass that guides the reader to common and specific parts of the text. Okay, that analogy may not work as well as I was hoping, but I gave it a shot.

Conclusions

Overall, HyperPo is a robust tool that has a lot to offer, and I have of course only scratched the surface. Wordle and WordItOut are useful for expressing a main idea or message easily and succinctly, but I imagine HyperPo could be used for more serious research.

This exercise has taught me that one must be deliberate and careful while using these tools, provided that you want to come out with something useful. They can be used to confirm what you already know, which most would argue is quite boring. It takes a great deal of time and experimentation before coming out with a truly stunning result, and these are the ones that are the most worthwhile. These are the moments when you are able to look at a text in a new way, and this alone justifies the use of these tools.

In this sense, Ramsay’s these tools indeed create a sense of the “estrangement and defamiliarization of textuality” by forcing the reader to view a text in an entirely different way. For all of its simplicity, Wordle’s ability to recognize and display common words presents the text in its most basic form. No, this is not the same as reading The Scarlet Letter. Not even close. But as a tool of provocation, the re-shuffling and re-oganization of words could lead to new insights about the text. Perhaps HyperPo best demonstrates the capabilities of these sorts of tools for scholarship. I’m still not convinced that any of these tools can help us “Read a Million Books,” as they require the user to be familiar with the texts beforehand in order to glean useful information, but perhaps that is a topic for another day.

Examining the Architecture of _The Castle of Otranto_

Introduction with Ngram:

To begin my examination of The Castle of Otranto, I thought I would start with the results I found on Ngram. When we were told to use Ngram to map out two terms, I decided to go with “horror” and “terror.” I changed the dates in Ngram to start at 1700 rather than the default 1800 and mapped out the results. Here is what I found:

Ngram Viewer_Terror_Horror

Since the dawn of Gothic literature, incited by Horace Walpole’s The Castle of Otranto, occurred in the mid-1700s, I was not surprised to see such results, for along with the inception of Gothic literature in England, Walpole’s work also sparked a discussion of the difference between horror and terror. Ann Radcliffe, a renowned author of Gothic works during the late 1700s, utilized terror in her writings, hinting at supernatural occurrences, but eventually explaining them away as rational events transformed into terrifying ones by superstitious sentiments. Terror, for Radcliffe, is the anticipation of the supernatural. Horror, on the other hand, is the fulfillment of a supernatural occurrence. Radcliffe defines these differences in her essay “On the Supernatural in Poetry,” published in 1826. Traditionally, scholars have aligned terror with female Gothic writers and horror with male Gothic writers, though such a stark dichotomy is obviously not a perfect representation of the real relationships between male and female authors and the use of terror and horror. However, the dawn of Gothic literature and the discussion of horror and terror sparked by the differences between anticipated supernatural occurrences and the actual fulfillment of supernatural events can perhaps explain the sharp increase in the usage of horror and especially terror in the late 1700s. The steady decline leading up to the present and coming together of horror and terror can also be hypothesized to be a result of our more modern usage of these two words which tends to treat them as interchangeable.

WordItOut and Wordle:

The Castle of Otranto WorditOut The Castle of Otranto Wordle

Moving on to my text, when I put The Castle of Otranto through both Wordle and WordItOut, many of the results were similar. Names (Manfred, Isabella, Matilda, Theodore…) were marked as appearing in the text the most often, which is not all that surprising considering most of the novel concerns the “bartering” of two women, Isabella and Matilda, by Manfred. “Cried” is also relatively large, which makes sense since Isabella and Matilda are both upset with the matches Manfred tries to impose upon them. Other words that are in comparatively large font are “Princess,” “Lord,” “Prince,” and “Castle.” As the book that sparked the production of Gothic literature in England and contributed to the development of gothic tropes such as the medieval castle, the damsel in distress, and the tyrannical male, it is not surprising to find these terms in large font.

The Up-Goer Five Text Editor:

When I placed the top 100 words into the Up-Goer Five Text Editor, I came up with a lot of terms that just did not fit. Heralding back to a former age filled with knights in shining armor, princesses in distress, and ancient castles, it is not surprising that this is the case. Many of these words are not in common usage, including the personal pronouns thee, thou, and thy which again are used to suggest the composition of this text in medieval times.

The Castle of Otranto Up-Goer Five Text Editor

CLAWS:

CLAWS was intriguing, though perhaps not as useful as some of the other tools. However, there were some interesting results that mirrored what I found in my Wordle and WordItOut word clouds. There were a lot of proper nouns due to the common occurrence of names within the text. Also, there were many other nouns that serve to invoke the spirit of medieval times and Arthurian adventures: “Prince,” “Princess,” “Knight,” “Highness,” “court,” “escape,” “chamber,” and “convent.”

TAPoR:

Looking at TAPoR was a lot of fun. I definitely liked the aesthetics of the site with all of the different boxes showing me different ways of pulling apart the text and examining the words as they occur throughout the novel. Looking at the occurrences of words in the lower left hand corner of the page, I was interested to see that (after all of the indefinite/definite articles), the words “if” and “would” came up pretty high on the list. Seeing as the plot of this story centers around Lord Manfred’s attempts to convince Isabella to marry him, and later, his attempts to make his daughter marry Lord Frederic, these words seem appropriate (If only you would marry…). Once again names were high on the list. Because this tool offers you an easy way to map where the words fall and find the context in which they occur, I took the time to map out Manfred, Isabella, and Matilda to see where their names appear the most often and what is the context of these moments.

Manfred:

The Castle of Otranto_Manfred

The moment where Manfred occurs most frequently is in a moment when Matilda decides to go and speak to her father after the death of his son, Conrad. This scene involves Matilda trying to build up the courage to speak to her father. When she finally does, he denies her admittance, telling her that he does not want a daughter, he wants his son back. This is very typical for a man to be more concerned with the male heir than his daughter. Also, other places that Manfred’s name appears are surrounded by words like “rage,” “incensed,” “angrily,” and “impatient,” giving one a hint into the tyrannous nature of Lord Manfred.

Isabella:

The Castle of Otranto_Isabella

For Isabella, the time where her name is mentioned most frequently occurs during her escape attempt in which she flees from the evil machinations of Manfred, who seeks to divorce his wife in favor of marrying the young and innocent Isabella. Many of the other times Isabella’s name appears are in regards to discussion of Manfred’s loathsome plot and to inquiries that are being made into her disappearance so that she can be found and subjected to Manfred’s will.

Matilda:

The Castle of Otranto_Matilda

For Matilda, her moment comes when she is made aware of the fact that Lord Manfred (her father) agrees to marry her off to Lord Frederic (Isabella’s father) so that Frederic will grant Manfred Isabella’s hand in marriage. It is a typical moment of patriarchal bartering. Manfred wants Isabella, so he offers his own daughter to Frederic without a second thought. And the words surrounding the occurrences of Matilda’s include some of the feminine virtues that prevent her from being able to refuse such as “tenderness,” “virtuous,” “goodness,” and “purity.” How can these gentle and innocent women hope to escape the wickedness of their patriarch? It is not surprising that Isabella and Matilda, whose names are tossed around so often within this text, find a greater frequency of occurrence during the moments in which their fates decided by their patriarchal fathers are pressing down upon them.

Conclusion:

Overall, I was pleased with the new perspectives that TAPoR was able to offer. Although I have studied this text before, it was interesting to map out the words, find the moments where they occur most frequently, and justify them with my own impressions of the text. The results offered by TAPoR provided me with confirmation of thoughts I had already gleaned from the text. However, the “estrangement and defamiliarization” of the text that Ramsay addresses does serve more purposes than mere confirmation (3). I definitely felt as though I was able to gain access to the bones of this text in ways that I had not been able to through my own close reading, because it really forced me to pay attention to what words Walpole chose to use and where he placed them. Like Isabella, who explores the secret tunnels and hidden passageways of the castle as she attempts to escape from the tyrannical Manfred, I felt like I was able to find hidden pathways of The Castle of Otranto that I was not aware even existed before this activity.

It can become difficult to relinquish your first impression of a text, even when you are close reading it. I used this novel in a paper that I wrote about Gothic tropes and the use of horror and terror in Gothic texts, so my view was confined to looking for evidence of these themes. By “defamiliarizing” me with the text and breaking it down into words, I was able to pay closer attention to the distress of Isabella and Matilda, as well as the intense patriarchal authority evinced by Manfred’s character. As Ramsay notes, these digital tools gave me a way to do what scholars always do with texts when they critique them—they provided me with “a text transformed and transduced into an alternative version, in which, as Wittgenstein put it, we ‘see an aspect’ that further enables discussion and debate” (16). By looking at the words of The Castle of Otranto, the building blocks of this great novel, I was able to examine the architecture of the Castle in a way that enabled me to see alternative aspects of the text—thereby sparking new conversations about the language of female oppression and patriarchal dominance that were not the focus of my initial close reading of the text.

Decontextualizing ‘The House of Mirth’

Word Clouds – Word It Out (L) and Wordle (R)Mirth WordItOutMirth Wordle – Click to Enlarge

 

 

 

It’s not very surprising to see Lily’s name in big bold print in both clouds (though I definitely prefer Wordle’s aesthetics to Word It Out), as she is the novel’s protagonist – same goes for (Lawrence) Selden, our dashing bachelor/love interest. Also, since Mirth is a Wharton novel of manners, the presence of titles such as “Mrs.” and “Miss” is to be expected. I was, however, intrigued to see the singular pronoun “one” battling for preeminence with “Miss” – it’s been a few years (*cough* 4 or 5 *cough*) since I’ve read the novel, so no immediate reasons for this occurrence come to mind. Speculatively, however, there are a few theories I could spin. The novel centers on the misfortunes of Lily Bart, an aging beauty (and spinster at twenty-nine!) who repeatedly strives for independence throughout the novel. She is indeed a solitary figure (one alone) who continually casts herself apart from the rest of the crowd (one apart) and is continually pursued by Selden (for whom she is the only one). Spoiler alert, she also dies alone.

I also found it interesting that there is a bit of an imperative tone in some of the more prominent words in the word cloud – mostly temporal words like “now,” “moment,” “must,” and “time.” Words that refer to perception and the internal (“seemed,” “know,” “felt,” “sense,” “thought”) also dominate the more outwardly social terms (“voice,” “talk,” “tell,” and even “social”), a nod to the focus of the novel (i.e. Lily’s character), set against the backdrop of high society.

Word Lists – Up-Goer Five and CLAWS

I had a bit of trouble figuring this out, so I thought I’d be a bit more detailed in explaining (since I’m one of the earlier posts). In order to obtain a list of words from my word clouds, I had to scroll down to the box under my Word It Out cloud (I couldn’t find any option in Wordle) and click the “Word List” tab. Then for the “Case to display:” option I selected “Most Common” so that it listed the 100 words selected for the Word Cloud first (see pic below). Then I could select and copy my needed words for Up-Goer Five and CLAWS.

Screen Shot 2013-02-10 at 5.00.15 PM

I wasn’t sure what to expect when I pasted my words into the Up-Goer Five Text Editor, but I probably should have been tipped off by my need to select the option “Most Common” on Word It Up.

Mirth UpGoerFive

The only words that were kicked back were names! So… does Word It Up’s algorithm function in the way that Ramsay cautions against when discussing attempts to determine an author’s style, saying that it is more likely to “demonstrat[e] the general properties of word distribution in a natural language” (11)? I suppose I can cling to some degree of differentiation of Mirth from other novels in terms of which most-common words made the cut and how large they appear in relation to each other… But still, this little realization damages my perception of word clouds’ representational abilities.

Taking my now not-so-unique word list to CLAWS, I encountered a few off-putting glitches, such as the software’s inability to list my results vertically, which is the easiest way to interpret them (it stopped halfway through word number 58) and it’s blatant mislabeling of a few parts of speech (“Miss” was misinterpreted as a verb). Skimming through the list of tags, I concluded that the majority of the words were nouns and verbs (though there was some crossover potential in words like “sense” or “last” which were counted as verbs). There was one interjection, however, which was a pretty interesting find – the word, “Oh.” Such an interjection can express a broad range of emotions, though in the case of Mirth, there is surely an element of wistfulness underlying many of its appearances in the text.

And, with a statement like that, what better way to dive into TAPoR’s affordances and test my theory? According to TAPoR, the word “oh” appears 102 times in Mirth (much lower than our number one hit, “Lily,” at 677 occurrences). I was also able to map it’s distribution in the text:

Screen Shot 2013-02-10 at 7.02.16 PM

Of course, I rushed straight over to segment #13 (which required me to enlarge the actual reading pane, which I had shoved over in my eagerness to see the usage graph!) to see how “Oh” was actually being used in its most prominent passage. Aaaaaand, well, I was wrong. Segment #13 is a trivial conversation between Lily and another woman, filled with dismissive “Oh, Lily,” and “Oh, I don’t mean…” statements. Trying one last time, I checked out the trio of segments occurring near the novel’s (tragic) conclusion. In two of the three times “Oh” was again used dismissively BUT I was rewarded in discovering that both utterances were steeped in tragic irony – the first occurs during Lily’s last conversation with Selden, where she says, “There is some one I must say goodbye to. Oh, not you—we are sure to see each other again,” (SO MUCH POIGNANCY!) and the second dismissive “Oh” is again spoken by Lily in response to an acquaintance’s declaration for her little girl: “Wouldn’t it be too lovely for anything if she could grow up to be just like you?” The scene continues (Lily’s last conversation before her death):

Lily clasped the child close for a moment and laid her back in her mother’s arms. “Oh, she must not do that—I should be afraid to come and see her too often!” she said with a smile; and then, resisting Mrs. Struther’s anxious offer of companionship, and reiterating the promise that of course she would come back soon, and make George’s acquaintance, and see the baby in her bath, she passed out of the kitchen and went alone down the tenement stairs.

Final Thoughts:

Throughout my interaction with the programs discussed above, I found myself unable to resist finding meaning within the objective results churned out by algorithms – even when I recognized the blatant ‘fails’ of the software and its proclivity toward certain sets of words. Although words like “might” and “never” are likely to be highlighted by Wordle in other texts, their appearance in the word cloud for Mirth seemed irresistibly poignant. I even found myself making connections between the emphasis of “eyes” over other physical features, such as “hands,” “smile,” and “face” – for the eyes are the windows to the soul (and Selden resists objectifying Lily, unlike her mother, other men, and even Lily herself at times). Like Ramsay intimates in his examples of ELIZA and Mueller’s lists, I felt compelled to make sense of the results given, to “teeter between confirming [my] own theories and forming new ones” (71). According to Ramsay,

Algorithmic criticism seeks a new kind of audience for text analysis – one that is less concerned with fitness of method and the determination of interpretative boundaries, and one more concerned with evaluating the robustness of the discussion that a particular procedure annunciates. (17)

Is algorithmic criticism a ‘fit’ means of engaging meaningfully with a text? Well, considering the ‘robustness of the discussion’ I just had with myself in using such programs, I would have to say yes.

Dracula: Simplicity and Survival

I’ve always loved Dracula, not because it is revolutionary in and of itself, but because future readers and their interpretations have made it so.  Bram Stoker, I am thoroughly prepared to believe, was a particularly Victorian gentleman.  That being said, I have never “dug” into Dracula, so I look forward to seeing what arises when one does a bit of literary archaeology with the text.

While we were not asked to provide out Ngram data–and in the light of the TED talk–I felt it was a good place to start. Sticking with my theme, here are my Google Ngram Viewer results:

Clearly we can see who is winning in this battle of the vampires.

Clearly we can see who is winning in this battle of the vampires.

Now, I know this may not seem particularly fair–after all Edward Cullen has hardly appeared on the vampire map, as of yet.  It did, however, warm my heart to see that nothing has diminished Dracula’s ever growing popularity as a literary figure.  A little bit of a dip down in the past few years–I blame the dreadful Keanu Reeves film for that stumble–but all in all a steady climb.  In fact, I was surprised to find how long it really took for Dracula to get off the ground–and interested to know what sort of research into the real man Dracula (as opposed to the fictional vampire who stole his name) caused his little hop from obscurity in the 1820s.  (Edward Cullen, I’d like to point out) appears just as must before the Twilight novels were released as after, leading me to conclude that the name has appeared in other novels prior to his rise as a vampire, as well.)

Moving on from there, my Wordle word cloud:

Dracula's language really doesn't look terribly haunting like this.  The words are all painfully simple.

Dracula’s language really doesn’t look terribly haunting like this. The words are all painfully simple.

I find it interesting that even with Wordle supposedly removing commonly used English words from its cloud, the result is exceptionally boring.  No evidence of complex language in the least and nothing particularly atmospheric either.  I would have at least expected vampire to make an appearance in the cloud–or even Dracula–but the result it more than a little disappointing. And once again, even in Word it Out, this is not Dracula’s shining moment:

Let's just say I would never provide someone a word cloud in order to entice them to give Dracula a try.

Let’s just say I would never provide someone a word cloud in order to entice them to give Dracula a try.

The result looks closer to the vocabulary on an elementary school spelling test than the palette of a novel.  One might even suggest, based on the two clouds, that the novel be called Van Helsing as he makes a far more clear impression on the clouds than either “Dracula” or “vampire” manages.

As one might well expect from these word clouds, Up-goer Five Text Editor has very few stumbles at all–even after one permits Wordle to remove the most common English words from the cloud.  The real stumbling blocks for Up-goer Five are names (such as Lucy, Mina, Arthur, Jonathan, Van Helsing, and Harker), titles (such as Dr., Madam, Count, or Professor), and a few stray words (some obviously antiquated such as whilst and till and others, which came as more of a surprise such as terrible, poor, and thin).  The language of Dracula appears on the whole to be quite simple and common, indeed–certainly nothing Dickensian here.

Even CLAWS Part-of-Speech tagger suggested that the language of Dracula was far from complex and showed a most un-Victorian and un-Gothic abhorrence for description and complexity.  All of Dracula appears to be made up of nouns doing things either in the past, present, or future with little attempt at describing where, when, or how the action is taking place.  Further, there was only one conjunction (“whilst”) tagged among the output of the word clouds.  Again, all this argues for a lack of complexity in Bram Stocker’s language choices.  Even if one could argue that Stoker may simply have employed a wider and more varied range  of words–thus discounted from the word cloud–the fact that “and” doesn’t even appear in the Word It Out cloud (which did not remove the most common English words from the results) would appear as evidence of the relative simplicity of language within the text.

TAPoR was causing me difficulties and so I then moved on to Voyant, with which I was at least passingly familiar.  The results were, once again, surprising to say the least.  The cloud it provided was almost entirely made up of the most basic of language (it, he, she, they, then, etc) of which only one word was over four letters in length: “which.”  Turning to the word trends I plugged in “Dracula,” “vampire,” and “Van Helsing.”  Judging by the results, the books title Dracula is a misnomer.  Even its former title, prepublication The Dead Un-Dead or The Un-Dead would have been a gross mistake.  To call it a book about a vampire might even appear to be presumptuous.  According to Voyant, Dracula really ought to be called Van Helsing, who–once on the scene–has a soaring relative frequency.

For much of the novel, neither vampires nor Dracula are mentioned.  Van Helsing, however, seems to make a rapid climb to popularity and stay at the heart of the novel from that point forth.

For much of the novel, neither vampires nor Dracula are mentioned. Van Helsing, however, seems to make a rapid climb to popularity and stay at the heart of the novel from that point forth.

Thinking that perhaps Stoker had preferred the term “un-dead” or “dead” over “vampire,” I added both those terms to my graph with little change.  While slightly more popular throughout the text was the term “dead” over “vampire,” even that hardly ever rose higher than the 0.3 mark.

In and of themselves, these tools may not prove much–or at least “the effect is not the immediate apprehension of knowledge”–however, the conclusion that I would draw from the data is as follows:  Dracula, is not a complex novel. Its direct and uncomplicated language reflects the values of its solid, stalwart, and sensible middle-class men of the “modern era” with their modern inventions (such as the typewriter and stenograph) and science (such as blood transfusions).  Further, while Stocker may have forced Dracula (and his fellow vampires) to recede in the face of the Professor Van Helsing, hero and true main character of the novel, Dracula refused to die.  I tested out the following:

Van Helsing may have succeeding in ridding the fictional world of his foe, Dracula, but in the real world, Dracula thrives.

Van Helsing may have succeeding in ridding the fictional world of his foe, Dracula, but in the real world, Dracula thrives.

It is clear that Stoker created a character that need not have appeared solidly throughout the novel to have a lasting impression on the reader.  Dracula’s ever growing popularity is proof of this.  So, perhaps, it is right after all that the novel be named for a character that does not even appear for much of it; for, in reading it, it is not Van Helsing who captures one’s imagination, but the vampire, Dracula.  He lives on, healthy and well-loved, in the modern world while Van Helsing struggles in his shadow.

Download and Read: Augustine’s Confessions Online

For this exercise I wanted to choose something with a long and complex history that would be relevant to my interests, but which also had enough cultural significance to be of interest to a wider audience.  I settled on Augustine’s Confessions, his autobiographical masterpiece written at the end of the 4th century, in which he recounts his early life and conversion to Christianity.  As with any work written before the age of print, the Confessions came to life and first circulated in manuscript form (examples of which can also be found online, for example this digitized microfilm of Troyes, Bibl. mun., 473, and this digital facisimile of a Villanova MS). The work made its way into print at an early date, and was translated from the original Latin into English at least as early as the first half of the 17th century.  Perhaps the most accessible edition of the Latin text is that in the Patrologia Latina (32.659 ff.). The PL — the publication of which in the mid-19th century has to be one of the most successful acts of serial plagiarism ever perpetrated — retains its relevance today as a kind of least common denominator of editions. But as you might expect, over the years there have been numerous editions of the text, not to mention translations into various languages. Not knowing what other sorts of exercises might be in store in the coming weeks for the texts we choose to investigate, I decided to focus my efforts on the English versions of the Confessions.  And rather than attempting to compile a comprehensive survey of all the various versions that might be out there on the web, for the purpose of this exercise I decided not to labor too much over locating every available version and instead just to approach each of the four search interfaces with some common terms (viz., author: Augustine; title: Confessions; and where possible limiting the results to English language hits available in full text), and see what each one returned.

Project Gutenberg

Project Gutenberg returns just four hits in response to a search on “Augustine Confessions”, including one hit each of the English and the Latin text, as well as two anthologies that contain excerpts from the text.  Gutenberg’s English text (available here) was first released in 2002, and is a version of the translation of Edward Pusey from the Library of the Fathers series, a series of translations of patristic texts published in the 19th century by members of the Oxford Movement of High-Church Anglicanism. This is an influential translation, and it will make repeated appearances below.  The text is available in six formats: HTML, ePub, Kindle, Plucker, QiOO, and plain text (UTF-8).  The HTML version is XHTML, and seems to have been carefully proofed. This version also contains some useful additional encoding such as paragraph numbering.  The text can be read online, or downloaded in any of the six available formats. It is in the public domain, and is here released under a Project Gutenberg license, which allows the end-user to use the text for just about any non-commercial purpose.  There isn’t any obvious way to mark up or otherwise correct the text and re-submit it back to the project.

Google Books

Searching Google Books using the terms described above returns 25 hits when limited to those available in full, ranging in date from 1770 to 1912.  Closer examination reveals that many of these 25 are in fact duplicates, and others are irrelevant volumes of a multi-volume series (The Nicene and Post-Nicene Fathers [NPNF]), only one volume of which contains the text under investigation.  Among the ‘good’ hits are a copy from the Loeb Classical Library; a reprint of Pusey’s translation in a series called the Harvard Classical Texts; and a translation by Charles Pilkington in the aforementioned NPNF series.  The volume edited by Temple Scott (1900), which was scanned from a Harvard copy, is in fact a re-issue of Pusey’s translation, while the translation by W. H. Hutchings, scanned from a copy in the Bodleian, purports to be a new translation, albeit of only ten books rather than the full text’s thirteen (the final three books of the Confessions, more philosophical than autobiographical, are sometimes left out). There are a number of other versions available on Google besides these.  The volumes on Google are available in a variety of formats both directly on Google books and through the Google eBook feature, including several formats designed for e-readers and for online reading.  While the quality of those that rely on page images is generally good, the OCR versions remain quite error-laden.  For example, this passage chosen more or less at random: “$e tnbefff$s against tTie SOone of enucatinp; f&e BUT woe to thee, thou torrent of human custom!” (p. 23).  If one examines the page-images it is immediately apparent why the text is so corrupt in the first half of the passage — it is an epigram printed in a gothic font. But because of anomalies like this, the poor quality of the OCR would make it difficult and dangerous to use Google’s text for any serious purpose (over and above the fact that there is no obvious easy way to download the entire book in plain text format). There are some nice features of the Google reader such as the ability to create notes and mark up the text with highlighting in different colors. Google’s terms of service would seem to allow download and reuse of their content in a variety of forms.

Internet Archive

Perhaps the most interesting and unique offering at the Internet Archive is the very first one among the initial hits: a complete audio book from Librivox.  The experience of listening to the Confessions read aloud probably more closely approximates how the text was experienced through much of its early history, when even private reading was often done aloud, than many of the printed versions.  Many of the versions available on IA are copies of books digitized by Google. Pusey’s and Pilkington’s translations are here, but also a version of the text translated into Hebrew that I found on no other site. The IA’s versions are available for free download in a variety of formats, including formats for various e-readers, as a PDF, and as a single plain text file.  Unfortunately, the plain text version is full of OCR errors (not least the common failure to segregate headers, footnotes, and main text), and would require significant clean up to be useful for any serious purpose.  Many of the IA books are listed as not in copyright or with no known copyright restrictions, and can be downloaded freely in various formats.  In addition, descriptive information about the scanned books can be contributed by users through openlibrary.org, and problems can be reported to IA through a link on their site.  IA’s online reader is perhaps the best interface of any of the available online readers.

Hathi Trust

Finally, a search of the Hathi Trust using the same terms described above returns 19 hits, including many of the same translations available via Google (in fact, the watermarks reveal that many of these are in fact Google’s scans). As one might expect, the metadata for Hathi Trust books are generally fuller and more precise than Google’s. Another useful feature is the ability to download citation information. Plain text is available, but only on a page-by-page basis, and even the PDF download of full book in the public domain requires authentication.  According to the access and use policy, the Google-digitized books are requested not to be used for commercial purposes or re-hosted, but otherwise are free for use for non-commercial, educational purposes.

 

In conclusion, I would note that the plain-text version of Pusey’s translation available through Project Gutenberg is probably the most useful of all the free online versions of the text, simply because of its flexibility.  None of the foregoing discussion takes into account the accuracy of either the translations or of the editions upon which they were based.

The Scarlet Ebook

I selected Hawthorne’s The Scarlet Letter for three not so exciting reasons. 1. I have the book on hand. 2. Nearly all of the books I am interested in or enjoy come after the public domain works. 3. I happen to enjoy this one.

With that out of the way, The Scarlet Letter is available on all four resources: Project Gutenberg, the Internet Archive, HATHITrust, and Google Books. Let us go down the list and see what we have here.

Project Gutenberg is available in a variety of formats: HTML, EPUB (no images), Kindle, Plucker, QiOO Mobile, and Plain Text UTF8. It isn’t clear what edition of the text the HTML version is based on, only that this version of the ebook was first released in 1992, produced by Dartmouth College, but has been updated in 2005. The HTML version contains all of the materials you might find in a print version of the book, such as biographical information, a list of works, and an editor’s note, but as this is HTML, there was no effort here either for the text itself to resemble a printed book, or to take advantage of some of the possibilities of the ebook format.

A few of the other formats seem unfamiliar to me, and others require programs or e-readers to view. Alas, being a non-Kindle user, I moved on to the online reader, which divides the novel into pages, serving as an alternative to scrolling through the text. But the online reader does little else to mediate or alter the text.

The Internet Archive provides what appears to be three versions of the manuscript, but on closer inspection they are all identical copies of the HTML format of The Scarlet Letter taken directly from Project Gutenberg. The site provides a space for reviews (presumably for opinions on the quality of the e-copy or perhaps even the novel itself). It is also interesting to know that the novel has been downloaded 1,848 times.

Typing The Scarlet Letter into the search bar of HATHITrust yielded 931,602 results. Woah. Could I narrow this down? I clicked the option for “full text only,” and with my results narrowed, I happily clicked the search button only to be bombarded by 480,863 results. Hm. What if I clicked “Nathanial Hawthorne” as the author. That brought me down to 720 results. Perhaps my search was still off, but I decided that this was the best I was going to get.

I apologize for not having mustered the time or the patience to search through 720 results, although I suspected that the correct items would be found on the first page. First, a word on the functions of the site: HATHITrust provides a few limited options of viewing the text, but these only amount to zooming and flipping pages (or scrolling). The search function is quite nice and works well, although any Word or PDF file has this capability.

Going right down the list, the first selection brought me to a scanned copy of the 1889 Boston Houghton, Mifflin and Company version of the text, featuring black splotches and lines, and even a Due Date card in the back. In all other respects, however, this appeared to be a fairly well-done copy, and I would rather download a PDF of something that resembles a book rather than an HTML version that appears like a poorly designed web page.

How did the other copies fair? Well, it turns out many of them were duplicates, but one version caught my eye: The Scarlet Letter “with illustrations of the author, his environment and the setting of the book; together with a foreword and descriptive captions by Basil Davenport,” published in 1948. And the illustration? Well, it scanned quite well, I suppose. Hawthrorne does sport his mustache with pride.

 

Finding most of the copies of HATHITrust in respectable shape, I moved on to the last resource: Google Books. Having already sorted through Project Gutenberg’s wide variety of formats, The Internet Archive’s borrowing the most simplistic format (HTML) from Project Gutenberg, and HATHITrust’s large quantity of nearly identical copies (available for download as PDFs), I was ready for whatever Google Books had in store.

 

Typing “The Scarlet Letter by Nathanial Hawthorne” of course yielded many, many results, but I could see right away that only one was an actual copy of the text. Here I found a scanned copy of the text from the 1898 Doubleday and McClure Co. edition. And yes, this one also features a stunning illustration of Nathanial Hawthorne and his mustache. Google Books gives you the option to download the book in Plain Text, PDF, and EPUB formats. The quality of the copy itself is quite good, from what I can tell. But more importantly, Google placed some effort in supporting some unique features. In addition to the search function, clicking a chapter title in the table of contents will bring you to the correct page. This is a long ways from a hypertext version of the novel, but Google certainly took a step in the right direction.

 

Ultimately, I was not overly impressed with any version of the text, although I did not experience any of the extreme formatting issues Duguid encountered while researching Tristan Shandy. Moreover, as all copies are free to use for whatever purposes you may desire, I suppose I shouldn’t be one to complain. Google Books provided the most impressive copy of the text, even though I would still prefer my own hard copy of the novel next to a scanned e-copy with a search function. I consider my $4 well spent. I can imagine a more robust hypertext version of The Scarlet Letter, but perhaps that is a blog post for another day.

The Idea of an E-Book

I’m sorry, I just can’t come up with the great posting titles the rest of you do.

The first book I looked for was Lux Mundi (1890), a collection of Anglo-Catholic theological essays edited by Charles Gore. My reason for doing so was practical, since Travis Brown and I are using scanned images from this book, fed through OCR tools like Tesseract and OCRopus, for the ActiveOCR project at MITH. I won’t say the book was chosen at random,  but close to it. Travis wanted something from the late 19th century, and suggested that I search for everything in the Hathi Trust collection published in 1890.

The fact that the only other collection it appears in, however, is Google Books rules it out for the purpose of this assignment.

Deciding to stick with the theme of 19th century divines, I looked for John Henry Newman’s The Idea of the University, and found it on Project Gutenberg, the Internet Archive, Hathi Trust and Google Books.

As several other have noted, Project Gutenberg provides the most formats and the least provenance information. The book is available in HTML, EPUB, Kindle, PDF, Plucker, QiOO Mobile, Plain Text UTF-8 and TEI. All of these in addition, of course, to the Online Reader. Some of these formats seem a bit obscure to me — I had to look up Plucker (apparently an e-book reader for PalmOS devices), and QiOO (I’m guessing a reader for Android phones, since it’s Java-based, although they didn’t use the name Android). I fired up the oXygen editor to take a look at the TEI file , and it appears to be TEI (P5?) Lite with a Project Gutenberg-specific modified DTD. Although there are credits for the people responsible for preparing the files for Project Gutenberg, there is no information about which printed text(s) provide the basis for the electronic text.

I got 26 results when I searched the Internet Archive for The Idea of a University by Newman. One of these results was for the Project Gutenberg record, which offers the book in several formats not immediately visible on Project Gutenberg’s own page, including DAISY Digital Talking Book and DjVu (pronounced déjà vu, this is a format for scanned documents that its promoters, although I suspect few others, consider a competitor to image PDFs). There were also at least three (one may have been a duplicate)  results from Google Books (digitized from the University of California, Harvard, and New York Public Libraries).

I chose to look at one (26 was way too many) in detail that was contributed by “Kelly – University of Toronto”. While my first reaction was that “Kelly” might be an individual, a Google search indicated that it is a reference to the John M. Kelly Library at the University of St. Michael’s College, a Catholic university that has an institutional relationship with the public University of Toronto. This version was available in Full Text, PDF, EPUB, Kindle, Daisy and DjVu formats. The documents is in the Public Domain. There is no apparent way for users to report or correct errors. This is probably as good a place as any to note that I find the default online reader, which navigates through the text by “turning” pages, incredibly annoying. This is an misguided as the attempts of late 15th century printers to recreate the look of manuscripts in printed texts.

(This is as far as I’m going to be able to get before class, but I will update the post later with the information on the Hathi Trust and Google Books sites.)

 

Pride for Google Books, Prejudice for HATHITrust

Link

As a Kindle user, and more importantly, as someone who plans to work in digital publishing, I found this exercise very informative.  I initially attempted to find my favorite book, A Prayer for Owen Meany by John Irving, but it was only available on Google Books.  So, onto a favorite I knew would be a more viable option: Jane Austen’s Pride and Prejudice.

I am fairly familiar with free domain books, as I have downloaded many from Amazon.com for classes.  In fact, I have Pride and Prejudice via free download on my Kindle.  I was not, however, familiar with the answers to any of the questions Professor Kirschenbaum asked us to investigate.

Pride and Prejudice was available on all four platforms: Project Gutenberg, the Internet Archive, the HATHITrust, and Google Books.  With so many options to choose from, I dove into Google Books to see what I could find about the provenance of the book.  Where to begin?  There are seven versions available on page 1 of the initial search alone!  A sampling includes editions from Harvard dating from 1962 but copyrighted in 1918, Lenox Library with an 1853 copyright, and even a version from an imprint located in our neighbor Rockville, MD from 2008.  Some copies can only be read on the Google Books website, but others have a PDF and EPUB versions available.  From experience, I know PDFs can easily be transferred to an e-reader.  Thus, with the PDF, the reader now has four options on how to read—on the computer, printed out, on a smart phone, or on an e-reader.

The graphics and formatting were retained in all of the versions I researched.  Additionally, all of the versions I opened had a search feature.  Only a few had the option for reading the text in a more user-friendly way.  Some had options of reading one page at a time, side-by-side as you would a hard copy, and via thumbnails.  You can even save the book to your own online library.  As far as highlighting, Google Books had at least one version where you could create clippings and share them via social media.  For additional social media options, you can write your own review.  I did not, however, find a place where you can write about errors, nor did it seem there were any restrictions to usage, despite a Terms of Service.  Overall, Google Books was very user-friendly and provided a variety of ways to personalize your reading experience.

Next it was onto Project Gutenberg.  I was overwhelmed from the get-go when my search returned 29,141 downloads.  Further investigation led me to realize this was how many times the book had been downloaded for free.  From here I was given a variety of ways I could view and download, from HTML to QiOO Mobile, something I’ve never even heard of before.  I clicked on the very first HTML link.  There, I was greeted wit an interesting message:

This eBook is for the use of anyone anywhere at no cost and with
almost no restrictions whatsoever.  You may copy it, give it away or
re-use it under the terms of the Project Gutenberg License included
with this eBook or online at www.gutenberg.org

This intrigued me because, in my former life as a TV producer, there were restrictions on everything from music videos, to still images, to movie clips.  Everything came with a price and specifications as to how it could be used.  But what really got me was the release date of August 26, 2008 and a note that the version had last been modified November 5, 2012.  Surely Pride and Prejudice has not changed in the 200 years since it was first published.  But then, at the very bottom of the site, the full terms of license are listed.  Again, there were two interesting passages:

Updated editions will replace the previous one--the old editions will be renamed.

How are these editions being updated?  Why do they need to be updated?  What is being modified?  My questions were endless.  Then:

You may use this eBook for nearly any purpose such as creation of derivative works, 
reports, performances and research.  They may be modified and printed and given 
away--you may do practically ANYTHING with public domain eBooks.

This clause shocked me!  If you can do anything with public domain books, can we trust that we are getting the book as it was intended?  Are we getting the whole book, or some annotated version?  Because it can be modified in any given way, it seems as if we are given license to recreate the book to our liking.  Forget Elizabeth ending up with Darcy, let’s just change it around to have her wind up with the abhorrent Wickham.

The one option I found especially interesting on Project Gutenberg was the availability of a QR code, so that the user can scan it with their smart phone and automatically download a version to their mobile device.  PG also offers a link to “mirror sites,” which are mostly international universities offering the same version of Pride and Prejudice for download from their university library.  I found this to be disappointing because I was hoping it would offer me a version of the book translated into other languages, but it did not.  While it first appeared that PG was going to offer many versions of the book, all formats led to the exact same version, which is much different than the variety offered on Google Books, but also gave me a sense of faith that perhaps Pride and Prejudice wasn’t being mangled by users doing anything they want with the text.

Using the Internet Archive initially appeared to just cull the books that had been digitized elsewhere.  In fact, various versions specified they came from Google Books—the same Harvard version mentioned earlier—and Project Gutenberg.  Despite the versions being the same, the Internet Archive had a much more user-friendly format.  If you desired to read the book online, it immediately led you to a side-by-side page layout, that, when flipping pages, animated the page turning. It also allowed the graphics to be seen more clearly.  One of the most unique features was that the version was available as an audio book.  However, the audio was very computerized and it attempted to read aloud quotation marks and other punctuation.  While none of these features change the text, somehow it made it a little more enjoyable to know of the bells and whistles available.

The Internet Archive also offered your basic search functions, download options, and a place to write user reviews.  Strangely, the terms of use has not been modified since 2001.  Surprising given how much has changed in the digital humanities in the past 12 years.  It did, however, give an email address to contact someone about copyright information.  One of the versions even had a link to an editable page, where you can edit the book.  Thus far, only eight users had done so since 2008.  I guess people aren’t as inclined to mess with classics, until you have the bright idea to write Pride and Prejudice and Zombies, as Seth Grahame-Smith did.

Finally, it was onto HATHITrust.  As soon as I clicked on the page I knew it wasn’t nearly as user-friendly and complete as any other online library.  The initial results only returned options that would search for the book in hard copy at nearby libraries.  It was the eighth result that was actually a full-view online version.  I clicked, only to find it was the trusty ol’ Google Books version yet again.  It too had the side-by-side page flip option, but the words were so small you couldn’t read them and the zoom feature did not work.  The only was to read it online was in the traditional view.  However, it was available for PDF download just like the others.

HATHITrust also had what I’ve now come to realize are the basic features of an online library: document search, a personal online library, and a way to share links from the book on a social networking site.  It did have a more prominent feedback link for users to share how they found the quality of the text.  One reportable problem is missing parts—perhaps they got an editable version.

Overall, Google Books and the Internet Archive had the best sites, in my opinion.  Either way, I think it’s great there are so many classic books available to readers so easily.  No matter which site was chosen, the reader was going to get a legitimate copy of Pride and Prejudice, one of the most beloved books of all time.  As for me, I’ll stick to my Kindle for reading digital books for now.  However, a hard copy version will always be my first love.