The Prejudice of Stripped Texts

To start this week’s exercise, I decided to have a little fun. Kind of like stretching before a big work out. Using Google’s Ngram Viewer, I compared the heroine of my chosen text, Pride and Prejudice’s Elizabeth Bennet, to her modern-day counterpart, Bridget Jones, with whose diary we are intimately acquainted. Because Helen Fielding has openly admitted to basing her characters on Jane Austen’s—especially Mark Darcy on Mr. Darcy—I thought it would be interesting to see how else they compare. I was surprised to see how Miss Bennet’s popularity waned for so many years and then, at the turn of the century, increased and hasn’t stopped since.  Additionally, I was surprised to see that Bridget Jones’ popularity peaked higher than Elizabeth’s ever did.

Ngram Viewer

Then onto the hard part of the work out—creating a definition for digital humanities. And not just any definition, one with strict boundaries. My humble result below.

DH Definition

 

Wordle vs. WordItOut

While I generally consider myself a hands-on learner and quick on the uptake when it comes to basic computer programs and technologies, I found this week’s exercise to be more than a little frustrating. Wordle would not allow me to insert the Project Gutenberg (or any other) link to get my word output, which resulted in me copying and pasting the book in its entirety into the “Paste in a bunch of text” box. Oh, I pasted in a bunch of text alright! Finally, I got this beauty:

Wordle

Then it was time for WordItOut, which was a much quicker task after figuring out Wordle’s quirks.

WordItOut

I actually took the time to try to make the two look as similar as possible in coloring for easier comparison. I think Wordle has WordItOut beat in basic aesthetics, but otherwise the results were nearly identical. I was very surprised to see “Mr.” was the word most used throughout Pride and Prejudice. Despite being the nineteenth century’s chick-lit by a female author, it is clear that it was still a man’s world at the time of writing and publication. However, the word “Elizabeth” does run a close second, which is a bit refreshing.

 

Up-Goer Five Text Editor

Next up, the commonality of words. It appears things haven’t changed much in 200 years since Miss Austen put pen to paper. In fact, other than proper names, only four words she used were not in the top 1000 words of Up-Goer Five: indeed, pleasure, till, and manner. However, this made me curious what the results would be if basic words like came, made, most, and go were not allowed to be analyzed. I was surprised at pleasure being so widely used. It’s not a word I hear used often, and it seems the connotation has changed over the years.

Up-Goer Five

 

CLAWS

CLAWS was my least favorite of all the sites. To me, it did not lay out the results in a clear, easy-to-read manner. It was also counterintuitive that the key wasn’t listed on the same page as the results, so that you had to toggle back and forth between pages. Additionally, this seems more like it would be useful for grade school children learning grammar than it would be for any other purpose.

CLAWS

 

TAPoR

When it came to TAPoR, I wasn’t nearly as interested in the HyperPo abilities as I was with the program’s ability to run lists of words and compile how many times each word occurs in the text. The word “Elizabeth,” which appeared to be a close second to “Mr.” in the Wordle, is actually used 200 times less than “Mr.” Futhermore, I was particularly interested in the listing ability for two reasons. First, Stephen Ramsay writes extensively on the tf-idf formula and how its findings affect critics when looking for patterns in a text, which I found intriguing. Second, in Italo Calvino’s If on a winter’s night a traveler, a character tries to categorize and determine the genre of books based solely on the words that recur and appear the most in a given work. It’s an interesting thought, trying to decide what a book is about without having read it for its sentences, but for the words it features.

TAPoR

 

While all of these sites were fun to play with and produced interesting results, I think they ultimately take away from the true meaning of what a book is hoping to convey. Making a book a thing of quantitative results removes the reader’s ability to interpret the text for himself and to engage in the nuances the author has created with grammar, punctuation, and voice. The only work that comes to mind that would benefit from these results would be Gertrude Stein’s “Portraits and Repetition,” where her goal is to use the same words as many times and in as many ways as possible. As Ramsay himself writes:

“It is one thing to notice patterns of vocabulary, variation in line length, or images of darkness and light; it is another thing to employ a machine that can unerringly discover every instance of such features across a massive corpus of literary texts and then present those features in a visual format entirely foreign to the original organization in which these features appear” (Ramsay 16).

I couldn’t agree more. Just as Project Gutenberg states that anything may be done with a public domain text, which may result in the text being changed in ways that dissolve its power and purpose, stripping it to just its words changes it too.

From Hell’s Heart I Graph At Thee!

The idea of quantifying Moby-Dick is simultaneously exciting and perhaps not altogether surprising given the results of some of the returns from the tools we were instructed to use. The novel is packed with Shakespearean language, is about a very specialized topic (whaling), and formally very odd in places. But that, of course, just means Moby-Dick is an ideal text for these sorts of experiment, right? Let’s see…

First, I ran Moby-Dick Wordle, resulting in this diagram:

Secondly, WordItOut:

The most obvious difference between the two is the choice for the largest word. ‘Whale’ and ‘one’, are unsurprisingly the largest words represented on the image. WordItOut, however, displays ‘all’ as its largest word, with ‘whale’ and ‘one’ the runners-up. The word ‘all’ is not represented on Wordle’s image, meaning it is cast aside in that program as an all-too-common word to be of any use. Now, I do see the logic in this decision in some form; ‘all’ is a common word, and sometimes can be used as a needless intensifier or a purely quantitative word. In this case, however, I contest Wordle’s decision; in Ahab’s final monologue he explicitly describes Moby Dick as “all-destroying” as he speeds, harpoon in hand, towards the beast that is destroying his ship. The ‘all’ in this case is not just a simple word, it’s an intensifier certainly, but it represents Ahab’s life (the whaling trade), and Ahab himself (his soul has been scarred and his body maimed). It is possible to read this word with more than the mere commonality ascribed to it by Wordle’s software.

Secondly, the major characters of the novel are mentioned: Queequeg, Stubb, Starbuck, and Ahab, but there are some missing. Ishmael is gone despite being the narrator, but aside from the opening sentence, his name is barely mentioned if at all (mostly just annotations ever recall to his name). More interesting, though, is the absence of one of Ahab’s right-hand men: Flask. Naturally, this means he is mentioned less, or at least referred to by name fewer times than the other first mates of the Pequod, but perhaps this opens up a line of inquiry to pursue: why are Starbuck and Stubb getting so much attention as to appear quantitatively more visible?

Next, I placed the contents of the word cloud into the Up-Goer Five, receiving the expected list of forbidden words:

Stubb, stub, brush, check, end, point, boats, captain, sperm, sea, ship, thou, nor, boat, Ahab, ye, whales, deck, Queequeg, Starbuck, chapter, whale, among

This list can be divided easily into three categories: Names (Stubb, Ahab, Queequeg, and Starbuck), archaisms (thou, ye, nor), and nautical terms (stub, brush, check, point, boats, sperm, sea, ship, boat, whales, deck, whale). None of these are surprising to see on the list considering the names are odd, the archaisms by definition not going to be common, and our modern society is less reliant on ship-trade as to render the nautical terms more scarce, and I would guess they wouldn’t appear in the top 1000 words in 1851 either.

The interesting remainders are end and among, which, I’ll admit, I am surprised are not within the ten hundred most used words.

Next comes the CLAWS speech tagger. This tool, as Mary and Dan reported, is not only less visually appealing, but less clear to someone not familiar with its format to read. But the tool was surprisingly good at recognizing the propers nouns (Queequeg, Stubb, Starbuck, and Ahab) as such, and not returning some sort of error or even just suggesting them as nouns. Since proper nouns are typically dependent upon context to recognize, CLAWS’ ability to recognize them is impressive. Aside from the names, there are mostly nouns and adjectives represented by list, with a few prepositions (upon, among) and an interjection (oh), but fewer verbs than I expected, with only five by my count: said, cried, go, thought, and know.

Finally, with the TAPoR/Voyant tool, I found myself lucky that the first chapter of Moby-Dick was a default on the website. Unfortunately, the diagnostic returned was not all that interesting, so I went ahead and uploaded the entire text.

The cloud, or ‘cirrus’, for Voyant is prone to including “useless” words, as you can see, like articles, but fortunately, while it does not take the liberty that both Wordle and WordItOut do with automatically removing certain words (and thereby removing some potentially important words, as in the case of ‘all’) it allows you to customize your list and essentially blacklist the words you do not want. Wordle as well provided this feature, but removed words by default. Voyant forces the uploader to think and choose the words represented.

As you can see in the screenshot, the first word I selected that seemed, to me, to be worth scanning was ‘whale’, with a total of 971 uses beginning on the very first page. What is fascinating about Voyant are the multiple ways it will contextualize and build information around a single word. There are two windows dedicated to showing a frequency chart and the context around each mention as well as tabs for the parts of the entire corpus of where your chosen word (or words) appears. This helps to alleviate any suspicion, especially when dealing with an ambiguous word (unlike ‘whale’) that may have multiple uses and contexts.

Looking at the use ‘whale’ throughout the entire book, I would be tempted to explore the periodic lull in its mentions visible in the line graph. When the graph is given 10 and 15 segments, this oscillations are more drastic and shows much more sporadic mentions of the term, though the most interestingly, what can be seen is a steady decline in the use of ‘whale’ until what starts the final chapters of the book, or, the chase sequence, in which case it begins a steep incline. There is seemingly a dramatic tension in the graph recognizable through its usage of the term.

So, when I think about Ramsay’s idea of “estrangement” from textuality, I have to wonder about what it is within the text, or about the text that is primary subject of estrangement. Is it the narrative? For ever instant my initial responses have been grounded within the narrative: why is Flask mentioned less? Why is the word ‘all’ important to the word cloud to be a significant loss? What time frame is represented by the steep incline at the end of the line graph? All of these questions are brought about because of my familiarity with the reading: a product of the close-reading focused education that enforced that I read Moby-Dick because it, singularly, is important and above thousands of anonymous books. But when it comes to the answers of my questions, are they all necessarily going to return to the narrative? Personally, it seems the temporary estrangement is merely a way of refocusing the narrative again and re-reading it, arriving at Ramsay’s purported goal: creating new information and criticism from what the algorithms can show us.

On reading, translating and making questions

I selected a group of short stories by Edgar Allan Poe for this exercise. It was difficult to work with Machado de Assis this time, because I did not find the translation into English. Also, I was curious about seeing the particular voice of Poe’s stories, its peculiar vocabulary. But I also thought that it would be interesting to see some translation phenomena at the same time. I selected the anthology that Charles Baudelaire translated by the title Histoires extraordinaires. As I could find this edition in Project Gutenberg as well as the complete stories by Edgar Alan Poe, I decided to create a document in English with the same short stories.

It is well known that Baudelaire was the first translator of Poe into French and that this translation was very important for European literature. I wanted to see what happened if I compared the two anthologies through Wordle and WordItOut, and then HyperPo. So I began my exercise with some extra questions: Could we get interesting or relevant information about the words that appeared in the original and the translation? Are those programs helpful tools for Translation Studies?

I began with the Google Ngram Viewer, to compare Poe and Baudelaire in their respective languages, with pretty obvious results (I must admit I spent some time playing battles between couples like Derrida/Deleuze; Godard/Truffaut, etc. with amazing results):

 

ENGLISH FRENCH

But I wanted to see what happened in Spanish, and the results were more interesting. They are published or are subject of analysis almost at the same time! Why did this happen? Is the reception of Poe similar to Baudelaire’s in the Spanish speaking world? Are their figures similar?

SPANISH

When I created a word cloud through WordItOut I realized that there was a list of common words that the cloud ignored, and that I could change that list as well as replace characters. Also, I could change a lot of settings as number of words, order, color, etc. But when I tried to create a word cloud with the French version, I did not have the option of a foreign language, so I did it myself, adding the most common French words to be ignored by the cloud. The result was this:

WORDITOUTENGLISH

WordItOut- English

WORDITOUTFRENCH

WordItOut – French

 

I was surprised that most of the words were very common words, so I wonder if analyzing these results could be interesting. The importance of the word “now” maybe is telling us something about Poe’s short stories style regarding the treatment of time. We can make multiple interpretations from this result: the question of “time” in Poe’s literature, or moreover, the question of “time” in Baudelaire’s literature. Why Baudelaire chose these and not other stories to his first anthology of Poe’s work? Is there something behind the words?

When I used Wordle, I realized that the list of ignored words is not so big. Some common words  entered in the word cloud. I noticed that this program had a filter for different languages, but it happened the same with the French version, as I could see many words of common usage, as “bien” or  “cette” or  “comme”. So, in that case, Wordle was less useful to find meaningful results.

We have to think on one important issue: that we have to customize very carefully these tools. That arises the following questions: Are we making a text say what we want it to say? Is it just another way to do the same as the kind of literary criticism we already have?

When I pasted the words from WorditOut to Up-Goer, the program permitted all of them except six: “Dupin”, for it is a surname, “indeed”, “balloon”, “manner”, “itself” and “earth”.  I found it interesting that most of Poe’s words were common.

UPGOER FIVE

UpGoer Five

Using CLAWS, I found that most of the words are nouns, (I used the help of Wordle to see this in a clearer way!), adverbs, adjectives, general determiners, the “base forms” of the verb “to be”, prepositions, etc. I think it is an interesting tool when you are looking for something very specific. Again, all depends on the questions you have, the relevance of those questions and the relevance of the results. Data just for the data is meaningless.

CLAWS RESULTS

CLAWS

Finally, TaPor is a very interesting program. It is much more sophisticated and useful than the word cloud creators. It works with texts in French, Spanish, German. The “voyant tools” were interesting, like seeing the frequency of certain word(s) in a graphic, in context, etc. You actually can “see through your texts” as the Web page invite the users. I found that “death” and “idea” appears the same amount of times! And “great” and “little” are the most common used adjectives. It is also interesting to see the differences between the two languages. The results tell us a lot about the particularities of both languages, like the common use of the verb “to say” in English language literature opposed to the use of synonyms of that verb in other languages’ literatures, as it is more frequent in the English version that in French version. There are a lot of data to read and analyze here!

TAPOR ENGLISH TAPOR FRANCES

I think all these tools are useful for translators to understand some phenomena, how we translate, how some writers and some translators use a particular vocabulary, style, phrase construction, etc. I think it would be great to do that with an own translation and see the results, and also to compare two translations of the same work!

Conclusions

At this level (just trying new tools, not researching for any particular paper) I found curious numbers and graphs, but if I had had in mind a set of questions and hypothesis, it would have been very useful –but always depending on the relevance of the questions and responses. I think that if we have questions very well defined, there will be some interesting results. (I wonder about the difference between answers and results. Do computers answer or just give us results?) And once we have some answers from the computer, we can reformulate new questions, which is the most interesting part of literary criticism, activity that, as Ramsay says, did not change with the introduction of computers. We interpret the results that machine can give to a certain research –word  frequency through a book, through time, etc. As Ramsay affirms,

“If something is known from a word-frequency list or a data visualization, it is undoubtedly a function of our desire to make sense of what has been presented. We fill the gaps, resolve contradictions, and, above all, generate additional narratives in the form of declarative realizations.”(62)

Results are results and they can not be changed; they are a fact. But we read them and we arrive to different conclusions, even though we have the same object in front of our eyes: algorithms, data or a book. Those are just different ways that let us read a story, and reading through machines is a fascinating one that many times defy our preconceived ideas or give us new perspectives of reading it. That feeling of learning to read again, of seeing a text in a whole different way, that “ostranenie”, are fundamental to begin making questions, to try to find new paths to fight against common places in literary criticism. Also it is a way to do things with books: like snipping its pages. That is something we can do because we are working with digital texts. And digital texts have a very different substance than printed texts. With computers we can analyze just text, but not other important details that also make part of our reading of a book (and the readings that that book had in the history of our culture) as the book itself: how and where it was published, how its covers are, from which collection, etc. So not everything can be read through machines, and we have to pay attention not to isolate the “text”, as if everything that should be read is just in the (digital) words of a text.

Through my experience working with different programs for this exercise, I realized that I was finding new questions (everything was questions! and I could not arrive to any answer at this level); I also found new ways of thinking texts, of thinking translations, and that is what I really like to do as a reader and as a student.

“Computer-enabled play” — hacking The Marble Faun

(I steal “computer-enabled play” from Irizarry, quoted by Ramsay on pg. 36 of Reading Machines).

The reason I was drawn to include the phrase “computer-enabled play” in my title was because that is really what I felt like I was doing throughout the exercise: playing, fiddling, fooling around, testing out, exploring, etc. Similar to what Mary expressed, I found that some of these experiments were overwhelming (or annoying) in their unfamiliarity, but I soon discovered that if I “played” with the tool enough, I could eventually gain some insights into The Marble Faun in a new way (i.e., different insights than I would have garnered from reading the text in a traditional manner).

Wordle and WordItOut seemed especially “playful” with their fun names, bright colors and graphic visualization. As I’ll probably reiterate several times in this post, these tools did require some “fiddling,” though.

WordItOut

WordItOut

Wordle

Wordle

For both tools, I used the full text of The Marble Faun, from the Project Gutenburg plain text online version. With WordItOut, I appreciated the function to tweak the list that was generated. I could view the list in ascending count order, alphabetically or randomly (though I’m not sure how the last two would aid in a critical analysis). I was able to increase or decrease the number of words. Also, unlike Wordle, WordItOut allows you to see the generated word list in both list and visualization form, which was helpful when I wanted to copy the list for further exercises.

As others have observed, with narrative forms (novels), it seems that names and other pronouns seem to be most prominent. In a story like The Marble Faun, it was actually interesting to see which character was the most represented: Miriam. (The Marble Faun is sort of like modern day sitcoms that center on a group of friends–so imagine that we Wordled an episode of “Friends” and discovered that “Rachel” is the largest name–what does this tell us about the group dynamic?). In The Marble Faun, the drama centers on MIRIAM. I thought it was interesting that Hilda is the second largest name–so in a novel which actually has fewer female characters than male, the females still win out in nominal presence.

Other prominent words seemed to thematically center on art (not surprising; the characters are artists living in Rome) and time (not surprising; Hawthorne often focuses on the interplay of past, present and future reality). Wordle and WordItOut thus demonstrated for me the point Ramsay notes more than once, that at a base level, digital tools might merely confirm analyses we have already made.

Upgoer 5

Up-Goer 5

After struggling with the “define digital humanities” Up-Goer 5 challenge before beginning the exercise (how unnatural for literary scholars to prioritize simple, oft-used words over our sophisticated vocabularies!) it was interesting to use the tool to investigate a pre-generated text. I used my list from WordItOut, but removed the names as I didn’t think they would propagate any new insights (i.e., it would be no surprise that “Hilda” is not in the top 1000 used words). What I did find, though, WAS intriguing.

Of 97 words produced by WordItOut as “most used” in The Marble Faun, only eleven did not make the Up-Goer 5 top 1000 word list. These were: sculptor, Rome, marble, among, itself, whom, Roman, nor, poor, tower, and indeed. I’m guessing that Rome and Roman are too specific (proper nouns) to merit top-1000 usage, while sculptor and marble as nouns also seem too obscure (we don’t talk about sculpture very generally or often). Nor, whom and indeed are rather sophisticated uses of grammar, so their absence doesn’t surprise me. I don’t have an explanation for among, itself, poor, or tower—any thoughts?

What’s left behind (in the top 1000) is interesting when you consider the words as “topics” of interest (in the sense that oft-used words might represent broader themes): life, heart, friend, good, human, world, love, art, idea, moment. Not only are they huge topics in Hawthorne’s text, but also, apparently, in everyday speech.

CLAWS

CLAWS

On to CLAWS, the realm of tagging. This, I did not find playful. I was rather confused, though finding the accompanying “tagset” key was somewhat illuminating. I didn’t have the patience to count the different types of word forms, but that could have been interesting to see–were Hawthorne’s 100 most-used words from The Marble Faun mostly pronouns? (probably). Singular nouns? Comparative adjectives? Etc. etc. So I can see how CLAWS could be a useful tool, but I didn’t like the aesthetic of the list that was generated (no spacing, no counting) so, admittedly, I moved on.

TAPoR

TAPoR

TAPoR, while also intimidating with its unfamiliar interface, was “playful” in its potential for “fiddling,” as I previously described. The more I played around with it, the more I found ways to make it work for me. After scrolling through the word lists in the lower left-hand corner (sorted Frequency vs. Count vs. Trends) I clicked on “heart” to see what came up.

I don’t really understand the graph in the upper-right hand corner, though I know you can view two words at once to—I presume—compare frequency at various points in the book. For instance, I viewed “woman” and “sympathy” together and saw a very similar pattern, suggesting that woman & sympathy are often discussed in tandem. This is not surprising, given that Hawthorne’s romance could really be considered a sentimental novel and he’s constantly talking about the female characters’ womanhood and capacities for sympathy (e.g. Hilda is very sympathetic, Miriam not so much). What confused me were the “segments,” though I suppose you could generate the graph so that it represented chapters, if you knew how to finagle that breakdown. That way, you could see where, in the novel, topics were discussed with higher frequencies. “Heart,” for instance, skyrockets at the end of The Marble Faun, according to this “Word Trends” graph.

I also got the hang of the concordances tab and found these lists extremely interesting. Under “heart,” I could observe the following concordances:

Intimate/heart/knowledge
Hilda/heart/life
close/heart/beautiful
brain/heart/think
trust/heart/trusts
secret/heart/burns

These are only a few examples (from 89 instances of heart in the first volume of the novel) but SO INTERESTING! I’m especially intrigued by instances like “knowledge,” “brain” and “think” surrounding the presence of the heart, since we get that tension between cognition and emotion there. Trust and secrets regarding the heart don’t surprise me at all given the nature of the novel, nor does the presence of the “beautiful.” Hilda’s concordance is also not surprising—the trio of “Hilda,” “heart” and “life” is only too perfect. (I know I’m not being very clearly critical here, but I’m sure you can see the potential for developed analytical writing on these topics).

One question I had while using TAPoR concordances: TAPoR doesn’t select the immediate surrounding words, but rather “keywords.” For instance, “secret/heart/burns” comes from the sentence: “There is a secret in my heart that burns me!—that tortures me!” This is a pretty good example—we presumably don’t care about “in,” “my” or “that,” but how are the keywords chosen? Does the software just eliminate prepositions? Do we lose the presence of “torture” here? Compare to an example like “only/heart/sought.” The sentence from which this concordance is generated is, “But if it were only a pent-up heart that sought an outlet?” To me, “pent-up” seems important, while “sought” and “outlet” are equally important. So, I’m just wondering (and perhaps someone can actually tell me) how the concordances work—how are the surrounding terms generated?

I could really see myself using TAPoR in the future (though, again, the interface doesn’t really appeal to me and I wished I could have enlarged everything–but these are minor complaints of a whiny variety). As someone who was widely unexposed to DH tools before this class, Ramsay’s Reading Machines and our exercises have legitimately moved me “Towards An Algorithmic Criticism.” The text, in its descriptions, examples and analysis of digital tools and their impact on/interaction with literary criticism was seriously illuminating. We were prompted to consider how

“the effect is not the immediate apprehension of knowledge, but instead what the Russian Formalists called ostranenie—the estrangement and defamiliarization of textuality” (3)

regarding our experience with the various digital tools today, and it certainly applies. “Estrangement” and “defamiliarization” certainly describe my “computer-enabled play” with The Marble Faun today. We are distanced from the text when the computer intervenes, transforming prose into lists, visual graphics, concordances, and line graphs. BUT this does offer, though not immediate, new “apprehension of knowledge,” I believe. From reading Hawthorne’s prose, I do not “know” whose name appears most often in the text, even if I can guess. Conjecture becomes fact, and fact leads us to points of inquiry, new questions regarding “why?” More articulately put by Ramsay on pg. 62:

“If something is known from a word-frequency list or a data visualization, it is undoubtedly a function of our desire to make sense of what has been presented. We fill in gaps, make connections backward and forward, explain inconsistencies, resolve contradictions, and, above all, generate additional narratives int he form of declarative realizations.”

I’d like to point out a couple of other passages which stuck out to me and helped me frame algorithmic criticism this week:

“The computer revolutionizes, not because it proposes an alternative to the basic hermeneutical procedure, but because it reimagines that procedure at new scales, with new speeds, and among new sets of conditions” (31).

 

“Rather than hindering the process of critical engagement, this relentless exactitude produces a critical self-consciousness that is difficult to achieve otherwise” (34).

And, to end, perhaps a point that generates and necessitates discussion: in opposition to “ambiguity,” the computer “demands an answer” (67). Is this a limitation? Shouldn’t there be room for ambiguity in literature, even if it doesn’t fit into an automated output? Ramsay continues, “…the computer demands abstraction and encapsulation of its components” (67). Again–is a limitation present, here? Are all texts (/words/phrases/data sets) discrete, with potential to be “encapsulated?” Does the computer miss something the subjective mind would not?

Mapping Scarlet Letters

Wordle and WordItOut

To start, I decided to place the text of The Scarlet Letter into Wordle and WordItOut and compare them side by side. Here are the clouds I created. First, Wordle:

Wordle Scarlet Letter

Nothing too surprising here. And yes, I kept all the defaults. On to WordItOut:

WordItOut-Word-cloud-161404

I kept the layout simple for easy reading. As you can see, both word clouds are quite similar, and the content of the words isn’t surprising. Names are quite prevalent, with Hester and Pearl topping the list. Words like “Heart,” “Life,” and “Mother” get at the core issues of the novel. WordItOut represented quite a few more mundane words that don’t mean much on their own: “within,” “among,” “whether,” “indeed,” “even,” ect., while Wordle came up with more interesting results overall, though many of these words appear small due to their low frequency. In particular, the category of morality pops up, which shouldn’t be surprising for someone that has read the novel: “soul,” “sin,” “shame,” ect. Finally, while admittedly a common word, the large frequency of “One” is a bit puzzling, but something to take note of for later.

Up-Goer Five Text Editor

Up-Goer Five Text Editor is an interesting experiment in constraint. It does for the depth of language what Twitter does for length. It took a bit of rewriting before I got a definition of the Digital Humanities that didn’t seem horrible: “The use of computers in order to find new ways of doing and making while focusing on older ways of understanding.” Wow, does Up-Goer Five Text Editor require simplicity or what? Already I was ready for a few rejected words, so I put my results from WordItOut into the box and clicked enter. This is what I found:

Up-Goer Scarlet Letter

Most of this isn’t too surprising. I did not expect names like Prynne and Chillingsworth to be among the ten hundred most used words. Moreover, words not in use anymore, such as “thee” and “thy,” were rejected, although I was surprised and disgusted by the rejection of “whom.” You would think a word like “itself” would appear, but this demonstrates just how limited you must make your vocabulary in order to use this tool. This was an amusing experiment, and the constraint works in a similarly way as  Twitter, forcing the user to create something under set limitations.

CLAWS Part-of-Speech

Next, we move on to CLAWS part-of-speech tagger, which is a fun experiment, but not quite as amusing as the other tools. I would have appreciated a function that sorts the words of alike parts of speech together, but I suppose you cannot ask for everything. From what I can tell, there is actually a variety of parts of speech here, with proper nouns, reflexive pronouns (the prevalence of “self” is interesting), adverbs, singular nouns, prepositions, pronouns, and more. Could I have discovered this on my own? Probably. But CLAWS brings these facts to my attention as a way of sparking new questions or pursuing new areas of study. But for now I’ll leave CLAWS alone and move on to the final tool.

TAPoR

Trying out HyperPo and experimenting with different combinations of words was worthwhile. After fading through the first page of largely uninteresting words, I came across the word “One” once again. Equipped with this new tool, I decided to map out its presence throughout the text and perhaps account for it.

HyperPo 1 One

Yes, “One” is a common word and could have little significance. It could also be a particular vocabulary quirk of Hawthorne (or perhaps the era in which the book was written) to use “one” rather than “you” or “she” or “he,” or to refer back to a person. Certainly, this is the case. But there are numerous instances in which “One” serves a more interesting purpose. In sentences like “. . . deep a dye as the one betokened by the scarlet letter,” one is used to emphasize the unique suffering of Hester’s situation. Again, more likely Hawthorne uses “One” incidentally as part of his diction, but cases like these suggest the possibility of something more.

Next I decided to experiment with a more concrete idea. I selected “child” and “infant,” both of which refer to the character Pearl in the novel, and attempted to set them against each other on the graph. This did not work for some reason, so I was forced to look at them separately. As expected, “Infant” occurs almost entirely on the left side of the graph, the beginning of the novel, when Pearl is, well, and infant. Child, meanwhile, appears steadily throughout the novel, starting just after “infant” ends (with some overlap), as well as a slight dip in the set of chapters in which Pearl does not appear. This looks good. Despite the minor technical hiccup, HyperPo seems to be doing its job. Of course, in this case, I only set it to tell me something I already know, suggesting that I may not be asking the right questions. But this was an short experiment with the capabilities of the tool itself, so I have no choice but to forgive myself.

As one final experiment, I noticed that HyperPo allows you to collapse different words and view their frequency as one unit. I tried this with “sin” and “shame,” words associated with Hester’s scarlet letter:

HyperPo 3 sin and shame

As you can see, the greatest frequency of these words together occurs toward the beginning of the novel, while it fluctuates up and down before going up near the end. What can we determine from this graph alone? Perhaps the scarlet letter torments Hester most toward the beginning of the novel, during Pearl’s infancy. The passage at the end is also notable for the line,

. . . long since recognised the impossibility that any mission of divine and mysterious truth should be confided to a woman stained with sin, bowed down with shame, or even burdened with a life-long sorrow.

Focusing on these central themes at the end accounts for the tiny spike. Of course, I can verify none of this without directly consulting the novel, which further indicates the use of this tool as a form of provocation, a way of reshuffling the words of the text to raise interesting questions. In this sense, to “see through the text” involves a specific mapping which requires zeroing in specifically on finite sets of words. The experience of HyperPo is like reading a text with a powerful, magical magnifying glass that guides the reader to common and specific parts of the text. Okay, that analogy may not work as well as I was hoping, but I gave it a shot.

Conclusions

Overall, HyperPo is a robust tool that has a lot to offer, and I have of course only scratched the surface. Wordle and WordItOut are useful for expressing a main idea or message easily and succinctly, but I imagine HyperPo could be used for more serious research.

This exercise has taught me that one must be deliberate and careful while using these tools, provided that you want to come out with something useful. They can be used to confirm what you already know, which most would argue is quite boring. It takes a great deal of time and experimentation before coming out with a truly stunning result, and these are the ones that are the most worthwhile. These are the moments when you are able to look at a text in a new way, and this alone justifies the use of these tools.

In this sense, Ramsay’s these tools indeed create a sense of the “estrangement and defamiliarization of textuality” by forcing the reader to view a text in an entirely different way. For all of its simplicity, Wordle’s ability to recognize and display common words presents the text in its most basic form. No, this is not the same as reading The Scarlet Letter. Not even close. But as a tool of provocation, the re-shuffling and re-oganization of words could lead to new insights about the text. Perhaps HyperPo best demonstrates the capabilities of these sorts of tools for scholarship. I’m still not convinced that any of these tools can help us “Read a Million Books,” as they require the user to be familiar with the texts beforehand in order to glean useful information, but perhaps that is a topic for another day.

Neil Gaiman’s “A Calendar of Tales”

I discovered this last week when retweets by Neil Gaiman, a favorite author of mine, took over my Twitter feed. It was too wonderful not to share. What has happened is this: Neil Gaiman has teamed up with the makers of the Blackberry 10 to create a project (very much in the spirit of the Digital Humanities) that allows readers to collaborate with Neil Gaiman as he writes. As you can see from the website, the project is entitled, “A Calendar of Tales.” For each month, Gaiman produced a question to which people on twitter were able to respond using specific hashtags: #jantale for January, #febtale for February, etc… Gaiman is now using the tweets sent out by his followers as inspiration for a series of tales that he will write (one for each month). As the next step, Gaiman will share his tales and accept submissions of illustrations, choosing one for each story, thereby making these tales both inspired by and illustrated by his followers on Twitter. And that’s the real beauty of this project: collaboration. In the video posted on this site, Gaiman talks about how the composition process is usually a rather lonely one—featuring a writer sitting in a room writing down thoughts that only he or she is privy to at the time. However, by calling upon tweeters from all over the world to share their thoughts and stories on Twitter, Gaiman is able to transform the writing process into a collaborative one in which a reciprocity is formed between writer and reader that allows him to draw upon his fellow tweeters for inspiration in order to create stories that would have been left untold had it not been for this project.

What has William Morris to do with DH?

A brief recommendation: UMD Libraries’ Special Collections is currently featuring an exhibit  (“How We Might Live: The Vision of William Morris,” Sept. 2012-July 2013) on the life and works of William Morris, the 19th-century English author, designer, socialist, and — arguably most famously, though perhaps I’m not objective on this point — founder of the Kelmscott Press and printer of the Kelmscott Chaucer.  As a medievalist with a particular interest in manuscript studies, I’ve long found Morris’s work appealing and admired his taste — for example, what lover of books would not appreciate the discussion of the relative aesthetic merits of various typefaces and guidelines for margin widths found in his “The Ideal Book“?  That having been said, though, I never found Morris particularly relevant to my own work — that is, not until I read Bethany Nowviskie’s very thoughtful MLA talk, “Resistance in the Materials” (posted here on her blog).  Nowviskie uses a quotation from Morris as a jumping off point for discussing the role of craft and collaboration in DH, as well as for some reflections on the casualization of the academic workforce.  Not only is her essay directly pertinent to our discussion of making and building in DH, but for me reading it also gave new relevance to UMD’s Morris exhibition.  In particular, it got me thinking about the tension between the hand- and machine-crafted object in Morris’ work, and about the resonance of his attempts to translate both the aesthetics and the ethics of the hand-crafted book into the technological context of printing. In that sense his work now strikes me as particularly relevant to our moment, when at times the future of books as physical objects seems to be in doubt — not to mention the viability of a career devoted to writing and studying them. But rather than take my word for it, why not read the essay — and take in the exhibition — for yourself?

Examining the Architecture of _The Castle of Otranto_

Introduction with Ngram:

To begin my examination of The Castle of Otranto, I thought I would start with the results I found on Ngram. When we were told to use Ngram to map out two terms, I decided to go with “horror” and “terror.” I changed the dates in Ngram to start at 1700 rather than the default 1800 and mapped out the results. Here is what I found:

Ngram Viewer_Terror_Horror

Since the dawn of Gothic literature, incited by Horace Walpole’s The Castle of Otranto, occurred in the mid-1700s, I was not surprised to see such results, for along with the inception of Gothic literature in England, Walpole’s work also sparked a discussion of the difference between horror and terror. Ann Radcliffe, a renowned author of Gothic works during the late 1700s, utilized terror in her writings, hinting at supernatural occurrences, but eventually explaining them away as rational events transformed into terrifying ones by superstitious sentiments. Terror, for Radcliffe, is the anticipation of the supernatural. Horror, on the other hand, is the fulfillment of a supernatural occurrence. Radcliffe defines these differences in her essay “On the Supernatural in Poetry,” published in 1826. Traditionally, scholars have aligned terror with female Gothic writers and horror with male Gothic writers, though such a stark dichotomy is obviously not a perfect representation of the real relationships between male and female authors and the use of terror and horror. However, the dawn of Gothic literature and the discussion of horror and terror sparked by the differences between anticipated supernatural occurrences and the actual fulfillment of supernatural events can perhaps explain the sharp increase in the usage of horror and especially terror in the late 1700s. The steady decline leading up to the present and coming together of horror and terror can also be hypothesized to be a result of our more modern usage of these two words which tends to treat them as interchangeable.

WordItOut and Wordle:

The Castle of Otranto WorditOut The Castle of Otranto Wordle

Moving on to my text, when I put The Castle of Otranto through both Wordle and WordItOut, many of the results were similar. Names (Manfred, Isabella, Matilda, Theodore…) were marked as appearing in the text the most often, which is not all that surprising considering most of the novel concerns the “bartering” of two women, Isabella and Matilda, by Manfred. “Cried” is also relatively large, which makes sense since Isabella and Matilda are both upset with the matches Manfred tries to impose upon them. Other words that are in comparatively large font are “Princess,” “Lord,” “Prince,” and “Castle.” As the book that sparked the production of Gothic literature in England and contributed to the development of gothic tropes such as the medieval castle, the damsel in distress, and the tyrannical male, it is not surprising to find these terms in large font.

The Up-Goer Five Text Editor:

When I placed the top 100 words into the Up-Goer Five Text Editor, I came up with a lot of terms that just did not fit. Heralding back to a former age filled with knights in shining armor, princesses in distress, and ancient castles, it is not surprising that this is the case. Many of these words are not in common usage, including the personal pronouns thee, thou, and thy which again are used to suggest the composition of this text in medieval times.

The Castle of Otranto Up-Goer Five Text Editor

CLAWS:

CLAWS was intriguing, though perhaps not as useful as some of the other tools. However, there were some interesting results that mirrored what I found in my Wordle and WordItOut word clouds. There were a lot of proper nouns due to the common occurrence of names within the text. Also, there were many other nouns that serve to invoke the spirit of medieval times and Arthurian adventures: “Prince,” “Princess,” “Knight,” “Highness,” “court,” “escape,” “chamber,” and “convent.”

TAPoR:

Looking at TAPoR was a lot of fun. I definitely liked the aesthetics of the site with all of the different boxes showing me different ways of pulling apart the text and examining the words as they occur throughout the novel. Looking at the occurrences of words in the lower left hand corner of the page, I was interested to see that (after all of the indefinite/definite articles), the words “if” and “would” came up pretty high on the list. Seeing as the plot of this story centers around Lord Manfred’s attempts to convince Isabella to marry him, and later, his attempts to make his daughter marry Lord Frederic, these words seem appropriate (If only you would marry…). Once again names were high on the list. Because this tool offers you an easy way to map where the words fall and find the context in which they occur, I took the time to map out Manfred, Isabella, and Matilda to see where their names appear the most often and what is the context of these moments.

Manfred:

The Castle of Otranto_Manfred

The moment where Manfred occurs most frequently is in a moment when Matilda decides to go and speak to her father after the death of his son, Conrad. This scene involves Matilda trying to build up the courage to speak to her father. When she finally does, he denies her admittance, telling her that he does not want a daughter, he wants his son back. This is very typical for a man to be more concerned with the male heir than his daughter. Also, other places that Manfred’s name appears are surrounded by words like “rage,” “incensed,” “angrily,” and “impatient,” giving one a hint into the tyrannous nature of Lord Manfred.

Isabella:

The Castle of Otranto_Isabella

For Isabella, the time where her name is mentioned most frequently occurs during her escape attempt in which she flees from the evil machinations of Manfred, who seeks to divorce his wife in favor of marrying the young and innocent Isabella. Many of the other times Isabella’s name appears are in regards to discussion of Manfred’s loathsome plot and to inquiries that are being made into her disappearance so that she can be found and subjected to Manfred’s will.

Matilda:

The Castle of Otranto_Matilda

For Matilda, her moment comes when she is made aware of the fact that Lord Manfred (her father) agrees to marry her off to Lord Frederic (Isabella’s father) so that Frederic will grant Manfred Isabella’s hand in marriage. It is a typical moment of patriarchal bartering. Manfred wants Isabella, so he offers his own daughter to Frederic without a second thought. And the words surrounding the occurrences of Matilda’s include some of the feminine virtues that prevent her from being able to refuse such as “tenderness,” “virtuous,” “goodness,” and “purity.” How can these gentle and innocent women hope to escape the wickedness of their patriarch? It is not surprising that Isabella and Matilda, whose names are tossed around so often within this text, find a greater frequency of occurrence during the moments in which their fates decided by their patriarchal fathers are pressing down upon them.

Conclusion:

Overall, I was pleased with the new perspectives that TAPoR was able to offer. Although I have studied this text before, it was interesting to map out the words, find the moments where they occur most frequently, and justify them with my own impressions of the text. The results offered by TAPoR provided me with confirmation of thoughts I had already gleaned from the text. However, the “estrangement and defamiliarization” of the text that Ramsay addresses does serve more purposes than mere confirmation (3). I definitely felt as though I was able to gain access to the bones of this text in ways that I had not been able to through my own close reading, because it really forced me to pay attention to what words Walpole chose to use and where he placed them. Like Isabella, who explores the secret tunnels and hidden passageways of the castle as she attempts to escape from the tyrannical Manfred, I felt like I was able to find hidden pathways of The Castle of Otranto that I was not aware even existed before this activity.

It can become difficult to relinquish your first impression of a text, even when you are close reading it. I used this novel in a paper that I wrote about Gothic tropes and the use of horror and terror in Gothic texts, so my view was confined to looking for evidence of these themes. By “defamiliarizing” me with the text and breaking it down into words, I was able to pay closer attention to the distress of Isabella and Matilda, as well as the intense patriarchal authority evinced by Manfred’s character. As Ramsay notes, these digital tools gave me a way to do what scholars always do with texts when they critique them—they provided me with “a text transformed and transduced into an alternative version, in which, as Wittgenstein put it, we ‘see an aspect’ that further enables discussion and debate” (16). By looking at the words of The Castle of Otranto, the building blocks of this great novel, I was able to examine the architecture of the Castle in a way that enabled me to see alternative aspects of the text—thereby sparking new conversations about the language of female oppression and patriarchal dominance that were not the focus of my initial close reading of the text.

Decontextualizing ‘The House of Mirth’

Word Clouds – Word It Out (L) and Wordle (R)Mirth WordItOutMirth Wordle – Click to Enlarge

 

 

 

It’s not very surprising to see Lily’s name in big bold print in both clouds (though I definitely prefer Wordle’s aesthetics to Word It Out), as she is the novel’s protagonist – same goes for (Lawrence) Selden, our dashing bachelor/love interest. Also, since Mirth is a Wharton novel of manners, the presence of titles such as “Mrs.” and “Miss” is to be expected. I was, however, intrigued to see the singular pronoun “one” battling for preeminence with “Miss” – it’s been a few years (*cough* 4 or 5 *cough*) since I’ve read the novel, so no immediate reasons for this occurrence come to mind. Speculatively, however, there are a few theories I could spin. The novel centers on the misfortunes of Lily Bart, an aging beauty (and spinster at twenty-nine!) who repeatedly strives for independence throughout the novel. She is indeed a solitary figure (one alone) who continually casts herself apart from the rest of the crowd (one apart) and is continually pursued by Selden (for whom she is the only one). Spoiler alert, she also dies alone.

I also found it interesting that there is a bit of an imperative tone in some of the more prominent words in the word cloud – mostly temporal words like “now,” “moment,” “must,” and “time.” Words that refer to perception and the internal (“seemed,” “know,” “felt,” “sense,” “thought”) also dominate the more outwardly social terms (“voice,” “talk,” “tell,” and even “social”), a nod to the focus of the novel (i.e. Lily’s character), set against the backdrop of high society.

Word Lists – Up-Goer Five and CLAWS

I had a bit of trouble figuring this out, so I thought I’d be a bit more detailed in explaining (since I’m one of the earlier posts). In order to obtain a list of words from my word clouds, I had to scroll down to the box under my Word It Out cloud (I couldn’t find any option in Wordle) and click the “Word List” tab. Then for the “Case to display:” option I selected “Most Common” so that it listed the 100 words selected for the Word Cloud first (see pic below). Then I could select and copy my needed words for Up-Goer Five and CLAWS.

Screen Shot 2013-02-10 at 5.00.15 PM

I wasn’t sure what to expect when I pasted my words into the Up-Goer Five Text Editor, but I probably should have been tipped off by my need to select the option “Most Common” on Word It Up.

Mirth UpGoerFive

The only words that were kicked back were names! So… does Word It Up’s algorithm function in the way that Ramsay cautions against when discussing attempts to determine an author’s style, saying that it is more likely to “demonstrat[e] the general properties of word distribution in a natural language” (11)? I suppose I can cling to some degree of differentiation of Mirth from other novels in terms of which most-common words made the cut and how large they appear in relation to each other… But still, this little realization damages my perception of word clouds’ representational abilities.

Taking my now not-so-unique word list to CLAWS, I encountered a few off-putting glitches, such as the software’s inability to list my results vertically, which is the easiest way to interpret them (it stopped halfway through word number 58) and it’s blatant mislabeling of a few parts of speech (“Miss” was misinterpreted as a verb). Skimming through the list of tags, I concluded that the majority of the words were nouns and verbs (though there was some crossover potential in words like “sense” or “last” which were counted as verbs). There was one interjection, however, which was a pretty interesting find – the word, “Oh.” Such an interjection can express a broad range of emotions, though in the case of Mirth, there is surely an element of wistfulness underlying many of its appearances in the text.

And, with a statement like that, what better way to dive into TAPoR’s affordances and test my theory? According to TAPoR, the word “oh” appears 102 times in Mirth (much lower than our number one hit, “Lily,” at 677 occurrences). I was also able to map it’s distribution in the text:

Screen Shot 2013-02-10 at 7.02.16 PM

Of course, I rushed straight over to segment #13 (which required me to enlarge the actual reading pane, which I had shoved over in my eagerness to see the usage graph!) to see how “Oh” was actually being used in its most prominent passage. Aaaaaand, well, I was wrong. Segment #13 is a trivial conversation between Lily and another woman, filled with dismissive “Oh, Lily,” and “Oh, I don’t mean…” statements. Trying one last time, I checked out the trio of segments occurring near the novel’s (tragic) conclusion. In two of the three times “Oh” was again used dismissively BUT I was rewarded in discovering that both utterances were steeped in tragic irony – the first occurs during Lily’s last conversation with Selden, where she says, “There is some one I must say goodbye to. Oh, not you—we are sure to see each other again,” (SO MUCH POIGNANCY!) and the second dismissive “Oh” is again spoken by Lily in response to an acquaintance’s declaration for her little girl: “Wouldn’t it be too lovely for anything if she could grow up to be just like you?” The scene continues (Lily’s last conversation before her death):

Lily clasped the child close for a moment and laid her back in her mother’s arms. “Oh, she must not do that—I should be afraid to come and see her too often!” she said with a smile; and then, resisting Mrs. Struther’s anxious offer of companionship, and reiterating the promise that of course she would come back soon, and make George’s acquaintance, and see the baby in her bath, she passed out of the kitchen and went alone down the tenement stairs.

Final Thoughts:

Throughout my interaction with the programs discussed above, I found myself unable to resist finding meaning within the objective results churned out by algorithms – even when I recognized the blatant ‘fails’ of the software and its proclivity toward certain sets of words. Although words like “might” and “never” are likely to be highlighted by Wordle in other texts, their appearance in the word cloud for Mirth seemed irresistibly poignant. I even found myself making connections between the emphasis of “eyes” over other physical features, such as “hands,” “smile,” and “face” – for the eyes are the windows to the soul (and Selden resists objectifying Lily, unlike her mother, other men, and even Lily herself at times). Like Ramsay intimates in his examples of ELIZA and Mueller’s lists, I felt compelled to make sense of the results given, to “teeter between confirming [my] own theories and forming new ones” (71). According to Ramsay,

Algorithmic criticism seeks a new kind of audience for text analysis – one that is less concerned with fitness of method and the determination of interpretative boundaries, and one more concerned with evaluating the robustness of the discussion that a particular procedure annunciates. (17)

Is algorithmic criticism a ‘fit’ means of engaging meaningfully with a text? Well, considering the ‘robustness of the discussion’ I just had with myself in using such programs, I would have to say yes.

Dracula: Simplicity and Survival

I’ve always loved Dracula, not because it is revolutionary in and of itself, but because future readers and their interpretations have made it so.  Bram Stoker, I am thoroughly prepared to believe, was a particularly Victorian gentleman.  That being said, I have never “dug” into Dracula, so I look forward to seeing what arises when one does a bit of literary archaeology with the text.

While we were not asked to provide out Ngram data–and in the light of the TED talk–I felt it was a good place to start. Sticking with my theme, here are my Google Ngram Viewer results:

Clearly we can see who is winning in this battle of the vampires.

Clearly we can see who is winning in this battle of the vampires.

Now, I know this may not seem particularly fair–after all Edward Cullen has hardly appeared on the vampire map, as of yet.  It did, however, warm my heart to see that nothing has diminished Dracula’s ever growing popularity as a literary figure.  A little bit of a dip down in the past few years–I blame the dreadful Keanu Reeves film for that stumble–but all in all a steady climb.  In fact, I was surprised to find how long it really took for Dracula to get off the ground–and interested to know what sort of research into the real man Dracula (as opposed to the fictional vampire who stole his name) caused his little hop from obscurity in the 1820s.  (Edward Cullen, I’d like to point out) appears just as must before the Twilight novels were released as after, leading me to conclude that the name has appeared in other novels prior to his rise as a vampire, as well.)

Moving on from there, my Wordle word cloud:

Dracula's language really doesn't look terribly haunting like this.  The words are all painfully simple.

Dracula’s language really doesn’t look terribly haunting like this. The words are all painfully simple.

I find it interesting that even with Wordle supposedly removing commonly used English words from its cloud, the result is exceptionally boring.  No evidence of complex language in the least and nothing particularly atmospheric either.  I would have at least expected vampire to make an appearance in the cloud–or even Dracula–but the result it more than a little disappointing. And once again, even in Word it Out, this is not Dracula’s shining moment:

Let's just say I would never provide someone a word cloud in order to entice them to give Dracula a try.

Let’s just say I would never provide someone a word cloud in order to entice them to give Dracula a try.

The result looks closer to the vocabulary on an elementary school spelling test than the palette of a novel.  One might even suggest, based on the two clouds, that the novel be called Van Helsing as he makes a far more clear impression on the clouds than either “Dracula” or “vampire” manages.

As one might well expect from these word clouds, Up-goer Five Text Editor has very few stumbles at all–even after one permits Wordle to remove the most common English words from the cloud.  The real stumbling blocks for Up-goer Five are names (such as Lucy, Mina, Arthur, Jonathan, Van Helsing, and Harker), titles (such as Dr., Madam, Count, or Professor), and a few stray words (some obviously antiquated such as whilst and till and others, which came as more of a surprise such as terrible, poor, and thin).  The language of Dracula appears on the whole to be quite simple and common, indeed–certainly nothing Dickensian here.

Even CLAWS Part-of-Speech tagger suggested that the language of Dracula was far from complex and showed a most un-Victorian and un-Gothic abhorrence for description and complexity.  All of Dracula appears to be made up of nouns doing things either in the past, present, or future with little attempt at describing where, when, or how the action is taking place.  Further, there was only one conjunction (“whilst”) tagged among the output of the word clouds.  Again, all this argues for a lack of complexity in Bram Stocker’s language choices.  Even if one could argue that Stoker may simply have employed a wider and more varied range  of words–thus discounted from the word cloud–the fact that “and” doesn’t even appear in the Word It Out cloud (which did not remove the most common English words from the results) would appear as evidence of the relative simplicity of language within the text.

TAPoR was causing me difficulties and so I then moved on to Voyant, with which I was at least passingly familiar.  The results were, once again, surprising to say the least.  The cloud it provided was almost entirely made up of the most basic of language (it, he, she, they, then, etc) of which only one word was over four letters in length: “which.”  Turning to the word trends I plugged in “Dracula,” “vampire,” and “Van Helsing.”  Judging by the results, the books title Dracula is a misnomer.  Even its former title, prepublication The Dead Un-Dead or The Un-Dead would have been a gross mistake.  To call it a book about a vampire might even appear to be presumptuous.  According to Voyant, Dracula really ought to be called Van Helsing, who–once on the scene–has a soaring relative frequency.

For much of the novel, neither vampires nor Dracula are mentioned.  Van Helsing, however, seems to make a rapid climb to popularity and stay at the heart of the novel from that point forth.

For much of the novel, neither vampires nor Dracula are mentioned. Van Helsing, however, seems to make a rapid climb to popularity and stay at the heart of the novel from that point forth.

Thinking that perhaps Stoker had preferred the term “un-dead” or “dead” over “vampire,” I added both those terms to my graph with little change.  While slightly more popular throughout the text was the term “dead” over “vampire,” even that hardly ever rose higher than the 0.3 mark.

In and of themselves, these tools may not prove much–or at least “the effect is not the immediate apprehension of knowledge”–however, the conclusion that I would draw from the data is as follows:  Dracula, is not a complex novel. Its direct and uncomplicated language reflects the values of its solid, stalwart, and sensible middle-class men of the “modern era” with their modern inventions (such as the typewriter and stenograph) and science (such as blood transfusions).  Further, while Stocker may have forced Dracula (and his fellow vampires) to recede in the face of the Professor Van Helsing, hero and true main character of the novel, Dracula refused to die.  I tested out the following:

Van Helsing may have succeeding in ridding the fictional world of his foe, Dracula, but in the real world, Dracula thrives.

Van Helsing may have succeeding in ridding the fictional world of his foe, Dracula, but in the real world, Dracula thrives.

It is clear that Stoker created a character that need not have appeared solidly throughout the novel to have a lasting impression on the reader.  Dracula’s ever growing popularity is proof of this.  So, perhaps, it is right after all that the novel be named for a character that does not even appear for much of it; for, in reading it, it is not Van Helsing who captures one’s imagination, but the vampire, Dracula.  He lives on, healthy and well-loved, in the modern world while Van Helsing struggles in his shadow.