Paper Machines???

Has anyone tried to run Paper Machines? I have downloaded all the pre-req’s and I know it’s installed (my Firefox just updated and prompted me to review my add-ons – both Zotero and Paper Machines appeared in the list), but I don’t know how to initiate it in Zotero. The directions on GitHub are very sparse:

To begin, right-click (control-click for Mac) on the collection you wish to analyze and select “Extract Texts for Paper Machines.” Once the extraction process is complete, this right-click menu will offer several different processes that may be run on a collection, each with an accompanying visualization. Once these processes have been run, selecting “Export Output of Paper Machines…” will allow you to choose which visualizations to export.

When I right-click on a collection, no such option appears. This is what I see, even with all options investigated:

Screen Shot 2013-02-19 at 5.15.30 PM

Anyone else have any success?

Sherlock Holmes Would Have Been a DHer

Alright, it is time for a super geeky confession: I belong to a Sherlock Holmes society.  At the last meeting a number asked me what I was studying and I tried to explain Digital Humanities to them.  It wasn’t, shall we say, the greatest success.  So I’ve been thinking, at one of our next meetings maybe I’ll finally give the presentation–a duty I’ve shirked for all of the 10 years I’ve belonged to the club.  I was trying to think of ways to blend DH with Sherlock Holmes and show how even the most basic of DH tools might be useful when understanding the Sherlock Holmes stories.

Well, the work for this coming week to find a library of sorts related to our texts started me thinking about the similarities between Dracula and Sherlock Holmes–and the men responsible for their creation.  Both authors considered themselves to be the epitome of the Victorian gentleman–upholding the beliefs fundamental to that image.  As such, wouldn’t they have a tendency to choose from the same offerings of the LDA Buffet?  Some additions of Dracula, such as my Project Gutenberg copy, even bill it as “A Mystery Story.”  Would the two men’s word choice reflect this similarity in experience and ideal?

Dracula Word Cloud Sherlock Holmes Wordle

I tried doing the Holmes word cloud with one text–Hound of the Baskervilles–but the names like Baskerville and Henry started to dominate so much so that one couldn’t see much of the other language, so to balance it out I stuck as much of the Sherlockian Canon as I could find into Wordle the resulting “footprint,” if I may so call it, seems more representative of Sir Arthur Conan Doyle’s writing as was the goal.  And judging by the results, it would seem that the two do share a similarity in word choice.  Words like “man,” “know,” “must,” “may,” “light,” “night,” and so on all have strong followings in the clouds.

Now, I’ve often heard said of Doyle that he was not a terribly good writer and that he, instead, had the good fortune to create a character who was original and fascinating enough to come to life in spite of this less than fortuitous entrance into the world.  Holmes captured the imagination of the readers in spite of Doyle’s talent rather than because of it.  Could the same be true of Dracula seeing the linguistic similarities between their authors?  I’m not entirely sure how to test this particular theory–maybe someone else will be able to suggest one–but I thought I could test how the popularity of the characters of Dracula and Holmes have compared to that of their creators.  The idea being that if Holmes and Dracula and their creators shared the limelight it would suggest that there was as much to be said about the creation as the creator.  Doyle and Stoker would be as interesting as authors as their creations were as literary characters.  The result is as follows:

Screen Shot 2013-02-14 at 6.43.00 PMGoogle’s Ngram Viewer would seem to support this theory.  The characters have survived far better than their creators–in fact, Holmes leaps to the forefront from the instant of his creation (Dracula has a bit more of an uphill battle at first).  But maybe this is to be expected?  Do characters always do better than their creators?  If so, let’s test on an undeniably talented author and their beloved creation, Jane Austen and Darcy:

Screen Shot 2013-02-14 at 6.44.54 PM

Now, the one problem with the above, is that it doesn’t take into consideration that Darcy is rarely called by his full name and has a very common one, at that, unlike Holmes and Dracula.  So, here is the above result modified with the revision of “Mr. Darcy” rather than simply “Darcy.”  It is not ideal, how often does, when writing about Austen’s ideal man, so formally refer to him as “Mr. Darcy.”  But, one should at least be able to mentally average the two results to attain some sense of our Darcy’s popularity in English writing:

Screen Shot 2013-02-14 at 6.45.08 PMSo clearly, this is not true among all authors and their creations.  Austen gives Darcy a run for his money.  Now, one must also take into account that Austen published far more texts than Stoker or Doyle.  Her’s were also far more popular–anyone heard of or remember The White Company? No? That suggests to me that Doyle’s talent with the written word is not as strong Holmes’s persistance in the memory.

Further, this research suggests that Stoker and Dracula shared a similar relationship with their fictional creations and made similar word choices.  We can’t definitively prove that Stoker and Doyle were particularly terrible writers, but the results suggest that other writers do not stand in the shadows while their creations take the limelight as these two do.

As a final note: the class discussion of anime reminded me of a statistic I read long ago that stated that there were more Sherlock Holmes societies in Japan than their were in the UK.  As it turns out, according to the list of active Sherlockian societies kept by Peter E. Blau (a member of the Baker Street Irregulars, the most illustrious Sherlockian society), Japan has 15 societies while the UK has 16.  Still, the figure is impressive and made me curious how Holmes’ popularity (and Dracula’s) compared by geographical region and language.  Alas, I don’t know how to translate Holmes into Japanese or Russian (there is a large following there as well) so I’m limited to American and British English for Google’s Ngram Viewer.  However, the results were still fascinating:

I find it fascinating that to the Americans, Holmes's popularity grew far more rapidly than in England, yet once again, the vampire steals the show.

I find it fascinating that to the Americans, Holmes’s popularity grew far more rapidly than in England, yet once again, the vampire steals the show.

It would seem that while Holmes was very popular in the UK since his creation, Dracula has recently stolen center stage--in spite of all the latest Sherlock re-imaginings.

It would seem that while Holmes was very popular in the UK since his creation, Dracula has recently stolen center stage–in spite of all the latest Sherlock re-imaginings.

In conclusion, I think Holmes would have been a DHer.  The man who cried, “Data! Data! Data! [....] I can’t make bricks without clay,” would have appreciated the way in which DH offers one tremendous information at one’s fingertips and the tools to make sense of it.  Holmes would especially have to appreciate the fact that the methods of the Digital Humanities could be used to catch our own Napoleon of Crime, so to speak, Osama bin Laden.  And as for Dracula?  Well, clearly DH has brought him out into the light of day.

 

Wordly wobblings

My findings from the Google Ngram Viewer are that we did not like “idea” very much in the first half of the eighteenth century.  Our feelings about “truth” have varied substantially; we liked it quite a lot during the mid-nineteenth century, but in 1910 we started preferring “idea” and this has stayed fairly consistent since then.  Ngram

My Up-Goer Five definition of DH goes like this:

It is about doing old things in new ways. Or, if you ask another person, it is about doing new things in even newer ways. People who do it don’t agree on what things are most important or how to study them. Human life changed when books did away with forms of writing that came before them. Computer forms of stuff that used to be only on paper might be doing the same thing now. Computers can make stories look different, but does that mean that they ARE different at the bottom? Or is it only the way that we look at them? If we use computers to read books, we can study different ideas about them. The question is whether those kinds of ideas leave out the kind that came before. The question is also whether the old kinds of study leave out ideas that one can only reach by using new ways. Perhaps the best way to put the question is: How do we decide whether the old or new way is best for something we want to learn (or, better yet, how we can put the two together)?

 

While the original XKCD comic is funny, I think this concept can only work well when humor, not communication, is the point.  It could be helpful if someone is taking him/herself too seriously and wants to re-evaluate a statement in search of excessive jargon, but it does not seem useful for describing something to someone who does not already know what you are talking about.  Without the words “digital,” “humanities,” “electronic,” or “interpret” I wasn’t able to make a definition that could let somebody who had never heard of DH know what I was describing.

So, on to Wordle.  I used the Gutenberg text of King Lear (minus the fine print and introductory “comments”) and this was what I got:

http://www.wordle.net/show/wrdl/6368114/Gutenberg_King_Lear_Wordle_for_ENGL_668K

Word it Out gave me this:

WordItOut-Word-cloud-162602

Obviously, the speech prefixes dominate these clouds; Lear and Kent are the most prominent in both clouds.

Running the Word it Out list through the Up-Goer Five produced these words:

tell one night Sister say make see great done further now man hath long life late Daughter good Daughters Enter name mans answer away yet part better Father fit eyes nothing cold else old some Horse Gods time home go hand least way take Letter heard here much against still know Sir rather heart both all though found more come art Let most well like little many place follow age gone made other comes hold death none mad call within Brother full power hast head Sisters makes Lady after two set being put came do’s thing What’s toward Boy where’s best world thought men reason stand word Oh before any dead first bring house Friend blood matter true since told dost draw fire doth Fathers course things cause strange sight stands

 

One thing that surprised me is that “Lady” could stay but “gentleman” had to go.  Someone who was not aware of the context could probably gather that family relationships are a major theme of the work represented, but could probably not go much further than that.

The CLAWS tagger produced this:

-----_PUN 
place_NN1 hast_VHB turne_NN1 feare_NN1 Storme_NN1 Master_NN1 since_CJS 
i'th_NN1 th_NN0 Edgar_NP0 halfe_NN1 Edg_NN1 businesse_NN1 else_AV0 Enter_VVB 
leaue_NN1 Slaue_NN1 done_VDN thing_NN1 stand_NN1 heare_NN1 Ha_ITJ Regan_NP0 
Cornwall_NP0 speake_NN1 Lady_NN1 comes_VVZ world_NN1 Madam_NN1 head_NN1 
some_DT0 still_AJ0 Sword_NN1 Sir_NN1 againe_VVB thy_DPS farre_NN1 liue_NN1 
till_PRP any_DT0 Cordelia_NN1 most_AV0 set_VVN Knaue_NP0 told_VVD forth_AV0 
fire_VVB Brother_NP0 Daughters_NP0 Ile_NP0 meanes_NN2 gaue_VVB none_PNI 
being_VBG fit_AJ0 know_VVB within_PRP do'st_NN1 Douer_NN1 Cor_ITJ call_NN1 
nor_CJC Bast_VVB other_AJ0 Gentleman_NN1 Foole_NN1 backe_NN1 men_NN2
things_NN2 Noble_AJ0 neuer_NN1 Trumpet_NN1 pray_VVB seene_NN1 Alacke_VVB 
hither_AV0 goe_VVB now_AV0 Glou_NP0 more_AV0 bring_VVB vp_NN0 true_AJ0 
though_CJS much_AV0 two_CRD Villaine_NP0 euer_NN1 heard_VVD fellow_NN1 
gone_VVN Edmund_NP0 Scena_NP0 Fortunes_NN2 hold_VVB put_VVB where_AVQ 's_VBZ 
whom_PNQ take_VVB himselfe_NN1 do_VDB 's_POS Corn_NN1 ere_PRP sleepe_NN1 
euery_NN1 better_AJC King_NN1 say_VVB Stew_NN1 deere_NN1 first_ORD bin_NN1 
Fathers_NN2 finde_NN1 Duke_NP0 Gent_NP0 Gloster_NP0 cause_NN1 Knights_NN2 
good_AJ0 name_NN1 Oh_ITJ T_PNP is_VBZ returne_NN1 Sonne_UNC Horse_NN1 away_AV0 
France_NP0 Exit_NN1 Bastard_NN1 looke_NN1 make_VVB after_PRP o'th_NN1 
Prythee_NN1 wits_NN2 makes_VVZ Reg_NP0 word_NN1 little_AV0 vs_PRP Steward_NN1 
like_PRP age_NN1 Nature_NN1 thine_DPS cold_NN1 follow_VVB shalt_VM0 
against_PRP stands_NN2 What_DTQ 's_VBZ rather_AV0 way_AV0 seeke_VVB 
further_AV0 came_VVD Father_NN1 haue_VHB answer_NN1 knowne_NN1 long_AV0 
home_AV0 many_DT0 loue_VVB Sisters_NN2 life_NN1 Gods_NN2 late_AV0 thee_PNP 
made_VVD Fortune_NN1 Alb_NP0 eyes_VVZ nothing_PNI farewell_NN1 Edmond_NP0 
feele_NN1 purpose_NN1 Tom_NP0 old_AJ0 Friend_NN1 see_VVB found_VVN least_DT0 
power_NN1 dead_AJ0 Traitor_NN1 well_AV0 Let_VVB vse_NN1 toward_PRP blood_NN1 
euen_NN1 Lear_NP0 draw_VVB Lord_NN1 reason_NN1 mad_AJ0 strange_AJ0 heart_NN1 
here_AV0 Letter_NN1 yet_AV0 Albany_NP0 Gon_NP0 Gonerill_NP0 man_NN1 part_NN1 
one_CRD great_AJ0 Glo_NP0 dost_VDB heere_AJ0 giue_NN1 downe_NN1 doth_VDZ 
poore_NN1 lesse_NN1 come_VVB hand_NN1 Kent_NP0 Grace_NP0 art_NN1 helpe_NN1 
go_VVB matter_NN1 foule_NN1 course_NN1 thou_PNP strike_VVB Boy_NN1 vpon_NN1 
whose_DTQ thinke_NN1 thought_NN1 beare_NN1 peace_NN1 hath_VHZ Exeunt_UNC 
death_NN1 full_AJ0 Sister_NN1 owne_NN1 house_NN1 selfe_NN1 night_NN1 best_AJS 
Fiend_NN1 keepe_NN1 both_AV0 tell_VVB Ste_NN1 mans_NN2 sight_VVB Glouster_NN1 
all_DT0 hence_AV0 before_PRP Daughter_NN1 time_NN1 ..._SENT **42;7;TOOLONG_UNC

I’m sorry; I can’t give a useful analysis of this.  The site is the opposite of the word cloud generators in that it is not even a little bit user-friendly.  The key to tags is not straightforwardly organized.  I tried to find what “NPO” (or possibly “NP0) might mean, but it was not in the list.  Perhaps this would make more sense to me if I knew something about coding.

Pushing onward into the land of things I don’t understand, I approached TAPoR and HyperPo.  Using this site was extremely frustrating because, once I uploaded the text (I couldn’t copy and paste, so the Gutenberg “comments” came along for the ride), the resulting window did not include labeled buttons.  I got the following analyzing the word “daughter”:

"Daughter"

 

If I’m using it right, this tool indicates that the word “daughter” occurs most often in Act 1, Scene 2 — the scene in which Lear divides his kingdom.  This scene coincides with the highest number of mentions of “Cordelia” but not of “Gonerill” or “Regan.”  I think this set of tools has the most potential usefulness, but I had trouble understanding how to make them useful.  I tried some of the “help,” “tutorial,” and “tour” features, but I kept running into “page not found” and “router error” messages; I don’t know if I was doing something wrong or if the site just wasn’t working very well.

Ramsay was right:  these tools make the text of King Lear look completely unfamiliar.  As I flailed about through these mysterious new waters, I found that the mere strangeness of what I was seeing was almost overwhelming.  I can see that I might eventually be able to put these tools to productive use, but first I need to become more comfortable navigating digital environments.

Seeing the Forest through the Thees (and Thous)

My initial reaction to Ramsay’s statement is that for me nothing quite induces the defamiliarization of textuality like invoking the ostranenie of Russian formalists. I’d like to see someone explain that passage in Upgoerfive! That having been said, I found this week’s exercises quite thought provoking and exciting. As soon as I began my first attempts to create word clouds with Augustine’s <i>Confessions</i>, I knew there were going to be problems with my particular translation, the language of which is extremely antiquated. Because of the language, my initial Wordle showed “Thee”, “Thou”, and “Thy” to be the most common words (because they are not in their basic stoplists, of course, even though a modern translator would say “you” and “your”).  Further examination revealed that there were a large number of other very common words in archaic forms in my text.

Through some trial and error, and using a text editor with advanced Grep capability to perform some batch replace procedures on my text file, I managed to generate a more satisfactory result. The Wordle and WordItOut versions seemed quite similar in my case. And even though WordItOut seems to offer somewhat easier manipulation of the final ouput, I’m posting the Wordle because I agree with others that they tend to look better:

Wordle

I found this to be a surprisingly good encapsulation of many of the main themes of the Confessions. Putting the resulting words into UpGoerFive resulted in the following list of words used frequently in my text that were not among the more commonly used in English today:

nor, unto, lord, earth, soul, whom, heaven, itself, therefore, neither, behold, joy, spirit, whence, flesh, holy, certain, unless

Here we can see that the archaic language is still apparent, even after my attempts to modernize the most frequently used archaic words.  ”Nor”, “unto”, and “whom” should really probably be on the stoplist since the ideas that my old translation is expressing with them would probably be expressed with stoplist words in a translation written today.  But if we look past those words, the remaining results are reasonably instructive, and a machine trying to ‘comprehend’ what the Confessions are about would have a reasonably easy time of it, I suspect.

The CLAWS tagger seems quite powerful though its results didn’t immediately speak to me.  I did notice that it seems to have mis-identified Augustine’s use of “times” as a preposition.  CLAWS becomes particularly powerful, it would seem to me, if one were to convert the results list to a spreadsheet that can be easily sorted by part of speech.  TAPOR likewise looks like a very powerful toolset — if I’m not mistaken its concordance generator could could accomplish Father Busa’s entire project in a matter of a few minutes — assuming one had the works of Aquinas available in text files.

Ultimately, though, coming back to the question of defamiliarization of the text, this week’s exercises proved to me that there is something valuable in breaking our texts down in this way — even if I’m not sure I see where this is all headed just yet.  Text mining procedures like these seem to be taking apart the forest and sorting the trees by species, size, age, etc.  Surely that would be useful information for a biologist studying the forest, but how we will get from stacks of trees over to understanding biodiversity still remains unclear to me.

No Clever Title

I used the text of John Henry Newman’s The Idea of a University that I found on the Project Gutenberg website last week to produce two word clouds on Wordle and WordItOut.

NewmanWordleWordItOut-Newman

There were two differences that jumped out right away when I compared the two: WordItOut seemed to do a better job of weeding out stopwords (“may”), and Wordle accepted without question what I’m pretty sure are character-encoding errors (the pseudo-words beginning with ‘Ä’).

I had pretty much the same experience as everyone else did when I pasted the words from the WordItOut word cloud into the Up-Goer Five Text Editor: it rejected 26 of the words (although it wasn’t concerned in the least that the words, in that order, did not constitute a syntactically valid English sentence.).

I then pasted the same words from the WordItOut word cloud in the CLAWS Part-of-Speech tagger. For some reason, the text pasted with spaces between the words, and I had to enter the spaces manually. I noticed that the word list had a similar effect to the “entropic poem” on page 37 of Ramsay’s Reading Machines, which surprised me, since I had assumed that that effect would only be perceptible in a short text.

I get the point of tools like this. There’s a similar one called William Whitaker’s Words that’s very popular among students learning Latin, although the fact that CLAWS accepts bulk input (unlike Whitaker’s Words) is an improvement on the model. And there are useful things, I suppose to be learned about a text from such tools (e.g., to confirm or deny the claim that John Calvin never used adverbs in writing). I didn’t, however, find the output of CLAWS particularly edifying in this case:

CLAWS Output

 

I attempted to hand off the URL for the plain text on the Project Gutenberg site directly to TAPoR using “Your Web Page”, but what I got was an HTTP 403 Forbidden error, so I played with Chapter 1 of Moby Dick instead. My sense was that the HyperPo does need a body of text longer than a single chapter in order to be really useful rather than a curiosity.

I don’t feel qualified to comment on whether the use of these tools produces an effect of estrangement and defamiliarization of textuality in general — not being a literature student, I’m not used to relating to textuality in the abstract, as opposed to a particular text or texts. My impression is that tools of this kind will do much more for you if you already know something about the text you are examining in this way, and I certainly got a lot more out the examination of Gratian’s Decretum than of Newman’s Idea of a University.

Art and Science as Complementary Opposites

I was very drawn to the argument Ramsay puts forth in Reading Machines. This might be because out of all of the readings thus far (okay, only two week’s worth of reading, but last week had a good amount of material . . .), Ramsay most willingly acknowledges the divide between humanistic inquiry and computational method. Indeed, as Ramsay argues, while each contains a kernel of the other, algorithmic criticism seeks definitive answers, while literary criticism seeks unanswerable questions.

In this blog post I will try to focus only on “Preconditions” and the first chapter, “An Algorithmic Criticism,” of Ramsay’s book, perhaps setting my own constraints for myself. I do this to save the rest of my thoughts for class on Wednesday, and I will use this post as a jumping-off point for discussion.

It is difficult to explain why the pairing of two opposing modes of inquiry fascinates me. This discussion reminds me of the interests of early science fiction writers, who, influenced by the Romantic period, used the very methods of rationalism and science as a form of critique. Ramsay nearly states exactly this in his discussion of art and science:

“Art has very often sought either to parody science or to diminish its claims to truth.”

With this ever-present tension, how could we possibly use text analysis to aid literary criticism in a way that does not remove the basic tenets of humanistic inquiry? Ramsay has a few answers to this. Computer-based tools represent a limitation that allows us to reorganize and understand a text in new ways. While text analysis can only concern itself with verifiable facts, the user is left to decide what to do with these “facts.”

In other words, computer-based tools like text analysis often act as a form of provocation, a starting point for us to delve deeper into an issue. I certainly encountered this in my own limited/crude experiment with Woodchipper, a topic modeling tool. The fear that comes with using many of these tools—and here I might break my own constraint and reach into the other chapters—is that they can only tell us what we already know. This might be a problem with methodology, as Ramsay points out. The more worthwhile experiments are the ones that tell you things that suggest the opposite of what you believe. Certainly as computer-based tools grow more complex and sophisticated, they will be able to give us answers to questions we previously believed only humans could address. But Ramsay is more interested in discourse rather than methodology:

“. . . we can refocus the hermeneutical problem away from the nature and limits of computation (which is mostly a matter of methodology) and move it toward consideration of the nature of the discourse in which text analysis bids participation.”

Another issue which Ramsay may or may not address is that while you can produce results using text analysis (and other tools) without having read the text in question, you may not be able to interpret those results. This is certainly true for Ramsay’s experiment with The Waves. As Ramsay points out after running an equation regarding the speakers in the novel,

“Few readers of The Waves would fail to see some emergence of pattern in this list.”

But what if you haven’t read The Waves? It is a short book, and one you would certainly be expected to have read if you decided to publish anything, including an experiment with text analysis, on the novel. But this issue becomes a problem when we consider “distant reading,” which purports not to require any general or specific knowledge of the text. In fact, distant reading discourages it.

But if you cannot interpret the results unless you have read the book in question, how are we supposed to approach the topic: “How to Read a Million Books.”? Even when we consider a hundred or a thousand books at once (or millions, as described in the TED talk video), it might be helpful to know at least a few things about each one, like the fact that The Waves features six speakers.

Here is where methodology asserts its importance once again. Only when a computer-based tool becomes sophisticated enough to allow for interpretive analysis without engaging with the text directly can these tools usurp the primacy of the reader. Perhaps we have reached this stage already, but I cannot help but cling to the importance of close reading, even as we compare a work to hundreds, thousands, or even millions of others.

War of the Wordles

Unfortunately, I lost my first Wordle of War of the Worlds, which had a beautiful custom palette and Martian-like font, and now I’m really mad that I couldn’t find a search function on the Wordle site’s public gallery. Boo. So here’s a second one.

wordle

And, the much uglier WordItOut!

WordItOut-Word-cloud-162393

Interestingly, many configurations of the Wordle sketch out a bare-bones premise for the book with the most prominent words: “Martians Came”. Both “Mars” and “Earth” are very small, and don’t even appear in the WordItOut! There are few proper nouns, no character names, but places like “London” and “Woking” show up. “Black” and “red” are also prominent, as are sensory words like “heard”, “see”, “saw.” “Seemed” is much bigger than “know,” giving a feel for the uncertainty that haunts much of the action of the book. The WordItOut! on the other hand, picked up much more common “filler” words like “said,” “about,” “through,” “over.” It was also much less fun to play with. Much of the appeal of the Wordle for me was arranging the layout so as to maximize the “sense” I could make out of it visually: how much of the basic “plot” or action words could I manage to juxtapose and highlight with color, straight or curved lines, font “appropriate” to the subject matter? As Ramsay suggests, this is perhaps the greatest potential of text-analysis tools–the ability to operate at a new scale and to manipulate the text on different levels than “close reading” allows.

Not surprisingly, very few of my Wordle words were allowed in the Up-Goer Five Text Editor. While experimenting with Up-Goer Five, I was trying to figure out the best approach–do I hand-pick words from the list of ten hundred, or do I build my definition by attempting to write it first, and then “translate” it? I wove back and forth between these approaches, picking some words and then trying out other phrases that were inspired by them. Ultimately I was disappointed, and I must say my definition of DH was more flippant than informative: “Many conversations about building, making, thinking. doing; money, jobs. Using computers to study humans and read/write ‘algorithmically.’” Without punctuation it’s as long as a tweet.

When I input the Wordle text into the CLAWS Part-of-Speech tagger, it interestingly read many of the verbs as gerunds, tagging them as adjectives. I would really like to know what others think the best application of a tool like this would be. I immediately thought it could be used as a translation aid from one corpus to another, but this doesn’t seem to be a feature.

TAPoR was honestly the tool that got me most excited and seemed most applicable to my research on women’s alternative/independent publishing. It was easy to “mess around” in–I’ve never done any text analysis before but at the most basic level I knew what a stop-word list was, and could figure out how to get the tool to “spit out” what I wanted to see. The descriptions that appear when you hover over a tool were immensely helpful and I found myself wishing every DH project or toolbox had this feature. Interested by the appearance of place names like London and Woking, I graphed these on the concordance tool to see the protagonist’s (and the Martians) geographical movements through the novel. I also graphed “Martians” and “People,” the occurrence of which mirrored each other for most of the novel before “People” drops off sharply toward the end, when the protagonist is moving through deserted houses and communities. This exercise really tested my knowledge of the “plot points” in the book–I found myself remembering details that seemed insignificant, all by looking at a graph of the words. I’m just itching to digitize some zines, scrape their text, and compare all the instances of “queer,” “feminist,” and “anti-racist” I can find.

I also couldn’t help but smile at the title of these tools: “Voyant: See through Your Texts.” The entendre is irresistible–use “your texts” (whatever they may be) as a pane or a lens through which to view a specific topic, and/or make your texts transparent, lucid; make bare their meanings. Of course, the implication of Ramsay’s argument is that none of these tools, or the texts to which we apply them, are “transparent.” We might be able to “see” our text differently, from new angles an at previously hidden layers, but it is dangerous to assume that nothing resists the self-evidence of scholarly vision. My partner, who was watching me do these experiments and also helping me with the necessary plugins to run them, kept lingering on these sites to figure out what kinds of algorithms they use and what kinds of patterns they’re finding. I’m not sure most users think about the tools on those levels [DH-ers and hackers are, as usual, another story], and it would be easy to tout their potential while forgetting that our interpretations, the most valued currency in some humanities disciplines, are just begging to be made.

 

Loved by the King?

I’ve seen Wordles used before in school projects, but usually for display purposes rather than used as an analytical tool.  Therefore, I was excited to see the application given a new purpose that teachers could easily use in school for a variety of texts.

Word Clouds!

When I imported Project Gutenberg’s text of the first volume of Le Morte D’Arthur into Wordle and Word it Out, these were my results (Sadly, I discovered that the “Loved by the King” font in Wordle was not very, well, kingly, so I switched it to a more appropriate font):

Wordle

Wordle

 

Word It Out

Word It Out

It’s not surprising that the most prominent word in both is “Sir”, as most of the characters go by that epithet, nor that “king” and “knight” are also frequently used, emphasizing the courtly genre of the text.  ”CHAPTER” probably is featured since the table of contents was included in my copy and paste, in addition to all the times it is usually used.  I was surprised that Tristram beats out Arthur (in a book titled after him!)  I also found it interesting that words such as “smote”, “battle”, and “slain” are much more prominent than “God” and “worship”, hinting that the divine justification for most of the fighting was not as much of an excuse as it purported to be.

Paraphrasing with Up-Goer Five

Screen Shot 2013-02-13 at 3.35.29 AM

Like many of my classmates, I found when I put the top 100 words into Up-Goer Five, that about half the words were not permitted, primarily in the proper name, antiquated term, and knightly terminology categories.  I would doubt the ability of someone to use the Up-Goer Five to summarize books like this with difficult language if I hadn’t seen their application to Hamlet’s “To Be or Not To Be” speech.  (I actually recommended this application to my former co-workers, many of whom require their students to paraphrase the famous soliloquies in Shakespeare’s plays on their tests.)

And I thought I was free from dealing with parts of speech…

CLAWSI was impressed by the CLAWS Part of Speech Tagger’s ability to correctly identify even the antiquated pronouns such as “ye” and “thee”, but other than that, I found it difficult to see how these kinds of results could be useful in an analysis of the text.  Maybe if there were further calculations applied (frequencies of parts of speech?) I could have seen those patterns to turn into narratives–or at least questions–that Ramsay suggests.

Making some conclusions with TAPoR

When I first plugged the text of Le Morte D’Arthur into TAPoR, the frequency count and “Cirrus” were both dominated by articles and other “unimportant” words, but when I asked the program to remove them, it generated a list almost identical to that of Wordle and Word It Out!  The Word Trends graphs, though, got interesting when I decided to click on those prominent names.

Frequency of Arthur, Tristram, and Launcelot's appearances in the book

Frequency of Arthur, Tristram, and Launcelot’s appearances in the book

 

Leaving the “Segments” setting at 10 to roughly mimic the 9 books in Vol. 1, I discovered that Arthur most frequently appears at the beginning of the book (which makes sense, given that it is devoted to the story of how he came to power), and then is practically forgotten about.  Likewise, Tristram dominates the last part of the book, even more so than Arthur.  This makes sense because book 8 is all about Tristram’s adventures.  Similarly, Launcelot spikes in the middle of the graph, as book 6 is all about his deeds.  The juxtaposed graph shows clearly how Malory attempted to integrate all the various legends about the knights which had come from different sources, choosing to do it in an episodic fashion focusing on the character rather than jump back and forth between multiple storylines as is more typical of contemporary literature.

So what is it like to read this?

I think that these activities did have a sense of what Ramsay refers to as  ”ostranenie–the estrangement and defamiliarization of textuality” (3).  However, I’m skeptical as to how far we can take algorithmic analysis when the potential for grasping at straws exists.  As Ramsay mentions later on,

If something is known from a word-frequency list or a data visualization, it is undoubtedly a function of our desire to make sense of what has been presented. We fill in gaps, make connections backward and forward, explain inconsistencies, resolve contradictions, and, above all, generate additional narratives in the form of declarative realizations (62).

How much of this meaning is because we want to see meaning there?  And how much is built on prior assumptions?  For example, am I reading too much into the Word Trend charts of Malory because I know that his project was one of compilation, rather than invention?  I think this gets even trickier when you analyze results of an algorithm that you have designed–your own biases and/or assumptions are built into the project from the start.  Hopefully we’ll talk more in class about when these types of practices are productive and when they produce results that just mirror what we already think.

 

(And if you’re interested in seeing the outcome of Unicorns vs. Zombies according to Google N-Gram, check out my blog post!)