Paper Machines???

Has anyone tried to run Paper Machines? I have downloaded all the pre-req’s and I know it’s installed (my Firefox just updated and prompted me to review my add-ons – both Zotero and Paper Machines appeared in the list), but I don’t know how to initiate it in Zotero. The directions on GitHub are very sparse:

To begin, right-click (control-click for Mac) on the collection you wish to analyze and select “Extract Texts for Paper Machines.” Once the extraction process is complete, this right-click menu will offer several different processes that may be run on a collection, each with an accompanying visualization. Once these processes have been run, selecting “Export Output of Paper Machines…” will allow you to choose which visualizations to export.

When I right-click on a collection, no such option appears. This is what I see, even with all options investigated:

Screen Shot 2013-02-19 at 5.15.30 PM

Anyone else have any success?

Sherlock Holmes Would Have Been a DHer

Alright, it is time for a super geeky confession: I belong to a Sherlock Holmes society.  At the last meeting a number asked me what I was studying and I tried to explain Digital Humanities to them.  It wasn’t, shall we say, the greatest success.  So I’ve been thinking, at one of our next meetings maybe I’ll finally give the presentation–a duty I’ve shirked for all of the 10 years I’ve belonged to the club.  I was trying to think of ways to blend DH with Sherlock Holmes and show how even the most basic of DH tools might be useful when understanding the Sherlock Holmes stories.

Well, the work for this coming week to find a library of sorts related to our texts started me thinking about the similarities between Dracula and Sherlock Holmes–and the men responsible for their creation.  Both authors considered themselves to be the epitome of the Victorian gentleman–upholding the beliefs fundamental to that image.  As such, wouldn’t they have a tendency to choose from the same offerings of the LDA Buffet?  Some additions of Dracula, such as my Project Gutenberg copy, even bill it as “A Mystery Story.”  Would the two men’s word choice reflect this similarity in experience and ideal?

Dracula Word Cloud Sherlock Holmes Wordle

I tried doing the Holmes word cloud with one text–Hound of the Baskervilles–but the names like Baskerville and Henry started to dominate so much so that one couldn’t see much of the other language, so to balance it out I stuck as much of the Sherlockian Canon as I could find into Wordle the resulting “footprint,” if I may so call it, seems more representative of Sir Arthur Conan Doyle’s writing as was the goal.  And judging by the results, it would seem that the two do share a similarity in word choice.  Words like “man,” “know,” “must,” “may,” “light,” “night,” and so on all have strong followings in the clouds.

Now, I’ve often heard said of Doyle that he was not a terribly good writer and that he, instead, had the good fortune to create a character who was original and fascinating enough to come to life in spite of this less than fortuitous entrance into the world.  Holmes captured the imagination of the readers in spite of Doyle’s talent rather than because of it.  Could the same be true of Dracula seeing the linguistic similarities between their authors?  I’m not entirely sure how to test this particular theory–maybe someone else will be able to suggest one–but I thought I could test how the popularity of the characters of Dracula and Holmes have compared to that of their creators.  The idea being that if Holmes and Dracula and their creators shared the limelight it would suggest that there was as much to be said about the creation as the creator.  Doyle and Stoker would be as interesting as authors as their creations were as literary characters.  The result is as follows:

Screen Shot 2013-02-14 at 6.43.00 PMGoogle’s Ngram Viewer would seem to support this theory.  The characters have survived far better than their creators–in fact, Holmes leaps to the forefront from the instant of his creation (Dracula has a bit more of an uphill battle at first).  But maybe this is to be expected?  Do characters always do better than their creators?  If so, let’s test on an undeniably talented author and their beloved creation, Jane Austen and Darcy:

Screen Shot 2013-02-14 at 6.44.54 PM

Now, the one problem with the above, is that it doesn’t take into consideration that Darcy is rarely called by his full name and has a very common one, at that, unlike Holmes and Dracula.  So, here is the above result modified with the revision of “Mr. Darcy” rather than simply “Darcy.”  It is not ideal, how often does, when writing about Austen’s ideal man, so formally refer to him as “Mr. Darcy.”  But, one should at least be able to mentally average the two results to attain some sense of our Darcy’s popularity in English writing:

Screen Shot 2013-02-14 at 6.45.08 PMSo clearly, this is not true among all authors and their creations.  Austen gives Darcy a run for his money.  Now, one must also take into account that Austen published far more texts than Stoker or Doyle.  Her’s were also far more popular–anyone heard of or remember The White Company? No? That suggests to me that Doyle’s talent with the written word is not as strong Holmes’s persistance in the memory.

Further, this research suggests that Stoker and Dracula shared a similar relationship with their fictional creations and made similar word choices.  We can’t definitively prove that Stoker and Doyle were particularly terrible writers, but the results suggest that other writers do not stand in the shadows while their creations take the limelight as these two do.

As a final note: the class discussion of anime reminded me of a statistic I read long ago that stated that there were more Sherlock Holmes societies in Japan than their were in the UK.  As it turns out, according to the list of active Sherlockian societies kept by Peter E. Blau (a member of the Baker Street Irregulars, the most illustrious Sherlockian society), Japan has 15 societies while the UK has 16.  Still, the figure is impressive and made me curious how Holmes’ popularity (and Dracula’s) compared by geographical region and language.  Alas, I don’t know how to translate Holmes into Japanese or Russian (there is a large following there as well) so I’m limited to American and British English for Google’s Ngram Viewer.  However, the results were still fascinating:

I find it fascinating that to the Americans, Holmes's popularity grew far more rapidly than in England, yet once again, the vampire steals the show.

I find it fascinating that to the Americans, Holmes’s popularity grew far more rapidly than in England, yet once again, the vampire steals the show.

It would seem that while Holmes was very popular in the UK since his creation, Dracula has recently stolen center stage--in spite of all the latest Sherlock re-imaginings.

It would seem that while Holmes was very popular in the UK since his creation, Dracula has recently stolen center stage–in spite of all the latest Sherlock re-imaginings.

In conclusion, I think Holmes would have been a DHer.  The man who cried, “Data! Data! Data! [....] I can’t make bricks without clay,” would have appreciated the way in which DH offers one tremendous information at one’s fingertips and the tools to make sense of it.  Holmes would especially have to appreciate the fact that the methods of the Digital Humanities could be used to catch our own Napoleon of Crime, so to speak, Osama bin Laden.  And as for Dracula?  Well, clearly DH has brought him out into the light of day.

 

Art and Science as Complementary Opposites

I was very drawn to the argument Ramsay puts forth in Reading Machines. This might be because out of all of the readings thus far (okay, only two week’s worth of reading, but last week had a good amount of material . . .), Ramsay most willingly acknowledges the divide between humanistic inquiry and computational method. Indeed, as Ramsay argues, while each contains a kernel of the other, algorithmic criticism seeks definitive answers, while literary criticism seeks unanswerable questions.

In this blog post I will try to focus only on “Preconditions” and the first chapter, “An Algorithmic Criticism,” of Ramsay’s book, perhaps setting my own constraints for myself. I do this to save the rest of my thoughts for class on Wednesday, and I will use this post as a jumping-off point for discussion.

It is difficult to explain why the pairing of two opposing modes of inquiry fascinates me. This discussion reminds me of the interests of early science fiction writers, who, influenced by the Romantic period, used the very methods of rationalism and science as a form of critique. Ramsay nearly states exactly this in his discussion of art and science:

“Art has very often sought either to parody science or to diminish its claims to truth.”

With this ever-present tension, how could we possibly use text analysis to aid literary criticism in a way that does not remove the basic tenets of humanistic inquiry? Ramsay has a few answers to this. Computer-based tools represent a limitation that allows us to reorganize and understand a text in new ways. While text analysis can only concern itself with verifiable facts, the user is left to decide what to do with these “facts.”

In other words, computer-based tools like text analysis often act as a form of provocation, a starting point for us to delve deeper into an issue. I certainly encountered this in my own limited/crude experiment with Woodchipper, a topic modeling tool. The fear that comes with using many of these tools—and here I might break my own constraint and reach into the other chapters—is that they can only tell us what we already know. This might be a problem with methodology, as Ramsay points out. The more worthwhile experiments are the ones that tell you things that suggest the opposite of what you believe. Certainly as computer-based tools grow more complex and sophisticated, they will be able to give us answers to questions we previously believed only humans could address. But Ramsay is more interested in discourse rather than methodology:

“. . . we can refocus the hermeneutical problem away from the nature and limits of computation (which is mostly a matter of methodology) and move it toward consideration of the nature of the discourse in which text analysis bids participation.”

Another issue which Ramsay may or may not address is that while you can produce results using text analysis (and other tools) without having read the text in question, you may not be able to interpret those results. This is certainly true for Ramsay’s experiment with The Waves. As Ramsay points out after running an equation regarding the speakers in the novel,

“Few readers of The Waves would fail to see some emergence of pattern in this list.”

But what if you haven’t read The Waves? It is a short book, and one you would certainly be expected to have read if you decided to publish anything, including an experiment with text analysis, on the novel. But this issue becomes a problem when we consider “distant reading,” which purports not to require any general or specific knowledge of the text. In fact, distant reading discourages it.

But if you cannot interpret the results unless you have read the book in question, how are we supposed to approach the topic: “How to Read a Million Books.”? Even when we consider a hundred or a thousand books at once (or millions, as described in the TED talk video), it might be helpful to know at least a few things about each one, like the fact that The Waves features six speakers.

Here is where methodology asserts its importance once again. Only when a computer-based tool becomes sophisticated enough to allow for interpretive analysis without engaging with the text directly can these tools usurp the primacy of the reader. Perhaps we have reached this stage already, but I cannot help but cling to the importance of close reading, even as we compare a work to hundreds, thousands, or even millions of others.

From Hell’s Heart I Graph At Thee!

The idea of quantifying Moby-Dick is simultaneously exciting and perhaps not altogether surprising given the results of some of the returns from the tools we were instructed to use. The novel is packed with Shakespearean language, is about a very specialized topic (whaling), and formally very odd in places. But that, of course, just means Moby-Dick is an ideal text for these sorts of experiment, right? Let’s see…

First, I ran Moby-Dick Wordle, resulting in this diagram:

Secondly, WordItOut:

The most obvious difference between the two is the choice for the largest word. ‘Whale’ and ‘one’, are unsurprisingly the largest words represented on the image. WordItOut, however, displays ‘all’ as its largest word, with ‘whale’ and ‘one’ the runners-up. The word ‘all’ is not represented on Wordle’s image, meaning it is cast aside in that program as an all-too-common word to be of any use. Now, I do see the logic in this decision in some form; ‘all’ is a common word, and sometimes can be used as a needless intensifier or a purely quantitative word. In this case, however, I contest Wordle’s decision; in Ahab’s final monologue he explicitly describes Moby Dick as “all-destroying” as he speeds, harpoon in hand, towards the beast that is destroying his ship. The ‘all’ in this case is not just a simple word, it’s an intensifier certainly, but it represents Ahab’s life (the whaling trade), and Ahab himself (his soul has been scarred and his body maimed). It is possible to read this word with more than the mere commonality ascribed to it by Wordle’s software.

Secondly, the major characters of the novel are mentioned: Queequeg, Stubb, Starbuck, and Ahab, but there are some missing. Ishmael is gone despite being the narrator, but aside from the opening sentence, his name is barely mentioned if at all (mostly just annotations ever recall to his name). More interesting, though, is the absence of one of Ahab’s right-hand men: Flask. Naturally, this means he is mentioned less, or at least referred to by name fewer times than the other first mates of the Pequod, but perhaps this opens up a line of inquiry to pursue: why are Starbuck and Stubb getting so much attention as to appear quantitatively more visible?

Next, I placed the contents of the word cloud into the Up-Goer Five, receiving the expected list of forbidden words:

Stubb, stub, brush, check, end, point, boats, captain, sperm, sea, ship, thou, nor, boat, Ahab, ye, whales, deck, Queequeg, Starbuck, chapter, whale, among

This list can be divided easily into three categories: Names (Stubb, Ahab, Queequeg, and Starbuck), archaisms (thou, ye, nor), and nautical terms (stub, brush, check, point, boats, sperm, sea, ship, boat, whales, deck, whale). None of these are surprising to see on the list considering the names are odd, the archaisms by definition not going to be common, and our modern society is less reliant on ship-trade as to render the nautical terms more scarce, and I would guess they wouldn’t appear in the top 1000 words in 1851 either.

The interesting remainders are end and among, which, I’ll admit, I am surprised are not within the ten hundred most used words.

Next comes the CLAWS speech tagger. This tool, as Mary and Dan reported, is not only less visually appealing, but less clear to someone not familiar with its format to read. But the tool was surprisingly good at recognizing the propers nouns (Queequeg, Stubb, Starbuck, and Ahab) as such, and not returning some sort of error or even just suggesting them as nouns. Since proper nouns are typically dependent upon context to recognize, CLAWS’ ability to recognize them is impressive. Aside from the names, there are mostly nouns and adjectives represented by list, with a few prepositions (upon, among) and an interjection (oh), but fewer verbs than I expected, with only five by my count: said, cried, go, thought, and know.

Finally, with the TAPoR/Voyant tool, I found myself lucky that the first chapter of Moby-Dick was a default on the website. Unfortunately, the diagnostic returned was not all that interesting, so I went ahead and uploaded the entire text.

The cloud, or ‘cirrus’, for Voyant is prone to including “useless” words, as you can see, like articles, but fortunately, while it does not take the liberty that both Wordle and WordItOut do with automatically removing certain words (and thereby removing some potentially important words, as in the case of ‘all’) it allows you to customize your list and essentially blacklist the words you do not want. Wordle as well provided this feature, but removed words by default. Voyant forces the uploader to think and choose the words represented.

As you can see in the screenshot, the first word I selected that seemed, to me, to be worth scanning was ‘whale’, with a total of 971 uses beginning on the very first page. What is fascinating about Voyant are the multiple ways it will contextualize and build information around a single word. There are two windows dedicated to showing a frequency chart and the context around each mention as well as tabs for the parts of the entire corpus of where your chosen word (or words) appears. This helps to alleviate any suspicion, especially when dealing with an ambiguous word (unlike ‘whale’) that may have multiple uses and contexts.

Looking at the use ‘whale’ throughout the entire book, I would be tempted to explore the periodic lull in its mentions visible in the line graph. When the graph is given 10 and 15 segments, this oscillations are more drastic and shows much more sporadic mentions of the term, though the most interestingly, what can be seen is a steady decline in the use of ‘whale’ until what starts the final chapters of the book, or, the chase sequence, in which case it begins a steep incline. There is seemingly a dramatic tension in the graph recognizable through its usage of the term.

So, when I think about Ramsay’s idea of “estrangement” from textuality, I have to wonder about what it is within the text, or about the text that is primary subject of estrangement. Is it the narrative? For ever instant my initial responses have been grounded within the narrative: why is Flask mentioned less? Why is the word ‘all’ important to the word cloud to be a significant loss? What time frame is represented by the steep incline at the end of the line graph? All of these questions are brought about because of my familiarity with the reading: a product of the close-reading focused education that enforced that I read Moby-Dick because it, singularly, is important and above thousands of anonymous books. But when it comes to the answers of my questions, are they all necessarily going to return to the narrative? Personally, it seems the temporary estrangement is merely a way of refocusing the narrative again and re-reading it, arriving at Ramsay’s purported goal: creating new information and criticism from what the algorithms can show us.

“Computer-enabled play” — hacking The Marble Faun

(I steal “computer-enabled play” from Irizarry, quoted by Ramsay on pg. 36 of Reading Machines).

The reason I was drawn to include the phrase “computer-enabled play” in my title was because that is really what I felt like I was doing throughout the exercise: playing, fiddling, fooling around, testing out, exploring, etc. Similar to what Mary expressed, I found that some of these experiments were overwhelming (or annoying) in their unfamiliarity, but I soon discovered that if I “played” with the tool enough, I could eventually gain some insights into The Marble Faun in a new way (i.e., different insights than I would have garnered from reading the text in a traditional manner).

Wordle and WordItOut seemed especially “playful” with their fun names, bright colors and graphic visualization. As I’ll probably reiterate several times in this post, these tools did require some “fiddling,” though.

WordItOut

WordItOut

Wordle

Wordle

For both tools, I used the full text of The Marble Faun, from the Project Gutenburg plain text online version. With WordItOut, I appreciated the function to tweak the list that was generated. I could view the list in ascending count order, alphabetically or randomly (though I’m not sure how the last two would aid in a critical analysis). I was able to increase or decrease the number of words. Also, unlike Wordle, WordItOut allows you to see the generated word list in both list and visualization form, which was helpful when I wanted to copy the list for further exercises.

As others have observed, with narrative forms (novels), it seems that names and other pronouns seem to be most prominent. In a story like The Marble Faun, it was actually interesting to see which character was the most represented: Miriam. (The Marble Faun is sort of like modern day sitcoms that center on a group of friends–so imagine that we Wordled an episode of “Friends” and discovered that “Rachel” is the largest name–what does this tell us about the group dynamic?). In The Marble Faun, the drama centers on MIRIAM. I thought it was interesting that Hilda is the second largest name–so in a novel which actually has fewer female characters than male, the females still win out in nominal presence.

Other prominent words seemed to thematically center on art (not surprising; the characters are artists living in Rome) and time (not surprising; Hawthorne often focuses on the interplay of past, present and future reality). Wordle and WordItOut thus demonstrated for me the point Ramsay notes more than once, that at a base level, digital tools might merely confirm analyses we have already made.

Upgoer 5

Up-Goer 5

After struggling with the “define digital humanities” Up-Goer 5 challenge before beginning the exercise (how unnatural for literary scholars to prioritize simple, oft-used words over our sophisticated vocabularies!) it was interesting to use the tool to investigate a pre-generated text. I used my list from WordItOut, but removed the names as I didn’t think they would propagate any new insights (i.e., it would be no surprise that “Hilda” is not in the top 1000 used words). What I did find, though, WAS intriguing.

Of 97 words produced by WordItOut as “most used” in The Marble Faun, only eleven did not make the Up-Goer 5 top 1000 word list. These were: sculptor, Rome, marble, among, itself, whom, Roman, nor, poor, tower, and indeed. I’m guessing that Rome and Roman are too specific (proper nouns) to merit top-1000 usage, while sculptor and marble as nouns also seem too obscure (we don’t talk about sculpture very generally or often). Nor, whom and indeed are rather sophisticated uses of grammar, so their absence doesn’t surprise me. I don’t have an explanation for among, itself, poor, or tower—any thoughts?

What’s left behind (in the top 1000) is interesting when you consider the words as “topics” of interest (in the sense that oft-used words might represent broader themes): life, heart, friend, good, human, world, love, art, idea, moment. Not only are they huge topics in Hawthorne’s text, but also, apparently, in everyday speech.

CLAWS

CLAWS

On to CLAWS, the realm of tagging. This, I did not find playful. I was rather confused, though finding the accompanying “tagset” key was somewhat illuminating. I didn’t have the patience to count the different types of word forms, but that could have been interesting to see–were Hawthorne’s 100 most-used words from The Marble Faun mostly pronouns? (probably). Singular nouns? Comparative adjectives? Etc. etc. So I can see how CLAWS could be a useful tool, but I didn’t like the aesthetic of the list that was generated (no spacing, no counting) so, admittedly, I moved on.

TAPoR

TAPoR

TAPoR, while also intimidating with its unfamiliar interface, was “playful” in its potential for “fiddling,” as I previously described. The more I played around with it, the more I found ways to make it work for me. After scrolling through the word lists in the lower left-hand corner (sorted Frequency vs. Count vs. Trends) I clicked on “heart” to see what came up.

I don’t really understand the graph in the upper-right hand corner, though I know you can view two words at once to—I presume—compare frequency at various points in the book. For instance, I viewed “woman” and “sympathy” together and saw a very similar pattern, suggesting that woman & sympathy are often discussed in tandem. This is not surprising, given that Hawthorne’s romance could really be considered a sentimental novel and he’s constantly talking about the female characters’ womanhood and capacities for sympathy (e.g. Hilda is very sympathetic, Miriam not so much). What confused me were the “segments,” though I suppose you could generate the graph so that it represented chapters, if you knew how to finagle that breakdown. That way, you could see where, in the novel, topics were discussed with higher frequencies. “Heart,” for instance, skyrockets at the end of The Marble Faun, according to this “Word Trends” graph.

I also got the hang of the concordances tab and found these lists extremely interesting. Under “heart,” I could observe the following concordances:

Intimate/heart/knowledge
Hilda/heart/life
close/heart/beautiful
brain/heart/think
trust/heart/trusts
secret/heart/burns

These are only a few examples (from 89 instances of heart in the first volume of the novel) but SO INTERESTING! I’m especially intrigued by instances like “knowledge,” “brain” and “think” surrounding the presence of the heart, since we get that tension between cognition and emotion there. Trust and secrets regarding the heart don’t surprise me at all given the nature of the novel, nor does the presence of the “beautiful.” Hilda’s concordance is also not surprising—the trio of “Hilda,” “heart” and “life” is only too perfect. (I know I’m not being very clearly critical here, but I’m sure you can see the potential for developed analytical writing on these topics).

One question I had while using TAPoR concordances: TAPoR doesn’t select the immediate surrounding words, but rather “keywords.” For instance, “secret/heart/burns” comes from the sentence: “There is a secret in my heart that burns me!—that tortures me!” This is a pretty good example—we presumably don’t care about “in,” “my” or “that,” but how are the keywords chosen? Does the software just eliminate prepositions? Do we lose the presence of “torture” here? Compare to an example like “only/heart/sought.” The sentence from which this concordance is generated is, “But if it were only a pent-up heart that sought an outlet?” To me, “pent-up” seems important, while “sought” and “outlet” are equally important. So, I’m just wondering (and perhaps someone can actually tell me) how the concordances work—how are the surrounding terms generated?

I could really see myself using TAPoR in the future (though, again, the interface doesn’t really appeal to me and I wished I could have enlarged everything–but these are minor complaints of a whiny variety). As someone who was widely unexposed to DH tools before this class, Ramsay’s Reading Machines and our exercises have legitimately moved me “Towards An Algorithmic Criticism.” The text, in its descriptions, examples and analysis of digital tools and their impact on/interaction with literary criticism was seriously illuminating. We were prompted to consider how

“the effect is not the immediate apprehension of knowledge, but instead what the Russian Formalists called ostranenie—the estrangement and defamiliarization of textuality” (3)

regarding our experience with the various digital tools today, and it certainly applies. “Estrangement” and “defamiliarization” certainly describe my “computer-enabled play” with The Marble Faun today. We are distanced from the text when the computer intervenes, transforming prose into lists, visual graphics, concordances, and line graphs. BUT this does offer, though not immediate, new “apprehension of knowledge,” I believe. From reading Hawthorne’s prose, I do not “know” whose name appears most often in the text, even if I can guess. Conjecture becomes fact, and fact leads us to points of inquiry, new questions regarding “why?” More articulately put by Ramsay on pg. 62:

“If something is known from a word-frequency list or a data visualization, it is undoubtedly a function of our desire to make sense of what has been presented. We fill in gaps, make connections backward and forward, explain inconsistencies, resolve contradictions, and, above all, generate additional narratives int he form of declarative realizations.”

I’d like to point out a couple of other passages which stuck out to me and helped me frame algorithmic criticism this week:

“The computer revolutionizes, not because it proposes an alternative to the basic hermeneutical procedure, but because it reimagines that procedure at new scales, with new speeds, and among new sets of conditions” (31).

 

“Rather than hindering the process of critical engagement, this relentless exactitude produces a critical self-consciousness that is difficult to achieve otherwise” (34).

And, to end, perhaps a point that generates and necessitates discussion: in opposition to “ambiguity,” the computer “demands an answer” (67). Is this a limitation? Shouldn’t there be room for ambiguity in literature, even if it doesn’t fit into an automated output? Ramsay continues, “…the computer demands abstraction and encapsulation of its components” (67). Again–is a limitation present, here? Are all texts (/words/phrases/data sets) discrete, with potential to be “encapsulated?” Does the computer miss something the subjective mind would not?

Neil Gaiman’s “A Calendar of Tales”

I discovered this last week when retweets by Neil Gaiman, a favorite author of mine, took over my Twitter feed. It was too wonderful not to share. What has happened is this: Neil Gaiman has teamed up with the makers of the Blackberry 10 to create a project (very much in the spirit of the Digital Humanities) that allows readers to collaborate with Neil Gaiman as he writes. As you can see from the website, the project is entitled, “A Calendar of Tales.” For each month, Gaiman produced a question to which people on twitter were able to respond using specific hashtags: #jantale for January, #febtale for February, etc… Gaiman is now using the tweets sent out by his followers as inspiration for a series of tales that he will write (one for each month). As the next step, Gaiman will share his tales and accept submissions of illustrations, choosing one for each story, thereby making these tales both inspired by and illustrated by his followers on Twitter. And that’s the real beauty of this project: collaboration. In the video posted on this site, Gaiman talks about how the composition process is usually a rather lonely one—featuring a writer sitting in a room writing down thoughts that only he or she is privy to at the time. However, by calling upon tweeters from all over the world to share their thoughts and stories on Twitter, Gaiman is able to transform the writing process into a collaborative one in which a reciprocity is formed between writer and reader that allows him to draw upon his fellow tweeters for inspiration in order to create stories that would have been left untold had it not been for this project.

What has William Morris to do with DH?

A brief recommendation: UMD Libraries’ Special Collections is currently featuring an exhibit  (“How We Might Live: The Vision of William Morris,” Sept. 2012-July 2013) on the life and works of William Morris, the 19th-century English author, designer, socialist, and — arguably most famously, though perhaps I’m not objective on this point — founder of the Kelmscott Press and printer of the Kelmscott Chaucer.  As a medievalist with a particular interest in manuscript studies, I’ve long found Morris’s work appealing and admired his taste — for example, what lover of books would not appreciate the discussion of the relative aesthetic merits of various typefaces and guidelines for margin widths found in his “The Ideal Book“?  That having been said, though, I never found Morris particularly relevant to my own work — that is, not until I read Bethany Nowviskie’s very thoughtful MLA talk, “Resistance in the Materials” (posted here on her blog).  Nowviskie uses a quotation from Morris as a jumping off point for discussing the role of craft and collaboration in DH, as well as for some reflections on the casualization of the academic workforce.  Not only is her essay directly pertinent to our discussion of making and building in DH, but for me reading it also gave new relevance to UMD’s Morris exhibition.  In particular, it got me thinking about the tension between the hand- and machine-crafted object in Morris’ work, and about the resonance of his attempts to translate both the aesthetics and the ethics of the hand-crafted book into the technological context of printing. In that sense his work now strikes me as particularly relevant to our moment, when at times the future of books as physical objects seems to be in doubt — not to mention the viability of a career devoted to writing and studying them. But rather than take my word for it, why not read the essay — and take in the exhibition — for yourself?

A Quick Experiment in “Distant Reading” a Large Medieval Latin Text

Gratian

 

My dissertation is on the textual development of Gratian’s Decretum. The Decretum was written around 1140 by the otherwise unknown Gratian, and was the foundational textbook for the systematic study of canon law within the medieval university. (In fact, it remained the basis for the law of the Roman Catholic church right up until 1917.)

Inspired by Charity and Kathryn’s presentation on Wednesday night, I decided to use Wordle to do an experiment in “distant reading” Gratian’s text. The MGH (Monumenta Germaniae Historiae) in Munich digitized Emil Friedberg’s still-standard 1879 critical edition in the 80s, and I cut-and-pasted the whole thing (all 490,446 words) into Wordle.

A few things need to be kept in mind in order to interpret the resulting Wordle.

First, the Decretum was written in Latin, a fully-inflected language, and Wordle does no stemming. This is both a minus and a plus. Deus, Dei, Deum and Deo are just morphologically different forms of one word, and if we were to put them all together, Deus (“God”) would have a more prominent (and less misleading) place in the visual space than it does. Episcopus (“bishop”) is another example. On the other hand, the fact that Wordle does no stemming has the effect of preserving the gendered words, for example eum (“him”) and eam (“her”). These pronouns can, or course, refer to things that are masculine and feminine in a purely grammatical sense, but the difference is nevertheless interesting.

Another linguistic feature is the salience of the word que. This word can mean several different things depending on context, but it shows up on the Wordle because of its use as a relative pronoun (“which”) kicking off a subordinate clause. Latin is a hypotactic language and so subordinate clauses appear much more frequently than in a paratactic language like English.

Second, the Wordle makes sense in the context of the way in which Gratian put the Decretum together. The Decretum consists of short extracts from “authorities”, church councils plus long-dead theologians and popes, which Gratian embeds within a framework of his own comments (called dicta or “sayings”). It is extremely interesting that only two of the individual authorities are named frequently enough to show up in the Wordle: Augustinus (bishop of Hippo Regius in modern-day Algeria, d. 430) and Gregorius (bishop of Rome, d. 603). The word Papa (“Pope”) is more prominent, suggesting the collective, if not individual, heft of the popes in the lineup of authorities. Finally, Concilio (“Council”) shows up because the attribution (“inscription” in the jargon of medieval canon law studies) of so many canons is to one or another of the general or provincial councils that Gratian cited.

The chaining of multiple authorities in sequence is a very prominent feature of the text, and is indicated by the world Item (“Similarly”). One of Gratian’s goals was to show that the authorities were in harmony with each other. In fact his title for the book (which isn’t the one that stuck) was Concordia Discordantium Canonum (“The Agreement of Disagreeing Rules”). To do that, however, he had to bring out the apparent disagreements among the authorities before resolving them (his resolutions usually being introduced by Unde or “Whence”). This gives rise to the use of adversative particles like uel (“or”) and uero (“but”) that foreground the (apparent) contrast between the positions of the authorities.

These are just some of the immediate reactions I had to a quick experiment in “distant reading” an almost half million word text in one morning. I’ll update this post if I come up with more upon further reflection. I’d also appreciate feedback from the group on how to better communicate these ideas.

Saving Wordle Word Clouds

If anyone wants to know how I saved my word cloud from Wordle, here’s how I did it (you might find a better way): Choose the Print option and through that menu save it as a PDF, then open the PDF and save it as a JPEG. You can probably take a screenshot of the word cloud, too – I just wanted the best resolution possible so that it could be blown up onscreen for the activity.

This might be completely superfluous, but I just wanted to share in case anyone was initially flummoxed – please feel free to comment if you have a better way!

*UPDATE (in response to Paul’s comment):

Paul, I did use my work computer initially, which is a Microsoft one. However, when I went to Wordle.net just now, I was able to download the Java plug-in, restart Firefox, generate a wordcloud, and when I clicked “Print,” (a few times, because I had to keep “Allow”-ing the applet to connect with my printer [which isn't actually even hooked up to my laptop currently]), I was able to get a Print dialogue screen to appear:

Screen Shot 2013-02-08 at 12.25.24 PM

So I could manually choose “Save as PDF”, which then led to this screen, where I was able to save my word cloud into PDF format:

Screen Shot 2013-02-08 at 12.25.42 PM

I don’t know if it’s because I used Firefox or my OS is different (I’m running 10.7.5), but after downloading Java, I was able to obtain a PDF. However, now I need to double-check my print queue (for my non-existent printer), because this has happened before. Good luck! :/