I used the text of John Henry Newman’s The Idea of a University that I found on the Project Gutenberg website last week to produce two word clouds on Wordle and WordItOut.
There were two differences that jumped out right away when I compared the two: WordItOut seemed to do a better job of weeding out stopwords (“may”), and Wordle accepted without question what I’m pretty sure are character-encoding errors (the pseudo-words beginning with ‘Ä’).
I had pretty much the same experience as everyone else did when I pasted the words from the WordItOut word cloud into the Up-Goer Five Text Editor: it rejected 26 of the words (although it wasn’t concerned in the least that the words, in that order, did not constitute a syntactically valid English sentence.).
I then pasted the same words from the WordItOut word cloud in the CLAWS Part-of-Speech tagger. For some reason, the text pasted with spaces between the words, and I had to enter the spaces manually. I noticed that the word list had a similar effect to the “entropic poem” on page 37 of Ramsay’s Reading Machines, which surprised me, since I had assumed that that effect would only be perceptible in a short text.
I get the point of tools like this. There’s a similar one called William Whitaker’s Words that’s very popular among students learning Latin, although the fact that CLAWS accepts bulk input (unlike Whitaker’s Words) is an improvement on the model. And there are useful things, I suppose to be learned about a text from such tools (e.g., to confirm or deny the claim that John Calvin never used adverbs in writing). I didn’t, however, find the output of CLAWS particularly edifying in this case:
I attempted to hand off the URL for the plain text on the Project Gutenberg site directly to TAPoR using “Your Web Page”, but what I got was an HTTP 403 Forbidden error, so I played with Chapter 1 of Moby Dick instead. My sense was that the HyperPo does need a body of text longer than a single chapter in order to be really useful rather than a curiosity.
I don’t feel qualified to comment on whether the use of these tools produces an effect of estrangement and defamiliarization of textuality in general — not being a literature student, I’m not used to relating to textuality in the abstract, as opposed to a particular text or texts. My impression is that tools of this kind will do much more for you if you already know something about the text you are examining in this way, and I certainly got a lot more out the examination of Gratian’s Decretum than of Newman’s Idea of a University.