The Prejudice of Stripped Texts

To start this week’s exercise, I decided to have a little fun. Kind of like stretching before a big work out. Using Google’s Ngram Viewer, I compared the heroine of my chosen text, Pride and Prejudice’s Elizabeth Bennet, to her modern-day counterpart, Bridget Jones, with whose diary we are intimately acquainted. Because Helen Fielding has openly admitted to basing her characters on Jane Austen’s—especially Mark Darcy on Mr. Darcy—I thought it would be interesting to see how else they compare. I was surprised to see how Miss Bennet’s popularity waned for so many years and then, at the turn of the century, increased and hasn’t stopped since.  Additionally, I was surprised to see that Bridget Jones’ popularity peaked higher than Elizabeth’s ever did.

Ngram Viewer

Then onto the hard part of the work out—creating a definition for digital humanities. And not just any definition, one with strict boundaries. My humble result below.

DH Definition


Wordle vs. WordItOut

While I generally consider myself a hands-on learner and quick on the uptake when it comes to basic computer programs and technologies, I found this week’s exercise to be more than a little frustrating. Wordle would not allow me to insert the Project Gutenberg (or any other) link to get my word output, which resulted in me copying and pasting the book in its entirety into the “Paste in a bunch of text” box. Oh, I pasted in a bunch of text alright! Finally, I got this beauty:


Then it was time for WordItOut, which was a much quicker task after figuring out Wordle’s quirks.


I actually took the time to try to make the two look as similar as possible in coloring for easier comparison. I think Wordle has WordItOut beat in basic aesthetics, but otherwise the results were nearly identical. I was very surprised to see “Mr.” was the word most used throughout Pride and Prejudice. Despite being the nineteenth century’s chick-lit by a female author, it is clear that it was still a man’s world at the time of writing and publication. However, the word “Elizabeth” does run a close second, which is a bit refreshing.


Up-Goer Five Text Editor

Next up, the commonality of words. It appears things haven’t changed much in 200 years since Miss Austen put pen to paper. In fact, other than proper names, only four words she used were not in the top 1000 words of Up-Goer Five: indeed, pleasure, till, and manner. However, this made me curious what the results would be if basic words like came, made, most, and go were not allowed to be analyzed. I was surprised at pleasure being so widely used. It’s not a word I hear used often, and it seems the connotation has changed over the years.

Up-Goer Five



CLAWS was my least favorite of all the sites. To me, it did not lay out the results in a clear, easy-to-read manner. It was also counterintuitive that the key wasn’t listed on the same page as the results, so that you had to toggle back and forth between pages. Additionally, this seems more like it would be useful for grade school children learning grammar than it would be for any other purpose.




When it came to TAPoR, I wasn’t nearly as interested in the HyperPo abilities as I was with the program’s ability to run lists of words and compile how many times each word occurs in the text. The word “Elizabeth,” which appeared to be a close second to “Mr.” in the Wordle, is actually used 200 times less than “Mr.” Futhermore, I was particularly interested in the listing ability for two reasons. First, Stephen Ramsay writes extensively on the tf-idf formula and how its findings affect critics when looking for patterns in a text, which I found intriguing. Second, in Italo Calvino’s If on a winter’s night a traveler, a character tries to categorize and determine the genre of books based solely on the words that recur and appear the most in a given work. It’s an interesting thought, trying to decide what a book is about without having read it for its sentences, but for the words it features.



While all of these sites were fun to play with and produced interesting results, I think they ultimately take away from the true meaning of what a book is hoping to convey. Making a book a thing of quantitative results removes the reader’s ability to interpret the text for himself and to engage in the nuances the author has created with grammar, punctuation, and voice. The only work that comes to mind that would benefit from these results would be Gertrude Stein’s “Portraits and Repetition,” where her goal is to use the same words as many times and in as many ways as possible. As Ramsay himself writes:

“It is one thing to notice patterns of vocabulary, variation in line length, or images of darkness and light; it is another thing to employ a machine that can unerringly discover every instance of such features across a massive corpus of literary texts and then present those features in a visual format entirely foreign to the original organization in which these features appear” (Ramsay 16).

I couldn’t agree more. Just as Project Gutenberg states that anything may be done with a public domain text, which may result in the text being changed in ways that dissolve its power and purpose, stripping it to just its words changes it too.

16 thoughts on “The Prejudice of Stripped Texts

  1. Thank you a lot for giving everyone such a special opportunity to check tips from this site. It really is so lovely and as well , jam-packed with a great time for me personally and my office fellow workers to search your website minimum thrice in one week to study the newest issues you will have. And definitely, we’re certainly amazed considering the cool hints served by you. Some 1 points in this article are absolutely the best I’ve ever had.

  2. A few things i have always told people today is that while looking for a good on the net electronics retail outlet, there are a few elements that you have to
    consider. First and foremost, you need to make sure
    to find a reputable as well as reliable shop that has
    obtained great opinions and ratings from other customers and
    industry people. This will ensure you are handling a well-known store that can offer good assistance and aid to the patrons.
    Thanks for sharing your notions on this

    My web page Barrett Falkner

  3. I was very pleased to search out this web-site.I
    wanted to thanks in your time for this excellent read!!
    I positively enjoying each little bit of it and I have you bookmarked to take a look
    at new stuff you weblog post.

    Look at my web page; Jabari Dunlap

  4. Many thanks for sharing this wonderful us you actually understand what you happen to be referring to! Bookmarked. Please also visit my site =). You can easliy possess a link exchange contract between us!

  5. I am glad for writing to make you understand what a terrific discovery my wife’s daughter enjoyed checking yuor web blog. She discovered a wide variety of details, which included how it is like to possess a great coaching spirit to make the others completely learn selected specialized matters. You undoubtedly surpassed readers’ expected results. I appreciate you for producing the important, trusted, explanatory and fun tips about your topic to Gloria.

  6. Howdy very nice blog!! Man .. Beautiful .. Wonderful .. I will bookmark your web site and take the feeds additionally?KI’m happy to find a lot of helpful info right here within the put up, we want develop more techniques in this regard, thank you for sharing. . . . . .

  7. you’re really a excellent webmaster. The website loading velocity is incredible. It sort of feels that you are doing any unique trick. Furthermore, The contents are masterwork. you have done a great activity in this matter!

  8. Todas las habitaciones del Hotel son amplias, dotadas de servicios privados con bañera ducha, secador de pelo, frigorífico, teléfono, TV y SAT, conexion internet alta velocidad, aire acondicionado y tienen un aislamiento acústico muy eficaz. La verdad que tuve una buena estadia en Roma y el Hotel fue fundamental , la habitacion , el servicio , la recepcion todo hizo sentirnos bien. Qurido Silvio….todavía resuenan en mis oídos las melodías del último concierto en Madrid, un 18 de noviembre del año 2007. No, no, en ese caso mejor en Houtosn TX y asi no tendre que viajar tantas horas en bus hasta orlando.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>