“There is no obvious unit of language.” –Susan Hockey, Electronic Texts in the Humanities
Guest: Lisa Rhody
- Fish, “Mind Your Ps and Bs: The Digital Humanities and Interpretation“
- Steadman, “Big Data and the Death of the Theorist“
- Hall, “Has Critical Theory Run Out of Time for Data-Driven Scholarship?” (DDH)
- Wittmore, “Text: A Massively Addressable Object” and “The Ancestral Text” (DDH)
- Jockers, “The LDA Buffet is Now Open; or, Latent Dirichlet Allocation for English Majors“
- Underwood, “Topic Modeling Made Just Simple Enough” and “We Don’t Already Know the Broad Outlines of Literary History“
- Lisa Rhody, [Materials available via Dropbox; check your email, and contact me if you need the link.]
Image: Daniel Libeskind’s Reading Machine as exhibited at the 1985 Venice Architecture Biennale.
This exercise is about creating infrastructure. Specifically, you will install tools to form the basis of a scholarly text analysis environment, and then help populate a communal data set we will use for some preliminary experiments. Unlike the exercise from last week, this is all about distant reading at scale, using a corpus containing multiple texts.
- Your first task is to install Zotero. Zotero can function as either a plug-in to the Firefox Web browser (recommended) or as a standalone application. Detailed instructions for installing Zotero and troubleshooting that process are available on the site.
- Zotero is a powerful platform for managing scholarly resources and citation. We will be looking at its capabilities more closely later in the course. For now, however, we are using it primarily as a platform for supporting a text analysis tool called Paper Machines (which is an add-on to Zotero, essentially a plug-in to a plug-in). Go read the Paper Machines documentation at the previous link. Now get Paper Machines by downloading this file and installing it as per the documentation. Note that you will not need Python for this version of PM, but you will need Java on your system. If you do not and do not know how to install it, please ask me.
- Now Paper Machines needs some data to operate on. If you already have some Zotero collections, you can try it out. Play with the different options that are available and see if you can find something interesting to do with them. But let’s see if we can build a real corpus . . .
- Remember that Zotero Groups invitation you got a few days ago? Go and find it now and redeem it so you can join the ENGL 668K group on Zotero.org. Once you do, you should see the group folder (“ENGL668K”) appear under the Group Libraries tab in your list of Zotero collections.
- You can add items to our class Group. Go to the folder labelled “TextHeap.” That is where we will build our collection. Take the same text you’ve worked with for the previous two exercises and use Zotero to add it to the TextHeap folder.
- Now go and get 10 more books that are related in some way to the first one: by the same author, by a member of his or her circle, from the same period, of the same literary genre, thematically related . . . it’s your call. Yes, this will be tedious. Tedium is sometimes part of our work. If you can add more than 10 to the group folder, you’re a hero!
- By now the group corpus in the Zotero TextHeap folder should be growing (maybe not quite a million, but we’ll get there). Try running Paper Machines tools and see what you get. We’ll take a look at it all together in class.
There is no blogging requirement for this assignment. It will be graded Pass/Fail. If you get stuck, ask a friend from class, or ask me. You will also have an opportunity to complete any aspect of it you still need to in order to Pass while in class next week. The goal is for you to come away with a.) a platform for text analysis, i.e. the Zotero/Paper Machines combo, and b.) a starter corpus.
- 2/19 12:30 Jen Golbeck, Director, Human-Computer Interaction Lab; Assistant Professor, College of Information Studies; Affiliate Assistant Professor, Computer Science Department | University of Maryland, “Art, Tagging, and Social Media”
- 2/20 12:30 Ed Summers, Information Technology Specialist | Repository Development Center Library of Congress, “Linking Things on the Web: A Pragmatic Examination of Linked Data”