This first module of the course focuses on the current white-hot topic of “big data” by way of the specific domain of textual analysis, in effect offering up some possible answers to Greg Crane’s framing question, what do you do with a million—or 10.6 million—books? (And, even more importantly, how does having them at your fingertips change the kind of questions you can ask about the textual record?) Along the way we will also stress and stretch the act of reading itself, asking how reading is being reconfigured digitally.
Presentation: Charity and Kathryn
- Crane, “What do You do with a Million Books?“
- Duguid, “Inheritance and Loss? A Brief Survey of Google Books“
- Gavin and Smith, “An Interview with Brett Bobley” (DDH)
- McCarty, “A Telescope for the Mind?” (DDH)
- Nowviskie, “What Do Girls Dig?” (DDH); original post, with comments, here.
- Burdick, et al. “Emerging Methods and Genres” (also skim the “Portfolio of Case Studies” (D_H)
Note: We will be reading Ramsay’s Reading Machines, in its entirety, as the reading for next week’s class; there is no open access edition; you should therefore source a copy immediately if you have not done so already.
The purpose of this exercise is to expand on Duguid’s work with Tristram Shandy by performing a modest lateral experiment in digital bibliography. Start by locating a book that is available in a full-text copy (but not necessarily in the same edition) from at least three of the following: Project Gutenberg, the Internet Archive, the HATHITrust, and Google Books. Then, using the “Exercises” category, create a post that addresses the following questions for the book you have chosen. And by the way, choose a book you’re interested in, not just any old thing. (You may not choose a book that’s already been done by someone else; first come, first served!) Then, for each of the different services from which it is available:
- What electronic formats is the book offered in?
- What do you know about its provenance, i.e. the source from which the digital text was derived? What edition information is provided?
- Are there particular formatting and presentation issues (illustrations, page layout) that are retained or lost?
- Is there any provision for reporting and correcting errors?
- What special features and functionality are available? Can you annotate the text? Create personal collections for your individual research? What kind of search functionality is available? Can you download a local copy of the text? (If you can, do.)
- What restrictions on use does the site enforce?
We will continue to build on this exercise in the next two weeks, first by using some basic text analysis tools in conjunction with the book you’re working with, and next by “curating” your text in preparation for assembling a corpus for a topic modeling exercise.
This is a graded exercise.
|02/05/13||Adeline Koh, Assistant Professor, Department of Literature, Richard Stockton College; Visiting Faculty Fellow, Humanities Writ Large Program | Duke University||MITH Conference Room|
|Digitizing Chinese Englishmen: Archival Silences, Digital Recovery, and Creating a Nineteenth Century “Postcolonial” Archive|
|Time: 12:30pm. Admittance: Open to the Public. Address: 0301 Hornbake Library, University of Maryland.|
Tues., Feb. 5th, evening, Digital Humanities Colloquium, details TBA