Technoromanticism » digital tools http://mith.umd.edu/eng738T English 738T, Spring 2015 Thu, 21 May 2015 19:52:25 +0000 en hourly 1 http://wordpress.org/?v=3.3.1 Team MARKUP Documentation http://mith.umd.edu/eng738T/team-markup-documentation/?utm_source=rss&utm_medium=rss&utm_campaign=team-markup-documentation http://mith.umd.edu/eng738T/team-markup-documentation/#comments Mon, 23 Apr 2012 11:48:58 +0000 Amanda Visconti http://mith.umd.edu/eng738T/?p=765 I created some webpages with the documentation used by Team MARKUP: http://amandavisconti.github.com/markup-pedagogy/. The content represents almost everything we worked from during the encoding phase of our project, except some administrivia and links/images representing copyrighted content (sorry, no manuscript screenshots!).

]]>
http://mith.umd.edu/eng738T/team-markup-documentation/feed/ 1
“How Can You Love a Work If You Don’t Know It?”: Six Lessons from Team MARKUP http://mith.umd.edu/eng738T/how-can-you-love-a-work-if-you-dont-know-it-six-lessons-from-the-team-markup-project/?utm_source=rss&utm_medium=rss&utm_campaign=how-can-you-love-a-work-if-you-dont-know-it-six-lessons-from-the-team-markup-project http://mith.umd.edu/eng738T/how-can-you-love-a-work-if-you-dont-know-it-six-lessons-from-the-team-markup-project/#comments Thu, 19 Apr 2012 09:12:44 +0000 Amanda Visconti http://mith.umd.edu/eng738T/?p=686 Read more ]]> X all the Y meme with text encode all the things!

Encode all the things... or not. Remixed from image by Allie Brosh of Hyperbole (hyperboleandahalf.blogspot.com).

Update 4/24/2012: Oh, neat!: this post got the DH Now Editor’s Choice on Tuesday, April 24th, 2012.

Team MARKUP evolved as a group project in Neil Fraistat’s Technoromanticism graduate seminar (English 738T) during the Spring 2012 term at the University of Maryland; our team was augmented by several students in the sister course taught by Andrew Stauffer at the University of Virginia. The project involved using git and GitHub to manage a collaborative encoding project, practicing TEI and the use of the Oxygen XML editor for markup and validation, and encoding and quality-control checking nearly 100 pages of Mary Shelley’s Frankenstein manuscript for the Shelley-Godwin Archive (each UMD student encoded ten pages, while the UVa students divided a ten-page chunk among themselves).

Team MARKUP is currently writing a group blog post on the process, so I’ll use this post to concentrate on some specifics of the experience and link to the group post when it’s published.

Screenshot of TEI encoding of Frankenstein manuscript in Oxygen XML editor

The Creature speaks.

Six takeaways from the Team MARKUP project:

  1. Affective editing is effective editing? One of my favorite quotations–so beloved that it shapes my professional work and has been reused shamelessly on my Ph.D. exams list, a Society for Textual Scholarship panel abstract, and at least one paper–is Gary Taylor’s reasoning on the meaningfulness of editing:

    “How can you love a work, if you don’t know it? How can you know it, if you can’t get near it? How can you get near it, without editors?”*.

    Encoding my editorial decisions with TEI pushed me a step closer to the text than my previous non-encoded editorial experience, something I didn’t know was possible. My ten pages happened to be the first pages of the Creature’s monologue; hearing the voice of the Creature by seeing its true creator’s (Mary Shelley’s) handwriting gave me shivers–meaningful shivers accompanied by a greater understanding of important aspects of Shelley’s writing, such as the large editorial impact made by her husband Percy and the differing ways she crossed out or emphasized changes to her draft. Moving between the manuscripts images and the TEI encoding–so similar to my other work as a web designer and developer–also emphasized the differences in the writing process of my generation and the work that went into inscribing, organizing, and editing a book without the aid of a mechanical or digital device.

  2. Project management. Because we didn’t know what to expect from the project until we were in the thick of encoding–would everyone be able to correctly encode ten full pages? how would we control quality across our work? what would our finished pages look like in terms of encoding depth?–we spent most of the project functioning as a large team, which was both sometimes as unwieldy as our large GoogleDoc (trying to find a time when eight busy graduate students can meet outside of class time is difficult!) and sometimes made sense (I was one of the few people on our team comfortable with GitHub and encoding at the start of the project, so I helped with a lot of one-on-one Skype, in-person, and email sessions early on). If I did the project over, I would have held a single Bootcamp day where we all installed and pushed within GitHub and encoded one page of manuscript up on the projector screen, then delegated my role as team organizer by dividing us into three subgroups. I also might have insisted on people agreeing ahead of time on being available for specific in-person meeting times, rather than trying to schedule these one or two weeks beforehand. I do think things worked out pretty well as they did, largely because we had such a great team. Having the GoogleDoc (discussed more below) as a central point for tech how-tos, advice, and questions was also a good choice, though in a larger project I’d probably explore a multi-page option such as a wiki so that information was a) easier to navigate and b) easily made public at the end of our project.
  3. Changing schemas and encoding as interpretive. Encoders who started their work early realized that their efforts had good and bad results: because the schema saw frequent updates during our work, those who finished fast needed to repeatedly update their encoding (e.g. a major change was removing the use of <mod type>s). Of course it was frustrating to need to update work we thought was finished–but this was also a great lesson about work with a real digital edition. Not only did the schema changes get across that the schema was a dynamic response to the evolving methodology of the archive, it prepared us for work as encoders outside of a classroom assignment. Finally, seeing the schema as a dynamic entity up for discussion emphasized that even among more seasoned encoders, there are many ways to encode the same issue: encoding, as with all editing, is ultimately interpretative.
  4. Encode all the things! Or not. Depth of encoding was a difficult issue to understand early on; once we’d encoded a few pages, I began to have a better sense of what required encoding and what aspects of the manuscript images I could ignore. Initially, I was driven to encode everything, to model what I saw as thoroughly as possible: sums in the margins, different types of overstrikes, and analytical bibliography aspects such as smudges and burns and creases. What helped me begin to judge what to encode was understanding what was useful for Team MARKUP to encode (the basics that would apply to future encoding work: page structure and additions and deletions), what was useful for more advanced encoders to tackle (sitting in on the SGA staff meetings, I knew that some of our work would be subject to find-and-replace by people more experienced with Percy and Mary’s handwriting styles), and what our final audience would do with our XML (e.g. smudges and burns weren’t important, but Percy’s doodles could indicate an editorial state of mind useful to the literary scholar).
  5. Editorial pedagogy. Working on Team MARKUP not only improved my markup skills, it also gave me more experience with teaching various skills related to editions. As I mentioned above, acting as organizer and de facto tech person for the team gave me a chance to write up some documentation on using GitHub and Oxygen for encoding work. I’m developing this content for this set of GitHub Pages to help other new encoders work with the Shelley-Godwin Archive and other encoding projects. Happily, I was already scheduled to talk about editorial pedagogy at two conferences right after this seminar ends; the Team MARKUP experience will definitely become part of my talks during a panel I organized on embedding editorial pedagogy in editions (Society for Textual Scholarship conference,) and a talk on my Choose-Your-Own-Edition editorial pedagogy + games prototype at the Digital Humanities Summer Institute colloquium in Victoria.
  6. Ideas for future encoding work. I’ve started to think about ways to encode Frankenstein more deeply; this thinking has taken the form of considering tags that would let me ask questions about the thematics of the manuscript using Python or TextVoyeur (aka Voyant); I’m also interested in markup that deals with the analytical bibliography aspects of the text, but need to spend more time with the rest of the manuscript images before I think about those. So far, I’ve come up with five new thematic tagging areas I might explore:
  • Attitudes toward monstrosity: A tag that would identify the constellation of related words (monster, monstrous, monstrosity), any mentions of mythical supernatural creatures, metaphorical references to monstrosity (e.g. “his vampiric behavior sucks the energy out of you”), and reactions/attitudes toward the monstrous (with attributes differentiating responses to confronting monstrosity with positive, negative, and neutral attitudes). I could then track these variables as they appear across the novel and look for patterns (e.g. do we see less metaphorical references to monstrosity once a “real” monster is more prevalent in the plot?).
  • Thinking about doodles: We’re currently marking marginalia doodles with <figure> and a <desc> tag describing the drawing. In our section of the manuscript, many (all?) of these doodles are Percy Shelley’s; I’d like to expand this tag to let me identify and sort these doodles by variables such as complexity (how much thought went into them rather than editing the adjacent text?), sense (do they illustrate the adjacent text?), and commentary (as an extension of sense tagging, does a doodle seem ironically comic given the seriousness or tragedy of the adjacent text?). For someone new to studying Percy’s editorial role, such tagging would help me understand both his editing process and his attitude toward Mary’s writing (reverent? patronizing? distracted? meditative?)
  • Names, dates, places: These tags would let us create an animated timeline of the novel that shows major characters as they move across a map.
  • Anatomy, whole and in part: To quote from an idea raised in an earlier post of mine, I’d add tags that allowed “tracking the incidence of references to different body parts–face, arms, eyes–throughout Frankenstein, and trying to make sense of how these different terms were distributed throughout the novel. In a book concerned with the manufacture of bodies, would a distant reading show us that the placement of references to parts of the body reflected any deeper meanings, e.g. might we see more references to certain areas of the body grouped in areas of the novel with corresponding emphases on the display, observation, and action? A correlation in the frequency and placement of anatomical terms with Frankenstein‘s narrative structure felt unlikely (so unlikely that I haven’t run my test yet, and I’m not saving the idea for a paper!), but if had been lurking in Shelley’s writing choices, TextVoyeur would have made such a technique more visible.”
  • Narrative frames: Tags that identified both the specifics of a current frame (who is the speaker, who is their audience, where are they, how removed in time are they from the events they narrate?) and that frame’s relationship to other frames in the novel (should we be thinking of these words as both narrated by Walton and edited by Victor?) would help create a visualization of the novel’s structure.

I expect that playing around with such tags and a distant reading tool would yield even better thinking about encoding methodology than the structural encoding I’ve been working on so far, as the decisions on when to use these tags would be so much more subjective.

* From “The Renaissance and the End of Editing”, in Palimpsest: Textual Theory and the Humanities, ed. George Bornstein and Ralph G. Williams (1993), 121-50.

]]>
http://mith.umd.edu/eng738T/how-can-you-love-a-work-if-you-dont-know-it-six-lessons-from-the-team-markup-project/feed/ 9
Digitally Dissecting the Anatomy of “Frankenstein”: Part One http://mith.umd.edu/eng738T/digitally-dissecting-the-anatomy-of-frankenstein-part-one/?utm_source=rss&utm_medium=rss&utm_campaign=digitally-dissecting-the-anatomy-of-frankenstein-part-one http://mith.umd.edu/eng738T/digitally-dissecting-the-anatomy-of-frankenstein-part-one/#comments Fri, 24 Feb 2012 17:32:03 +0000 Amanda Visconti http://mith.umd.edu/eng738T/?p=343 Read more ]]>

A frequency chart of the terms "human" and "monster" in Frankenstein.

A two-part blog post: the first post will cover grabbing and analyzing Twitter and other textual data and working with them in Wordle and TextVoyeur, and the second will use these tools to consider the function of body parts in Mary Shelley’s Frankenstein.


Get your data!
Text? If you’re producing a project you want other people to see, you’d want to locate–or scan/key-in yourself–a reliable edition of your text. For the purposes of this course, since I’m just asking a quick question for my own purposes, I’ll use the dubious (what edition? what errors?) Project Gutenberg etext I grabbed from this page. Don’t forget to remove the extra licensing information from the beginning and end!
Twitter? Finding old tweets from your individual account might not be difficult (especially if you don’t tweet hourly), but Twitter only saves hashtag searches for around ten days (there are some third-party sites such as Topsy that may have older tweets, but I’ve found these to be unreliable). The best policy is to start archiving once you know you’ve got a hashtag you’re interested in.
1. There are a bunch of ways to archive tweets, but I think the easiest if to set up an RSS feed through something like Google Reader. You can get the feed URL for any Twitter search by replacing “hashtag” in the following string with the search term of your choice (e.g. technoro):

https://search.twitter.com/search.atom?q\x3d%23hashtag

Once you set up your feed reader as subscribed to this URL, you’ll have a feed that updates with all new tweets using the hashtag. You can export these at any time you’d like to work with them in a visualization tool; place any feeds you want to export into a folder (visit Google Reader’s settings > Folders), then enter the following URL into your address bar (replacing “folder” with your folder name):

https://www.google.com/reader/public/subscriptions/user/-/label/folder

This will bring you to an XML file of your feed that you can save to your computer and edit.
2. Too much work? You can use a service like SearchHash, which will let you input a hashtag and download a CSV file (spreadsheet); this might be easier to work with if you’re unfamiliar with RSS feeds and/or XML, but you can only trust such services to cover about the last ten days of tweets.

Get out your tools!
1. Wordle is one of the fastest and easiest tools for checking out a text: you paste in your text or a link to a webpage, and it produces a word frequency cloud (the frequency with which a word appears in your text corresponds to how large the word appears in the cloud). Wordle lets you do a few simple things via the drop-down menu on the top of the visualization:

  • remove stop-words (stop-words are words that appear frequently in texts but usually have little content associated with them–think things like articles and prepositions. If you’ve ever tried to make a word frequency cloud and seen some huge “THE” and “AN” type words, you need to filter your text with a stop-word list),
  • change the look (color, font, orientation of text), and
  • reduce the number of words shown (Wordle only shows the top x words appearing in a text).

Wordle is a simple way to get a look at the words being used in a text; you can get a quick sense of diction, preoccupations, and patterns. However, it doesn’t let you make any sort of strong argument beyond statements about what words are frequent; with text analysis, you always want to be able to “drill down” from your distant reading to the individual words or phrases or moments that make up the macro view you’re seeing, and Wordle doesn’t let you do that.

2. Luckily, there are free, web-based tools that let you go beyond Wordle’s abilities fairly easily. TextVoyeur* (aka Voyant) is really meant for comparing documents among a large corpus of texts, but you can use it to look at a few or even a single text. Voyeur maintains a great tutorial here that I recommend you visit to understand where different features are on the page, but here’s an overview of things you might want to do with it:

  • A word frequency cloud (like Wordle), but with better stop-words. This cloud should appear in the upper-left corner; the settings button for each pane within the page appears when you click the small gear icon that appears in the upper-right of each pane, and clicking it in the cloud pane lets you turn on the stop-word list of your choice to filter out.
  • A list of words in frequency order (click “words in the entire corpus” in the bar at the bottom-left; again, you can filter out stop-words). You can search in this pane for interesting words (e.g. “monster”); then, check the box next to the word, and in the pane that appears use the heart icon to add the word to the favorites list. You can add several terms to your favorites this way (e.g. monster, human, angel), then compare these favorites in the “word trends” pane, which with chart the frequency of these words’ appearances throughout your text.
  • Drill down. “Keywords in context” lets you see where a given word appears in the novel. “Collocates” are words that tend to appear near other specific words. Collocation can help you understand a text’s rhetoric; is the word “monster” often near the word “abnormal” or “misunderstood”? TextVoyeur lets you set how near a given search term you want to look for collocates (e.g. one word on either side of your search term? fifteen words?). If you’re interested in a word with multiple meanings or that appears within larger words (e.g. the word count for “inhuman” may include the count for “human”; you might want to see whether “Frankenstein” is being used to refer to Victor or another family member), you might want to drill down into these examples and see how many of the examples feeding into the count actually support your argument.

3. The internet is full of free tools for working with texts, many with more specific foci (e.g. tools that attempt to determine the gender of a text’s author). Two places to start finding more tools:

I’ll try to publish for the second part of this blog post later this week, where I’ll tackle a question about Frankenstein using some of these tools and also address some of these tools’ shortcomings (i.e. things you can’t say when pointing at these visualizations).

*Note that TextVoyeur was experiencing some interface issues today (2/24), which meant that we didn’t demo it at the DH Bootcamp. If you’re having trouble using this tool, those issues might not have been solved yet.

]]>
http://mith.umd.edu/eng738T/digitally-dissecting-the-anatomy-of-frankenstein-part-one/feed/ 4