Data Analysis – Maryland Institute for Technology in the Humanities

Archiving Usenet: Adopting an Ethics of Care

Avery Dame — Mon, 23 Jan 2017 15:53:29 +0000

This is the fourth in series of blog posts by 2016-17 Winnemore Digital Dissertation Fellow Avery Dame on the progress of his dissertation, “Talk Amongst Yourselves: Community Formation in Transgender Counterpublic Discourse Online,” which explores the affective and structural meanings assigned to “community” in English-language transgender discourse online.

“On the Internet, nobody knows you’re a dog.” Most folks have no doubt encountered this adage, coined in a 1993 New Yorker cartoon, through one of the many, many cultural riffs and references, or maybe in a reproduction of the original cartoon. The idea, of course, represents public perceptions about anonymity, privacy, and the internet prevalent at the time of its publication: that one’s online and offline presences could be largely disconnected from each other.

When the cartoon was first published, the sentiment certainly seemed more likely to be true in theory (though not always in practice). Particularly throughout the 1990s into the mid-2000s, the internet was thought to be a safe space for engaging in a variety of identity play, and transgender individuals were uniquely poised to benefit. One’s offline identity was not always tightly bound to their online presence, certainly not closely as social network sites like Facebook might wish them to be—a change reflected in a 2015 follow-up cartoon of the dogs reminiscing about their prior anonymity. Online, trans individuals could take steps to disconnect their offline selves from their online identities, where they might adopt different names and gender identities that better reflected their own self-understanding. While I didn’t identify as transgender at the time, I nevertheless engaged in these practices myself as a teenager, often failing to ‘correct’ individuals who, presciently, assumed I was male.

However, my online life at the time was entirely pseudonymous, and I made sure to keep a certain distance between my offline and online selves. This has allowed me to keep my prior online activities (as well as my past opinions on the state of the World of Warcraft endgame) largely divorced from my current online presence. Other individuals, particularly early users whose online access came through an employer or university, may not have been able to maintain such a clean separation. Bits of one’s offline identity—elements of a legal name used for official company email address, differing names between those used in messages and those attached to email accounts, or an “official” email signature—remained connected to online activities, including posting to Usenet. For trans individuals, these traces can reveal distinctly gendered or pre-transition names, employment, or activities they might otherwise wish was not widely known.

As I get closer to a launch-ready version of the Transgender Usenet Archive, much of my attention has been focused on thinking through my ethical responsibility to these users. At the core of the project are two impulses. On one hand, I hope to increase the accessibility and reach of an important, if undiscussed, part of recent transgender history. As a consequence, however, I am giving these posts a new kind of visibility beyond the initial level of access (which, admittedly, you can already get through the Google Groups archive). Given this increased access, I am also deeply invested in conscientiously respecting not only posters’ agency as authors, but also their privacy as individuals, who may have treated their posts as ephemeral communications, not meant for academic analysis.

Because there’s not a lot of guidance for working with Usenet materials, I’ve looked to other instances where archivists faces similar concerns. Tara Robertson’s writing on the ethical implications of Reveal Digital’s scanning and posting of the On Our Backs backcatalogue (since taken down) speak compellingly to the importance of thinking carefully about consent, representation, and digital access. One difference between OOB and other digitized materials is Usenet’s status as the organizing umbrella under which a variety of public fora lived. Usenet newsgroups, and by extension users’ posts, were always ‘public’ in terms of accessibility. However, posts were not archived and made available on a mass scale until DejaNews started collecting them in 1995; the current Google archive, and thus the collections the archive is based on, are made up of what DejaNews collected, along with several other donated collections of pre-1995 material. Following DejaNews’s announcement, users concerned about privacy successfully advocated for DejaNews to adopt the the “X-No-Archive” header, which signalled a post shouldn’t be archived. However, DejaNews’s choice to respect users’ wishes to XNAY (for X-No-Archive: yes) their posts was voluntary—a policy Google (which acquired DejaNews in 2001) has continued to follow to this day.

Nevertheless, the fact users had the option to XNAY posts when they were first written doesn’t guarantee they would want their posts to be publicly available now. With contemporary indexing and archiving tools, what might have seemed “privately public” in 1997 now can be made, in incautious hands, all too public. With some fairly simple Python scripts, I’ve been able to collect, count, and index thousands of user names and emails, including building a whole network of users’ communication.

The Google Groups archive has functionally performed such indexing on a massive scale, making all of these posts (and their attached content, some of it clearly not intended for such a mass audience) available to anyone who wishes to access it. Individuals can request for archived posts be removed, but the process for doing so is opaque at best. As Andy Baio rightly notes, Google’s primary interest here is not in in acting as a good steward of the internet’s past but in maximizing profitability. In a internet landscape dominated by social network sites (including Google’s underwhelming entry into the field, Google+), personal data mining, and algorithmic filtering, Usenet is neither ripe for personal data mining nor very profitable. In fact, it’s the exact opposite: an unstructured, decentralized system now best known as a resource for illegal file sharing. Thus, there appears to be little financial incentive to investing energy into the archive.

In her discussion of the impact of Reveal’s choice to make OOB widely available, Robertson makes it a point to connect this act with the people it most directly impacts: those in the photographs. In reaching out to these individuals for their reactions, her opinion shifts as a result of her own community membership, as “‘the community’ wasn’t an abstract notion, it was the people who gave me those generous quotes. I could see their faces and empathize with their fears and feelings that institutions had screwed them over again.” These moments, Robertson suggests, require archivists, librarians, and others to act with an ethics of care, which Bethany Nowviskie argues focuses a researcher or practitioner on two key areas:

“The first is toward an appreciation of context, interdependence, and vulnerability—of fragile, little things and their interrelation. The second is an orientation not toward objective evaluation and judgment (as in the philosophical mainstream of ethics)—not, that is, toward criticism—but toward personal, worldly action and response.”

I, like Robertson, am both a professional (academic researcher, in this case) and a community member, and these roles shape my thinking. While I’m interested in making these discussions accessible, I also want to recognize and respect their contextual particularities and constraints. Robertson suggests the Zine Librarians’ Code of Ethics as source of guidance, and I’ve drawn on it in designing the Transgender Usenet Archive.

In design, I’ve chosen to take several different steps to preserve individual privacy and encourage good, respectful practice. The archive will be publicly available to anyone who wishes to use it, but accessing the archive will require users to informally agree that they are agreeing to use it for non-commercial personal, teaching, learning or research reasons only. All of the posts included in the archive have been selectively indexed and do not include headers which might contain identifiable information, such as emails and names. However, I have not altered posts’ content in any way, so any message sign-offs and email signatures that were already included in posts will appear in the archive as is.

I’ve also manually removed any 64-bit code for images (such as personal photographs, etc) that include any possibly identifying features (such as full body or face shots); these images have been marked with . There’s a long history of repurposing and reposting trans women’s photos online without their consent, and I don’t want to contribute to it through the archive. Because I can’t determine the particular provenance of these photos (especially given that many were attached to mass-mailed spam), I’ve chosen to err on the side of caution and redact these images.

Lastly, I want to do my utmost to respect and support posters’ right to refusal. Unfortunately, the scale and amount of content in the archive makes attempting to contact individual posters unfeasible. However, this post is meant to offer individuals a chance to let me know if they’d like their posts not to be included. Please feel free to reach out to me via email if you think your posts might be in the archive and would like them removed, or if you have any other questions or concerns. As part of the archive site, I’ll also be offering will be a contact form for individuals whose would like to inquire about if their posts are included in the archive.

The post Archiving Usenet: Adopting an Ethics of Care appeared first on Maryland Institute for Technology in the Humanities.

Listening for the Static

Avery Dame — Fri, 09 Dec 2016 18:11:45 +0000

This is the third in series of blog posts by 2016-17 Winnemore Digital Dissertation Fellow Avery Dame on the progress of his dissertation, “Talk Amongst Yourselves: Community Formation in Transgender Counterpublic Discourse Online,” which explores the affective and structural meanings assigned to “community” in English-language transgender discourse online.

As you can guess from my last post, I’ve been relying heavily on the Python email and mailbox modules (which inherits many functions from email) to process and analyse the Usenet collections. Instead of having to manually sift through each message, the parser identifies key information, logs it in a dictionary, and can spit it back out when called. At a practical level, using this method has saved me a considerable amount of “processing time,” so to speak. Early on, however, I noticed multiple “Nones” appearing in my results, which indicated that an attempt to access the message headers had failed. I didn’t think much of it at the time, given the size of these collections.¹ Just some static I could ignore in favor of the much more sizable noise. Then I started work on the cisgender network, and I discovered that static was actually noise as well. I just hadn’t been prepared to listen for it.

First, here’s what a raw usenet post from the collection looks like (to maintain anonymity, I’ve removed to the name/email in the “From:” line):

—-
From -8946248053963491671
X-Google-Language: ENGLISH,ASCII-7-bit
X-Google-Thread: 10857f,b3db99cd0296b805
X-Google-Attributes: gid10857f,public
From: Email-Address (Name)
Subject: Re: New Member
Date: 1997/11/17
Message-ID: <3470d5d4.1323094@news.lineone.net>#1/1
X-Deja-AN: 290289518
References: <01bcf387$c1ae6e60$0202010a@hp-customer> <64q94t$aj4@mtinsc03.worldnet.att.net>
Organization: British Telecom
Newsgroups: alt.support.crossdressing

Hello April———-enjoy the ride!!————-Joanne x
—-

As you can see, each post includes a header with a variety of associated metadata and then the text of the message itself. The collected Usenet postings, by and large, follow the conventions of email formatting at the time, with From, Subject, Date, and Message ID headers, along with a variety of Usenet specific or non-standard headers added by news clients or servers (designated by the “X-” prefix). Because these collections were scraped from the Google Groups format, every message header begins with “From” and the unique message ID assigned by Google, followed by a set of proprietary, non-standard headers.

As part of building my network, I collected the content of all messages indexed as part of the network in a .txt file. Some of these messages, however, began at seemingly random points in the body of a message, even though the original messages in the collection had all of the necessary information, including headers. Yet when I tried to find a cause, there were no immediately apparent similarities in the messages which came up, nor any less “visible” options like invisible characters.

As I found (with the excellent help of Ed Summers), these message were the empty “Nones.” As noted earlier, I’ve been relying on the pre-built Python parser to successfully identify the start of each message. The parser determines the start of a message using headers defined by RFC (Request for Comments) 2822, or searches for “a single envelope header, also known as the Unix-From header or the From_ header.” In the mailbox format, the envelope header functions as a separator to indicate the start of a new message. In practice, though, the parser flags all new lines that begin with “From ” as the start of a new message and searches for the defined headers. In most instances, however, developers follow the advice outlined in the documentation on the mbox format, RFC 4155: “Many implementations are also known to escape message body lines that begin with the character sequence of “From “, so as to prevent confusion with overly-liberal parsers that do not search for full separator lines.”

The Python parser, it turns out, is an overly liberal parser. Because it was matching any instance of newline + “From ”, it read all sentences beginning with “From ” as the start of a new message—which, of course, lacked any recognizable headers. When outputting the message content to my “collector” file, the “From” line was skipped and each message began on the next line down, resulting the apparent randomness of the message’s beginning.

Solving this problem, however, was somewhat more complex. I had two options: write a module that adapts the existing parser for my purposes or create a module that made a duplicate of the mailbox edited to prevent inappropriate flagging. Given my current schedule, I opted for the latter approach. However, for both there was a combination of factors made this task particularly thorny.²

A) Because of overly-liberal parser design, the mailbox has (at least initially) to be read line by line.

B) I didn’t want a solution that unnecessarily “cleans” the data by removing the proprietary Google headers. Also, removing the headers a) doesn’t change core problem with the parser and b) necessitates the creation of a replacement envelope header.

C) The Google header being read as the envelope header doesn’t match the RFC standard for mailbox separator lines (IE: From foobar@gmaill.com Wed Jan 25 21:37:37 2017), so existing email-based solutions weren’t immediately helpful.

D) Lastly, whatever I wrote had to be able to differentiate between Google’s proprietary header, whose content was consistent in format (“From ”, sometimes a -, and a series of digits), and sentences beginning with “From ”, which were entirely inconsistent.

My current solution, while not technically elegant, uses this consistency to its advantage. Because the Google-specific message ID is always numerical, I know the seventh character (index location 6) will always be a number. In contrast, this combination occurs very rarely in the message text itself. Instead, all instances of “From ” that don’t have a digit at index 6 are changed to “xFrom ” in the new file. The module then does a pass of the new file, checking the end of “From ” lines for a digit. Any lines that don’t have a digit are printed in a separate log, so they can be manually checked and edited if necessary.

At a later point, I would like to sit down and write a Usenet-specific parser, adapted to account for this issue and Usenet-specific headers. After all, this process is by no means foolproof—as illustrated by the necessity of doing a manual check afterwards. Nevertheless, for me performing the manual check has served as a small, subtle reminder to “listen” to all of the information I received, not just that which seemed to sound “right.”

¹Part of this assumption was also based in the formatting of “spam” messages in the collection. In order to avoid being auto-cancelled (blocked from posting), contemporary mass-mailed Usenet spam often uses non-standard emails or other methods to avoid being flagged by cancel bots. I initially assumed their non-standard formatting was being misread by the parser, but this was not the case.

²Some of these issues are, no doubt, why institutional archives like the Smithsonian use MBOX as a stepping-stone before converting the files to XML.

The post Listening for the Static appeared first on Maryland Institute for Technology in the Humanities.

Visualizing Poster Activity on Usenet

Avery Dame — Thu, 17 Nov 2016 20:38:07 +0000

This is the second in series of blog posts by 2016-17 Winnemore Digital Dissertation Fellow Avery Dame on the progress of his dissertation, “Talk Amongst Yourselves: Community Formation in Transgender Counterpublic Discourse Online,” which explores the affective and structural meanings assigned to “community” in English-language transgender discourse online.

One of the biggest challenges of working with Usenet Collections is their sheer size. For my five newsgroup collections, the average message count is between roughly 50,000 to 100,000 per collection. (To place that in context to recent news stories, presidential candidate HIllary Clinton’s private email server held 62,320 total emails.) Though it’s not too sizable in storage terms (all five collections add up to about 1 GB total), it’s definitely a lot of data for a close discourse analysis. Complicating the process further is that many of the messages held in these collections also aren’t relevant to my specific research questions. That’s also a lot of information to hold in a single location, particularly as an archive. Unlike the anonymous “generous donor” who initially collected all of the various newsgroup messages, I’ll be making deliberate, intentional choices regarding what to include, how to present the messages, and what information should be indexed. Given this, I’ve moved to using the term “collections” to describe the data as it is now.

I’ve also been slowing my pace a bit in order to think carefully about what the archive might look like. Recently, I’ve focused my energy on spending a lot of time with the data, in order to get a better sense of how it should be structured, the technical challenges I might face, and what ethical questions I should consider. Part of this process has been doing a lot of scraping, counting, and visualizing, in order to put my numbers in (some) perspective. Now, these aren’t perfect tools, but I have been able to identify the active posters, cross-posting habits, and a rough network of posts using “cisgender” and variants of the term.

I’ve put all of these visualizations up on my site, with some description about their significance and my collection methodology (with links to the modules on GitHub). From these exercises, I’ve learned that these newsgroups were similar to non-transgender newsgroups in poster activity, with a small handful of highly active posters who make up a sizable chunk of the messages collected. Users primarily posted to one or two newsgroups at a time, and there are some interesting differences in both what’s recorded in the collections and how users cross-posted. There’s not a lot of crossposting between the two newsgroups with “transgendered” in the name, alt.transgendered (AltT) and soc.support.transgendered (SST), but there is a lot of cross-posting between SST and alt.support.srs (SRS). In contrast, the two major crossdressing groups, alt.fashion.crossdressing (AFCD) and alt.support.crossdressing (ASCD) have almost equal patterns of single newsgroup posting and cross-posting between themselves. These differences raise interesting questions I hope to address in a close analysis using the archive, once it’s launched in the next few weeks.

However, I also wanted to spend a little more time talking about my initial network analysis, because I think it’s indicative of some of complexities of working with Usenet data. One of my key research questions is how Usenet facilitated the spread of the term “cisgender.” As far as I’ve found, the term or its variants don’t appear in movement publications during the 1990s. However, it eventually became ubiquitous in transgender discourse. How could that be, if it wasn’t in active use in print publications? This takes me to the internet, the other major (recorded) hub of transgender discussion at the time.

The term’s origins are unclear, and its corresponding Wikipedia (the unofficial arbiter of its history) reflects this lack of clarity. The page did at one point cite two Usenet users, Carl Bujis posting in soc.support.transgendered in 1996 and Dana Leland Defosse, posting in alt.transgendered in 1994, as separately originating the term.¹ However, the validity of these claims were challenged as not being from “reliable sources” and subsequently removed. Usenet connections are made elsewhere as well: In the official Oxford English Dictionary (OED) definition, the earliest use example cited is from Usenet. For my research, I’m not particularly interested in finding a definitive origin point, but I am curious about what might have facilitated the sudden increase in use.

This leads me back to Usenet. As I noted in my post contextualizing Usenet, part of why spam was such an issue was how (relatively) easy it was to post and cross-post to multiple groups. This meant posts could spread widely and possibly be seen by a sizeable audience. Curious about how widespread the term was in the collected I collected information on all posts (identified by their unique Message ID) that used the term and its variants (cisgendered, cis-gender, cis-gendered, and cis), and the posts referenced in the “References” header (or previous posts in the conversation).² The References header is by no means a perfect tool, though. According to the documentation, the References header in Usenet messages was supposed to “allow messages to be grouped into conversations by the user interface program.” However, programs were required to include only “a reasonable number of backwards references” if the list got too long. Thus, not all of a conversation was recorded in the header. Furthermore, some messages weren’t collected at the poster’s request, so their trace exists in a unique Message ID with no data.

Nevertheless, the network I built (visualized using the OpenOrd layout) gives you an idea of the amount, activity level, and connections between posts. Each node is a unique posting. Nodes are sized and colored according to their degree of connection to other nodes, and labeled using their Message ID. Posts with just a Message ID and no extra information (original/reply, year, etc.) were not held in any of the collections.

What does this show? Firstly, that the term appeared frequently on Usenet in several venues: ASCD and SST. I’ve specifically chosen appeared instead of “used” because Usenet posters often quoted each other using big chunks of one another’s text. So, a term could appear in many posts, but only in quotes and not by the individual poster. So the term gains visibility even if it isn’t adopted by others. Furthermore, big numbers don’t always equal long threads (as far as the collections show). While several posts sparked a high level of conversation (large nodes), most were short threads or single responses. Lastly, activity is date-limited: The vast majority of post activity occurs between 1996-2006—right around when social media platforms like Myspace and Facebook really begin to take off. Most surprising to me, however, was the high incidence of posts in crossdressing groups. What little literature that exists on trans Usenet focuses on AltT and SST as the “big two” of Usenet, but AFCD and ASCD were active and influential in their own right. In ASCD in particular cisgender and variants appear the most, even though the group isn’t mentioned in the print archives as a major hub of discussion.

In multiple ways, then, making this network challenged either popular received knowledge about “cisgender” or my own assumptions about what trans Usenet looked like. The numbers can’t tell the whole story, though. Understanding how these posts connect to each other requires a close discourse analysis of individual posts and the connections I’ve visualized here. Otherwise, it’s just a bunch of nodes on a graph: attractive to look at, but not meaningful in any particular way. Instead, this kind of project requires meeting big data with a fine-grained attention for detail that attempts to get at the content of discussions, in order to give those “big data” numbers meaning and context.

¹ My data collection actually raises questions about the received narrative for who “first” uses “cisgender.” In 1994, 5 months after Defosse posts, another user posts in the same newsgroup about “cis-gendered, narrow-minded people,” with no clarification as to what the term means.

² Prior to collecting my data, I also checked each message’s content against an automatically generated list of possible common misspellings. However, this process produced no hits.

The post Visualizing Poster Activity on Usenet appeared first on Maryland Institute for Technology in the Humanities.

A Block List Against Hate

Ed Summers — Thu, 27 Oct 2016 11:10:33 +0000

Twitter User Identifiers

Two weeks ago a group of students, scholars and activists gathered in the evening at MITH for an event called the Night Against Hate. Our goal was to spend two hours working together to link groups and individuals documented in the Southern Poverty Law Center’s Extremist Files to their respective Twitter accounts in order to:

assist social media researchers who are studying the ways that these groups are operating online
provide an opportunity for folks to respond constructively to the rising tide of hate we are witnessing in online and offline spaces

We also had the very practical goal of creating a Twitter block list that would allow you to prevent extremists identified by the SPLC from tweeting into your timeline. This blog post is a quick update to provide some information about how to obtain and use this block list. Look for more information about what we learned in the process of putting this event on in the coming weeks.

Here’s what you need to do to use the block list:

Right-Click on this link to the block list and select Save Link As or your browser’s equivalent.
Go to your Twitter Settings, select Blocked Accounts in the menu on the left, click Advanced Options and then select Import a List from the dropdown.
When you click Attach a file to upload you will prompted to provide the location of the block list file you downloaded.
Once you’ve submitted the file you will see a list of the Twitter accounts present in the block list and have the opportunity to select/deselect them.
Click Block.

Here’s a video with a little bit more commentary if you would like to see this process in action before trying it yourself.

Over fifty local and remote participants worked during the two hours in Google Sheets and Slack to link and verify 89 of the 169 individuals and groups present in the Extremist Files. A big thank you to Amanda Visconti at Purdue (and MITH alum) who quickly got us set up on Digital Humanities Slack to provide a place for remote participants to ask questions and coordinate work.

The full results of this collaboration can be found in a Google Sheet. As you can see we also ended up attaching Facebook, YouTube, Tumblr and websites where possible. The code for collecting the SPLC data and for creating the block list from the Google Sheet is available on Github.

A few more things about the block list are worth noting in case you end up doing this kind of work yourself. The block list itself must contain Twitter user identifiers (numbers) instead of the Twitter handles. That’s why we wrote a program to get the handles from the spreadsheet and fetch the user identifiers from the Twitter API. We did consider making the block list available using the BlockTogether service, which would allow the list to be shared more easily. However BlockTogether associates the block list with a given user’s account, and we didn’t want to mix the Extremist Files accounts with our other blocked accounts. Finally, your block list file of numeric identifiers cannot end with a newline or else the Twitter import mechanism gets stuck. At least that was the case when this blog post was written.

The post A Block List Against Hate appeared first on Maryland Institute for Technology in the Humanities.

“If it gets us talking, it can’t be bad:” Building the Transgender Usenet Archive

Avery Dame — Wed, 19 Oct 2016 18:01:36 +0000

“If only one life is saved by the creation of this group, wouldn’t it be worth it? It’s only a communications medium, and people are needlessly losing their lives and wasting their potential in self-destructive, maladaptive, denial-bases coping strategies. The loss to our society is great, and needless…If it gets us talking, it can’t be bad.” – Anonymous, SST — an early history (part 2) (soc.support.transgendered)

You wouldn’t have the transgender movement as it is today without the Internet. Widespread public internet access played a key role in the transgender movement’s growing visibility at the national level during the 1990s. Access to the Internet mitigated many issues that had limited other organizing efforts, like geographic limitations and the sometimes-lengthy publication arc of print media. From the earliest days of Fidonet, trans individuals have made spaces for discussion and resource-sharing online. Some of these spaces were hosted on Usenet, a decentralized, worldwide discussion system founded in 1980 and organized around topic-specific newsgroups. Usenet, as a communications network, is an influential predecessor to modern social media platforms and the origin point for now-common bits of contemporary Internet vocabulary like “spam.”

Amongst its many newsgroups was a small collection of important transgender-related forums, the five most active being alt.transgendered, soc.support.transgendered, alt.support.srs, alt.support.crossdressing, and alt.fashion.crossdressing. As the anonymous poster in the opening quote notes, these spaces offered folks the opportunity to communicate and find support, without falling into “maladaptive” coping strategies. Discussions were active and sometimes highly contentious, as posters—some of them major figures in transgender political activism at the time—discussed and debated key issues of the day in transgender politics.

These newsgroups are at the center of my project as a Winnemore Digital Dissertation Fellow for this year. As a Fellow, I’ll be building a public archive of posts from these five groups using the Bookworm API and data from the Internet Archive’s Usenet Historical Collection. This archive will form a key part of my work, a case study focused on how posters use the term “cisgender” in their discussions. These groups are one of the few archival locations where participants regularly used the term, and several origin narratives point to different newsgroups as being the where it was first used. For my project, however, I’m not interested in origins so much as the specific contexts it was used in and how posters connected this use to their broader understandings of “transgender community.” This follows the focus of my larger dissertation, which explores the affective and structural meanings assigned to “community” in English-language transgender discourse online.

Beyond my own project, though, I’ll also be thinking and writing about the mechanics of Usenet-related research in general. Archival Usenet research can face significant barriers and raises important ethical questions about the afterlife of data. Over the coming year, I’ll be writing and posting about my process here on the MITH blog, my own blog, and (occasionally) on Twitter. Some of these posts will be about the technical and ethical challenges of the project, offering a window into I’m thinking through them. I’ll also be sharing some of my early findings and other interesting things I encounter in the archive during my research.

The post “If it gets us talking, it can’t be bad:” Building the Transgender Usenet Archive appeared first on Maryland Institute for Technology in the Humanities.

Come join MITH Thursday 10/13 for a Night Against Hate!

MITH — Tue, 11 Oct 2016 14:56:09 +0000

Please join us at MITH (and remotely) this Thursday to gather with others looking to learn from each other about how to investigate and thwart hate speech in social media. At our Night Against Hate event we will collaboratively try to link the Southern Poverty Law Center’s Extremist Files to social media accounts. This list can then be used by researchers to examine the effect that these groups are having online. In addition, we hope to use this event to learn from each other about emerging tools and techniques of self care while working online.

We will be collaboratively editing a Google Sheet that is loaded with the SPLC data. So in order to participate you will need a Google account, or to work with someone else who has one. To facilitate remote (and on site) participation we will also be gathering in the #resisthate channel in the Digital Humanities Slack. If you are going to be working remotely and aren’t already a part of DH Slack please join by filling out this form. DH Slack’s Code of Conduct provides a constructive and harassment free space for this work to happen in.

To help us organize the event, we will be working in 30 min (ish) sprints with short 10 minute breaks in-between to share reflections of this work, techniques for self-care, and emerging research questions.

We encourage folks joining us remotely to work in pairs or with friends. Always take a break when you need it. We will have moderators attending to the Slack Channel and Twitter #ResistHate. Share your tips and methods for self-care at any time.

Please use the hashtag #resisthate if you are sharing the event in social media.

will be provided for those who can make it to MITH!

The post Come join MITH Thursday 10/13 for a Night Against Hate! appeared first on Maryland Institute for Technology in the Humanities.

Henry Lovejoy Digital Dialogue

Digital Dialogues — Tue, 01 Mar 2016 01:30:12 +0000

Knowing when and where people came from within Africa, and when and where they went in diaspora, is a major research question affecting the history of the continent and the broader Atlantic world. My proposed solution is to initiate the process of creating the framework to standardize Africa’s geo-political history. Creating a broadly-accepted core of knowledge about the geographic, political and migratory history of Africa along a cartographic timeline will provide new insight, methods and solutions to research transformations to the continent, but also the origins of people absorbed into the trans-Atlantic slave trade and the history of the African diaspora. This talk will examine the current state of Digital Humanities in the discipline of African and African Diaspora History by focusing on approaches, strategies and challenges to integrating a proposed project called “West Africa Historical GIS” with the Liberated Africans Project, which will reconstruct widely dispersed archival evidence from a transnational collection of primary sources made by some of the world’s earliest international human rights courts.

These combined projects examine the enduring interest in the memory of slavery through evidence that allows rebuilding the life histories for tens of thousands of Liberated Africans throughout the Atlantic World. The long-term outcome will be a dynamic website to explore the history of antislavery and international human rights law, as well as the demography of the post-1807 trans-Atlantic slave trade, principally from the perspective of the Africans involved.

See below for a Storify recap of this Digital Dialogue, including live tweets and select resources referenced by Lovejoy during his talk.

The post Henry Lovejoy Digital Dialogue appeared first on Maryland Institute for Technology in the Humanities.

Matthew Lincoln Digital Dialogue

Digital Dialogues — Tue, 20 Oct 2015 13:00:01 +0000

“In the context of research, a model is an experimental device, modelling an experimental technique.” Willard McCarty, Humanities Computing.

What is a research model, and what is an experiment, in the context of art history? As we begin to compute data troves derived from catalogues raisonné and museum collections in new ways, we are challenged to grapple seriously with how to map different computational models (e.g. spatial, network, visual) to historical models of society, market, religion, gender, and more.

My talk will focus on my in-progress dissertation “Modeling the Network of Dutch and Flemish Print Production, 1500–1700”, in which I adapt existing museum collections databases in order to analyze large-scale changes in the organizational patterns of reproductive printmakers and publishers in the Netherlands during the sixteenth and seventeenth centuries. I will discuss the importance of formal network concepts to understanding artistic print production, and demonstrate how multiple analytical perspectives, including both measurement and descriptive analysis, as well as simulation modeling, compel us to revisit standing narratives and methodologies. This attentiveness towards computational modeling and the concept of the humanistic model in general, I will argue, has particularly high stakes for art historians as we continue to construct and evaluate the relationships between our historical narratives and the objects from which we derive them.

See below for a Storify recap of this Digital Dialogue, including links to resources and projects that Lincoln referenced during his talk.

The post Matthew Lincoln Digital Dialogue appeared first on Maryland Institute for Technology in the Humanities.

James English Digital Dialogue

Digital Dialogues — Sun, 20 Sep 2015 13:00:52 +0000

Scholars of contemporary fiction face special challenges in making the turn toward digitized corpora and empirical method. Their field is one of exceptionally large and uncertain scale, subject to ongoing transformation and dispute, and shrouded in copyright. I will present one possible way forward, based on my work for a special issue of Modern Language Quarterly on “Scale & Value” that I’m co-editing with Ted Underwood. My project uses quantitative relationships among mid-sized, hand-made datasets to map the field of Anglophone fiction from 1960 to the present. Some significant findings of this research concern a shift in the typical time-setting of the novel and a concomitant change in the relationship between literary commerce and literary prestige.

See below for a Storify recap of this Digital Dialogue, including links to resources and projects that English referenced during his talk.

The post James English Digital Dialogue appeared first on Maryland Institute for Technology in the Humanities.

Paul Jaskot Digital Dialogue

Digital Dialogues — Mon, 23 Mar 2015 12:00:59 +0000

Please note that this Digital Dialogue is a special co-sponsored talk in conjunction the Art History & Archaeology Department, and occurs on a different weekday and location.

The Michelle Smith Collaboratory for Visual Culture is located in Room 4213 of the Art and Sociology Building.

The Central Building Office at Auschwitz was for its time one of the largest architectural offices in Europe with over 150 SS architects and engineers employed as well as an equal number of forced-labor draftsmen. It was these architects who literally built the infrastructure of imperialist expansion in the East, as well was the brutal complementary structures of the Jewish genocide.

This talk analyzes the documentary evidence of the imperial ambitions of the SS as well as the digital visualizations of that archival evidence. Building off of his current work on digitally mapping the site (with his co-author, Anne Kelly Knowles), Jaskot asks what is at stake for digital mapping in the humanities, as well as for a spatial and architectural understanding of the Holocaust.

The post Paul Jaskot Digital Dialogue appeared first on Maryland Institute for Technology in the Humanities.