A few weeks ago MITH announced that it will be partnering with Washington University in St Louis (WUSTL) and the University of California at Riverside (UCR) on a new project called Documenting the Now. Documenting the Now is aimed at accomplishing two different, but deeply interrelated goals. The first is to develop an open source Web application called DocNow that will allow researchers and archivists to easily collect, analyze and preserve Twitter messages and the Web resources they reference. The second is to cultivate a much needed conversation between scholars, archivists, journalists and human rights activists around the effective and ethical use of social media content. We are very grateful to the Andrew Mellon Foundation for its generous support that will allow us to pursue these goals over the next two years.

As you can imagine, realizing the second goal is really a prerequisite for achieving the first. We want the DocNow application to reflect the use cases and requirements that emerge from the conversation. But at the same time, in order to have a meaningful conversation we need to know what is possible when it comes to collecting, analyzing and preserving social media content. That’s why we will be doing both concurrently, starting with our hand picked board of advisors, and software development team … and you, if you are interested. We will use early prototyping on the DocNow application to drive the conversation, and inform its continued development. We’re just getting started but this blog post provides a brief look at how the project came to be, some initial ideas we have for directions to head in, and how you can get involved if you are interested.

Background

I say we are just getting started, but the seeds for Documenting the Now can be found back in 2014 at a meeting of the Society of American Archivists in Washington DC. Although separated in space, many of the conversations at the conference in DC centered on the ongoing protests over the killing of Michael Brown, an unarmed African American teenager, by white police officer Darren Wilson. News of the killing and ongoing protests spread initially in social media.. Even as traditional media began reporting on the story, their narrative was challenged, and reframed by the conversation in Twitter. While the democratizing role of social media is ideologically complex, Sarah Jackson and Brooke Foucault Welles have uncovered evidence that in Ferguson, Twitter allowed individual initiators to raise awareness about the events in the initial hours following the death of Michael Brown:

African-Americans, women, and young people, including several members of Michael Brown’s working-class, African-American community, were particularly influential and succeeded in defining the terms of debate despite their historical exclusion from the American public sphere. This highlights democratic potentials within the networked public sphere, particularly vis-à-vis the discursive labor of members of American counter publics willing to contribute collective knowledge and critiques to the process of making sense of community crisis.
Jackson & Welles, 2015, p. 412

The archivists in DC that week were particularly attuned to the importance of this documentation that was being unfurled into Twitter, and outwards onto the Web as photographs, video and audio—some of it being livestreamed to a global audience. Bergis Jules and myself took part in that conversation at SAA, and resolved to do what we could to collect the Twitter conversation as best we could, for researchers now and in the future.

Of course, just as the deaths of countless other people of color at the hands of police preceded Michael Brown they tragically continued over the next few months: Tamir Rice, Eric Harris, Walter Scott, Jonathan Ferrell, Sandra Bland, Samuel DuBose and Freddie Gray. The only difference this time was that the traditional media outlets began to pay more attention, as the Black Lives Matter movement started by Alicia Garza, Patrisse Cullors, and Opal Tometi in 2013, accelerated in towns and cities across the United States.

Bergis and I continued our data collection work over the past year. You can see a description of some of these datasets in this TimeMapper visualization. We also began to write about this process of data collection and analysis online at Medium in the On Archivy column. We were pleased to receive useful feedback and additional contributions from the Web archiving community.

Over this same period MITH participated in a series of BlackLivesMatter teach-ins at UMD. These workshops were aimed at helping students and faculty contextualize the events in Ferguson, and help make them it part of their study. As a result we began to get requests from researchers such as Ernesto Calvo in Political Science as well as Rashawn Ray and Melissa Brown in Sociology who wanted to use the data we had collected in their research. Bergis and I also began conversations with Meredith Evans at WUSTL to see if there was a way to collaborate with them on their Documenting Ferguson archive. We wanted to help WUSTL figure out how the pivotal material in social media could form part of their archive.

It was at this time that we realized the potential for the Documenting the Now project. While many tools existed for collecting Twitter data, none were particularly suited to the needs of archivists who need to not only collect, but also appraise the material referenced in this conversation: the text, images, audio and video that is out on the Web. We didn’t have an ethical framework for involving the content creators in this process. We needed meaningful and workable models that would allow archivists to engage with content creators, or the initiators that Jackson and Welles describe. How can we build collections that are useful for researchers studying events like those in Ferguson, while also respecting the rights of the content creators, and Twitter’s Terms of Service?

Social Media Archiving

Another important dimension to our work, is the relationship between the DocNow application we are building to other projects in the Web archiving space: specifically the Social Feed Manager from George Washington University and Rhizome’s WebRecorder project.

Bergis was instrumental in helping start the SFM project at GWU, so we have a vested interest in using or at least interoperating with it. SFM’s scope is simultaneously a bit broader and a bit narrower than DocNow’s. SFM is best thought of as an extensible framework for collecting data from multiple social media sites including Flickr, Weibo as well as Twitter. DocNow on the other hand is narrowly focused on Twitter, at least to start with. The reason for this is that DocNow is going to be an environment for viewing, selecting and curating content from Twitter. SFM has explicitly kept access, discovery and analysis of content out of scope. We plan to work closely with the SFM team to make sure that DocNow will interoperate with SFM at the data layer. One specific use case we will be looking at is functionality that would allow data collected with SFM to be imported into DocNow for analysis and curation.

Another area in which SFM and DocNow differ is in their approach to archiving the Web. In DocNow we are explicitly interested in using the social media stream as a lens for finding and evaluating Web content. This idea is not unique to DocNow: it has been explored by the British Library in their TwitterVane experiment, and is currently being investigated by the NSF funded EventsArchive project at Virginia Tech, as well as the iCrawl project at the L3S Research Center. While we certainly will be paying attention to these ongoing efforts we think our approach in DocNow is going to be substantially different because our primary use cases involve curation, and the appraisal of content, that directly considers the role of content creators.

DocNow’s need for Web archiving functionality is why we are extremely interested in using Rhizome’s WebRecorder project. WebRecorder provides an open source, curator oriented environment for collecting and archiving Web content. Unlike more traditional automated approaches to Web archiving it uses the attention of the curator to guide preservation, and the curator’s browser as an intrinsic part of the process. One specific way that we are hoping this collaboration will take shape is in the contextualization of Web content. As content is being collected from the Web can archivists and researchers apply notes about why the resource was collected, and potentially document interactions with the content’s creator? WebRecorder was funded by Mellon in the same cycle as Documenting the Now, so we have a very real incentive to align the two projects.

One final area of development for DocNow that hasn’t been mentioned yet is our approach to visualization and analysis. In order to allow curators and archivists to build collections of social media and Web content we will necessarily need to build views into the collected data. We can anticipate a set of views, or a dashboard of sorts that provides insight into the conversation and the Web content, as well as functionality to collect and annotate it. But we also know that we will not be able to fully anticipate all the needs of all research questions.

Of course this uncertainty is what makes research interesting: asking questions that have never been asked before. So we want to build DocNow so that it provides a workspace for more knowledgeable users to run their own analysis. We are particularly excited about the work coming out of the Web Archives for Historical Research Group at the University of Waterloo, who are using Jimmy Lin’s WarcBase and Apache Spark as a platform for analysis of Web archives. We are hoping that Web accessible workspaces like SparkNotebooks or Jupyter notebooks the are embedded in DocNow could provide a compelling environment for research use of the data collections. It is still early days, but if this seems like an interesting avenue to explore please get in touch. We are hiring part time developers and designers to help us out with the work.

Silences in the Archive

Central to our work in Documenting the Now is the humanist’s awareness that what we know of history is deeply tied to the traces that are created and remain in our archives to be accessed and analyzed by researchers. How is our knowledge today shaped by the silences that our archives contain? Trouillot’s notion of historical production lays bare this process, in which archives form such a crucial element:

Silences enter the process of historical production at four crucial moments: the moment of fact creation (the making of sources): the moment of fact assembly (the making of archives); the moment of fact retrieval (the making of narratives); and the moment of retrospective significance (the making of history in the final instance).

(Trouillot, 1997, p. 26)

We want Documenting the Now (the conversation and the application) to embody this process, by allowing this new cultural and historical material to be archived, so that narratives and history can be made of them. This isn’t a process that a single institution or organization can do alone. It is a shared responsibility that spans disciplines, professions and technical frameworks.

So, this is the challenge that we’ve signed up for. We are extremely fortunate to be joined by a distinguished board of 18 advisors from the fields African American studies, communication, journalism, digital libraries as well as practicing archivists, journalists and technologists. You can see a list of them below. You can also expect to hear more about our work as it develops here on the MITH blog, on our (currently minimalist) project website where you can also sign up for an occasional newsletter. If you are really interested in participating we will let you know about our regular community calls, where we will discuss recent developments and use cases you might have.

Our Advisory Board

In alphabetical order

Natalie Baur
Natalie Baur
Archivist
University of Miami

Meredith Clark
Meredith Clark
Assistant Professor of Journalism
University of North Texas

Tressie McMillan Cottom
Tressie McMillan Cottom
Assistant Professor of Sociology
Virginia Commonwealth University

brian-dietz
Brian Dietz
Digital Program Librarian
North Carolina Statue University

Jarret Drake
Jarrett Drake
Digital Archivist
Princeton University

Meredith Evans
Meredith Evans
Director
Jimmy Carter Presidential Library and Museum

Jonathan Fenderson
Jonathan Fenderson
Assistant Professor of African and American Studies
Washington University in St. Louis

Deen Freelon
Deen Freelon
Assistant Professor of Communication
American University

jessica-johnson
Jessica Johnson
Assistant Professor of History
Michigan State University

robin-katz
Robin Katz
Public Services Librarian
University of California at Riverside

david-kim
David Kim
Mellon Postdoctoral Fellow
Occidental College

Marc Anthony Neal poses for a portrait.
Mark Anthony Neal
Professor of African American Studies
Duke University

michael-nelson
Michael Nelson
Professor of Computer Science
Old Dominion University

Yvonne Ng
Yvonne Ng
Senior Archivist
WITNESS

matt-phillips
Matt Phillips
Lead Developer
Library Innovation Lab
Harvard University

Rashawn Ray
Rashawn Ray
Assistant Professor of Sociology
University of Maryland

nicholas-taylor
Nicholas Taylor
Web Archiving Service Manager
Stanford University

Dexter Thomas
Dexter Thomas
Writer
Los Angeles Times

Stacie Williams
Stacie Williams
Learning Lab Manager
University of Kentucky

Micah Zeller
Micah Zeller
Copyright and Digital Access Librarian
Washington University St Louis

References:

  • Jackson, S. J & Welles, B. F. (2015). #ferguson is everywhere: initiators in emerging counterpublic networks. Information, Communication & Society, 19, 3. Retrieved from http://www.tandfonline.com/doi/full/10.1080/1369118X.2015.1106571.
  • Trouillot, M. R. (1997). Silencing the past: power and the production of history. Boston, Massachusetts: Beacon Press.