Data Curation – Maryland Institute for Technology in the Humanities https://mith.umd.edu Thu, 08 Oct 2020 19:59:46 +0000 en-US hourly 1 https://wordpress.org/?v=5.5.1 The Cleaners: Movie Night (Oct 30) https://mith.umd.edu/the-cleaners-movie-night-oct-30/ Mon, 07 Oct 2019 17:29:32 +0000 https://mith.umd.edu/?p=20796 The Cleaners (2018) Please join us in MITH on October 30, 2019 (All Hallows' Eve Eve) from 6-8pm for a screening of The Cleaners, a documentary which provides an in depth look at the hidden labor of content moderation that makes today's social media platforms possible. Once the dream of Silicon Valley tech [...]

The post The Cleaners: Movie Night (Oct 30) appeared first on Maryland Institute for Technology in the Humanities.

]]>
The Cleaners

The Cleaners (2018)

Please join us in MITH on October 30, 2019 (All Hallows’ Eve Eve) from 6-8pm for a screening of The Cleaners, a documentary which provides an in depth look at the hidden labor of content moderation that makes today’s social media platforms possible. Once the dream of Silicon Valley tech startups, the democratization of web publishing has brought huge challenges to the mega-corporations that run today’s social media platforms, as they struggle to prevent the viral spread of online hate, violence and abuse.

Key to these moderation systems are large numbers of human moderators, who interpret community guidelines, and sometimes clandestine content rules, in order to decide what content will remain online. As Sarah Roberts details in her book Behind the Screen (a recent Digital Studies Colloquium pick) commercial content moderators work behind the scenes, in remote locations and precarious working conditions, where they are often subjected to a barrage of unsettling material that can leave lasting psychological and social impacts.

A brief discussion will follow the screening. Popcorn and soda pop will be available, but feel free to bring some take-out or some pre-Halloween candy.

The post The Cleaners: Movie Night (Oct 30) appeared first on Maryland Institute for Technology in the Humanities.

]]>
MITH welcomes T’Sey-Haye Preaster https://mith.umd.edu/mith-welcomes-tsey-haye-preaster/ Thu, 13 Dec 2018 16:46:47 +0000 https://mith.umd.edu/?p=20402 We are excited to welcome T'Sey-Haye Preaster to the MITH team as the Project Coordinator for the second phase of the Documenting the Now project, generously funded by The Andrew W. Mellon Foundation.

The post MITH welcomes T’Sey-Haye Preaster appeared first on Maryland Institute for Technology in the Humanities.

]]>
T'Sey-Haye Preaster
We are excited to welcome T’Sey-Haye Preaster to the MITH team as the Project Coordinator for the second phase of the Documenting the Now project, generously funded by The Andrew W. Mellon Foundation. T’Sey-Haye has already been on the job since late October contributing ideas and helping the DocNow team get started on the next phase of our work.

Prior to joining MITH, T’Sey-Haye was key in making sure that the “Intentionally Digital, Intentionally Black” conference hosted by the AADHum initiative in October of this year came off so successfully. At that time, she was a member of the Marketing and Communications Office in the College of Arts and Humanities.

Check out her biography, follow her on Twitter, and look for her byline here talking about the exciting things happening on the Documenting the Now project.

Welcome T’Sey-Haye!

 

The post MITH welcomes T’Sey-Haye Preaster appeared first on Maryland Institute for Technology in the Humanities.

]]>
Documenting the Now Phase 2 https://mith.umd.edu/documenting-the-now-phase-2/ Tue, 16 Oct 2018 21:01:04 +0000 https://mith.umd.edu/?p=20320 With a $1.2 Million grant from The Andrew W. Mellon Foundation, The Maryland Institute for Technology in the Humanities in the College of Arts and Humanities at the University of Maryland, Shift, and the Department of Media Studies at the University of Virginia (UVA) will collaborate to lead the ongoing work of the Documenting the Now project.

The post Documenting the Now Phase 2 appeared first on Maryland Institute for Technology in the Humanities.

]]>
DocNow2

With a $1.2 Million grant from The Andrew W. Mellon Foundation, The Maryland Institute for Technology in the Humanities in the College of Arts and Humanities at the University of Maryland, Shift, and the Department of Media Studies at the University of Virginia (UVA) will collaborate to lead the ongoing work of the Documenting the Now project. Started in 2014 with a grant to Washington University in St. Louis in partnership with the University of California, Riverside and MITH, Documenting the Now is committed to developing tools and community practices that support the ethical collection, use, and preservation of social media and web archives. Continuing the important work the project has accomplished over the past four years, the second phase of Documenting the Now will be focused on three interdependent strands of activity: software development, pedagogy, and engagement with community-based archiving of social justice activism.

Leading this second phase of Documenting the Now will be Trevor Muñoz, Interim Director of MITH & Assistant Dean for Digital Humanities Research at UMD who will serve as the Principal Investigator and the Administrative Lead; Bergis Jules, Director of Equity Initiatives at Shift Design Inc who will serve as a Co-Principal Investigator and the Project Director; Dr. Meredith Clark, Assistant Professor in the Department of Media Studies at UVA who will serve as a Co-Principal Investigator and Academic Lead; and Ed Summers, Lead Software Developer at MITH who will be the project’s Technical Lead.

During this phase of the project, our technical work, led by Summers with support from Alexandra Dolan-MescalFrancis Kayiwa and Dr. Raffaele Viglianti, will focus on continuing to develop, test, and deploy the software utilities built during phase one. These tools include DocNow, the Tweet ID Dataset Catalog, Hydrator and Twarc. One of the main focuses for the software that the project team will develop in this phase will be human-centered design approaches that privilege interaction between content creators and users of our tools who are interested in collecting social media data as archival content.

One example of work that will exemplify the project’s goal to undertake human centered design is Social Humans. Created by Dolan-Mescal, UX and Web Designer for Documenting the Now, Social Humans is a set of data labels designed to empower content creators and inform researchers about user intent. In addition to continuing work developing software and fostering a community of practice around social media/web archiving that is grounded in an ethics of care for the histories of oppressed people, the next phase will also see the project team engage in pedagogical activities around social media and race, with the exciting addition of Dr. Meredith Clark as a Co-Principal Investigator. Dr. Clark is a former newspaper journalist whose research focuses on the intersections of race, media, and power. Her work on the project will include the development of academic courses, including a series of experiential learning tasks and assignments using DocNow tools and support. The project team is excited she agreed to join this phase of the effort.

Phase two will also include work on archiving activism history through a set of community-based archiving workshops. The goal of the program will be to build digital community-based archives in direct partnership with social justice activist organizations. Local activists are usually the people closest to the issues negatively impacting a community and they are most frequently on the front lines agitating for support and offering the most effective solutions, whether their causes are addressing police violence, inadequate educational opportunities, food scarcity, mass incarceration, or racial injustice. The Documenting the Now project is interested in exploring how we might build digital community-based archives from the perspectives of local activists and in equitable partnership with them. The archives will be built on Mukurtu CMS and we’re excited to work with that team because of their commitment to community control of local cultural heritage. Activist groups will be selected to participate in the program through an open application process. We will be sharing more information about the workshops and the application process soon, including incentives for the activist organizations, the workshop team, and the structure of the program. Stay tuned to the Documenting the Now Twitter and blog, or join our Slack for more information.

MITH, along with our partners, are extremely grateful for the support from The Andrew W. Mellon Foundation for Documenting the Now, and for the Foundation’s continued support of cultural heritage work that is intentionally community centered and grounded in an ethic of care for the lived experiences of the most vulnerable people in our society. We are particularly excited for the opportunity that continued support provides for enacting our strategic values in combination with the Foundation’s support for African American History, Culture and the Digital Humanities (AADHum).

The Maryland Institute for Technology in the Humanities (MITH) is a leading digital humanities center that pursues disciplinary innovation and institutional transformation through applied research, public programming, and educational opportunities. Jointly supported by the University of Maryland College of Arts and Humanities and the University of Maryland Libraries, MITH engages in collaborative, interdisciplinary work at the intersection of technology and humanistic inquiry.

Shift Design, Inc is a US 501(c)3 non-profit corporation that was established with a specific focus to design products for social change. Much of our work to date has focused on building an inclusive record of our shared cultural heritage, including projects like Historypin and Storybox.

The Department of Media Studies at the University of Virginia began in Fall 2000 as an interdisciplinary undergraduate major in the College of Arts and Sciences. The department is historical and critical in orientation and takes media as its object of study. The department focuses on the forms, institutions, and effects of media (radio, film, television, photography, print, digital and electronic media), with particular emphasis on the mass media of the modern and contemporary period.

The post Documenting the Now Phase 2 appeared first on Maryland Institute for Technology in the Humanities.

]]>
Little Big Data https://mith.umd.edu/little-big-data/ Fri, 03 Aug 2018 12:45:44 +0000 https://mith.umd.edu/?p=19817 This past spring Purdom Lindblad and I had the opportunity to participate in several praxis oriented sessions involving social media data collection and analysis for Matt Kirschenbaum's Introduction to Digital Studies (MITH 610). We thought that some of the details of how we went about doing this work could be interesting to share with a [...]

The post Little Big Data appeared first on Maryland Institute for Technology in the Humanities.

]]>
This past spring Purdom Lindblad and I had the opportunity to participate in several praxis oriented sessions involving social media data collection and analysis for Matt Kirschenbaum‘s Introduction to Digital Studies (MITH 610). We thought that some of the details of how we went about doing this work could be interesting to share with a wider audience, and also wanted to begin a short series of posts that showcases the work that some students generated during the class.

MITH 610 introduces students to current topics and critical issues in the field of Digital Studies. MITH itself functions not just as a space for the class, but also as a laboratory for experimenting with digital methods, and getting acquainted with people on campus (and in the DC area) who are doing work in the digital humanities.

For example this past Spring MITH 610 was broken up into 3 modules: Reimagining the Archive, Media Archaeology and Data Stories. In the Data Stories module we worked with students to understand how social media APIs operate, and explored how to do data collection and documentation while being guided by the principles of Advocacy by Design. Advocacy by Design centers ethical questions of why we are interested in pursuing particular sets of research questions in order to better understand how we carry out the research, interpret our findings, and speculate about possible futures that they entail. These conversations compel us to ask how people are represented in, or are subjects of, academic work. Who reads and uses our work? Who collaborates and contributes to our work? Providing a welcoming and collaborative space for asking these questions is a central part of MITH’s vision for digital studies at UMD, which you can also see reflected in its core values.

One somewhat mundane, but never the less significant, challenge we often face when working as a group with different technologies is what we call The Laptop Problem. Fortunately, students come to class with a computer of some kind. It’s almost a given, especially in a field like digital studies. On the plus side this means that students arrive to class already equipped with the tools of the trade, and we don’t need to manage an actual set of machines for them to use. However on the down side everyone comes with a slightly different machine and/or operating system which can make it very difficult for us to craft a single set of comprehensive instructions for. Much time can be lost time simply getting everyone set up to begin the actual work.

We were also stymied by another problem. In introducing social media data collection we wanted to go where the Digital Humanities generally (and wisely) fears to tread: The Command Line. In the previous Media Archaeology module, students examined and experimented with MITH’s Vintage Computing collection, which involved working directly with older hardware and software interfaces, and reflecting on the affordances that they offer. If you are curious about what this involved here’s a short Twitter thread by Caitlin Christian-Lamb that describes (with some great pictures) some of her work in this module:

We thought it would be compelling to introduce social media data collection by using the command line interface, as an example of a (relatively) ancient computer interface that continues to be heavily used even today, particularly in Cloud environments. But because of The Laptop Problem we weren’t guaranteed everyone would have the same command line available to them, or that they would even have access to it. One way of solving The Laptop Problem is to provide access to a shared virtual environment of some kind where software is already installed. This is when we ran across Google Cloud Shell.

Since the University of Maryland uses Google’s GSuite for Education for email and other services, students are (for better or worse) guaranteed to have (at least one) Google account. As part of Google Cloud they offer any account holder the ability to go to a URL https://console.cloud.google.com/cloudshell which automatically launches a virtual machine in the cloud, and give you a terminal window directly in your browser for interacting with it. It is a real Debian Linux operating system, which can used without having to install any software at all.

We developed a short exercise that walked students through how to launch Google Cloud Shell, get comfortable with a few commands, install the twarc utility, and use it to collect some Twitter data directly from Twitter’s API. twarc has been developed as part of MITH’s involvement in the Documenting the Now project, and allowed  students to collect Twitter data matching a query of their choosing, store it in the native JSON format that Twitter themselves make available, and download it for further analysis.

Describing all the intricate details of this data flow was well beyond the scope of the class. But it did present an opportunity for demystifying how Application Programming Interfaces (APIs) take their shape on the web, and to describe how these services make structured data available, and to who. Matt likes to refer to refer to this experience as Little Big Data. To bookend the exercise students wrote about what they chose to collect and why, and reflected on what the collected data, and the experience of collecting it said to them in the shape of a short data story. Look for a few of these stories in subsequent posts here on the MITH blog.

The post Little Big Data appeared first on Maryland Institute for Technology in the Humanities.

]]>
Endangered Data Week, February 26 – March 2, 2018 https://mith.umd.edu/endangered-data-week-february-26-march-2-2018/ Mon, 19 Feb 2018 17:18:52 +0000 http://mith.umd.edu/?p=19397 Led by the Digital Library Federation, Endangered Data Week, February 26 – March 2, is an international, collaborative effort, coordinated across campuses, nonprofits, libraries, citizen science initiatives, and cultural heritage institutions, to shed light on public datasets that are in danger of being deleted, repressed, mishandled, or lost. The goals of Endangered Data Week [...]

The post Endangered Data Week, February 26 – March 2, 2018 appeared first on Maryland Institute for Technology in the Humanities.

]]>
Endangered Data Week

Led by the Digital Library Federation, Endangered Data Week, February 26 – March 2, is an international, collaborative effort, coordinated across campuses, nonprofits, libraries, citizen science initiatives, and cultural heritage institutions, to shed light on public datasets that are in danger of being deleted, repressed, mishandled, or lost. The goals of Endangered Data Week are to promote care for endangered collections by publicizing the availability of datasets; increasing critical engagement with them, including through visualization and analysis; and by encouraging political activism for open data policies and the fostering of data skills through workshops on curation, documentation and discovery, improved access, and preservation.

2018 Endangered Data Week Events

Interdisciplinary Panel & Practitioner Lightning Talks

February 26, 1 – 4 PM
Special Events Room, McKeldin Library

This panel of diverse disciplinary representatives invites participants to discuss the definitions of data, practices of data collection, ethical considerations and threats against data. Viewed in concert with each other, these domain perspectives will aid us in understanding the complex environment of research data preservation and the numerous dangers that can threaten the long-term usability, sustainability, and discoverability of this information. This panel will include:

  • Ricardo Punzalan, UMD iSchool (moderator)
  • Angus Murphy, UMD Department of Plant Science & Landscape Architecture
  • Joanne Archer, UMD Special Collections and University Archives
  • Jennifer Serventi, National Endowment for the Humanities
  • Catherine Knight Steele, UMD Department of Communication and Director of the African American History, Culture, and Digital Humanities

To supplement our expert panel, a number of practitioners from around the university and surrounding community will provide quick-fire presentations on their current data practices, describing the lived experience of professionals operating in a world of endangered data. Presenters will include:

  • Matthew Miller, UMD Roshan Institute (moderator)
  • Kelley O’Neal, UMD Libraries
  • Maddie Clybourn, Prince George’s County Memorial Public Library System
  • Jessica Lu, Post-Doc with African American History, Culture, and Digital Humanities
  • Amy Wickner, UMD Special Collections and University Archives

Data Preservation Workshop

February 28, 10 AM – 12 Noon
Rm 6107, McKeldin Library

This hands-on session will seek to address a topic that has important impacts for both individual researchers and the larger endangered data landscape: personal data preservation. This workshop will feature two segments: first, an overview of data preservation topics will familiarize participants with the core practices of data stewardship in individual practices and within the University community. Second, a hands-on tool demonstration will give participants a chance to try their hand at tools that facilitate self-guided archiving practices.

This will be a tech heavy course, please bring a personal computer.

Endangered Data Week Happy Hour

March 2, 4 PM
MilkBoy ArtHouse, 7416 Baltimore Avenue, College Park

An informal closing to Endangered Data Week 2018. Continue the conversation over drinks and snacks.

An open-ended conversation on the impacts of endangered data in all its varieties and forms. From personal data to tax-funded public research data, how will uncertain futures for data impact us? As individuals? As institutions? As nations?

Curious? Have ideas? Have questions? Bring them all and join in the conversation.

The post Endangered Data Week, February 26 – March 2, 2018 appeared first on Maryland Institute for Technology in the Humanities.

]]>
DocNow and Rhizome receive IMLS National Forum grant! https://mith.umd.edu/documenting-the-now-receives-imls-forum-grant/ Tue, 19 Sep 2017 16:35:49 +0000 http://mith.umd.edu/?p=18930 We are thrilled to announce that Documenting the Now, MITH's Mellon-funded collaborative social media preservation initiative with Washington University and the University of California, Riverside, has been awarded a National Forum Grant from the Institute of Museum and Library Services (IMLS), as part of a new collaboration with arts organization Rhizome. For the full [...]

The post DocNow and Rhizome receive IMLS National Forum grant! appeared first on Maryland Institute for Technology in the Humanities.

]]>

We are thrilled to announce that Documenting the Now, MITH’s Mellon-funded collaborative social media preservation initiative with Washington University and the University of California, Riverside, has been awarded a National Forum Grant from the Institute of Museum and Library Services (IMLS), as part of a new collaboration with arts organization Rhizome. For the full details about this exciting opportunity, read the text from yesterday’s announcement from Rhizome below.

Rhizome to Host National Forum on Ethics and Archiving the Web

March 22-24, 2018
By Michael Connor

Rhizome, in collaboration with the University of California at Riverside Library (UCR), the Maryland Institute for Technology in the Humanities (MITH), and the Documenting the Now project, was awarded $100,000 by IMLS to host a national forum to address ethical issues facing the web archiving field. The forum will is hosted place March 22-24, 2018 at our longtime affiliate and host, the New Museum in New York City.

This National Forum will convene archives professionals, artists, activists, net culture critics, journalists, and designers/developers to explore how to build social media archives that protect the rights of users and communities while chronicling contemporary cultures and social movements. An open call for participants and attendees will be announced in October.

In 2015, Rhizome launched the Webrecorder initiative, a flagship project of its digital preservation program, to develop a new platform to easily archive and immediately reconstruct fully interactive copies of almost any modern webpage. Webrecorder is a powerful web archiving system, offered directly, for free to users of all kinds. Through Webrecorder, Rhizome aims to support decentralized, specialized born-digital archives that center the interests of the users and communities they serve.

Archiving social media has been a key concern of the Webrecorder initiative, and the National Forum builds on a successful series of ‘Digital Social Memory’ events which addressed the topic. Both iterations of DSM have brought together artists, activists, and archivists to talk about social media as cultural practice, and how it is and will be remembered. The conversations supported by this program directly inform ongoing product development.

Our partner, Documenting the Now, is a project of University of Maryland, University of California at Riverside, and Washington University in St. Louis.They have created a tool and community supporting the ethical collection, use, and preservation of social media content. Formed in response to the emergence of Twitter as a central communication channel during the 2014 protests in Ferguson, Mo., DocNow seeks to protect the rights of content creators while chronicling historically significant events.

The National Forum is organized by Michael Connor, Rhizome’s artistic director, Aria Dean, Rhizome’s assistant curator for net art and digital culture, Bergis Jules, University & Political Papers Archivist at UC Riverside and Community Lead, DocNow, and Ed Summers, Lead Developer at Maryland Institute for Technology and Technical Lead of DocNow.

The National Forum on Ethics and Archiving the Web was made possible by the Institute of Museum and Library Services and the John S. and James L. Knight Foundation. 

The post DocNow and Rhizome receive IMLS National Forum grant! appeared first on Maryland Institute for Technology in the Humanities.

]]>
Tracking Changes With diffengine https://mith.umd.edu/tracking-changes-diffengine/ Wed, 25 Jan 2017 17:00:56 +0000 http://mith.umd.edu/?p=18210 Our most respected newspapers want their stories to be accurate because once the words are on paper, and the paper is in someone’s hands, there’s no changing them. The words are literally fixed in ink to the page, and mass produced into many copies that are pretty much impossible to recall. Reputations can rise and [...]

The post Tracking Changes With diffengine appeared first on Maryland Institute for Technology in the Humanities.

]]>
Our most respected newspapers want their stories to be accurate because once the words are on paper, and the paper is in someone’s hands, there’s no changing them. The words are literally fixed in ink to the page, and mass produced into many copies that are pretty much impossible to recall. Reputations can rise and fall based on how well newspapers are able to report significant events. But of course physical paper isn’t the whole story anymore.

News on the web can be edited quickly as new facts arrive, and more is learned. Typos can be quickly corrected–but content can also be modified for a multitude of purposes. Often these changes instantly render the previous version invisible. Many newspapers use their website as a place for their first drafts, which allows them to craft a story in near real time, while being the first to publish breaking news.

News travels fast in social media as it is shared and reshared across all kinds of networks of relationships. What if that initial, perhaps flawed version goes viral, and it is the only version you ever read? It’s not necessarily fake news, because there’s no explicit intent to mislead or deceive, but it may not be the best, most accurate news either. Wouldn’t it be useful to be able to watch how news stories shift in time to better understand how the news is produced? Or as Jeanine Finn memorably put it: how do we understand the news before truth gets its pants on?

As part of MITH’s participation in the Documenting the Now project we’ve been working on an experimental utility called diffengine to help track how news is changing. It relies on an old and quietly ubiquitous standard called RSS. RSS is a data format for syndicating content on the Web. In other words it’s an automated way of sharing what’s changing on your website, and for following what changes on someone else’s. News organizations use it heavily. When you listen to a podcast you’re using RSS. If you have a blog or write on Medium an RSS feed is quietly being generated for you whenever you write a new post.

So what diffengine does is really quite simple. First it subscribes to one or more RSS feeds, for example the Washington Post, and then it watches to see if any articles change their content over time. If a change is noticed a representation of the change, or a diff, is generated, the new version is archived at the Internet Archive, and the diff is (optionally) tweeted.

We’ve been experimenting with an initial version of diffengine by having it track the Washington Post, the Guardian and Breitbart News which you can see on the following Twitter accounts: wapo_diff, guardian_diff and breitbart_diff. Nick Ruest at York University and Ryan Baumann at Duke University have been setting up their own instances of diffengine to track what is now 25 media outlets, which you can see in this list  that Ryan is maintaining.

So here’s an example of what a change looks like when it is tweeted:

The text highlighted in red has been deleted and the text highlighted in green has been added. But you can’t necessarily take diffengine’s word for it right? Bots are sending all kinds of fraudulent and intentionally misleading information out on the web — especially in social media. So when diffengine notices new or changed content it uses Internet Archive’s save page now functionality to take a snapshot of the page, which it then references in the tweet. So you can see the original and changed content in the most trusted public repository we have for archived web content. You can see the links to both the before and after versions in the tweet above.

diffengine draws heavily on the work and example of two similar projects: NYTDiff and NewsDiffs. NYTdiff is able to create presentable diff images and tweet them for the New York Times. But it was designed to work specifically with the NYTimes API. diffengine borrows the use of phantomjs for creating tweetable images. NewsDiffs on the other hand provides a comprehensive framework for watching changes on multiple news sites (Washington Post, New York Times, CNN, BBC, etc). But you need to be a programmer to add a parser module for a website that you want to monitor. It is also a fully functional web application which requires considerable commitment to setup and run.

With the help of feedparser diffengine takes a different approach by working with any site that publishes an RSS feed of changes. This covers many news organizations, but also personal blogs and organizational websites that put out regular updates. And with the readability module diffengine is able to automatically extract the primary content of pages, without requiring special parsing to remove boilerplate material on a site-by-site basis.

To do its work diffengine keeps a small database of feeds, feed entries and version histories that it uses to notice when content has changed. If you know your way around a SQLite database you can query it to see how content has changed over time. This database could be a valuable source of research data, or small data, for the study of media production, or the way organizations or people communicate online. One possible direction we are considering is creating a simple web frontend for this database that allows you to navigate the changed content without requiring SQL chops.

Perhaps diffengine could also create its own private archive of the web content, rather than relying on a public snapshot at the Internet Archive. Keeping the archive private could help address ethical concerns around documenting particular individuals or communities when conducting research. If this sounds useful or interesting please get in touch with the Documenting the Now project, by joining our Slack channel or emailing us at info@docnow.io.

Installation of diffengine is currently a bit challenging if you aren’t already familiar with installing Python packages from the command line. If you are willing to give it a try let us know how it goes over on GitHub. Ideas for sites for us to monitor as we develop diffengine are also welcome!

Special thanks to Matthew Kirschenbaum and Gregory Jansen at the University of Maryland for the initial inspiration behind this idea of showing rather than telling what news is. The Human-Computer Interaction Lab at UMD hosted an informal workshop after the recent election to see what possible responses could be, and diffengine is one outcome from that brainstorming.

This page was originally published on the Documenting the Now blog

The post Tracking Changes With diffengine appeared first on Maryland Institute for Technology in the Humanities.

]]>
A Decade of Digital Dialogues Event Recordings and the Challenges of Implementing a Retroactive Digital Asset Management Plan https://mith.umd.edu/decade-digital-dialogues-event-recordings-challenges-implementing-retroactive-digital-asset-management-plan/ Thu, 14 Jul 2016 20:39:00 +0000 http://mith.umd.edu/?p=17756 This is the 5th post in MITH's Digital Stewardship Series. In this post, MITH's summer intern David Durden discusses his work on MITH's audiovisual collection of historic Digital Dialogues events. I was brought on as a summer intern at MITH to work on a digital curation project involving Digital Dialogues, MITH’s signature events program featuring speakers from around [...]

The post A Decade of Digital Dialogues Event Recordings and the Challenges of Implementing a Retroactive Digital Asset Management Plan appeared first on Maryland Institute for Technology in the Humanities.

]]>
This is the 5th post in MITH’s Digital Stewardship Series. In this post, MITH’s summer intern David Durden discusses his work on MITH’s audiovisual collection of historic Digital Dialogues events.

I was brought on as a summer intern at MITH to work on a digital curation project involving Digital Dialogues, MITH’s signature events program featuring speakers from around the U.S., and occasionally beyond, which has been running for eleven years. The Digital Dialogues events program has documented the development of the digital humanities as well as the ideas and work of several of the pioneers of the field. However, as the digital humanities grew and developed, so did the technology used to record and edit the Digital Dialogues. This digital record must be curated and preserved in order to ensure that the Digital Dialogues events are accessible for many years to come.

Staying current with changes in digital audio and video recording and editing resulted in a variety of media sources, file types, storage locations, and web-hosting services. MITH currently has a workflow for recent and future Digital Dialogues that ensures proper storage of raw video, systematized file naming-conventions, standards for video editing and the creation of web content, and redundant storage. This plan, in some form, must be retroactively applied to almost a decade of content.

Since I was dealing with a variety of locations for content, the first task at hand was to consolidate media from all storage locations and resolve discrepancies and duplications. This resulted in aggregating all available content from an editing workstation, an external drive, an AWS server, and a local server. Once all the content was funneled into a singular location, I began the slow and tedious process of comparing files and folders. I was able to separate usable media from everything else and began moving content into a well-organized master directory that will be cloned into redundant storage for preservation. Future workflows will prevent discrepancies by having content be imported, named, organized, and edited on the local workstation and then copied to external storage sources to prevent duplication or accidental changes to archived content.

An example of the future data flow for Digital Dialogues videos

An example of the future data flow for Digital Dialogues videos

MITH had been successfully saving multiple copies of files across different storage devices, but many of these files reflected out-dated workflows and there were often several versions of the same file. The recording of Digital Dialogues went through several technological evolutions and left behind a messy file structure. Some source files were saved, others are missing. Some final product videos and recordings were duplicated across local storage devices, others exist solely in the Internet Archive and other web-hosting services. MITH’s early Digital Dialogues provide an example of the danger inherent in relying on singular storage locations and web-hosting services to archive digital assets. The file compression used by many services, as well as the possibility of service interruption, make web-hosting a ‘front-end access-only’ form of digital storage. The important thing to emphasize here is that once digital source media is lost, it is usually lost forever, which is why it is always necessary and recommended to have a data management plan ready at the onset of any digital project.

Data storage isn’t the only challenge that the Digital Dialogues collection presents as the collection has moved through different A/V editing workflows and standards. The Digital Dialogues transitioned from audio recording to video recording, as well as from using iMovie to Adobe Premiere to edit video, a transition that has left a considerable number of useless project files lingering about. The differences between the two video editing software suites are considerable and present several challenges to long term functionality. Adobe Premiere and iMovie handle the import of source media very differently. Premiere doesn’t actually import the source media, but instead creates a link to the file using a system path, which results in project files that are only a few hundred kilobytes in size. IMovie, however, stores a copy of the original media as well as a variety of program specific data, which greatly increases the size of the project folder. Additionally, Adobe Premiere allows for backwards compatibility to some degree, whereas iMovie does not, making Premiere a better choice for long term functionality of project files.

The links that Adobe Premiere creates to source media are problematic because, if the source media changes location or filename, the links are effectively broken and media must be relocated before any editing can occur. However, as long as the source media is preserved and is identifiable, it is a simple task to point Premiere to the correct location of the source. To ensure MITH’s future access to working project files (which is important if a derivative is lost and needs to be regenerated, or video formatting needs to be updated for a website), I created a well organized and descriptively named directory containing all project files and associated linked media. The current editing and curation plan involves each Digital Dialogue event being stored in a folder containing source media and the edited derivative. Before transferring any source media, an appropriate directory is created to store the files. Files are then transferred from an external storage device or camera to the video editing iMac work-station and stored in the appropriate event folder. The event folders are named using the following convention:

‘YYYYMMDD_SpeakerNameInCamelCase_AdditionalSpeakersSeparatedByUnderscores’.

Events are organized by season (e.g., Spring 2016) and stored in a season folder using the following convention:

‘YYYY-Season-Semester’.

All events for a season will be edited in a single Adobe Premiere project file that is located within the season folder. This reduces the amount of project files to manage and also streamlines the video editing process.

Example of a well-organized Digital Dialogue season folder

Example of a well-organized Digital Dialogue season folder

Another part of this project consisted of editing previous content to conform to current standards. Due to the variety of files that existed, both formats and duplicates, I decided to prioritize raw footage (or the highest quality derivative that I could discover) for archiving and the creation of new videos. Provided that usable media was accessible, videos currently on the MITH website are being updated to reflect proper MITH logos and branding, as well as title slates with appropriate attributions to speakers, dates and talk titles. There are also many years of Digital Dialogues recorded as audio, which are in the process of being exported to a standardized video format so that the majority of Digital Dialogues will be accessible to the user through one hosting service (Vimeo). At the end of the project, I will have created or recreated around 105 videos, streamlined and documented any changes to MITH’s audiovisual workflows, and ensured proper digital stewardship of an important collection of digital humanities scholarship. My second and final blog post in this series will highlight some of the more interesting content in this collection.

 

 

 

The post A Decade of Digital Dialogues Event Recordings and the Challenges of Implementing a Retroactive Digital Asset Management Plan appeared first on Maryland Institute for Technology in the Humanities.

]]>
Call for Applications: MITH Summer Audiovisual Data Curation Intern https://mith.umd.edu/mith-summer-audiovisual-data-curation-intern/ Wed, 18 May 2016 09:30:27 +0000 http://mith.umd.edu/?p=17598 The Maryland Institute for Technology in the Humanities (MITH), University of Maryland’s digital humanities institute, is seeking a graduate student intern to assist with a data curation and stewardship project during the summer 2016 term to assist with the assessment, organization and curation of our collection of audiovisual recordings covering MITH’s speaker series and events. [...]

The post Call for Applications: MITH Summer Audiovisual Data Curation Intern appeared first on Maryland Institute for Technology in the Humanities.

]]>
The Maryland Institute for Technology in the Humanities (MITH), University of Maryland’s digital humanities institute, is seeking a graduate student intern to assist with a data curation and stewardship project during the summer 2016 term to assist with the assessment, organization and curation of our collection of audiovisual recordings covering MITH’s speaker series and events. The intern must complete at least 120 hours of work over at least a six-week period between late May and early July 2016. Interns may receive academic credit (based on approval from their department) and MITH will offer a stipend of up to $1800.

Project Goals/Duties:

Under the supervision of MITH’s Project Manager, the graduate summer intern would perform an assessment of the current state of all of MITH’s audiovisual holdings related to event documentation. This includes (primarily) the Digital Dialogues series, MITH’s signature events program which features speakers from various scholarly disciplines discussing topics related to work in the digital humanities, as well as other MITH events such as the 2013 Personal Digital Archiving conference, the 2012 Topic Modeling conference, and more. The events have been recorded on either audio or video in a variety of formats. The intern will a) work with MITH staff to enact a data curation and stewardship strategy to streamline archival workflows for its audiovisual materials, b) perform migration and reformatting tasks to consolidate and normalize formats and metadata, and c) create a series of three blog posts highlighting selected content. Interns will also be encouraged to share insights about digital curation theory and practice generated byher/his work over the summer (see our 2016 Digital Humanities Stewardship series for example).

Qualifications:

This position is ideal for someone who wishes to expand her or his breadth of experience in dealing with the stewardship and curation of a variety of digital audiovisual materials, and who has an interest or knowledge in the field of digital humanities. The ideal candidate should be pursuing a graduate degree in library or archival science with a specialization or dedicated scholarly interest in data curation, digital preservation, audiovisual archiving and preservation, or similar. Special consideration will be given to candidates with coursework or field work in these areas, or in audiovisual production/editing or archiving/preservation.

About MITH:

MITH is a leading digital humanities center that pursues disciplinary innovation and institutional transformation through applied research, public programming, and educational opportunities. Jointly supported by the University of Maryland College of Arts and Humanities and the University of Maryland Libraries, MITH engages in collaborative, interdisciplinary work at the intersection of technology and humanistic inquiry. MITH specializes in text and image analytics for cultural heritage collections, data curation, digital preservation, linked data applications, and data publishing. Enabling the analysis of cultural heritage collections on a large scale, we create frameworks that allow us to develop new methods and tools for the exploration and visualization of digital materials. Our applied research and practice supports curation and publication of data that contributes to improved methodologies for the organization and stewardship of humanities research.

To Apply: Email cover letter, resume, and two references as a single PDF file to MITH Project Manager Stephanie Sapienza, at sapienza@umd.edu. Type “Application for Summer A/V Data Curation Intern –

[Last Name]” in the subject line. For best consideration, apply on or before Wednesday June 1, 2016 at 5:00pm Eastern time.  All applicants will be notified their application was received. Selected applicants will be contacted for telephone and/or in-person interviews. Start and end dates and work days/hours are negotiable as candidates’ schedules require.

The post Call for Applications: MITH Summer Audiovisual Data Curation Intern appeared first on Maryland Institute for Technology in the Humanities.

]]>
Documenting the Now Team Announced https://mith.umd.edu/documenting-now-team-announced/ Mon, 11 Apr 2016 23:18:25 +0000 http://mith.umd.edu/?p=17488 Back in February we announced MITH's involvement in the Documenting the Now project, which is now under way. In a nutshell, Documenting the Now is an effort to build an application called DocNow, that helps researchers and archivists collect Web content about current events using Twitter. The project is also about building a community and [...]

The post Documenting the Now Team Announced appeared first on Maryland Institute for Technology in the Humanities.

]]>
Back in February we announced MITH’s involvement in the Documenting the Now project, which is now under way. In a nutshell, Documenting the Now is an effort to build an application called DocNow, that helps researchers and archivists collect Web content about current events using Twitter. The project is also about building a community and a conversation about what it means to ethically engage in the work of social media and Web archiving. We thought we would provide a quick update about our recent work, and where you can go to learn more.

Since the project is a partnership between MITH, Washington University in St Louis and the University of California at Riverside we have established a project site on Medium where all team members can share information about their work, and we can get feedback from others who are interested in the project. While you can expect to see occasional updates about Documenting the Now here on the MITH blog, please follow us there if you are interested in seeing all the developments as they happen.

Speaking of the team we recently announced the initial group of staff and contractors who will be helping on the project.  Washington University hired Desiree Jones-Smith, who started in April as the Project Coordinator for Documenting the Now. One exciting thing that Desiree will be focused on is planning our face-to-face event in St Louis, where the core team, advisory board members, and others interested in the project will gather to explore the design and ethical issues in the DocNow application. Look out here and on our project website to hear more about that event in the coming weeks.

Also joining the team are three contractors: Alexandra Dolan-MescalFrancis Kayiwa and Dan Chudnov. Alexandra is gathering requirements and designing the user experience of the DocNow application. Francis is designing the backend infrastructure with an eye towards containerization and cloud deployment. Dan is architecting and implementing the data analysis pipeline and visualization pieces that form a foundation of the application. If you are interested in tracking this work you can follow us over on GitHub where we will be coordinating the software development.

The Documenting the Now team is spread across the country so we needed to find a virtual environment where we can share day to day information about the work. Of course we all use email, but having a place to share documents, have conversations and see relevant news in a shared space was important to us. And of course a big part of the project is having an open conversation about the technical and ethical considerations of social media archiving. That’s why we’ve started using a team on Slack. The Documenting the Now Slack is also open to people who are interested in the project and the work of social media archiving in general. To join Slack you need to use this form to submit a request. We hope to see you there!

And if that isn’t enough you can subscribe for the DocNow Newsletter over at docnow.io.

The post Documenting the Now Team Announced appeared first on Maryland Institute for Technology in the Humanities.

]]>