Ed Summers – Maryland Institute for Technology in the Humanities

The Cleaners: Movie Night (Oct 30)

Ed Summers — Mon, 07 Oct 2019 17:29:32 +0000

The Cleaners (2018)

Please join us in MITH on October 30, 2019 (All Hallows’ Eve Eve) from 6-8pm for a screening of The Cleaners, a documentary which provides an in depth look at the hidden labor of content moderation that makes today’s social media platforms possible. Once the dream of Silicon Valley tech startups, the democratization of web publishing has brought huge challenges to the mega-corporations that run today’s social media platforms, as they struggle to prevent the viral spread of online hate, violence and abuse.

Key to these moderation systems are large numbers of human moderators, who interpret community guidelines, and sometimes clandestine content rules, in order to decide what content will remain online. As Sarah Roberts details in her book Behind the Screen (a recent Digital Studies Colloquium pick) commercial content moderators work behind the scenes, in remote locations and precarious working conditions, where they are often subjected to a barrage of unsettling material that can leave lasting psychological and social impacts.

A brief discussion will follow the screening. Popcorn and soda pop will be available, but feel free to bring some take-out or some pre-Halloween candy.

The post The Cleaners: Movie Night (Oct 30) appeared first on Maryland Institute for Technology in the Humanities.

Data Histories and Natural History—Andrea Thomer

Ed Summers — Tue, 23 Apr 2019 13:05:30 +0000

Please join us Wednesday, April 24, at 3:30pm at MITH (0301 Hornbake Library) for a presentation by Dr. Andrea Thomer, who is visiting from the University of Michigan iSchool, and does work on data histories, with implications for cultural collections and humanities data across disciplines.

Natural historians create the frameworks, calendars and infrastructures that allow us to understand and grapple with “deep time” — but they do so within their own temporally complex scholarly settings: the infrastructures and data collections that house the specimens and datasets used in their analyses. Though natural history collections are meant to last for generations, the records they contain last only years (at best) without careful maintenance and curation. Digital collections are particularly fragile, prone to bit rot and obsolescence, and must consequently be upgraded and migrated frequently. In this talk, Thomer will consider the temporal rhythms of natural history data collections, their management, and and their migration, and how that impacts the creation and management of systems of understanding – and making – “deep time.”

Andrea Thomer is an assistant professor of information at the University of Michigan School of Information. She conducts research in the areas of data curation, museum informatics, earth science and biodiversity informatics, information organization, and computer supported cooperative work. She is especially interested in how people use and create data and metadata standards; the impact of information organization on information use; issues of data provenance, reproducibility, and integration; and long-term data curation and infrastructure sustainability — on the scale of decades rather than years. She is studying a number of these issues through the “Migrating Research Data Collections” project – a recently awarded Laura Bush 21st Century Librarianship Early Career Research Grant from the Institute of Museum and Library Services. Dr. Thomer received her doctorate in Library and Information Science from the School of Information Sciences at the University of Illinois at Urbana‐Champaign in 2017.

The post Data Histories and Natural History—Andrea Thomer appeared first on Maryland Institute for Technology in the Humanities.

Measuring Impact of Digital Repositories – Simon Tanner

Ed Summers — Tue, 23 Apr 2019 13:03:12 +0000

Measuring Impact of Digital Repositories
Open, Collaborative Research: Developing the Balanced Value Impact Model to Assess the Impact of Digital Repositories
Thursday, April 25, 11 AM, MITH (0301 Hornbake Library)

Simon Tanner will offer a sneak peek at the Balanced Value Impact Model 2.0 (BVI Model). Tanner will introduce the Digital Humanities at King’s College London, and link this to his open and collaborative research practices to tell the story of the intellectual development of the BVI Model. He will detail the BVI Model 2.0 to highlight what’s new and how it works. Tanner will relate these changes to his collaboration with Europeana to develop their Impact Playbook and look to the future of that tool.

The session will include time for questions and discussion.

—

Simon Tanner is Professor of Digital Cultural Heritage in the Department of Digital Humanities at King’s College London. He is a Digital Humanities scholar with a wide-ranging interest in cross-disciplinary thinking and collaborative approaches that reflect a fascination with interactions between memory organization collections (libraries, museum, archives, media and publishing) and the digital domain.

As an information professional, consultant, digitization expert and academic he works with major cultural institutions across the world to assist them in transforming their impact, collections and online presence. He has consulted for or managed over 500 digital projects, including digitization of the Dead Sea Scrolls, and has built strategy with a wide range of organizations. These include the US National Gallery of Art and many other museums and national libraries in Europe, Africa, America and the Middle East. Tanner has had work commissioned by UNESCO, the Danish government, the Arcadia Fund and the Andrew W. Mellon Foundation. He founded the Digital Futures Academy that has run in the UK, Australia, South Africa and Ghana with participants from over 40 countries.

Research into image use and sales in American art museums by Simon Tanner has had a significant effect on opening up collections access and OpenGLAM in the museum sector. Tanner is a strong advocate for Open Access, open research and the digital humanities. Tanner was chair of the Web Archiving sub-committee as an independent member of the UK Government-appointed Legal Deposit Advisory Panel. He is a member of the Europeana Impact Taskforce which developed the Impact Playbook based upon his Balanced Value Impact Model. He is part of the AHRC funded Academic Book of the Future research team.

The post Measuring Impact of Digital Repositories – Simon Tanner appeared first on Maryland Institute for Technology in the Humanities.

Old Futures Book Launch—Alexis Lothian

Ed Summers — Tue, 23 Apr 2019 12:56:07 +0000

Please join us on Monday, April 29 at 4pm in MITH for a book launch and discussion of Alexis Lothian’s new book Old Futures: Speculative Fiction and Queer Possibility out now from NYU Press. Lothian will talk about her book in conversation with Amanda Phillips, who is an Assistant Professor in the Department of English and Film and Media Studies at Georgetown University.

From the dust jacket:

Old Futures explores the social, political, and cultural forces feminists, queer people, and people of color invoke when they dream up alternative futures as a way to imagine transforming the present. Lothian shows how queer possibilities emerge when we practice the art of speculation: of imagining things otherwise than they are and creating stories from that impulse. Queer theory offers creative ways to think about time, breaking with straight and narrow paths toward the future laid out for the reproductive family, the law-abiding citizen, and the believer in markets. Yet so far it has rarely considered the possibility that, instead of a queer present reshaping the ways we relate to past and future, the futures imagined in the past can lead us to queer the present.

Narratives of possible futures provide frameworks through which we understand our present, but the discourse of “the” future has never been a singular one. Imagined futures have often been central to the creation and maintenance of imperial domination and technological modernity; Old Futures offers a counterhistory of works that have sought––with varying degrees of success––to speculate otherwise. Examining speculative texts from the 1890s to the 2010s, from Samuel R. Delany to Sense8, Lothian considers the ways in which early feminist utopias and dystopias, Afrofuturist fiction, and queer science fiction media have insisted that the future can and must deviate from dominant narratives of global annihilation or highly restrictive hopes for redemption.

Each chapter chronicles some of the means by which the production and destruction of futures both real and imagined takes place: through eugenics, utopia, empire, fascism, dystopia, race, capitalism, femininity, masculinity, and many kinds of queerness, reproduction, and sex. Gathering stories of and by populations who have been marked as futureless or left out by dominant imaginaries, Lothian offers new insights into what we can learn from imaginatively redistributing the future now.

The post Old Futures Book Launch—Alexis Lothian appeared first on Maryland Institute for Technology in the Humanities.

Documenting the Now Phase 2

Ed Summers — Tue, 16 Oct 2018 21:01:04 +0000

With a $1.2 Million grant from The Andrew W. Mellon Foundation, The Maryland Institute for Technology in the Humanities in the College of Arts and Humanities at the University of Maryland, Shift, and the Department of Media Studies at the University of Virginia (UVA) will collaborate to lead the ongoing work of the Documenting the Now project. Started in 2014 with a grant to Washington University in St. Louis in partnership with the University of California, Riverside and MITH, Documenting the Now is committed to developing tools and community practices that support the ethical collection, use, and preservation of social media and web archives. Continuing the important work the project has accomplished over the past four years, the second phase of Documenting the Now will be focused on three interdependent strands of activity: software development, pedagogy, and engagement with community-based archiving of social justice activism.

Leading this second phase of Documenting the Now will be Trevor Muñoz, Interim Director of MITH & Assistant Dean for Digital Humanities Research at UMD who will serve as the Principal Investigator and the Administrative Lead; Bergis Jules, Director of Equity Initiatives at Shift Design Inc who will serve as a Co-Principal Investigator and the Project Director; Dr. Meredith Clark, Assistant Professor in the Department of Media Studies at UVA who will serve as a Co-Principal Investigator and Academic Lead; and Ed Summers, Lead Software Developer at MITH who will be the project’s Technical Lead.

During this phase of the project, our technical work, led by Summers with support from Alexandra Dolan-Mescal, Francis Kayiwa and Dr. Raffaele Viglianti, will focus on continuing to develop, test, and deploy the software utilities built during phase one. These tools include DocNow, the Tweet ID Dataset Catalog, Hydrator and Twarc. One of the main focuses for the software that the project team will develop in this phase will be human-centered design approaches that privilege interaction between content creators and users of our tools who are interested in collecting social media data as archival content.

One example of work that will exemplify the project’s goal to undertake human centered design is Social Humans. Created by Dolan-Mescal, UX and Web Designer for Documenting the Now, Social Humans is a set of data labels designed to empower content creators and inform researchers about user intent. In addition to continuing work developing software and fostering a community of practice around social media/web archiving that is grounded in an ethics of care for the histories of oppressed people, the next phase will also see the project team engage in pedagogical activities around social media and race, with the exciting addition of Dr. Meredith Clark as a Co-Principal Investigator. Dr. Clark is a former newspaper journalist whose research focuses on the intersections of race, media, and power. Her work on the project will include the development of academic courses, including a series of experiential learning tasks and assignments using DocNow tools and support. The project team is excited she agreed to join this phase of the effort.

Phase two will also include work on archiving activism history through a set of community-based archiving workshops. The goal of the program will be to build digital community-based archives in direct partnership with social justice activist organizations. Local activists are usually the people closest to the issues negatively impacting a community and they are most frequently on the front lines agitating for support and offering the most effective solutions, whether their causes are addressing police violence, inadequate educational opportunities, food scarcity, mass incarceration, or racial injustice. The Documenting the Now project is interested in exploring how we might build digital community-based archives from the perspectives of local activists and in equitable partnership with them. The archives will be built on Mukurtu CMS and we’re excited to work with that team because of their commitment to community control of local cultural heritage. Activist groups will be selected to participate in the program through an open application process. We will be sharing more information about the workshops and the application process soon, including incentives for the activist organizations, the workshop team, and the structure of the program. Stay tuned to the Documenting the Now Twitter and blog, or join our Slack for more information.

MITH, along with our partners, are extremely grateful for the support from The Andrew W. Mellon Foundation for Documenting the Now, and for the Foundation’s continued support of cultural heritage work that is intentionally community centered and grounded in an ethic of care for the lived experiences of the most vulnerable people in our society. We are particularly excited for the opportunity that continued support provides for enacting our strategic values in combination with the Foundation’s support for African American History, Culture and the Digital Humanities (AADHum).

The Maryland Institute for Technology in the Humanities (MITH) is a leading digital humanities center that pursues disciplinary innovation and institutional transformation through applied research, public programming, and educational opportunities. Jointly supported by the University of Maryland College of Arts and Humanities and the University of Maryland Libraries, MITH engages in collaborative, interdisciplinary work at the intersection of technology and humanistic inquiry.

Shift Design, Inc is a US 501(c)3 non-profit corporation that was established with a specific focus to design products for social change. Much of our work to date has focused on building an inclusive record of our shared cultural heritage, including projects like Historypin and Storybox.

The Department of Media Studies at the University of Virginia began in Fall 2000 as an interdisciplinary undergraduate major in the College of Arts and Sciences. The department is historical and critical in orientation and takes media as its object of study. The department focuses on the forms, institutions, and effects of media (radio, film, television, photography, print, digital and electronic media), with particular emphasis on the mass media of the modern and contemporary period.

The post Documenting the Now Phase 2 appeared first on Maryland Institute for Technology in the Humanities.

Little Big Data

Ed Summers — Fri, 03 Aug 2018 12:45:44 +0000

This past spring Purdom Lindblad and I had the opportunity to participate in several praxis oriented sessions involving social media data collection and analysis for Matt Kirschenbaum‘s Introduction to Digital Studies (MITH 610). We thought that some of the details of how we went about doing this work could be interesting to share with a wider audience, and also wanted to begin a short series of posts that showcases the work that some students generated during the class.

MITH 610 introduces students to current topics and critical issues in the field of Digital Studies. MITH itself functions not just as a space for the class, but also as a laboratory for experimenting with digital methods, and getting acquainted with people on campus (and in the DC area) who are doing work in the digital humanities.

For example this past Spring MITH 610 was broken up into 3 modules: Reimagining the Archive, Media Archaeology and Data Stories. In the Data Stories module we worked with students to understand how social media APIs operate, and explored how to do data collection and documentation while being guided by the principles of Advocacy by Design. Advocacy by Design centers ethical questions of why we are interested in pursuing particular sets of research questions in order to better understand how we carry out the research, interpret our findings, and speculate about possible futures that they entail. These conversations compel us to ask how people are represented in, or are subjects of, academic work. Who reads and uses our work? Who collaborates and contributes to our work? Providing a welcoming and collaborative space for asking these questions is a central part of MITH’s vision for digital studies at UMD, which you can also see reflected in its core values.

One somewhat mundane, but never the less significant, challenge we often face when working as a group with different technologies is what we call The Laptop Problem. Fortunately, students come to class with a computer of some kind. It’s almost a given, especially in a field like digital studies. On the plus side this means that students arrive to class already equipped with the tools of the trade, and we don’t need to manage an actual set of machines for them to use. However on the down side everyone comes with a slightly different machine and/or operating system which can make it very difficult for us to craft a single set of comprehensive instructions for. Much time can be lost time simply getting everyone set up to begin the actual work.

We were also stymied by another problem. In introducing social media data collection we wanted to go where the Digital Humanities generally (and wisely) fears to tread: The Command Line. In the previous Media Archaeology module, students examined and experimented with MITH’s Vintage Computing collection, which involved working directly with older hardware and software interfaces, and reflecting on the affordances that they offer. If you are curious about what this involved here’s a short Twitter thread by Caitlin Christian-Lamb that describes (with some great pictures) some of her work in this module:

SO media archeology week for @umd_dsah MITH 610: Introduction to Digital Studies with @mkirschenbaum happened like 2 weeks ago, but I’ve been 2 Busy 2 Tweet, so here is a thread sharing many pictures and videos of using 1980s computers:

— Caitlin Christian-Lamb (@christianlamb) April 24, 2018

We thought it would be compelling to introduce social media data collection by using the command line interface, as an example of a (relatively) ancient computer interface that continues to be heavily used even today, particularly in Cloud environments. But because of The Laptop Problem we weren’t guaranteed everyone would have the same command line available to them, or that they would even have access to it. One way of solving The Laptop Problem is to provide access to a shared virtual environment of some kind where software is already installed. This is when we ran across Google Cloud Shell.

Since the University of Maryland uses Google’s GSuite for Education for email and other services, students are (for better or worse) guaranteed to have (at least one) Google account. As part of Google Cloud they offer any account holder the ability to go to a URL https://console.cloud.google.com/cloudshell which automatically launches a virtual machine in the cloud, and give you a terminal window directly in your browser for interacting with it. It is a real Debian Linux operating system, which can used without having to install any software at all.

We developed a short exercise that walked students through how to launch Google Cloud Shell, get comfortable with a few commands, install the twarc utility, and use it to collect some Twitter data directly from Twitter’s API. twarc has been developed as part of MITH’s involvement in the Documenting the Now project, and allowed students to collect Twitter data matching a query of their choosing, store it in the native JSON format that Twitter themselves make available, and download it for further analysis.

Describing all the intricate details of this data flow was well beyond the scope of the class. But it did present an opportunity for demystifying how Application Programming Interfaces (APIs) take their shape on the web, and to describe how these services make structured data available, and to who. Matt likes to refer to refer to this experience as Little Big Data. To bookend the exercise students wrote about what they chose to collect and why, and reflected on what the collected data, and the experience of collecting it said to them in the shape of a short data story. Look for a few of these stories in subsequent posts here on the MITH blog.

The post Little Big Data appeared first on Maryland Institute for Technology in the Humanities.

Monitoring Climate Data on the Web

Ed Summers — Thu, 03 May 2018 17:59:34 +0000

Monitoring Climate Data on the Web

Ray Cha
Software Project Manager
Environmental Data and Governance Initiative

MITH Conference Room
11:00 am – 12:00 pm, 2:00 – 3:00 pm

Please join us on May 10 from 11:00 am – 12:00 pm in MITH for a presentation by Ray Cha, who is helping direct the efforts of the Environmental Data and Governance Initiative (EDGI) to monitor federal websites that provide access to information about climate change, the environment, and energy. The presentation and discussion will be followed in the afternoon (2 – 3 pm) with an informal demonstration to get a more hands on understanding of how EDGI’s volunteers work. Participants are welcome to attend either (or both) sessions.

EDGI is an international network of academics and non-profits addressing potential threats to federal environmental and energy policy, and to the scientific research infrastructure built to investigate, inform, and enforce them. Dismantling this infrastructure—which ranges from databases to satellites to models for climate, air, and water—could imperil the public’s right to know, the United States’ standing as a scientific leader, corporate accountability, and environmental protection.

EDGI is monitoring changes to thousands of federal environmental agency webpages to document and analyze the way environmental data either disappears or otherwise changes, sometimes in subtle but significant ways.

More about EDGI’s work monitoring federal government websites can be found in their recent Changing the Digital Climate: How Climate Change Web Content is Being Censored Under the Trump Administration.

The post Monitoring Climate Data on the Web appeared first on Maryland Institute for Technology in the Humanities.

Ethics and Archiving the Web

Ed Summers — Mon, 19 Feb 2018 19:47:06 +0000

MITH is very excited to announce our participation in the Ethics and Archiving the Web National Forum which will be taking place at the New Museum in New York City, March 22-24. This collaboration between Rhizome and the Documenting the Now project will bring together activists, librarians, journalists, archivists, scholars, developers, and designers who are interested in generative conversations around the ethical use of the web in archives and memory work. If this sounds relevant to you please register today while spots are still available. In addition to the program of panels and talks there will also be a series of workshops on the Saturday following the main event. Continue below the fold for a bit more context on why this event is important to MITH’s work here at UMD.

For the past two years our work with our partners on Documenting the Now has deepened MITH’s longstanding interest in how archives are assembled and studied as an integral part of digital humanities research. Much of MITH’s previous attention in this area has focused on the construction of archives in the web–or rather, using the web as a means for publishing for, and engaging with, particular audiences of humanities scholars. As part of our efforts to help document the Ferguson Protests, Baltimore Uprising, and the Black Lives Matter movement, we have been drawn into conversations about how to build archives of the web, specifically of social media content such as Twitter. This engagement has led us directly into conversations about the positionality of archival work, and how ethics and our own values get built into collections and applications.

Thanks to the efforts of Bergis Jules and Vernon Mitchell (the projects’ two co-PIs) we have had the opportunity to engage with and learn from activists in Ferguson on several occasions. These activists described how they used social media as part of their work in Ferguson, and how social media records fit into their lived experience, not just as protestors, but as citizens and people. Most importantly these activists, along with an assembled group of scholars, helped us think together about what it means to do memory work as activists, archivists and social media researchers. It is not simply good enough for our project to document the events in Ferguson without engaging with and giving back to the communities we are documenting. While methods such as participant observation and action research are helpful guides, there is still much work to be done in applying them as humanists and archivists to communities on the web.

The web has often been thought of as a shared public space, or as Lawrence Lessig described it in 1999, a commons:

The internet is a commons: the space that anyone can enter, and take what she finds without the permission of a librarian, or a promise to pay. The net is built on a commons — the code of the World Wide Web, HTML, is a computer language that lays itself open for anyone to see — to see, and to steal, and to use as one wants. If you like a web page, then all major browsers permit you to reveal its source, download it, and change it as you wish. It’s out there for the taking; and what you take leaves as much for me as there was before.

It is astonishing how much has changed in how we think about the web since Lessig wrote those words almost 20 years ago. Far from being simply a commons that we can all take from equally, the web is now an unevenly distributed sociotechnical space, and an essential part of contemporary life. Web content exists along continuums of access and privilege, instead of in a binary, public/private state. Social media platforms are perfect examples of how communities can form in pockets the web. These communities aren’t simply part of a public commons or locked up in corporate walled gardens. We identified a real concrete need for more conversation and shared practices of how to work as scholars and archivists in an ethical, participatory way, while respecting the agency of the web communities we are attempting to remember.

With this goal in mind we invite you to join us in New York City at the Ethics and Archiving the Web forum. While the program is fixed, there are some spots available during the day long workshops if you would like to share your own work or projects with us. We hope to see you there!

Please get in touch with Ed Summers at MITH with any questions about the Documenting the Now project, or MITH’s involvement in the forum.

The post Ethics and Archiving the Web appeared first on Maryland Institute for Technology in the Humanities.

Return of the Digital Dialogues Podcast

Ed Summers — Mon, 03 Apr 2017 20:15:23 +0000

Since 2005 MITH’s Digital Dialogues series has served as our signature events program, where we invite members of the digital humanities community to join us to talk about their work. From the beginning it was important for these discussions to serve not just our local community here at the University of Maryland, but also for them to be available to a growing number of scholars interested in the humanities and digital media. As David Durden discussed previously in his posts about curating this collection, Digital Dialogues originally started as an audio podcast, and later migrated to a video format, that we now make available on Vimeo.

A couple years ago our friend Raymond Yee tweeted to us:

@UMD_MITH @digdialog Is there a RSS feed for your podcast series? http://t.co/ZwPkcH1OtJ

— Raymond Yee (@rdhyee) April 2, 2015

While we do have an RSS feed for the MITH website we don’t actually have a dedicated podcast for Digital Dialogues. Unfortunately a podcast isn’t something that Vimeo offers. Sadly it’s often not in the interests of social media companies to let you leave their websites and apps to view content. Nevertheless, Raymond is among the 21% of Americans who still actively listen to podcasts–and according to Edison Research the numbers are growing.

So, while we’re not getting rid of the video channel we decided to bring back the Digital Dialogues podcast:

http://mith.umd.edu/digital-dialogues/podcast/

Drop that URL in your podcast player or head on over to the Apple Store:

Registering the podcast with Apple had the handy side effect of pushing it out into the wider podcast ecosystem, so podcast players like Overcast should be able to find it.

We considered modifying our WordPress site to add the video enclosure to our existing RSS feed, but decided instead to make the podcast part of our already existing workflow. Calling it a workflow is really another way of saying that a program runs from cron every day looking for new Digital Dialogue events that have an embedded Vimeo video; if a new one is found the video is downloaded (with youtube-dl), the audio is extracted (with ffmpeg) and then it’s published on Amazon S3 (with boto). You can see this program over in the mithcast repository on GitHub if you are interested.

We do hope you enjoy the Digital Dialogues on your commute, on your jog, or wherever they may find you. We’re just sorry it took two years Raymond!

The post Return of the Digital Dialogues Podcast appeared first on Maryland Institute for Technology in the Humanities.

Tracking Changes With diffengine

Ed Summers — Wed, 25 Jan 2017 17:00:56 +0000

Our most respected newspapers want their stories to be accurate because once the words are on paper, and the paper is in someone’s hands, there’s no changing them. The words are literally fixed in ink to the page, and mass produced into many copies that are pretty much impossible to recall. Reputations can rise and fall based on how well newspapers are able to report significant events. But of course physical paper isn’t the whole story anymore.

News on the web can be edited quickly as new facts arrive, and more is learned. Typos can be quickly corrected–but content can also be modified for a multitude of purposes. Often these changes instantly render the previous version invisible. Many newspapers use their website as a place for their first drafts, which allows them to craft a story in near real time, while being the first to publish breaking news.

News travels fast in social media as it is shared and reshared across all kinds of networks of relationships. What if that initial, perhaps flawed version goes viral, and it is the only version you ever read? It’s not necessarily fake news, because there’s no explicit intent to mislead or deceive, but it may not be the best, most accurate news either. Wouldn’t it be useful to be able to watch how news stories shift in time to better understand how the news is produced? Or as Jeanine Finn memorably put it: how do we understand the news before truth gets its pants on?

As part of MITH’s participation in the Documenting the Now project we’ve been working on an experimental utility called diffengine to help track how news is changing. It relies on an old and quietly ubiquitous standard called RSS. RSS is a data format for syndicating content on the Web. In other words it’s an automated way of sharing what’s changing on your website, and for following what changes on someone else’s. News organizations use it heavily. When you listen to a podcast you’re using RSS. If you have a blog or write on Medium an RSS feed is quietly being generated for you whenever you write a new post.

So what diffengine does is really quite simple. First it subscribes to one or more RSS feeds, for example the Washington Post, and then it watches to see if any articles change their content over time. If a change is noticed a representation of the change, or a diff, is generated, the new version is archived at the Internet Archive, and the diff is (optionally) tweeted.

We’ve been experimenting with an initial version of diffengine by having it track the Washington Post, the Guardian and Breitbart News which you can see on the following Twitter accounts: wapo_diff, guardian_diff and breitbart_diff. Nick Ruest at York University and Ryan Baumann at Duke University have been setting up their own instances of diffengine to track what is now 25 media outlets, which you can see in this list that Ryan is maintaining.

So here’s an example of what a change looks like when it is tweeted:

Deportation force is ‘not happening,’ Paul Ryan tells undocumented family – The Washi… https://t.co/OQEpG1Inj3 -> https://t.co/NsDNI5Dflt pic.twitter.com/t0Q6iuG2qX

— Editing the Wapo (@wapo_diff) January 13, 2017

The text highlighted in red has been deleted and the text highlighted in green has been added. But you can’t necessarily take diffengine’s word for it right? Bots are sending all kinds of fraudulent and intentionally misleading information out on the web — especially in social media. So when diffengine notices new or changed content it uses Internet Archive’s save page now functionality to take a snapshot of the page, which it then references in the tweet. So you can see the original and changed content in the most trusted public repository we have for archived web content. You can see the links to both the before and after versions in the tweet above.

diffengine draws heavily on the work and example of two similar projects: NYTDiff and NewsDiffs. NYTdiff is able to create presentable diff images and tweet them for the New York Times. But it was designed to work specifically with the NYTimes API. diffengine borrows the use of phantomjs for creating tweetable images. NewsDiffs on the other hand provides a comprehensive framework for watching changes on multiple news sites (Washington Post, New York Times, CNN, BBC, etc). But you need to be a programmer to add a parser module for a website that you want to monitor. It is also a fully functional web application which requires considerable commitment to setup and run.

With the help of feedparser diffengine takes a different approach by working with any site that publishes an RSS feed of changes. This covers many news organizations, but also personal blogs and organizational websites that put out regular updates. And with the readability module diffengine is able to automatically extract the primary content of pages, without requiring special parsing to remove boilerplate material on a site-by-site basis.

To do its work diffengine keeps a small database of feeds, feed entries and version histories that it uses to notice when content has changed. If you know your way around a SQLite database you can query it to see how content has changed over time. This database could be a valuable source of research data, or small data, for the study of media production, or the way organizations or people communicate online. One possible direction we are considering is creating a simple web frontend for this database that allows you to navigate the changed content without requiring SQL chops.

Perhaps diffengine could also create its own private archive of the web content, rather than relying on a public snapshot at the Internet Archive. Keeping the archive private could help address ethical concerns around documenting particular individuals or communities when conducting research. If this sounds useful or interesting please get in touch with the Documenting the Now project, by joining our Slack channel or emailing us at info@docnow.io.

Installation of diffengine is currently a bit challenging if you aren’t already familiar with installing Python packages from the command line. If you are willing to give it a try let us know how it goes over on GitHub. Ideas for sites for us to monitor as we develop diffengine are also welcome!

Special thanks to Matthew Kirschenbaum and Gregory Jansen at the University of Maryland for the initial inspiration behind this idea of showing rather than telling what news is. The Human-Computer Interaction Lab at UMD hosted an informal workshop after the recent election to see what possible responses could be, and diffengine is one outcome from that brainstorming.

This page was originally published on the Documenting the Now blog

The post Tracking Changes With diffengine appeared first on Maryland Institute for Technology in the Humanities.