In this talk I will describe the goals of the MALACH project (Multilingual Access to Large Spoken Archives) and some of our research results. I’ll begin by describing the unique characteristics of the oral history collection that we are using, in which Holocaust survivors, witnesses and rescuers were interviewed in several languages. Each interview has been digitized and extensively catalogued by subject matter experts, thus producing a remarkably rich collection for the application of machine learning techniques. Automatic speech recognition techniques originally developed for the domain of conversational telephone speech were adapted to process with word error rates that are adequate to support interactive search and automated clustering, detection of topic shifts, and topic classification. In this talk, I will describe the studies that we conducted to learn about what needs our systems should be designed to meet and I’ll summarize key results from our system development activities. I’ll conclude with some remarks about possible future directions for research applying new technologies to improve intellectual access to oral history and other spoken word collections.
A continuously updated schedule of talks is also available on the Digital Dialogues webpage.
Unable to attend the events in person? Archived podcasts can be found on the MITH website, and you can follow our Digital Dialogues Twitter account @digdialog as well as the Twitter hashtag #mithdd to keep up with live tweets from our sessions. Viewers can watch the live stream as well.
All talks free and open to the public. Attendees are welcome to bring their own lunches.
Contact: MITH (mith.umd.edu, mith@umd.edu, 301.405.8927).