Many popular natural language processing techniques and tools rely on annotated training corpora to learn models that can be used to process new data from a similar domain. We can train a parser on Wall Street Journal text from the Penn Treebank, for example, and expect it to perform reasonably well on recent blog posts or movie reviews, but not necessarily on eighteenth-century conduct manuals. Unfortunately it’s often hard to find or create appropriate training data for specific literary genres or historical periods, even in English. In this talk Travis Brown, Assistant Director of Research and Development at MITH, will look at some examples of semi-supervised and unsupervised methods that can be used to explore large text collections in domains with little or no available training data.
A continuously updated schedule of talks is also available on the Digital Dialogues webpage.
Unable to attend the events in person? Archived podcasts can be found on the MITH website, and you can follow our Digital Dialogues Twitter account @digdialog as well as the Twitter hashtag #mithdd to keep up with live tweets from our sessions. Viewers can watch the live stream as well.
All talks free and open to the public. Attendees are welcome to bring their own lunches.