{"id":9907,"date":"2012-12-18T13:43:52","date_gmt":"2012-12-18T18:43:52","guid":{"rendered":"http:\/\/mith.umd.edu\/?p=9907"},"modified":"2020-10-08T16:00:49","modified_gmt":"2020-10-08T20:00:49","slug":"topic-modeling-round-up-and-some-new-software","status":"publish","type":"post","link":"https:\/\/mith.umd.edu\/topic-modeling-round-up-and-some-new-software\/","title":{"rendered":"Topic Modeling: New Software and a Wrap-up of our NEH-Sponsored Workshop"},"content":{"rendered":"<p>Topic modeling in (and on) the humanities has been the subject of a number of blog posts and online conversations over the past few weeks, including <a href=\"http:\/\/tedunderwood.com\/2012\/12\/14\/what-can-topic-models-of-pmla-teach-us-about-the-history-of-literary-scholarship\/\" target=\"_blank\" rel=\"noopener noreferrer\">this article<\/a> by <a href=\"http:\/\/andrewgoldstone.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Andrew Goldstone<\/a> and <a href=\"http:\/\/tedunderwood.com\/\" target=\"_blank\" rel=\"noopener noreferrer\">Ted Underwood<\/a>, which provides a very clear introduction to the method and outlines a set of experiments on <a href=\"http:\/\/www.mla.org\/pmla\" target=\"_blank\" rel=\"noopener noreferrer\">PMLA<\/a>, and <a href=\"http:\/\/www.jgoodwin.net\/?cat=20\" target=\"_blank\" rel=\"noopener noreferrer\">a series of posts<\/a> by <a href=\"http:\/\/www.jgoodwin.net\/\" target=\"_blank\" rel=\"noopener noreferrer\">Jon Goodwin<\/a> that walk through experiments on texts from JSTOR&#8217;s <a href=\"http:\/\/dfr.jstor.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">Data for Research<\/a> in useful detail. The comments on these blog posts are well worth reading, along with the parallel discussions on Twitter, such as <a href=\"http:\/\/storify.com\/travisbrown\/distance-measures-for-topic-modeling\" target=\"_blank\" rel=\"noopener noreferrer\">these responses<\/a> to a recent question by <a href=\"http:\/\/www.si.umich.edu\/people\/matt-burton\" target=\"_blank\" rel=\"noopener noreferrer\">Matt Burton<\/a> about appropriate techniques for measuring the similarity of documents or topics.<\/p>\n<p>Both Jon and Matt were participants at MITH&#8217;s <a href=\"http:\/\/www.neh.gov\/divisions\/odh\" target=\"_blank\" rel=\"noopener noreferrer\">NEH<\/a>-funded <a href=\"http:\/\/mith.umd.edu\/topicmodeling\/\" target=\"_blank\" rel=\"noopener noreferrer\">topic modeling workshop<\/a> last month, and Jennifer Guiliano and I would like to thank all of the speakers and attendees once more, and to point again to some of the many follow-up blog posts and other documents that came out of the workshop (please comment or <a href=\"https:\/\/twitter.com\/travisbrown\" target=\"_blank\" rel=\"noopener noreferrer\">contact me<\/a> if I&#8217;ve missed something you&#8217;ve written that you&#8217;d like to see listed here):<\/p>\n<ul>\n<li><a href=\"http:\/\/www.thomaspadilla.org\/2012\/11\/05\/aybabtu\/\" target=\"_blank\" rel=\"noopener noreferrer\">Some reflections<\/a> by <a href=\"https:\/\/twitter.com\/thomasgpadilla\" target=\"_blank\" rel=\"noopener noreferrer\">Thomas Padilla<\/a><\/li>\n<li><a href=\"http:\/\/www.saritaalami.com\/2012\/11\/04\/on-the-topic-of-topic-modeling-nehmith-workshop-wrap-up\/\" target=\"_blank\" rel=\"noopener noreferrer\">A wrap-up<\/a> by <a href=\"https:\/\/twitter.com\/sarita__alami\" target=\"_blank\" rel=\"noopener noreferrer\">Sarita Alami<\/a><\/li>\n<li><a href=\"http:\/\/www.trevorowens.org\/2012\/11\/discovery-and-justification-are-different-notes-on-sciencing-the-humanities\/\" target=\"_blank\" rel=\"noopener noreferrer\">Some questions<\/a> by <a href=\"https:\/\/twitter.com\/tjowens\" target=\"_blank\" rel=\"noopener noreferrer\">Trevor Owens<\/a> (who was not at the workshop, although many of the commenters on this post were)<\/li>\n<li><a href=\"http:\/\/storify.com\/sekleinman\/dh-topic-modeling-seminar\" target=\"_blank\" rel=\"noopener noreferrer\">A collection of Tweets<\/a> by <a href=\"https:\/\/twitter.com\/sekleinman\" target=\"_blank\" rel=\"noopener noreferrer\">Scott Kleinman<\/a><\/li>\n<li><a href=\"http:\/\/storify.com\/ncecire\/adding-the-human-touch-to-lda-with-automatized-cas\" target=\"_blank\" rel=\"noopener noreferrer\">And another set<\/a> by <a href=\"https:\/\/twitter.com\/ncecire\" target=\"_blank\" rel=\"noopener noreferrer\">Natalia Cecire<\/a>, with a focus on issues of labor<\/li>\n<li><a href=\"https:\/\/docs.google.com\/document\/d\/1Tl2WHhCvORnOXr0dk7VXHSCRc2uUAW4Be9dLlgH4s6A\/edit#heading=h.15jvohj1gl5n\" target=\"_blank\" rel=\"noopener noreferrer\">Detailed notes<\/a> by <a href=\"https:\/\/twitter.com\/briancroxall\" target=\"_blank\" rel=\"noopener noreferrer\">Brian Croxall<\/a><\/li>\n<li><a href=\"http:\/\/mith.umd.edu\/dialogues\/lisa-rhody-revising-ekphrasis-telling-the-sister-arts-story-through-topic-modeling-and-network-analysis\/\" target=\"_blank\" rel=\"noopener noreferrer\">A Digital Dialogue at MITH<\/a> by <a href=\"https:\/\/twitter.com\/lmrhody\" target=\"_blank\" rel=\"noopener noreferrer\">Lisa Rhody<\/a> on the use of topic modeling in her dissertation project<\/li>\n<\/ul>\n<p>We&#8217;d also like to announce a <a href=\"https:\/\/github.com\/umd-mith\/topic-modeling\" target=\"_blank\" rel=\"noopener noreferrer\">small topic modeling library and toolkit<\/a> that MITH is releasing and will continue to develop over the next few months. This library is written in the <a href=\"http:\/\/www.scala-lang.org\/\">Scala programming language<\/a> and currently serves primarily as a lightweight wrapper for <a href=\"http:\/\/mallet.cs.umass.edu\/\">MALLET<\/a>. It pulls together bits of functionality and code that we at MITH found ourselves developing for various projects with a topic modeling component, including a graduate course project on the Gothic novel and science fiction, Lisa Rhody&#8217;s <a href=\"http:\/\/mith.umd.edu\/research\/review-revise-requery\/\">work on ekphrastic poetry<\/a>, <a href=\"http:\/\/www.literaturegeek.com\" target=\"_blank\" rel=\"noopener noreferrer\">Amanda Visconti<\/a>&#8216;s work on <a href=\"http:\/\/digitalliterature.net\/viewDHQ\/\" target=\"_blank\" rel=\"noopener noreferrer\">visualizing Digital Humanities Quarterly<\/a>, and the\u00a0 <a href=\"http:\/\/mith.umd.edu\/research\/fla\/\">Foreign Literatures in America<\/a> project.<\/p>\n<p>One simple piece of functionality that we&#8217;ve found widely useful is a command-line tool that exports data from a model file generated by MALLET to a spreadsheet that can be opened in Excel or <a href=\"http:\/\/www.libreoffice.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">LibreOffice<\/a>. While Excel is in many ways a less sophisticated data analysis platform than tools like R, it is widely used and has a relatively shallow learning curve. For example, this tool has allowed us to train a topic model and hand the results as a spreadsheet to a group of undergraduate students, who can then easily identify the documents in their corpus most strongly associated with a particular topic, or find the documents that are most similar (according to the topic model) to a particular text they are reading.<\/p>\n<p>The project currently doesn&#8217;t go out of its way to insulate users from the command line, but it is designed to be easy to install and use. It relies on the <a href=\"http:\/\/maven.apache.org\/\" target=\"_blank\" rel=\"noopener noreferrer\">Maven build tool<\/a> to manage dependencies, so you can run MALLET&#8217;s topic modeling engine (with reasonable defaults) without manually installing MALLET on your machine, for example. If you&#8217;re on a Mac with OS X, you already have Maven installed, and if you run Windows or Linux the installation process of installing Maven is <a href=\"http:\/\/maven.apache.org\/guides\/getting-started\/maven-in-five-minutes.html\" target=\"_blank\" rel=\"noopener noreferrer\">fairly painless and straightforward<\/a>.<\/p>\n<p>If you&#8217;re curious about topic modeling and are willing to roll up your sleeves and open a terminal, we&#8217;d encourage you to click the link above and try out this software. This is very much a work in progress, so let us know about features you&#8217;d like to see\u2014either in a comment here or by <a href=\"https:\/\/github.com\/umd-mith\/topic-modeling\/issues\" target=\"_blank\" rel=\"noopener noreferrer\">creating a new issue<\/a> in the GitHub repository\u2014and we&#8217;ll do our best to get them implemented. And watch this space\u2014in the spring we&#8217;ll be launching a sandbox environment that will allow users to run MALLET and other topic modeling tools without installing any software on their local machine.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Topic modeling in (and on) the humanities has been the subject of a number of blog posts and online conversations over the past few weeks, [&hellip;]<\/p>\n","protected":false},"author":10,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[66],"tags":[55],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v15.0 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Topic Modeling: New Software and a Wrap-up of our NEH-Sponsored Workshop &ndash; Maryland Institute for Technology in the Humanities<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/mith.umd.edu\/topic-modeling-round-up-and-some-new-software\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Topic Modeling: New Software and a Wrap-up of our NEH-Sponsored Workshop &ndash; Maryland Institute for Technology in the Humanities\" \/>\n<meta property=\"og:description\" content=\"Topic modeling in (and on) the humanities has been the subject of a number of blog posts and online conversations over the past few weeks, [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/mith.umd.edu\/topic-modeling-round-up-and-some-new-software\/\" \/>\n<meta property=\"og:site_name\" content=\"Maryland Institute for Technology in the Humanities\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/UMD.MITH\" \/>\n<meta property=\"article:published_time\" content=\"2012-12-18T18:43:52+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2020-10-08T20:00:49+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/mith.umd.edu\/wp-content\/uploads\/2018\/10\/MITH-logostack-square-grn.png\" \/>\n\t<meta property=\"og:image:width\" content=\"300\" \/>\n\t<meta property=\"og:image:height\" content=\"300\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebSite\",\"@id\":\"https:\/\/mith.umd.edu\/#website\",\"url\":\"https:\/\/mith.umd.edu\/\",\"name\":\"Maryland Institute for Technology in the Humanities\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":\"https:\/\/mith.umd.edu\/?s={search_term_string}\",\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/mith.umd.edu\/topic-modeling-round-up-and-some-new-software\/#webpage\",\"url\":\"https:\/\/mith.umd.edu\/topic-modeling-round-up-and-some-new-software\/\",\"name\":\"Topic Modeling: New Software and a Wrap-up of our NEH-Sponsored Workshop &ndash; Maryland Institute for Technology in the Humanities\",\"isPartOf\":{\"@id\":\"https:\/\/mith.umd.edu\/#website\"},\"datePublished\":\"2012-12-18T18:43:52+00:00\",\"dateModified\":\"2020-10-08T20:00:49+00:00\",\"author\":{\"@id\":\"https:\/\/mith.umd.edu\/#\/schema\/person\/5fc16934464447b81eecc0ae84130b9c\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/mith.umd.edu\/topic-modeling-round-up-and-some-new-software\/\"]}]},{\"@type\":\"Person\",\"@id\":\"https:\/\/mith.umd.edu\/#\/schema\/person\/5fc16934464447b81eecc0ae84130b9c\",\"name\":\"Travis Brown\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/mith.umd.edu\/#personlogo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/bf774a4e24c33e4bb6d19ea1939f01a2?s=96&d=mm&r=g\",\"caption\":\"Travis Brown\"},\"sameAs\":[\"https:\/\/twitter.com\/travisbrown\"]}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","_links":{"self":[{"href":"https:\/\/mith.umd.edu\/wp-json\/wp\/v2\/posts\/9907"}],"collection":[{"href":"https:\/\/mith.umd.edu\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mith.umd.edu\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mith.umd.edu\/wp-json\/wp\/v2\/users\/10"}],"replies":[{"embeddable":true,"href":"https:\/\/mith.umd.edu\/wp-json\/wp\/v2\/comments?post=9907"}],"version-history":[{"count":1,"href":"https:\/\/mith.umd.edu\/wp-json\/wp\/v2\/posts\/9907\/revisions"}],"predecessor-version":[{"id":21164,"href":"https:\/\/mith.umd.edu\/wp-json\/wp\/v2\/posts\/9907\/revisions\/21164"}],"wp:attachment":[{"href":"https:\/\/mith.umd.edu\/wp-json\/wp\/v2\/media?parent=9907"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mith.umd.edu\/wp-json\/wp\/v2\/categories?post=9907"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mith.umd.edu\/wp-json\/wp\/v2\/tags?post=9907"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}