{"id":3136,"date":"2011-08-08T09:40:29","date_gmt":"2011-08-08T14:40:29","guid":{"rendered":"http:\/\/mith.umd.edu\/?p=3136"},"modified":"2020-10-08T16:03:19","modified_gmt":"2020-10-08T20:03:19","slug":"reflections-on-scale-and-topic-modeling","status":"publish","type":"post","link":"https:\/\/mith.umd.edu\/reflections-on-scale-and-topic-modeling\/","title":{"rendered":"Reflections on Scale and Topic Modeling"},"content":{"rendered":"<p>I recently came across a 1991 <a href=\"http:\/\/www.theparisreview.org\/interviews\/2225\/the-art-of-criticism-no-1-harold-bloom\" target=\"_blank\" rel=\"noopener noreferrer\">interview<\/a> of the literary critic Harold Bloom (Sterling Professor of Humanities at Yale University) in The Paris Review, in the course of which Bloom remarks:<\/p>\n<p>&#8220;As far as I\u2019m concerned, computers have as much to do with literature as space travel, perhaps much less.&#8221;<\/p>\n<p>Since coming here (to MITH) as an intern this summer, I have learned about several projects that just might make Bloom change his mind.  In the last few weeks, I have been working on one such project, along with Travis Brown and Clay Templeton here at MITH, that utilizes cutting-edge work on topic modeling currently being done in University of Maryland\u2019s Computer Science department. Clay has already written about the project in his previous blog post, and so I will simply use this opportunity to express some reflection of my own.<\/p>\n<p>The question of \u201cscale\u201d has been on my mind over the past couple of weeks. We are processing vast amounts of text data \u2014 topic modeling is the kind of approach whose power of discovery is predicated on the assumption that vast amounts of textual data will be available for it to run on. It makes me pause and reflect that the assumption that these approaches will \u201cscale up\u201d quantitatively to continue to become more prominent and visible in the coming years, rests on some deeper technological and social assumptions. That is, increased success for these approaches is going to depend on Moore\u2019s Law continuing to hold (i.e., more and more processing power being available more and more cheaply), and also on the willingness (and legal feasibility) of those libraries and institutions that own such vast repositories of texts to make them available in computer-readable formats.<\/p>\n<p>Earlier, our group here at MITH was working with the \u201cunsupervised\u201d topic modeling approach, in which no knowledge of the content of the text is really needed \u2014 the algorithm simply cranks away at whatever text corpus it is working on, and discovers topics from it. For the last week or so, though, we have focused on the brand-new and cutting-edge \u201csupervised\u201d topic modeling approach that is being developed by a research group in the Computer Science department here at Maryland. The idea in this kind of \u201csupervised\u201d topic modeling is to \u201ctrain\u201d the algorithm by making use of domain knowledge. For example, in conjunction with the Civil War era newspaper archive with which we are working, we are making use of such related pieces of knowledge (coming from contemporaneous sources external to our corpus) as the casualty rate for each week, and the Consumer Price Index for each month. The idea behind this approach is that the algorithm will discover more \u201cmeaningful\u201d topics if it has a way to make use of feedback regarding how well the topics discovered by it are associated with a parameter of interest. Thus, if we are trying to bias the algorithm into discovering topics that pertain more directly to the Civil War and its effects, then it will make sense to align the aforementioned \u201cother kinds of data\u201d such as \u2014 in our case, casualty figures and economic figures for the era \u2014 which have a provenance outside the text corpus. This is where the \u201cqualitative\u201d scale becomes important.<\/p>\n<p>The more intelligently we try to leverage these approaches\u2019 power, the sheer number of areas with which the successful practitioner of this kind of topic modeling approach will, therefore, have to have at least a passing acquaintance, will \u201cscale\u201d up. This made me think about how people trained in information science \u2014 which is a truly interdisciplinary field \u2014 are really well-positioned to do this. Over the last week, for example, I read several papers on the economic history of the Civil War (which we were pointed to by Robert K. Nelson, a historian at the University of Richmond who has worked on topic modeling and history). Who would have guessed that one would have to read Civil War papers in the course of a summer internship in Information Science?  I aligned the economic data with the text corpus, and based on what the data seemed to be telling us, I came up with a <a href=\"http:\/\/www-personal.umich.edu\/~bhattach\/econhyp.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">design for some experiments<\/a> to test out some hypotheses, which we will proceed to carry out over the next few days.<\/p>\n<p>Also, in a piece of exciting news, the <a href=\"http:\/\/www-personal.umich.edu\/~bhattach\/RhetoricConferenceAbstractFinal.pdf\" target=\"_blank\" rel=\"noopener noreferrer\">paper proposal<\/a> that we (Travis, Clay, and I) submitted to the \u201cMaking Meaning\u201d conference for graduate students, organized by the Program in Rhetoric at the English Department of the University of Michigan, has been accepted. This presentation will reflect on how one might situate approaches like topic modeling in the context of literary theory and philosophy. This, too, is an example of how as \u201cinformation scientists\u201d we must see, and think, in terms of the \u201cbig picture\u201d \u2014 that is, scale up to the big picture.<\/p>\n<p>P.S. Now that this post turned out to be a reflection on the question of scale, it just occurred to me that it is also appropriate that the programming language I learned during the earlier part of the internship was \u2014 <a href=\"http:\/\/www.artima.com\/scalazine\/articles\/scalable-language.html\" target=\"_blank\" rel=\"noopener noreferrer\">Scala<\/a>!<\/p>\n","protected":false},"excerpt":{"rendered":"<p>I recently came across a 1991 interview of the literary critic Harold Bloom (Sterling Professor of Humanities at Yale University) in The Paris Review, in [&hellip;]<\/p>\n","protected":false},"author":11,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[66,77],"tags":[164,55],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v15.0 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Reflections on Scale and Topic Modeling &ndash; Maryland Institute for Technology in the Humanities<\/title>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/mith.umd.edu\/reflections-on-scale-and-topic-modeling\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Reflections on Scale and Topic Modeling &ndash; Maryland Institute for Technology in the Humanities\" \/>\n<meta property=\"og:description\" content=\"I recently came across a 1991 interview of the literary critic Harold Bloom (Sterling Professor of Humanities at Yale University) in The Paris Review, in [&hellip;]\" \/>\n<meta property=\"og:url\" content=\"https:\/\/mith.umd.edu\/reflections-on-scale-and-topic-modeling\/\" \/>\n<meta property=\"og:site_name\" content=\"Maryland Institute for Technology in the Humanities\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/UMD.MITH\" \/>\n<meta property=\"article:published_time\" content=\"2011-08-08T14:40:29+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2020-10-08T20:03:19+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/mith.umd.edu\/wp-content\/uploads\/2018\/10\/MITH-logostack-square-grn.png\" \/>\n\t<meta property=\"og:image:width\" content=\"300\" \/>\n\t<meta property=\"og:image:height\" content=\"300\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebSite\",\"@id\":\"https:\/\/mith.umd.edu\/#website\",\"url\":\"https:\/\/mith.umd.edu\/\",\"name\":\"Maryland Institute for Technology in the Humanities\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":\"https:\/\/mith.umd.edu\/?s={search_term_string}\",\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"WebPage\",\"@id\":\"https:\/\/mith.umd.edu\/reflections-on-scale-and-topic-modeling\/#webpage\",\"url\":\"https:\/\/mith.umd.edu\/reflections-on-scale-and-topic-modeling\/\",\"name\":\"Reflections on Scale and Topic Modeling &ndash; Maryland Institute for Technology in the Humanities\",\"isPartOf\":{\"@id\":\"https:\/\/mith.umd.edu\/#website\"},\"datePublished\":\"2011-08-08T14:40:29+00:00\",\"dateModified\":\"2020-10-08T20:03:19+00:00\",\"author\":{\"@id\":\"https:\/\/mith.umd.edu\/#\/schema\/person\/f146dda81152e7a9ea2018aa3c22b377\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/mith.umd.edu\/reflections-on-scale-and-topic-modeling\/\"]}]},{\"@type\":\"Person\",\"@id\":\"https:\/\/mith.umd.edu\/#\/schema\/person\/f146dda81152e7a9ea2018aa3c22b377\",\"name\":\"Sayan Bhattacharyya\",\"image\":{\"@type\":\"ImageObject\",\"@id\":\"https:\/\/mith.umd.edu\/#personlogo\",\"inLanguage\":\"en-US\",\"url\":\"https:\/\/secure.gravatar.com\/avatar\/f32a60715e2055043d28c86699f6c376?s=96&d=mm&r=g\",\"caption\":\"Sayan Bhattacharyya\"}}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","_links":{"self":[{"href":"https:\/\/mith.umd.edu\/wp-json\/wp\/v2\/posts\/3136"}],"collection":[{"href":"https:\/\/mith.umd.edu\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mith.umd.edu\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mith.umd.edu\/wp-json\/wp\/v2\/users\/11"}],"replies":[{"embeddable":true,"href":"https:\/\/mith.umd.edu\/wp-json\/wp\/v2\/comments?post=3136"}],"version-history":[{"count":1,"href":"https:\/\/mith.umd.edu\/wp-json\/wp\/v2\/posts\/3136\/revisions"}],"predecessor-version":[{"id":21311,"href":"https:\/\/mith.umd.edu\/wp-json\/wp\/v2\/posts\/3136\/revisions\/21311"}],"wp:attachment":[{"href":"https:\/\/mith.umd.edu\/wp-json\/wp\/v2\/media?parent=3136"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mith.umd.edu\/wp-json\/wp\/v2\/categories?post=3136"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mith.umd.edu\/wp-json\/wp\/v2\/tags?post=3136"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}