Travis Brown – Maryland Institute for Technology in the Humanities https://mith.umd.edu Thu, 08 Oct 2020 20:03:22 +0000 en-US hourly 1 https://wordpress.org/?v=5.5.1 Topic Modeling: New Software and a Wrap-up of our NEH-Sponsored Workshop https://mith.umd.edu/topic-modeling-round-up-and-some-new-software/ Tue, 18 Dec 2012 18:43:52 +0000 http://mith.umd.edu/?p=9907 Topic modeling in (and on) the humanities has been the subject of a number of blog posts and online conversations over the past few weeks, including this article by Andrew Goldstone and Ted Underwood, which provides a very clear introduction to the method and outlines a set of experiments on PMLA, and a series of [...]

The post Topic Modeling: New Software and a Wrap-up of our NEH-Sponsored Workshop appeared first on Maryland Institute for Technology in the Humanities.

]]>
Topic modeling in (and on) the humanities has been the subject of a number of blog posts and online conversations over the past few weeks, including this article by Andrew Goldstone and Ted Underwood, which provides a very clear introduction to the method and outlines a set of experiments on PMLA, and a series of posts by Jon Goodwin that walk through experiments on texts from JSTOR’s Data for Research in useful detail. The comments on these blog posts are well worth reading, along with the parallel discussions on Twitter, such as these responses to a recent question by Matt Burton about appropriate techniques for measuring the similarity of documents or topics.

Both Jon and Matt were participants at MITH’s NEH-funded topic modeling workshop last month, and Jennifer Guiliano and I would like to thank all of the speakers and attendees once more, and to point again to some of the many follow-up blog posts and other documents that came out of the workshop (please comment or contact me if I’ve missed something you’ve written that you’d like to see listed here):

We’d also like to announce a small topic modeling library and toolkit that MITH is releasing and will continue to develop over the next few months. This library is written in the Scala programming language and currently serves primarily as a lightweight wrapper for MALLET. It pulls together bits of functionality and code that we at MITH found ourselves developing for various projects with a topic modeling component, including a graduate course project on the Gothic novel and science fiction, Lisa Rhody’s work on ekphrastic poetry, Amanda Visconti‘s work on visualizing Digital Humanities Quarterly, and the  Foreign Literatures in America project.

One simple piece of functionality that we’ve found widely useful is a command-line tool that exports data from a model file generated by MALLET to a spreadsheet that can be opened in Excel or LibreOffice. While Excel is in many ways a less sophisticated data analysis platform than tools like R, it is widely used and has a relatively shallow learning curve. For example, this tool has allowed us to train a topic model and hand the results as a spreadsheet to a group of undergraduate students, who can then easily identify the documents in their corpus most strongly associated with a particular topic, or find the documents that are most similar (according to the topic model) to a particular text they are reading.

The project currently doesn’t go out of its way to insulate users from the command line, but it is designed to be easy to install and use. It relies on the Maven build tool to manage dependencies, so you can run MALLET’s topic modeling engine (with reasonable defaults) without manually installing MALLET on your machine, for example. If you’re on a Mac with OS X, you already have Maven installed, and if you run Windows or Linux the installation process of installing Maven is fairly painless and straightforward.

If you’re curious about topic modeling and are willing to roll up your sleeves and open a terminal, we’d encourage you to click the link above and try out this software. This is very much a work in progress, so let us know about features you’d like to see—either in a comment here or by creating a new issue in the GitHub repository—and we’ll do our best to get them implemented. And watch this space—in the spring we’ll be launching a sandbox environment that will allow users to run MALLET and other topic modeling tools without installing any software on their local machine.

The post Topic Modeling: New Software and a Wrap-up of our NEH-Sponsored Workshop appeared first on Maryland Institute for Technology in the Humanities.

]]>
Upcoming Topic Modeling for Humanities Research workshop https://mith.umd.edu/topic-modeling-for-humanities-research/ Fri, 26 Oct 2012 19:20:30 +0000 http://mith.umd.edu/?p=9741 In preparation for MITH's NEH-funded Topic Modeling for Humanities Research workshop, which is just over a week away, we'd like to highlight some resources associated with the workshop—as well as some of the recent conversations we've been following about applications of topic modeling in the humanities. First of all, we'd like to encourage everyone to [...]

The post Upcoming Topic Modeling for Humanities Research workshop appeared first on Maryland Institute for Technology in the Humanities.

]]>
In preparation for MITH’s NEH-funded Topic Modeling for Humanities Research workshop, which is just over a week away, we’d like to highlight some resources associated with the workshop—as well as some of the recent conversations we’ve been following about applications of topic modeling in the humanities.

First of all, we’d like to encourage everyone to check out Paper Machines, a Zotero extension developed by Jo Guldi and Chris Johnson-Roberson, who will be at the workshop next week to discuss how their project uses topic modeling to provide visualizations of the distribution of topics over time in the contents of your personal Zotero library. We’ve created a Storify post recording just a few of the recent reactions to Paper Machines on Twitter.

There have been a number of interesting conversations on topic modeling in the context of humanities research on Twitter in just the last couple of weeks, including these discussions about how to evaluate and interpret topic models, how to get detailed information about word distributions out of MALLET, and how to use topic models to aid navigation of text collections. Many of the participants in these threads will be at the workshop next week, and we’re looking forward to continuing the conversations there.

We’d also like to point everyone to the public Zotero library created by project director Jennifer Guiliano, which lists papers and blog posts about topic modeling methods and humanities applications; to workshop participant David Mimno‘s comprehensive bibliography; and to the topic modeling Subreddit recently started by Matt Burton (who will also be at the workshop).

I’ve also just put together a Twitter list of all of the participants whose Twitter handles I could round up, and the complete list of speakers and attendees is available on the project website. (If you’re attending the workshop and are on Twitter but not on the list, please let me know).

Please see the project website for a more detailed description of the goals of the workshop and the schedule of presentations and discussions, as well as information about logistics for workshop participants, and be sure to follow #dhtopic on Twitter on Saturday, November 3—one week from tomorrow!

The post Upcoming Topic Modeling for Humanities Research workshop appeared first on Maryland Institute for Technology in the Humanities.

]]>
MITH Awarded Amazon Web Services Grant https://mith.umd.edu/mith-awarded-amazon-web-services-grant/ https://mith.umd.edu/mith-awarded-amazon-web-services-grant/#comments Fri, 15 Jul 2011 14:06:16 +0000 http://mith.umd.edu/?p=2940 We are pleased to announce that MITH has been awarded an Amazon Web Services research grant for $7,500 to support our use of cloud computing services over the next two years. We are currently using Elastic Compute Cloud (EC2) instances to perform natural language processing tasks on large text corpora, including a collection of approximately [...]

The post MITH Awarded Amazon Web Services Grant appeared first on Maryland Institute for Technology in the Humanities.

]]>
We are pleased to announce that MITH has been awarded an Amazon Web Services research grant for $7,500 to support our use of cloud computing services over the next two years. We are currently using Elastic Compute Cloud (EC2) instances to perform natural language processing tasks on large text corpora, including a collection of approximately 117,000 books from the HathiTrust that spans four centuries and several languages. Our EC2 instances also support development at MITH, host prototype applications, and archive MITH’s activity in social media channels. This generous award from Amazon will allow us to expand all of these operations. Please follow the MITH blog for an upcoming series of more detailed posts about how we are using these cloud computing services to enable digital humanities research.

The post MITH Awarded Amazon Web Services Grant appeared first on Maryland Institute for Technology in the Humanities.

]]>
https://mith.umd.edu/mith-awarded-amazon-web-services-grant/feed/ 1