Manpreet Khural is an undergraduate member of the Gemstone POLITIC undergraduate research team, led by MITH Faculty Fellow Peter Mallios.

As we, Team POLITIC of Gemstone, make progress in the effort of utilizing data mining tools such as Weka, it becomes more evident that such a technological approach provides a goldmine of new information that would be otherwise impossible to obtain. Currently, we have been working to train Weka to answer a set of questions in which we are interested. In doing so, we have to first provide it data from which it can learn. This requires manual annotations of article documents. It is in doing this that we see the potential of data mining technology.

The lack of human learning biases is this potential. In order to provide Weka the most accurate learning data set, we have made strict guidelines for how we answer the questions.  Even with these guidelines, it is apparent that without strenuous amounts of personal effort, the questions will always have certain biases. The human opinion is a transient quantity and therefore makes it difficult to apply a scientific approach to the analysis of texts. We build associations every single day, making it impossible to maintain constants in mindset and realistically be able to answer these questions without error.

Data mining, on the other hand, has a much more objective learning process. It makes connections solely on the basis of the patterns that the data sets contains. These patterns contain an entirely new and revolutionary insight into texts as they are based on the use of language, what is on the page rather than the ideas, what a reader often infers based on prior personal associations. Even though the training process can be lengthy, the applications for data mining seem endless, considering that without such technology, we would have to go through every bit of text in our data set and annotate them. We foresee data mining as a way for gathering information on any topic that has sufficient amount of available text. For example, national defense agencies can use it to answer queries that could be useful in understanding changes in sentiment as pertaining to whatever topic they are interested in.  We believe that data mining will revolutionize many such industries which aim to understand changes in public sentiment.