For this assignment, I decided to make a distance reading visualization of the Homestuck text archive. Homestuck, is a long-running multimedia web-comic parody of the adventure game genre. Note that though the comic does have a lot of text (My input text contained 477993 words), many pages are entirely text free, and take the form of animated gifs, flash animations, or interactive games which obviously could not be represented. Interestingly, in analyzing the Wordle creation, easily the most common words are two letter acronyms which don’t necessarily provide any information to an outsider. These acronyms are in fact the chat-client name acronyms of several of the more prominent main characters (TG=turntechGod=Dave Strider, TT=tentacleTherapist=Rose Lalonde, EB=ectoBiologist=John Egbert, GC=gallowsCalibrator=Terezi Pyrope, etc.) In a sense, this really speaks to several of the main features of Homestuck as a whole. Primarily, that it is extremely self referential, and difficult to follow without all relevant background information. It also involves a large amount of dialog over chat-clients, each line of which is preceded by the speaking character’s acronym (but not name). The text also indicates several common themes in Homestuck, NOW and TIME, are both indicative of the extreme prevalence of time travel and time paradoxes in the plot. THINK and KNOW are both also fairly prominent, and vaguely indicate the self aware nature of much of the comic and its characters. Other interesting points include the fact that though common English words were removed from the Wordle, consistent quirk spellings of these words weren’t. Several of the main characters speak in highly characteristic manners, 3MPLOY1NG ON3 OR MOR3 TYP1NG QU1RKS 4ND 4 S1GN4TUR3 COLOR. Though the colors obviously could not be carried through, words like TH3 and 4ND were both fairly common in the wordle.
For a new distance reading tool, I am imagining a system which takes input text and produces a web of links between the words/themes as they are related to each other in the text. For instance, if a character is frequently described as thin, that would be represented as a line between the name of the character and the word thin. The ‘tension’ of the line would be proportional to the relative frequency/importance of the relation, which would bring closely tied ideas closer, and allow only slightly associated terms to drift farther apart. This would allow the user to not only to better perceive the relevant themes within the book, but perhaps find new ones.
In summary: