A Quick Experiment in “Distant Reading” a Large Medieval Latin Text



My dissertation is on the textual development of Gratian’s Decretum. The Decretum was written around 1140 by the otherwise unknown Gratian, and was the foundational textbook for the systematic study of canon law within the medieval university. (In fact, it remained the basis for the law of the Roman Catholic church right up until 1917.)

Inspired by Charity and Kathryn’s presentation on Wednesday night, I decided to use Wordle to do an experiment in “distant reading” Gratian’s text. The MGH (Monumenta Germaniae Historiae) in Munich digitized Emil Friedberg’s still-standard 1879 critical edition in the 80s, and I cut-and-pasted the whole thing (all 490,446 words) into Wordle.

A few things need to be kept in mind in order to interpret the resulting Wordle.

First, the Decretum was written in Latin, a fully-inflected language, and Wordle does no stemming. This is both a minus and a plus. Deus, Dei, Deum and Deo are just morphologically different forms of one word, and if we were to put them all together, Deus (“God”) would have a more prominent (and less misleading) place in the visual space than it does. Episcopus (“bishop”) is another example. On the other hand, the fact that Wordle does no stemming has the effect of preserving the gendered words, for example eum (“him”) and eam (“her”). These pronouns can, or course, refer to things that are masculine and feminine in a purely grammatical sense, but the difference is nevertheless interesting.

Another linguistic feature is the salience of the word que. This word can mean several different things depending on context, but it shows up on the Wordle because of its use as a relative pronoun (“which”) kicking off a subordinate clause. Latin is a hypotactic language and so subordinate clauses appear much more frequently than in a paratactic language like English.

Second, the Wordle makes sense in the context of the way in which Gratian put the Decretum together. The Decretum consists of short extracts from “authorities”, church councils plus long-dead theologians and popes, which Gratian embeds within a framework of his own comments (called dicta or “sayings”). It is extremely interesting that only two of the individual authorities are named frequently enough to show up in the Wordle: Augustinus (bishop of Hippo Regius in modern-day Algeria, d. 430) and Gregorius (bishop of Rome, d. 603). The word Papa (“Pope”) is more prominent, suggesting the collective, if not individual, heft of the popes in the lineup of authorities. Finally, Concilio (“Council”) shows up because the attribution (“inscription” in the jargon of medieval canon law studies) of so many canons is to one or another of the general or provincial councils that Gratian cited.

The chaining of multiple authorities in sequence is a very prominent feature of the text, and is indicated by the world Item (“Similarly”). One of Gratian’s goals was to show that the authorities were in harmony with each other. In fact his title for the book (which isn’t the one that stuck) was Concordia Discordantium Canonum (“The Agreement of Disagreeing Rules”). To do that, however, he had to bring out the apparent disagreements among the authorities before resolving them (his resolutions usually being introduced by Unde or “Whence”). This gives rise to the use of adversative particles like uel (“or”) and uero (“but”) that foreground the (apparent) contrast between the positions of the authorities.

These are just some of the immediate reactions I had to a quick experiment in “distant reading” an almost half million word text in one morning. I’ll update this post if I come up with more upon further reflection. I’d also appreciate feedback from the group on how to better communicate these ideas.

5 thoughts on “A Quick Experiment in “Distant Reading” a Large Medieval Latin Text

  1. Paul — It would be a great contribution to digital Latin studies if someone would create a list of stopwords for medieval Latin texts. Surely someone has done this already? Almost as surely, if they did, they got no credit for it from the powers that be. If nobody’s done it yet, I know where we should start: one of those grouped-frequency vocabulary lists, such as Gonzalez Lodge’s Vocabulary of High-School Latin.

