Computational Text Analysis

A blog post about computational text analysis.

Text Analysis, Generally Speaking

As we learned in class last week, text analysis allow people to focus on and gain insights into specific features of texts. We looked at features of several sonnets focusing on, amongst other things, the sonnets’ themes, cadence, rhyme schemes, and line length. We were able to investigate and visualize these features without the aide of computers. But, there are limits to humans’ capacity to analyze text. While we may be able to delve into sonnets on our own, analyzing an entire body or work without the aid of computers would be a serious undertaking.

Algorithmic Text Analysis

Prior to the advent of the digital computer, analyses of text were limited by the capacities of the individuals conducting said analyses. The shift toward computational text analysis began with Roberto Busa who, in partnership with the founder of IBM, set out to index the entire body of St. Thomas Aquinas’ work. The project, which took 30 years to complete, opened the door for computational text analysis within the humanities (Bonzio, 2011; Schreibman & Siemens, 2008).

The ability to conduct algorithmic text analyses posed somewhat of an existential problem for the field of literary analysis. Suddenly, as some realized, hypotheses could be made about texts and, supposedly, objectively confirmed or disconfirmed using computerized methods (see Schreibman & Siemens, 2008). As Chreibman & Siemens (2008) explain, this perspective is applicable to questions of authorship such as in the case The Federalist Papers. However, it may be less applicable to literary criticism.

Interestingly, literary critics, typically tasked with subjective interpretations of works of literature, also employ various algorithmic text analyes, including term frequency-inverse document frequency (tf-idf; see Schreibman & Siemenes, 2008). In this application, critics use the objective data generated by computers to support their subjective interpretations of texts.

References:

Bonzio, R. (2011). Father Busa, pioneer of computing in humanities with Index Thomisticus, dies at 98. Retrieved September 12, 2016, from http://www.forbes.com/sites/robertobonzio/2011/08/11/father-busa-pioneer-of-computing-in-humanities-dies-at-98/

Schreibman, Susan, and Ray Siemens. (2008). A Companion to Digital Literary Studies. Hardcover. Blackwell Companions to Literature and Culture. Oxford: Blackwell Publishing Professional. http://www.digitalhumanities.org/companionDLS/.

Written on September 12, 2016 by Josh Guberman