Library Research Spotlight: Patrick J. Burns
Since joining the ISAW Library as Assistant Research Scholar last year, one of my main areas of research has been the digital text analysis of Latin literature, and more specifically, the development of the computational tools necessary for doing this analysis. I contribute to an open-source research platform called the Classical Language Toolkit (CLTK), which aims to gather historical language corpora, building resources and tools for analyzing these corpora, and publishing scientific research based on this analysis. In 2017, I was invited to two conferences hosted by the Digital Humanities department at Universität Leipzig to present my Latin text analysis research on behalf of ISAW and CLTK.
In February, I attended the Global Philology Open Conference in Leipzig, which brought together researchers working on a wide variety of historical languages to address the question:
"What digital services, collections or curricula need to be developed so that a field of study can flourish in a digital society?"
There was much on offer directly relevant to the ISAW scholarly community, including presentations on machine learning and cuneiform studies, digital infrastructure for historical Chinese texts, and the state of Ancient Egyptian philology, to name just a few. Conference organizer and program director of the Leipzig Digital Humanities department, Gregory Crane, began the proceedings by defining global philology as social scholarship working “across boundaries of language and cultures” and stating that the future of philology would need to include any language of the past without room for restriction. This is an ambitious, forward-looking perspective for a field of study focused squarely on the deep past, and the conference highlighted the work being done now to lay the groundwork necessary to realize this goal.
My contribution to the Global Philology Open Conference was an update on my ongoing research into readability studies for classical languages. The tools I have been developing for the CLTK, in particular the tools designed to break texts down into units (sentences, words, syllables, etc.) and those designed to identify automatically retrieve dictionary headwords, make it possible to draw meaningful, statistical comparisons between texts. Moreover, the pedagogical value of such readability research is clear: with systematic, accurate counts of formal features in texts, students, especially the not insignificant autodidact community interested in classical languages, can be better matched with reading material at an appropriate level. In this talk, “Cicero’s Hardest Sentence?: Measuring Readability in Latin Literature,” I compared Latin teachers' assumptions about relative difficulty in the works of the 1st-century BCE orator-politician-philosopher, Marcus Tullius Cicero, with rankings derived from various readability measures. The rankings can be seen in the presentation here.
Earlier this month, I was invited back to Leipzig to participate in the Big Textual Data workshop, which followed up on February's conference but with a specific interest in dealing with historical language study at large scale. This was another good match for my CLTK research, particularly my recent work on the role of high-frequency vocabulary in text analysis. In this talk, "Creating Stoplists for Historical Languages," I discussed the role that removing high-frequency, low-information vocabulary (for example, "the" and "and" in English, or "ὁ" and "καί" for Greek) can have on text processing tasks and suggested how such lists may be built systematically for the wide variety of languages represented by the CLTK. As conference organizer Thomas Köntges emphasized throughout the workshop, there is a great deal to be gained from sharing resources and adapting methods across language-specific philological and linguistic traditions. My work on stoplist list creation is only one part of a larger effort to build BLARKs—Basic LAnguage Resource Kits—for historical languages, helping to ensure that the CLTK languages all have at least the minimal starting point, with respect to corpora, tools, and methods, required for text processing and analysis. My presentation on stoplists and BLARKs can be found here.
The overarching goal of the CLTK is to become a framework for integrated philological research of the ancient world, broadly defined. This goal is clearly in line with Universität Leipzig's Global Philology project which also works to tear down barriers between philological—and digital philological—traditions that have long been separated by disciplinary or departmental concerns. I have come to see the goals of the CLTK as sympathetic to the ISAW mission as well. The CLTK has a vision of digital philology similarly aimed at investigating "historical connections and patterns" and drawing "socially illuminating comparisons." The commitment of ISAW Digital Programs to instruction, programming, and research on computer-assisted approaches to ancient world study only strengthens this tie. Accordingly, it is with this collective energy and enthusiasm for digital approaches to ancient world study that I am organizing a conference at ISAW in April 2018. The conference is called Future Philologies: Digital Approaches to Historical Language Text. The conference will be my opportunity to bring back to ISAW the kinds of cutting-edge philological and computational research that I found on view at Leipzig.