The second (and final day) of the DHCS Colloquium ran from 9 - 1 and followed a slightly different format than the previous day. Three papers were followed by a panel discussion which was then followed by closing remarks and a thirty-minute near-extempore speech summing up the themes of the conference and asking for what to pursue next year. I'll sum up the important bits (at least to me) first, and then will get a bit shouty at the end. Fair warning.
Today's papers included:
"The Digital Docket: Categorical Feature Analysis and Legal Meme Tracking in the Supreme Court Corpus" (Wayne McIntosh, University of Maryland)
"Contructing a Classifier of Political Opinion: Are We There Yet?" (Daniel Diermeier, Stefan Kaufmann, Bei Yu (Northwestern University)
"Deconstructing Machine Learning" (Mark Olsen, Charles Cooney, ARTFL, University of Chicago; Rebecca Chung, Illinois Institute of Technology; Bradley Pasanek, Pomona College; D. Sculley, Tufts University)
The panel session featured all of the above speakers who riffed on the theme, "Goals and Methods of Text Analysis in Legal, Literary, and Political Domains: What can we learn from each other?"
At the start of the paper sessions, Herr Professor Mueller introduced the topic for today, the discipline of hermeneutics (the study of texts, theological, legal, and literary). The goals of many of the presenters included making data quickly and easily available to researchers which included things like batch-downloading documents retrieved via database query, and data visualization of phrases, lexical patterns and more via visual cues that can then be translated into real language.
One common thread among today's papers and yesterdays included the question, "I don't know what I am looking for, but I do know what I am not looking for." It appears that the new thinking and the new tools which have been produced from that thinking are allowing scholars to perform research that leads in unintended and often wonderful directions. The data will produce patterns that no one has ever seen before, and can then be interpreted by whom ever executed the query that retrieved the data set.
Perhaps the most dominant trend in creating and using text analysis tools is machine learning
. Machine learning is a subset of artificial intelligence where software is fed data that can then be used to form patterns which can then be applied to other data that is introduced into the pool at a later time. Quite important to concordances, machine learning can go beyond that into the realm of identifying and quantifying gender in writing, and even gender within fictitious characters to see how "right" an author was in writing in the gender opposite his/her own, or that an author made specific gender choices when giving characters speech withni the context of a play or novel. One of the goals of machine learning is to take the qualifiable "rightness" of assertions by humanists on the texts they study, and quantify them through mathematical analysis, getting at the "why" of the rightness. We are flirting with creating operational and experimental models that have hypotheses and demonstrable, repeatable conclusions.
Two outstanding examples of machine learning and free, on-line tools for textual analysis are the Nora Project
and, more recently, the MONK project
The panel discussion continued to follow the thread of machine learning and humanities, and sought to understand if humanities research is helping to push the development of machine learning (when compared to what is happening now which is just the opposite). We, as humanists and as social scientists, are looking towards building bridges between the empirical and the interpretive, bringing social intelligence to machine learning, and getting humanities into the quantitative game. "Web 2.0
" tools were briefly mentioned but not discussed, although when I talked to Mark Olsen (ARTFL
) about it later, he was quite keen on applying and enabling "social tagging
" to on-line, published research. The Perseus Project was mentioned as becoming a testbed project for this new kind of social/technical integration, specifically because Classicists are a "wild user community". Olsen means that the community is very vocal and quite critical of work done within Classics, and thus seemed likely to be able to best approve or reject this new direction in peer review of published and ongoing Classics research.
The Word-of-the-Day was "Groundtruth
", which was a new one on me, and apparently seeks to apply the cartographic analogy of "being true in a map to what is actually there on the ground", again seeking officially to quantify what has already been established and accepted qualitatively in the humanities.
The closing speech was given by Ray Siemens of the University of Victoria who also runs a "humanist summer camp" in Canada called the Digital Humanist Summer Institute
. Siemens did sum up the themes of the conference and left us with a call towards "full-spectrum science" meaning that regardless of discipline or belief, we should all be using the same tools towards the goals of obtaining and then sharing knowledge. These tools include enhancing speed, digitizing, encoding, organizing, disambiguating, pattern recognition, and more. "Naivete gets us started," Siemens said, and then called for a need to continue to address human/computer interaction.
With that, the question of "what to do for next year" was pitched to the audience. Several ideas were tossed around ranging from "How have we failed" to "pick an object and describe it from whatever discipline you practice," to "are we using our tools for new research and if not, what new tools do we need." Perhaps the most favorably received panel suggestion was "let's get a panel of people together who aren't using eTools and find out why."
I wanted to raise my hand and speak to two points, but then thought better of it, although I will be forwarding these ideas to the DHCS to consider for the immediate future and for next year:
1) We need to learn or consider strategies for enabling scholars to discover and then use these new tools for their research.
2) Conferences like this one are quite valuable, but only so far as they begin dialogue. This dialogue must be continued, however, but there is no clear space in which to do it. I feel that it is the DHCS' website's responsibility to do the following:
a. Post a public list of all conference attendees and include names, affiliations, and e-mail addresses.
b. Create a social network (here I go again...) that will allow all DHCS participants to, well, continue participating. We shouldn't wait until June or next October to revisit these questions. We need to continue to talk about them, share research, and help each other. Let the DHCS website host this network and become the base for the Academy of Digital Humanism or something.
Well that wasn't so shouty, but it was earlier when I was speaking to a colleague about it. Here's to long runs in the rain!
While teachers of the Classics may not benefit all that much from the information conveyed to DHCS conference-goers, the more general themes of easy-to-use software design, the need for evaluating digital tools to see if they really are helping with research and with understanding texts, and using pattern recognition as a new way of reading old texts are quite important. We will see how the grand Web 2.0 experiment goes with Perseus. Although I think the problems there are still stymied by networking issues which, as I learned first hand in my previous job, scuttle and sabotage even the best of intellectual intentions.
Tomorrow you can expect a distillation of what's next for Classics and textual analysis as it relates to technology and pedagogy.