More wired than a Roman Internet café

Field Report from the Chicago Colloquium for Digital Humanities, Day One

The Chicago Colloquium for Digital Humanities and Computer Science (DHCS) is the brainchild of Dr. Martin Mueller, humanist and luminary, who has midwifed such projects of global importance as the Perseus Project (albeit for English literature as opposed to Classics) and Wordhoard. 2007 marked the second installment of the conference hosted by Northwestern University in the conference center of Evanston, Illinois’, Orrington Hotel. Approximately 100 students, teachers, publishers, technicians, engineers, librarians, archaeologists – visionaries every one of them – stayed glued to their seats as six papers and two keynote addresses were presented on the theme of “Exploring the Scholarly Query Potential of High-Quality Text and Image Archives in a Collaborative Environment”.

Rather than go into every paper in detail, summarizing the highlights, I’ll present the paper titles and subject matter here, and then do my best to extrapolate common themes shared by each of the speakers towards a goal of presenting emerging trends in text analysis, preservation, research, and even reading:

“Countless Links: Qualitative Query Potential in Orlando” (Susan Brown, University of Guelph; Jeffery Antoniuk & Sharon Balazs (University of Alberta)

“Annotation in the Chymistry of Isaac Newton Project” (John Walsh, Indiana University)

“1001 Novels: A User-Contributor Approach to Creating Good Enough Digital Editions from OCR for Scholarly and Pedagogical Work” (Timothy Cole, University of Illinois – Urbana-Champaign; James Chartrand, Russell Archive at McMaster University; Martin Mueller (Northwestern University)

First Keynote: “The Remaking of Reading” (Matthew Kirschenbaum (University of Maryland)

“The Colonial and State Records of North Carolina Project” (Hugh Cayless (University of North Carolina, Chapel Hill)

“InscriptiFact: Looking at Mesopotamian Cylinder Seals in a New Light” (Wayne Pitard, University of Illinois)

“The Shuilu’ an Project: 3,000 Searchable Clay Statuettes from a Small Buddhist Temple near Xian” (Harlan Wallach, Northwestern University)

Second Keynote: “Beyond 2-D Text/Plain: The Chinese Buddhist Canon in 3-D” (Lewis Lancaster)

I believe that transcripts (or at least hand-outs) of each talk will be posted on the conference web site at the conclusion of the meetings. I will post the link here when it becomes available.

Introductory remarks for the first day of papers were given by Sarah Pritchard, Chief Librarian of Northwestern University, and by Martin Mueller, also of Northwestern, immediately setting the tone with these two quotes:

“The library is the lab of the humanists.”

“The digital extends the calculus of the possible.”

When we look at text on-line, we are really looking at a digital “surrogate”, a picture of a picture, a scan of a page of text, XML-encoded documents where the content has been extrapolated from the original medium of communication (be it papyrus or a page of e-mail). With digital tools ranging from imaging to databases, we, as readers of texts, are able to access additional information about those texts (metadata) in support of close reading and analysis. But while these tools do seem to facilitate research, the digital nature of both tools and text is fugitive. With technology shelf-life averaging about five years for software and hardware, we must always be mindful of preserving what we have already produced electronically. While the rewards for using technology are great, the risks are also quite clear, and publishers, libraries, universities, and other institutions (as well as the creators of research and publications) need to be prepared to port their data into contemporary formats, or encode them in such a way as to remain portable into new iterations of software and computer languages.

Stepping away from the future and back to the present, the first series of papers (and the first keynote) revolved around reading texts and contributing to the continued life of old texts, exploring the nature of what is a book, what is reading, and why text analysis matters. With textual analysis, scholars are stepping away from merely producing concordances to using tools that create a “semantic markup”. Realizing that primary texts (currently literary texts) must be interpreted in different ways, many of which are alien to the people producing the software to analyze texts, the Orlando project enables scholars to query on these texts to explore relationships between the texts, their authors, events, and more. Probing contexts is key to understanding the text’s author(s), the events in which might have driven text’s production, as well as unforeseen circumstances, unexpected relationships, and the like. As researchers, we just don’t know what to expect until we start asking questions, and tools like Orlando enable us to find things that we were perhaps not looking for, yet are of immeasurable value to our current work.

Our first new tool is “semantic grouping” in support of “qualitative analysis” of texts.

Our second new tool is “on-line annotation”.

Best exemplified by the ongoing work of Indiana University’s “The Newton Project”, John Walsh and others have created a suite of digital tools that allow researchers the heretofore unknown capability of providing commentary in support of Newton’s treatise on chemistry, specifically alchemy. Original manuscript leaves were scanned in support of the project which were then posted on-line at the Electronic Text Center. The text annotation tool is live and allows scholars to type information in regarding the content of Newton’s manuscript. Two other tools are to become available later in 2007 which include image annotation tools and visualization tools. Image annotation tools allow one to actually draw (as opposed to write) regarding certain elements of Newton’s text. The visualization tools will allow one to include things like a video recording the recreation of a Newtonian alchemical experiment. These tools will all be released as open source, meaning they will be free to use by anyone interested in employing them in their research.

One of the other speakers, Wayne Pitard, also spoke of the embarrassing quality of publications featuring cylinder seals from the Near East. His team resolved to photograph (with specialized equipment) cylinder seals in the Spurlock Museum’s collection, with an eye towards making all of the detail of the writing and imagery pop. In the current state of publication, cylinder seals are ignored photographically in favor of the impression they provide when rolled on wet clay. This completely ignores the information held within the seal itself. By focusing on the artifact, these photographs document the vehicle of transmitting ideas which in turn give archaeologists new ways to interpret or reinterpret the content of those seals. And the on-line tools and results, found on-line at inscriptifact.ncsa.uiuc.edu, are free.

Open-Source, or making something available on-line for free and for the good of humanity (in its purist sense), was a core element of the conference. If we are to enable scholars to complete their research, and that research serves humanistic ends, then those tools should be freely available at any time to any one.

In pursuit of both the open source initiative and producing on-line, digital editions of texts for research, the “1001 Novels” program was launched. It is an opinion generally held that digital editions, in their current state, are not good enough for research purposes. Texts to be offered on-line to be researched must be scanned both quickly and at a high quality and then posted on-line for free. Once these texts are made available to the world, scholars may register for free access to the texts at which point they can begin tagging them and adding annotations. The Open Content Initiative is spearheading projects like this. Practical results can be found at the web site of American Libraries.

A current example of this work is being conducted by the Bertrand Russell Institute which holds approximately 40,000 letters. These have all been scanned in anticipation of creating an electronic, critical edition of these letters. The project leaders created a special software tool that allows even the most technically ignorant researcher the chance to read a letter, indicate different elements of that letter, and add critical commentary to it.

So then, another lesson is to make the technology available to researchers simple to both understand and use, demonstrating its worth and enabling scholars to actually use it to create something of value to the field.

With all text analysis comes the necessity (or sometimes problem) of reading. Many of the speakers (including Muller and the first keynote speaker, Matthew Kirschenbaum) spoke of the value of not-reading. To define not-reading, one must understand that many of the texts that undergo any kind of scholarly review are familiar to that scholar and his/her audience. The Bible, the Iliad, other works of literature, are read, re-read, and internalized which often obfuscates other data and details that would be of value to scholarship yet are overlooked because of over-familiarity. Some tricks to get around reading (and into the Alice-in-Wonderland world of not-reading) are to read books upside-down, or back-to-front, or as a graphic illustration of key words from the text. By forcing oneself to read something familiar in an unfamiliar way, new connections are often made about the piece in question that had been otherwise overlooked.

With texts that are to be read and perhaps not-read, one must consider the media in which the texts are presented. “Bookspace” is a core ingredient to how one reads: there are margins, notes, sentences that carry from one page to the next, thoughts that span several pages, and more. Digitally, we have tried to address the issues of Bookspace by creating an actual, 3-D facsimile of a book, have produced 2-D scans of pages from books, or have divorced the text from the original page altogether. We work within the confines of a page, although there are new initiatives that are challenging the way we think of text, particularly books. For an example of several modern text experiments, visit collection.eliterature.org.

The virtual world of Second Life also made an appearance within the first keynote address. The world’s first virtually authored and published book, Anima, was released earlier in 2007, and can be read by one’s character (avatar) in-world, or can actually be printed as hard-copy should the reader wish it. With a population of over ten million avatars coupled with a strong library presence, Second Life has already become a new information portal which fosters creativity, actually enabling people to build. And one of the most popular prims (or constructions) is a book.

The moral then, to take away from here, is that the nature of books are changing, and that the change is happening now. Our definitions of how books are even subdivided is being driven by on-line tools like Google Books that defines sections as “snippets”, “divs”, “chunks” and the like, as opposed to the more-traditional “chapters”. eBooks are constantly being redefined, and it is up to both author and publisher to decide what constitutes an eBook and how that product can facilitate both reading and research in a digital format.

With all of the new initiatives and vocabulary that seem to appear overnight, every night, one speaker who was responsible for managing an on-line archives project spoke of his “deep distrust of technology.” While on the surface this statement might appear ironic, it is not. Distrust does drive one to work harder to establish a better product, a better piece of software, or a better way of implementing technology in a way that is simple to use and does what it is supposed to do well. Enthusiasm tempered by an understanding of the fact that computer technology is complex should yield highly tested tools and environments for research.

Such tempered enthusiasm was voiced by the second keynote speaker, Lewis Lancaster, who has spent the past twenty years analyzing the tens of millions of characters within the Korean canon of Buddhist literature. With Chinese, each character contains its own metadata and is best observed as living in three dimensions, in what Lancaster calls an “event structure”. Texts are encoded with a who, where, and a when, not to mention the obvious “what”, all of which contribute to better understanding a piece of writing.

Lancaster spoke to the value of pattern recognition as a non-reading tool, looking graphically, visually for words, word-combinations, and appearances within various codices to discover/determine document authorship, as well as specific forms of Chinese Buddhist narrative structure. And by using digital tools to turn off dominant words, a new layer of meaning is then uncovered that might ordinarily have been overlooked. The richness of the text becomes apparent in extra-dimensional space, returning the reader to a state of exploring a well-known text for the very first time.

Lancaster also stressed community in research as being all-important and a goal the delegates to this meeting of the DHCS should both share and work towards. This kind of real-time social networking must carry into on-line, open source work, where, regardless of what we study, we can all contribute to the creation of universal tools for any language and any text in anticipation of producing better comprehension and understanding of how we got here from there over time, and via thought.

How does this apply to Classicists?

All of the above tools and theories are scalable to any humanistic discipline, perhaps chief of which (at least in Western scholarship) in Classics. With these new tools, we can re-assess what we know about Homeric epic, re-interpret the poems of Catullus, and study primary texts (or at least the clay, pottery, or papyrus which approaches the primary) to better understand the literature within a historic, social, and economic context.

All of these tools are free, and many are simple to use. It then becomes the responsibility of Classicists, both students and teachers, both publishers and librarians, to find and then employ these tools to forward Classics scholarship, creating new avenues into close reading and meaning, understanding language, culture, and the minds within that culture that created what we today read and analyze. When incorporated into the curriculum, these tools and theories will follow students into future years of scholarship, forging a more critical eye while encouraging minds that are open to any possibility. This kind of free-thinking, open-source world that we as a species have already entered into can only grow, positively influencing scholarship into what it is supposed to be: an open and free advancement of communal learning based on shared research vetted by peers worldwide. In what approaches academic Utopia, data is no longer proprietary, but is part of the common good in the fact that it enables us to better understand ourselves and our history, one character at a time.

Watch this space for the Day 2 DHCS recap! More tagging to follow as I find time.

Comment by Laura Gibbs on October 22, 2007 at 8:10am: WOW, Andrew, thank you so much for this fascinating read (you made Monday morning FAR more interesting than it usually is).

any chance you will pick up some gossip there about the disaster that has befallen Perseus in the past year? UChicago was supposed to be hosting the next version of Perseus but everywhere the online Perseus seems to be broken, not working, inaccessible, etc. I think for all of us, Perseus represented the great first wave of this kind of activitiy and it has been really sad to watch it teeter into (unplanned?) obsolescence over the past year or so.

Field Report from the Chicago Colloquium for Digital Humanities, Day One

You need to be a member of eLatin eGreek eLearn to add comments!

Badge