Author Archives: Greta Franzini

TRACER tutorial, Rome 2017

Photo of a fingerprintWe’re very pleased to announce that eTRAP will be giving a text reuse tutorial in collaboration with DiXiT at the annual conference of the Italian Association for Digital Humanities (AIUCD) in Rome, Italy, this coming January!

The tutorial will run on 23rd and 24th January at the Sapienza University in Rome.

The tutorial builds on eTRAP’s research activities, most of which deploy our TRACER machine. TRACER is a suite of algorithms aimed at investigating text reuse in multifarious corpora, be those prose, poetry, in Italian, Latin, Ancient Greek or medieval German. TRACER provides researchers with statistical information about the texts under investigation and its integrated reuse visualiser, TRAViz, displays the reuses in a more readable format for further study.

This tutorial seeks to teach participants to independently understand, use and run TRACER. For the purpose of the tutorial and to ensure the smoothest possible outcome, participants will initially be working on an English data-set provided by eTRAP. Depending on the overall progress, we may also allocate some time to investigating the participants’ own data-sets, provided these comply with the TRACER format. A detailed description of the tutorial can be DOWNLOADED HERE.

The workshop will be conducted in English, with assistance in Italian should it be necessary. For more information about previous editions of this tutorial, visit our Events page.

Eligibility, Requirements and Bursaries

If you’re interested in exploring text reuse between two or multiple texts (in the same language) and would like to learn how to do it semi-automatically, then this tutorial is for you. In order to provide everyone with adequate (technical) assistance, the workshop can only accommodate 12 participants. To apply to the tutorial, please send a short CV and a brief motivation letter to contact(at)etrap(dot)eu by 16th December 2016. Those accepted will have to register for the AIUCD conference at https://www.conftool.net/aiucd2017/

La Sapienza University makes available travel bursaries for early career researchers, who submit an abstract to the EADH day. Should you be eligible for the bursary and wish to attend our tutorial, you must submit both an abstract to EADH and a CV with motivation letter to eTRAP. You may also apply for the tutorial without an EADH submission but you will not be eligible for a bursary in that case.

We look forward to seeing you in beautiful Rome!

Transkribus: A User Report

Melina Jander, an eTRAP Research Assistant, has written a short user report on our experience with the Handwritten Text Recognition (HTR) tool Transkribus. We’re currently using Transkribus as part of our pilot project TrAIN (Tracing Authorship In Noise), which aims at defining the noise-threshold that affects computational analyses on HTR’d and OCR’d texts . How much noise do we have to correct? How much can we leave in?

The report describes progress made thus far. A second report will be published in 2017 to report on Transkribus‘ automation process on our data.

You can download the report from our Output page.

2016-11-04 Update: The Transkribus website advertises our user report here.

Book chapter: A Catalogue of Digital Editions

Book coverThe latest Digital Humanities monograph published by Open Book Publishers, Digital Scholarly Editing: Theories and Practices, includes a chapter written by Greta together with Melissa Terras and Simon Mahony from the UCL Centre for Digital Humanities. The chapter is entitled A Catalogue of Digital Editions and reports on an homonymous ongoing project that collects and analyses digital editions in an attempt to identify best practice in digital scholarly editing.

As an Open Access publication, you can download the entire volume for free!

Greta will be presenting the Catalogue of Digital Editions project in the form of a poster at the upcoming Text Encoding Initiative conference in Vienna.

ADHO Award 2016

Just like the happy ending of a fairy tale: the Franzini sisters win an ADHO Bursary Award of 2016 for research on the Grimm brothers!

Once upon a time there were two sisters, Greta and Emily. The sisters lived in an old building called Heyne Haus in the small German town of Göttingen. As Digital Humanities elves they busied themselves to find something that would keep them occupied indoors during the cold winter of 2015. And like the Grimm brothers who lived in that very same town two hundred years before them, they started, together with a group of loyal companions, collecting stories and motifs. Fairy tale motifs to be precise…

The Digital Breadcrumbs of Brothers Grimm project, which began in October 2015, is collecting and automatically detecting folktale motifs as text reuse units or minimal primitives. In fact, with this project Emily, Greta and their colleagues (constituting the early career research group eTRAP) are addressing two specific challenges of their field: text reuse detection at scale and cross-lingual text reuse detection. While they’ve already experimented and have shown good results for the former (download the DH 2016 slides and poster from their website here, the latter challenge is still ongoing as they’re in the process of manually collecting the necessary data in order to train TRACER, a text reuse detection engine comprising 700 different algorithms, and other software, to detect motifs across multiple languages. For this challenge, they’ve selected three Grimm fairy tales to work with: Snow White, Puss in Boots and The Fisherman and his Wife. To push their research forward, the sisters and their team find and read as many versions of these tales as they can (original versions, not translations, both predating and following the Grimm collection), collect motifs therein and add them to a matrix that maps languages against one another. In the second stage of the project this multilingual dataset will be used by the software tools to automatically find other matches at web scale. Furthermore, the dataset will be integrated with existing ontological resources and will lead to an exploration of folktale motifs as Linked Open Data. With the Digital Breadcrumbs of Brothers Grimm project, Emily, Greta and their team are not only able to advance research in automatic text reuse detection, but can also support folklorists and literary scholars in tackling the large amount of folkloristic materials now available online.

For their ideas and work the Franzini sisters have been awarded the ADHO Bursary Award of 2016. The award is given to promising young scholars of the Digital Humanities who make a new valuable contribution to the field. The ceremony took place during this year’s Digital Humanities Conference in Kraków. Greta has been exploring this area of study since 2009 when she began a Master’s in Digital Humanities at King’s College London (KCL). She is now at the end of a PhD in Digital Humanities at University College London (UCLDH), while working as a Research Associate in the Institute of Computer Science at Göttingen University. Emily was introduced later to the field, when she was hired as a Researcher at the Humboldt Chair of Digital Humanities in Leipzig in 2013 and later began working as a Research Associate at the Institute of Computer Science in Göttingen.

Photo collage.

AIUCD 2016, Venice

Photo of a fingerprintWe’re very pleased to announce that eTRAP will be giving a text reuse tutorial at the annual conference of the Italian Association for Digital Humanities in Venice, Italy, this coming September! It’s the only tutorial of the conference and it will run on 6th and 7th September at the Ca’ Foscari University.

The tutorial builds on eTRAP’s research activities, most of which deploy Marco Büchler’s TRACER tool. TRACER is a suite of algorithms aimed at investigating text reuse in multifarious corpora, be those prose, poetry, in Italian or medieval German. TRACER provides researchers with statistical information about the texts under investigation and its integrated reuse visualiser, TRAViz, displays the reuses in a more readable format for further study.

This tutorial seeks to teach participants to independently understand, use and run TRACER. For the purpose of the tutorial and to ensure the smoothest possible outcome, participants will initially be working on data-sets provided by eTRAP. Depending on the overall progress, we may also allocate some time to investigating the participants’ own data-sets, provided these comply with the TRACER format1.

The workshop will be conducted in English. An Italian version of the tutorial flyer is available here. For more information about previous editions of this tutorial, visit our Events page.

Eligibility & Requirements

If you’re interested in exploring text reuse between two or multiple texts (in the same language) and would like to learn how to do it semi-automatically, then this tutorial is for you. In order to provide everyone with adequate (technical) assistance, the workshop can only accommodate 12 participants. To apply to the tutorial, please send your CV and a motivation letter to etrap-applications(at)gcdh(dot)de by July 31th, 2016. Those accepted will have to register for the AIUCD conference.

We look forward to seeing you in Venice!


1Should you be interested in investigating your own texts, please send us an email to the address above so that we can send you the requirements.

Article: Visual Text Analysis in Digital Humanities

Greta’s latest article “Visual Text Analysis in Digital Humanities“, co-authored with Stefan Jänicke, Muhammad Faisal Cheema and Gerik Scheuermann, has just been published by the Computer Graphics Forum! Here is the abstract:

In 2005, Franco Moretti introduced Distant Reading to analyze entire literary text collections. This was a rather revolutionary idea compared to the traditional Close Reading, which focuses on the thorough interpretation of an individual work. Both reading techniques are the prior means of Visual Text Analysis. We present an overview of the research conducted since 2005 on supporting text analysis tasks with close and distant reading visualizations in the digital humanities. Therefore, we classify the observed papers according to a taxonomy of text analysis tasks, categorize applied close and distant reading techniques to support the investigation of these tasks and illustrate approaches that combine both reading techniques in order to provide a multi-faceted view of the textual data. In addition, we take a look at the used text sources and at the typical data transformation steps required for the proposed visualizations. Finally, we summarize collaboration experiences when developing visualizations for close and distant reading, and we give an outlook on future challenges in that research area.

Feedback welcome!

DH2016, Kraków

Photo of fingerprinteTRAP will be attending the annual Digital Humanities Conference in Kraków, Poland, with three contributions:

  • A joint panel with colleagues from Finland and Estonia on digital folkloristics (15th July);
  • A poster on research progress concerning our Digital Breadcrumbs of Brothers Grimm project (13th July);
  • A full-day text reuse tutorial aimed at teaching participants how to run our TRACER tool (11th July).

We’re very happy that all three proposals were accepted at the conference as each approaches our research from a different angle: during the panel, we will discuss our work in relation to other initiatives in digital folkloristics; the poster will provide a snapshot of the project as a whole; and the tutorial will give an insight into part of our research methodology, which employs a powerful text reuse engine called TRACER.

We look forward to sharing our progress in Kraków and to seeing you, hopefully!

 

Grant awarded!

We are very pleased to announce that eTRAP has been awarded a 20,000€ grant from the University of Göttingen for a six-month pilot project. The project, TrAiN (Tracing Authorship in Noise), seeks to investigate the complex relation between noisy OCR’d data and automatic text analyses. In particular, we will investigate and attempt to define the maximum noise threshold that will allow us to adequately conduct authorship and text reuse analyses on a number of texts selected for this study. Our research questions: at which point does OCR/HTR noise interfere with the automatic identification of stable linguistic and stylistic markers? What is the minimum amount of noise we need to correct?

The project includes a joint research workshop with stylometry experts to optimise existing algorithms, and to exchange ideas and knowledge.

Congratulations, team!

Project Co-PIs: Marco Büchler, Greta Franzini, Emily Franzini, Gabriela Rotari, Maria Moritz.

English translations of Pan Tadeusz: a comparison with TRACER

As announced in late summer 2015, eTRAP ran a text reuse workshop in Tartu, Estonia, to teach participants how to run TRACER, a text reuse tool developed by Marco aimed at automatically identifying similarities between texts. Some of our participants tested TRACER on sample data we provided (English translations of the Bible); others, like Jan Rybicki, Assistant Professor at the Institute of English Studies at the Jagiellonian University of Kraków and co-organiser of Digital Humanities 2016, brought their own datasets to directly experiment with ongoing research.

Jan has been working with seven English translations of Poland’s most significant Romantic epic poem, Pan Tadeusz by Adam Mickiewicz (1834). As an expert literary translator himself,  Jan was interested in comparing these translations and to see whether TRACER could reveal any particular relationships between their authors. The translations he analysed are:

  • Maude Ashurst Biggs, Master Thaddeus or the Last Foray in Lithuania, London 1885 (in Miltonian blank verse)
  • George Rapall Noyes, Pan Tadeusz, or the Last Foray in Lithuania. A Story of Life among Polish Gentlefolk, London & Toronto, New York 1917 (prose)
  • Watson Kirkconnell, Sir Thaddeus or Last Foray in Lithuania: a History of the Nobility in the Years 1811 and 1812 in Twelve Books of Verse, 1962 (verse, based on Noyes)
  • Kenneth R. Mackenzie, Pan Tadeusz or the Last Foray in Lithuania, a Tale of the Gentry in Years 1811 and 1812, London 1964 (iambic pentameter)
  • Marcel Weyland, Pan Tadeusz or the Last Foray in Lithuania, a Tale of the Gentry During 1811 – 1812, Blackheath, NSW 2004 (verse)
  • Leonard Kress, Pan Tadeusz or the Last Foray in Lithuania: a History of the Nobility in the Years 1811 and 1812 in Twelve Books of Verse, Philadelphia 2006 (10 syllables with 5 stresses, with alternating rhymes)
  • Christopher Adam Zakrzewski, Pan Tadeusz or the Last Foray in Lithuania: A Tale of the Minor Nobility in the Years 1811–1812, New York 2010 (prose)

After an automatic lemmatisation all of the above texts, TRACER confirmed existing knowledge surrounding these texts but also provided a detailed overview of the degree of similarity between each pair of translations using its integrated TRAViz tool. Among other things, the fact that Kirkconnell based his verse translation on Noyes’ prose is very visible! Distant reading by TRACER also confirms that Kress’ translation differs from the others.

Jan also produced a more general view of the degrees of similarity between text pairs derived from TRACER with a Gephi network analysis (below).

Photo of Gephi network
Gephi network of Pan Tadeusz TRACER scores (by Jan Rybicki).

Jan’s experiments with English translations of Polish literature demonstrate the potential of TRACER for translation studies. We’re delighted to see this application of TRACER and look forward to hearing more about Jan’s research!

If you’d also like to run TRACER on your data, please contact Marco Büchler. We’d love to learn more about your research and to briefly describe your experience in a blogpost.

Cookie seminar: “How I became infected with the Indiana Jones virus”

On Thursday 12th November Marco will be giving a talk at the Göttingen Computer Science Cookie Seminar series entitled “Digital Humanities for Computer Scientists … or: How I became infected with the Indiana Jones virus”. Here the abstract of his talk:

Many definitions have been formulated to describe the Digital Humanities, driven either by political interests or born out of one’s own approach to it. This cookie talk describes my understanding of the Digital Humanities as an IT person and aims to show what computer scientists can contribute to our cultural heritage. The talk summarises several applications and developments that have been designed by my teams and me since 2008.

Coordinates:

  • Location: Institute for Computer Science, Goldschmidtstraße 7, 37077 Göttingen, seminar room 0.101
  • Time: November 12th, 2015, 8 PM
  • Link: Cookie seminar

 

Photo of computer
Indiana Jones on Atari 520ST. Source: Flickr. (CC BY 2.0, no changes made).