Tag Archives: Text Reuse

ADHO Award 2016

Just like the happy ending of a fairy tale: the Franzini sisters win an ADHO Bursary Award of 2016 for research on the Grimm brothers!

Once upon a time there were two sisters, Greta and Emily. The sisters lived in an old building called Heyne Haus in the small German town of Göttingen. As Digital Humanities elves they busied themselves to find something that would keep them occupied indoors during the cold winter of 2015. And like the Grimm brothers who lived in that very same town two hundred years before them, they started, together with a group of loyal companions, collecting stories and motifs. Fairy tale motifs to be precise…

The Digital Breadcrumbs of Brothers Grimm project, which began in October 2015, is collecting and automatically detecting folktale motifs as text reuse units or minimal primitives. In fact, with this project Emily, Greta and their colleagues (constituting the early career research group eTRAP) are addressing two specific challenges of their field: text reuse detection at scale and cross-lingual text reuse detection. While they’ve already experimented and have shown good results for the former (download the DH 2016 slides and poster from their website here, the latter challenge is still ongoing as they’re in the process of manually collecting the necessary data in order to train TRACER, a text reuse detection engine comprising 700 different algorithms, and other software, to detect motifs across multiple languages. For this challenge, they’ve selected three Grimm fairy tales to work with: Snow White, Puss in Boots and The Fisherman and his Wife. To push their research forward, the sisters and their team find and read as many versions of these tales as they can (original versions, not translations, both predating and following the Grimm collection), collect motifs therein and add them to a matrix that maps languages against one another. In the second stage of the project this multilingual dataset will be used by the software tools to automatically find other matches at web scale. Furthermore, the dataset will be integrated with existing ontological resources and will lead to an exploration of folktale motifs as Linked Open Data. With the Digital Breadcrumbs of Brothers Grimm project, Emily, Greta and their team are not only able to advance research in automatic text reuse detection, but can also support folklorists and literary scholars in tackling the large amount of folkloristic materials now available online.

For their ideas and work the Franzini sisters have been awarded the ADHO Bursary Award of 2016. The award is given to promising young scholars of the Digital Humanities who make a new valuable contribution to the field. The ceremony took place during this year’s Digital Humanities Conference in Kraków. Greta has been exploring this area of study since 2009 when she began a Master’s in Digital Humanities at King’s College London (KCL). She is now at the end of a PhD in Digital Humanities at University College London (UCLDH), while working as a Research Associate in the Institute of Computer Science at Göttingen University. Emily was introduced later to the field, when she was hired as a Researcher at the Humboldt Chair of Digital Humanities in Leipzig in 2013 and later began working as a Research Associate at the Institute of Computer Science in Göttingen.

Photo collage.

AIUCD 2016, Venice

Photo of a fingerprintWe’re very pleased to announce that eTRAP will be giving a text reuse tutorial at the annual conference of the Italian Association for Digital Humanities in Venice, Italy, this coming September! It’s the only tutorial of the conference and it will run on 6th and 7th September at the Ca’ Foscari University.

The tutorial builds on eTRAP’s research activities, most of which deploy Marco Büchler’s TRACER tool. TRACER is a suite of algorithms aimed at investigating text reuse in multifarious corpora, be those prose, poetry, in Italian or medieval German. TRACER provides researchers with statistical information about the texts under investigation and its integrated reuse visualiser, TRAViz, displays the reuses in a more readable format for further study.

This tutorial seeks to teach participants to independently understand, use and run TRACER. For the purpose of the tutorial and to ensure the smoothest possible outcome, participants will initially be working on data-sets provided by eTRAP. Depending on the overall progress, we may also allocate some time to investigating the participants’ own data-sets, provided these comply with the TRACER format1.

The workshop will be conducted in English. An Italian version of the tutorial flyer is available here. For more information about previous editions of this tutorial, visit our Events page.

Eligibility & Requirements

If you’re interested in exploring text reuse between two or multiple texts (in the same language) and would like to learn how to do it semi-automatically, then this tutorial is for you. In order to provide everyone with adequate (technical) assistance, the workshop can only accommodate 12 participants. To apply to the tutorial, please send your CV and a motivation letter to etrap-applications(at)gcdh(dot)de by July 31th, 2016. Those accepted will have to register for the AIUCD conference.

We look forward to seeing you in Venice!


1Should you be interested in investigating your own texts, please send us an email to the address above so that we can send you the requirements.

Current open paid positions for Student Assistants

Two Transcribers wanted!

(Targeted at students of German Literature or other Humanities subjects)

The early career research group eTRAP is looking for Student Assistants. The research group is associated with the Institute of Computer Science and operates from the Göttingen Centre for Digital Humanities (GCDH). Further information about the research group and its work can be found at https://www.etrap.eu.

Job description
We are looking for applicants interested in joining the research group on TrAIN, a new project which was recently awarded the sum of €20,000 by the University of Göttingen. TrAIN, which stands for Tracing Authorship in Noise, will run for the duration of six months from 1st June 2016. The aim of the project is to obtain digital and searchable copies of the original correspondence of the Grimm brothers – the famous authors of the Kinder- und Hausmärchen. The digital copies will be obtained in two different ways, namely by the use of an HTR (Handwritten Text Recognition) tool and multiple OCR (Optical Character Recognition) tools. The output of such work will then be used to further research in the fields of stylometry and authorship attribution.
We are hiring 2 students for the duration of 3 months (extendable contract) who will act as the transcribers of the team. They will work with Transkribus, an HTR tool used to transcribe handwritten texts.

Continue reading

DH2016, Kraków

Photo of fingerprinteTRAP will be attending the annual Digital Humanities Conference in Kraków, Poland, with three contributions:

  • A joint panel with colleagues from Finland and Estonia on digital folkloristics (15th July);
  • A poster on research progress concerning our Digital Breadcrumbs of Brothers Grimm project (13th July);
  • A full-day text reuse tutorial aimed at teaching participants how to run our TRACER tool (11th July).

We’re very happy that all three proposals were accepted at the conference as each approaches our research from a different angle: during the panel, we will discuss our work in relation to other initiatives in digital folkloristics; the poster will provide a snapshot of the project as a whole; and the tutorial will give an insight into part of our research methodology, which employs a powerful text reuse engine called TRACER.

We look forward to sharing our progress in Kraków and to seeing you, hopefully!

 

Grant awarded!

We are very pleased to announce that eTRAP has been awarded a 20,000€ grant from the University of Göttingen for a six-month pilot project. The project, TrAiN (Tracing Authorship in Noise), seeks to investigate the complex relation between noisy OCR’d data and automatic text analyses. In particular, we will investigate and attempt to define the maximum noise threshold that will allow us to adequately conduct authorship and text reuse analyses on a number of texts selected for this study. Our research questions: at which point does OCR/HTR noise interfere with the automatic identification of stable linguistic and stylistic markers? What is the minimum amount of noise we need to correct?

The project includes a joint research workshop with stylometry experts to optimise existing algorithms, and to exchange ideas and knowledge.

Congratulations, team!

Project Co-PIs: Marco Büchler, Greta Franzini, Emily Franzini, Gabriela Rotari, Maria Moritz.

JDMDH Special Issue: Call for Contribution

JDMDH Call for Contribution: Special Issue on Computer-Aided Processing of Intertextuality in Ancient Languages

Europe’s future is digital”. This was the headline of a speech given at the Hannover exhibition in April 2015 by Günther Oettinger, EU-Commissioner for Digital Economy and Society. While businesses and industries have already made major advances in digital ecosystems, the digital transformation of texts stretching over a period of more than two millennia is far from complete. On the one hand, mass digitisation leads to an „information overload“ of digitally available data; on the other, the “information poverty” embodied by the loss of books and the fragmentary state of ancient texts form an incomplete and biased view of our past. In a digital ecosystem, this coexistence of data overload and poverty adds considerable complexity to scholarly research.

Continue reading

DH Estonia 2015: attend the eTRAP Workshop!

We will be giving a workshop on Text Reuse at the
Translingual and Transcultural Digital Humanities Conference. Estonia, 19-21 October 2015!!
Day of workshop: Wednesday 21 October 2015

Find the full announcement below:
—————————————————————————————————————————————–
estonia-656787_640

eTRAP (Electronic Text Reuse Acquisition Project) is an Early Career Research Group funded by the German Federal Ministry of Education and Research (BMBF) and based at the Göttingen Centre for Digital Humanities at the University of Göttingen. The research group, which started on 1st March 2015, was awarded €1.6 million and runs for four years. As the name suggests, this interdisciplinary team studies the linguistic and literary phenomenon that is text reuse with a particular focus on historical languages. More specifically, we look at how ancient authors copied, alluded to, paraphrased and translated each other as they spread their knowledge in writing. This early career research group seeks to provide a basic understanding of the (historical) text reuse methodology (it being distinct from plagiarism), and so to study what defines text reuse, why some people reuse information, how text is reused and how this practice has changed over history.

Continue reading

Hackathon on Text Re-Use

Digital Humanities Hackathon on Text Re-Use

‘Don’t leave your data problems at home!’

27-31 July, 2015

Computer cartoon

Hosted by the Göttingen Centre for Digital Humanities (GCDH), Georg-August-Universität Göttingen, Germany
Organised by:  Franzini, Greta Franzini and Maria Moritz

The Göttingen Centre for Digital Humanities will host a Hackathon targeted at students and researchers with agermany-652967_1280 humanities background who wish to improve their computer skills by working with their own data-set. Rather than teaching everything there is to know about algorithms, the Hackathon will assist participants with their specific data-related problem, so that they can take away the knowledge needed to tackle the issue(s) at hand. The focus of this Hackathon is automatic text re-use detection and aims at engaging participants in intensive collaboration. Participants will be introduced to technologies representing the state of the art in the field and shown the potential of text re-use detection. Participants will also be able to equip themselves with the necessary knowledge to make sense of the output generated by algorithms detecting text re-use, and will gain an understanding of which algorithms best fit certain types of textual data. Finally, participants will be introduced to some text re-use visualisations.

Click here for further information on text re-use.

Continue reading