Research

These are eTRAP’s research projects. For more information about the projects, follow the links provided.

Digital Breadcrumbs of Brothers GrimmTracing Authorship In Noise (TrAIN)TRACER

The DBBG project seeks to advance research in cross-lingual text reuse detection and in text reuse detection at scale. The case study chosen to do so are collections of fairy tales from different cultures and languages. Within them, we identify text reuse units or primitives, the motifs, and manually compile a motif matrix that collects training data and maps languages to one another for further processing. By collecting multilingual tales we can train our software (TRACER and others) to automatically detect motifs at scale and across languages, thus contributing both humanistic and computational experience to the rapidly evolving field of Computational Folkloristics. For more information about the project, click here.

Trainlogo TrAIN (the acronym refers to the algorithm training required for this research) seeks to investigate the complex relation between digitised noisy data and automatic text analyses by way of a local case study: the vast letter collection of the Grimm Brothers, available as both digital images of the original manuscripts and as a printed edition (see below). In particular, we will compare the outputs produced by the HTR of the original letters and by the OCR of the printed edition, and investigate two common scholarly tasks: text reuse detection and authorship attribution. For more information about TrAIN, click here.

TRACER is a suite of 700 algorithms, whose features can be combined to create the optimal formula for detecting those words, sentences and ideas that have been reused across texts. TRACER is designed to facilitate research in text reuse detection and many have made use of it to identify plagiarism in a text, as well as verbatim and near verbatim quotations, paraphrasing and even allusions. The thousands of feature combinations that TRACER supports allow to investigate not only contemporary texts, but also complex historical texts where reuse is harder to spot. For more information about TRACER, click here.