eTRAP logoDigital Humanities Hackathon on Text Re-use
"Don't leave your data problems at home!"

27-31 July 2015



Key information

This Hackathon is co-organised by the eTRAP research team:

Schedule

Monday 27th July

10:00-12:00: General introduction. Each participant presents their data and research questions (5mins). Participants with similar research questions, data or problems can then choose to sit and work together. After introductions, participants learn how to use the command line/terminal to run a basic "Hello World!" program. For those who had trouble installing Java (see Technical Instructions below), eTRAP will help fix the problem before the afternoon session.

12:00-13:00: Lunch. Participants can either try the University canteen or visit nearby snack bars. Information about food options will be provided upon arrival at GCDH.

13:00-14:00: Introduction to text reuse. Theory: slide presentation by Marco Büchler.

14:00-14:30: Break. Snacks and beverages provided by eTRAP at GCDH.

14:30-16:30: Hacking - Preprocessing. Practice: in this session participants will preprocess their texts.

16:30-17:00: Buffer time to accommodate any delays or problems encountered during the session.

Tuesday 28th July

9:00-10:30: Hacking - Featuring

11:00-12:30: Hacking - Featuring/Selection

12:30-13:30: Lunch

13:30-16:00: Hacking - Selection

16:00-17:00: Break & buffer time to accommodate problems or delays.

Wednesday 29th July

9:00-10:30: Hacking - Linking

11:00-12:30: Hacking - Scoring

12:30-13:30: Lunch

13:30-16:00: Hacking - Scoring - Complete runs

Thursday 30th July

9:00-10:30: Hacking - Running text reuse algorithms on a big data-set

11:00-12:30: Visualising text reuse with Gnuplot.

12:30-13:30: Lunch

13:30-16:00: Conclusion and presentation of results

Friday 31st July

9:00-10:30: Installation of TRAViz and data import. Tutorial led by Stefan Jänicke.

11:00-12:30: Applying TRAViz to your data

12:30-13:30: Lunch

13:30-15:00: Common discussion while snacking. Discussion on improvements and needs for visualising text reuse.

15:00-16:30: Feedback. Participants are asked to fill in a Google Form providing comments aboout their Hackathon experience.

16:30-17:30: Packaging data for optional publication in ChallengePost.

Technical instructions

As previously announced, here are two technical tasks we ask you to complete in preparation for the Hackathon. The software we'll be using requires some minor installations and your data should meet a certain format.

1. Java installation

The software we'll be using requires that your computers have the Java JDK 8 package installed. You can download Java JDK 8 for your operating system from here.
Installing Java is straightforward. However, should you need more detailed instructions, you can visit:

Please check Java was successfully installed by:
  1. Open your command line or terminal.
  2. Make sure you are at your user directory. You should be in your user directory by default. This is what your terminal window should look like:



    If, for instance, your default directory is 'Downloads', please change the default directory to your home folder by typing "cd ../" (without the quotation marks):



  3. Once you're in your home directory, type "java" (without the quotation marks) and press ENTER. If the command is not found, go to point 5).
  4. Then type "javac" (without the quotation marks) and press ENTER. If the command is not found, this means you have to add another small entry to your user path (point 5) again).
  5. This video shows you what to do if problems occur. As Windows computers are more prone to errors, this video is a Windows guide. Video guides to the same problem are also available for other operating systems in Youtube.
  6. If you receive a long list with parameters, your installation was successful.
  7. To double-check the installation, type "java -version" and the terminal should return something like "1.8.0_45" as the name of your new installed java directory or the .exe you ran to install it.
And for those who want to go the extra mile:
  1. You can compile and run a first Java program (programming is NOT! the purpose of the Hackathon).
  2. Copy the file to your user directory.
  3. In the command line interface, within your user directory type: "javac Test1.java" (without the exclamation points). Confirm by Enter.
  4. Then type "java Test1" (without the exclamation points). Confirm by Enter.

Please email us if you have trouble getting your Java running.

Data preparation: Participants are NOT required to bring own data, but should he/she wish to, it must follow the following requirements:

The data has to be in a flat-file format without any markup or style (= .txt format). An ordinary UTF-8 or UTF-16 editor should be able to read your data. Each sentence (by ‘sentence' we mean text snippets that end with a full-stop) should appear in a separate line. Start each line with a unique numeric ID, followed by one tab and then by the sentence itself, as follows:

1\tabThis is my first sentence.
2\tabThis is my second sentence.

Make sure that the tab is an actual tab, NOT one ore more white spaces. If you want to run global queries on your corpus, please make sure all your data is in a single file. If you want to do independent re-use discovery, you can split your data between files accordingly. An identifier can be globally unique, but HAS to be unique within one file.

If sentence segmentation does not work for your data, you can alternatively segment by manuscript line, verse or paragraph.

*** FOR WINDOWS USERS ONLY
Please install a UTF-8 editor such as PSPad or EditPad Lite. The latter is better suited for large files.
***

Need help?

Is anything unclear or are you having troubles? Please send all your comments and concerns to the eTRAP team at:

etrap-applications(at)gcdh(dot)de