10:00-12:00: General introduction. Each participant presents their data and research questions (5mins). Participants with similar research questions, data or problems can then choose to sit and work together. After introductions, participants learn how to use the command line/terminal to run a basic "Hello World!" program. For those who had trouble installing Java (see Technical Instructions below), eTRAP will help fix the problem before the afternoon session.
12:00-13:00: Lunch. Participants can either try the University canteen or visit nearby snack bars. Information about food options will be provided upon arrival at GCDH.
13:00-14:00: Introduction to text reuse. Theory: slide presentation by Marco Büchler.
14:00-14:30: Break. Snacks and beverages provided by eTRAP at GCDH.
14:30-16:30: Hacking - Preprocessing. Practice: in this session participants will preprocess their texts.
16:30-17:00: Buffer time to accommodate any delays or problems encountered during the session.
9:00-10:30: Hacking - Featuring
11:00-12:30: Hacking - Featuring/Selection
12:30-13:30: Lunch
13:30-16:00: Hacking - Selection
16:00-17:00: Break & buffer time to accommodate problems or delays.
9:00-10:30: Hacking - Linking
11:00-12:30: Hacking - Scoring
12:30-13:30: Lunch
13:30-16:00: Hacking - Scoring - Complete runs
9:00-10:30: Hacking - Running text reuse algorithms on a big data-set
11:00-12:30: Visualising text reuse with Gnuplot.
12:30-13:30: Lunch
13:30-16:00: Conclusion and presentation of results
9:00-10:30: Installation of TRAViz and data import. Tutorial led by Stefan Jänicke.
11:00-12:30: Applying TRAViz to your data
12:30-13:30: Lunch
13:30-15:00: Common discussion while snacking. Discussion on improvements and needs for visualising text reuse.
15:00-16:30: Feedback. Participants are asked to fill in a Google Form providing comments aboout their Hackathon experience.
16:30-17:30: Packaging data for optional publication in ChallengePost.
As previously announced, here are two technical tasks we ask you to complete in preparation for the Hackathon. The software we'll be using requires some minor installations and your data should meet a certain format.
1. Java installation
The software we'll be using requires that your computers have the Java JDK 8 package installed. You can download Java JDK 8 for your operating system from here.
Installing Java is straightforward. However, should you need more detailed instructions, you can visit:
Please email us if you have trouble getting your Java running.
Data preparation: Participants are NOT required to bring own data, but should he/she wish to, it must follow the following requirements:
The data has to be in a flat-file format without any markup or style (= .txt format). An ordinary UTF-8 or UTF-16 editor should be able to read your data. Each sentence (by ‘sentence' we mean text snippets that end with a full-stop) should appear in a separate line. Start each line with a unique numeric ID, followed by one tab and then by the sentence itself, as follows:
1\tabThis is my first sentence.
2\tabThis is my second sentence.
…
Make sure that the tab is an actual tab, NOT one ore more white spaces. If you want to run global queries on your corpus, please make sure all your data is in a single file. If you want to do independent re-use discovery, you can split your data between files accordingly. An identifier can be globally unique, but HAS to be unique within one file.
If sentence segmentation does not work for your data, you can alternatively segment by manuscript line, verse or paragraph.
***
FOR WINDOWS USERS ONLY
Please install a UTF-8 editor such as PSPad or EditPad Lite. The latter is better suited for large files.
***
Is anything unclear or are you having troubles? Please send all your comments and concerns to the eTRAP team at:
etrap-applications(at)gcdh(dot)de