Orthography-based dating and localisation of Middle Dutch charters

In this study we build models for the localisation and dating of Middle Dutch charters. First, we extract character trigrams and use these to train a machine learner (K Nearest Neighbours) and an author verification algorithm (Linguistic Profiling). Both approaches work quite well, especially for the localisation task. Afterwards, an attempt is made to derive features that capture the orthographic variation between the charters more precisely. These are then used as input for the earlier tested classification algorithms. Again good results (at least as good as using the trigrams) are attained, even though proper nouns were ignored during the feature extraction. We can conclude that the localisation, and to a lesser extent the dating, is feasible. Moreover, the orthographic features we derive from the charters are an efficient basis for such a classification task.

One file (PDF) contains the text of the master thesis, the other file (.tar.gz) contains all the used data sets and analysis scripts.

DOI https://doi.org/10.23728/b2share.b1092be3cd4844e0bffd7b669521ba3c
PID http://hdl.handle.net/11304/3720bb44-831c-48f3-9847-6988a41236e1
Source https://b2share.eudat.eu/api/records/b1092be3cd4844e0bffd7b669521ba3c
Metadata Access https://b2share.eudat.eu/api/oai2d?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:b2share.eudat.eu:b2rec/b1092be3cd4844e0bffd7b669521ba3c
Creator Dieter Van Uytvanck
Publisher CLARIN
Publication Year 2017
Rights info:eu-repo/semantics/openAccess
OpenAccess true
Language English
Resource Type Text
Discipline Linguistics