A linguistic analyser for tagging, lemmatisation and parsing of Scottish Gaelic texts. Morphological and syntactic analyses are available directly from the webpage (through the text area window) or as a web service. A simple tagger option using a restricted tagset is also provided.
LANGUAGE DATA
The tagger was trained with the ARCOSG corpus (https://github.com/Gaelic-Algorithmic-Research-Group/ARCOSG) using Conditional Random Fields with scikit-learn (https://scikit-learn.org). The lemmatiser was build on the top of a lexicon provided by Michael Bauer and Will Robertson (www.faclair.com). The integrated UDPipe parser (http://ufal.mff.cuni.cz/udpipe) was trained with link2 option on Colin Batchelor's UD Gaelic Treebank (https://universaldependencies.org/).
OUTPUT FORMAT
Vertical tabular:
- simple tabbed text for direct html page results,
- simple tabbed text file or conllu file for web service results.
Grammatical information encoded through ARCOSG tagset and UD tagset.
EVALUATION
Full tagger accuracy of 90.7% (measured on about 4.6% of the ARCOSG corpus)
Simple tagger accuracy of 94.7% (measured on about 4.6% of the ARCOSG corpus)
Lemmatisation and Parsing not evaluated yet.