Randomized extraction of the New Norwegian corpus

PID

Randomized extraction of the New Norwegian Corpus (Nynorskkorpuset).

Contains sentences in New Norwegian (Nynorsk) from the year 2000 and after. Tab-separated, one word pr. line, lemmatized and morphologically tagged, year and domain information is given. Annotation is done with the Oslo-Bergen tagger. Sentences in the Bokmål standard have been removed.

This corpus is intended for use in the development of language technology.

Size: 3,3 million sentences, 57,5 million words.

Identifier
PID http://hdl.handle.net/11509/140
Related Identifier http://spraksamlingene.no/
Metadata Access https://repo.clarino.uib.no/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:repo.clarino.uib.no:11509/140
Provenance
Creator Gammeltoft, Peder
Publisher University of Bergen Library
Publication Year 2021
Rights Creative Commons - Attribution 3.0 Unported (CC BY 3.0); http://creativecommons.org/licenses/by/3.0/; CC
OpenAccess true
Contact clarin(at)uib.no
Representation
Language Norwegian Nynorsk; Nynorsk, Norwegian
Resource Type corpus
Format application/zip; text/plain; charset=utf-8; downloadable_files_count: 1
Discipline Linguistics