4 datasets found

Keywords: under resourced language

Filter Results
  • AlbMoRe Movie Reviews in Albanian

    AlbMoRe is a sentiment analysis corpus of movie reviews in Albanian, consisting of 800 records in CSV format. Each record includes a text review retrieved from IMDb and...
  • OdiEnCorp 2.0

    Data We have collected English-Odia parallel data for the purposes of NLP research of the Odia language. The data for the parallel corpus was extracted from existing parallel...
  • Oromo web corpus

    Oromo web corpus. Crawled by SpiderLing in January 2016. Encoded in UTF-8, cleaned, deduplicated.
  • Amharic Web Corpus

    Amharic web corpus. Crawled by SpiderLing in August 2013 and October 2015 and January 2016. Encoded in UTF-8, cleaned, deduplicated. Tagged by TreeTagger trained on Amharic WIC...
You can also access this registry using the API (see API Docs).