Dataset - B2FIND

Corpus of contemporary blogs

In NLP Centre, dividing text into sentences is currently done with a tool which uses rule-based system. In order to make enough training data for machine learning, annotators...
COSTRA 1.0: A Dataset of Complex Sentence Transformations

COSTRA 1.0 is a dataset of Czech complex sentence transformations. The dataset is intended for the study of sentence-level embeddings beyond simple word alternations or standard...

You can also access this registry using the API (see API Docs).

2 datasets found