EnTam: An English-Tamil Parallel Corpus (EnTam v2.0)

PID

EnTam is a sentence aligned English-Tamil bilingual corpus from some of the publicly available websites that we have collected for NLP research involving Tamil. The standard set of processing has been applied on the the raw web data before the data became available in sentence aligned English-Tamil parallel corpus suitable for various NLP tasks. The parallel corpus includes texts from bible, cinema and news domains.

Identifier
PID http://hdl.handle.net/11234/1-1454
Related Identifier http://ufal.mff.cuni.cz/~ramasamy/parallel/html/
Metadata Access http://lindat.mff.cuni.cz/repository/oai/request?verb=GetRecord&metadataPrefix=oai_dc&identifier=oai:lindat.mff.cuni.cz:11234/1-1454
Provenance
Creator Ramasamy, Loganathan; Bojar, Ondřej; Žabokrtský, Zdeněk
Publisher Charles University, Faculty of Mathematics and Physics, Institute of Formal and Applied Linguistics (UFAL)
Publication Year 2014
Rights Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0); http://creativecommons.org/licenses/by-nc-sa/3.0/; PUB
OpenAccess true
Contact lindat-help(at)ufal.mff.cuni.cz
Representation
Language English; Tamil
Resource Type corpus
Format application/x-gzip; text/plain; charset=utf-8; downloadable_files_count: 1
Discipline Linguistics