Does corpus size matter? Revisiting ENPC case studies with an extended version of the corpus

Signe Oksefjell Ebeling


The validity of contrastive findings that base themselves on material from small parallel corpora may be questioned, and ever since the compilation of the English-Norwegian Parallel Corpus (ENPC) and English-Swedish Parallel Corpus (ESPC) some 20 years ago we have been aware of this. Recently, the ENPC has been expanded into the ENPC+, holding bidirectional translation data three times the size of the fiction part of the original ENPC. Drawing on material from the ENPC+, this paper replicates three contrastive studies made on the basis of the fiction part of the original ENPC to explore to what extent corpus size matters. The replica studies suggest that individual style, genre and date of publication are variables that may have a greater impact on the results than mere corpus size.

Full Text: PDF