10.18710/GSV27M
Gerstenberger, Ciprian-Virgil(UiT The Arctic University of Norway)
Romanian Weak Pronoun Choice Data
DataverseNO
2014
The following corpus study shows that soft linguistic constraints are hard to describe and operationalize. In specific contexts, some Romanian clitic pronouns allow a choice between phonological hosts such as in că-mi dai cartea vs. că îmi dai cartea both meaning [that you give me the book]. What determines the choice between subjunction că in că-mi and prosthetic î in îmi (cf. Lombard 1976)? Popescu (2003, p. 160) argues for speech rate as surface realization trigger (monosyllabic că-mi in fast speech vs. bisyllabic că îmi in normal speech), while Dindelegan (2013, p. 388) argues for register rules (informal că-mi vs. formal că îmi). This means that formal, written language represents one extreme of a formality scale while informal, spoken language the other. Thus, a Romanian corpus of official documents, such as legal texts, is expected to contain only or significantly many forms with prosthetic î for constellations with otherwise optional variants. To test these two hypotheses, the Romanian part of the JRC-Acquis corpus (http://ec.europa.eu/dgs/jrc/) has been tagged with the RACAI tagger (http: //www.racai.ro). The resulting corpus of 62,650,821 tokens (including punctuation) has been evaluated wrt. the phenomena under scrutiny. Taking into account specific hosts, enclitic forms have been compared with their î-prosthetic counterparts. The numbers show almost no or statistically insignificant difference in usage for some specific host+clitic pairs (e.g., 3886 să îşi vs. 3852 să-şi [that to himself/ herself ], 200 ce îi vs. 110 ce-i [what to him/her]). From a usage-based perspective, these findings are clear arguments both against the register rules purported by D indelegan (2013) and against a pure speech rate hypothesis as in Popescu (2003). Since the JRC-Acquis corpus is translated from English by different translators, perhaps both native and non-native speakers of Romanian, a further corpus of original Romanian legal texts is being compiled for further analysis and comparison.
Gerstenberger, Ciprian-Virgil(UiT The Arctic University of Norway)UiT The Arctic University of Norway