View: |
Part 1: Document Description
|
Citation |
|
---|---|
Title: |
Replication Data for: Predicting Russian aspect by frequency across genres |
Identification Number: |
doi:10.18710/BIIGT6 |
Distributor: |
DataverseNO |
Date of Distribution: |
2017-12-03 |
Version: |
1 |
Bibliographic Citation: |
Eckhoff, Hanne; Janda, Laura; Lyashevskaya, Olga Nikolayevna, 2017, "Replication Data for: Predicting Russian aspect by frequency across genres", https://doi.org/10.18710/BIIGT6, DataverseNO, V1, UNF:6:TCa0jCAvvGll3zm3uYltvg== [fileUNF] |
Citation |
|
Title: |
Replication Data for: Predicting Russian aspect by frequency across genres |
Identification Number: |
doi:10.18710/BIIGT6 |
Authoring Entity: |
Eckhoff, Hanne (UiT The Arctic University of Norway) |
Janda, Laura (UiT The Arctic University of Norway) |
|
Lyashevskaya, Olga Nikolayevna (National Research University Higher School of Economics) |
|
Producer: |
UiT The Arctic University of Norway |
Distributor: |
DataverseNO |
Distributor: |
The Tromsø Repository of Language and Linguistics (TROLLing) |
Access Authority: |
Eckhoff, Hanne |
Depositor: |
Eckhoff, Hanne Martine |
Date of Deposit: |
2017-03-12 |
Holdings Information: |
https://doi.org/10.18710/BIIGT6 |
Study Scope |
|
Keywords: |
Arts and Humanities, semantics, aspect, correspondence analysis, Russian, verbs, frequency |
Abstract: |
We ask whether the aspect of individual verbs can be predicted based on the statistical distribution of their inflectional forms and how this is influenced by genre. To address these questions, we present an analysis of the “grammatical profiles” (relative frequency distributions of inflectional forms) of three samples of verbs extracted from the Russian National Corpus, representing three genres: Journalistic prose, Fiction, and Scientific-Technical prose. We find that the aspect of a given verb can be correctly predicted from the distribution of its forms alone with an average accuracy of 92.7%. Remarkably, this accuracy is statistically indistinguishable from the accuracy of prediction of aspect based on morphological marking. We maintain that it would be possible for first language learners to use distributional tendencies, in addition to morphological and other cues (for example semantic and syntactic cues), in acquiring the verbal category of aspect in Russian. |
Methodology and Processing |
|
Sources Statement |
|
Data Sources: |
Russian National Corpus |
Data Access |
|
Other Study Description Materials |
|
Related Publications |
|
Citation |
|
Title: |
Eckhoff, Hanne M., et al. “Predicting Russian aspect by frequency across genres.” The Slavic and East European Journal, vol. 61, no. 4, 2017, pp. 844–75. JSTOR, http://www.jstor.org/stable/26633829. |
Identification Number: |
www.jstor.org/stable/26633829 |
Bibliographic Citation: |
Eckhoff, Hanne M., et al. “Predicting Russian aspect by frequency across genres.” The Slavic and East European Journal, vol. 61, no. 4, 2017, pp. 844–75. JSTOR, http://www.jstor.org/stable/26633829. |
File Description--f1436 |
|
File: fic50factor1tagged.tab |
|
|
|
Notes: |
UNF:6:FcLokpcNo9/lqGqnJr8/Fg== |
File Description--f1434 |
|
File: journ50_factor1tagged.tab |
|
|
|
Notes: |
UNF:6:i5RDeUNEke185uknffIvNQ== |
File Description--f1431 |
|
File: rus.fiction.tab |
|
|
|
Notes: |
UNF:6:Qvntfkg9Nzf20k7M+Vi4lA== |
File Description--f1440 |
|
File: rus.journ.tab |
|
|
|
Notes: |
UNF:6:rGFY0vp0eA7zN5+s5q19GQ== |
File Description--f1435 |
|
File: rus.scitech_corrected.tab |
|
|
|
Notes: |
UNF:6:7JWXykZeJAzpCsOhSolXpQ== |
File Description--f1438 |
|
File: scitech50factor1tagged.tab |
|
|
|
Notes: |
UNF:6:RDO6l2SoPtSoBKdY0RjAEQ== |
List of Variables: |
|
Variables |
|
f1436 Location: |
Variable Format: character Notes: UNF:6:HoWxUNy2PWjvdYWHzOl7mA== |
f1436 Location: |
Summary Statistics: Min. -1.89732600670106; Valid 225.0; Mean 0.021867606813530192; Max. 0.681086667741942; StDev 0.4581254248824531; Variable Format: numeric Notes: UNF:6:lgoQd+tZGQUXbw5ma5mnbQ== |
f1436 Location: |
Variable Format: character Notes: UNF:6:4rtcn5WCDZb6YRIwFHnOGA== |
f1436 Location: |
Summary Statistics: Min. 50.0; Max. 4808.0; Mean 170.16000000000005; Valid 225.0; StDev 361.5862760116872 Variable Format: numeric Notes: UNF:6:P84VF4ky5m0q3mrmLquyQw== |
f1436 Location: |
Variable Format: character Notes: UNF:6:eoMDWnwqhOBUVE0m6sz02w== |
f1436 Location: |
Variable Format: character Notes: UNF:6:wyfWAFbhAsHmmHO439tUAw== |
f1436 Location: |
Variable Format: character Notes: UNF:6:cPBWumPehzhjlw8/kOeimg== |
f1436 Location: |
Variable Format: character Notes: UNF:6:nn4mszXTUcjzg8WKMW8fSA== |
f1434 Location: |
Variable Format: character Notes: UNF:6:qA08d87TxeEzG1VL1HMfdw== |
f1434 Location: |
Summary Statistics: Valid 185.0; Min. -1.4306265462582; Mean -0.051142185551501296; StDev 0.5833663244301566; Max. 1.05373275750803 Variable Format: numeric Notes: UNF:6:Ur3Rto8lreR+VjbEqaQlPQ== |
f1434 Location: |
Variable Format: character Notes: UNF:6:souK4BkC/a/+ANTJYGPojQ== |
f1434 Location: |
Summary Statistics: Max. 2763.0; StDev 226.8726390721828; Valid 185.0; Min. 50.0; Mean 133.1513513513513 Variable Format: numeric Notes: UNF:6:34q56UPy+THVgL3eG8NF4g== |
f1434 Location: |
Variable Format: character Notes: UNF:6:/7N0nE01ywwJVyWDk3e58A== |
f1434 Location: |
Variable Format: character Notes: UNF:6:rF6gT6VapA3UoLWbfDF6+w== |
f1434 Location: |
Variable Format: character Notes: UNF:6:zAucHmAEUBKQXlbm/OiQWw== |
f1434 Location: |
Variable Format: character Notes: UNF:6:Ry0bWcXYoeVasNxIuvfdCg== |
FormTranslit;LemmaTranslit;MoodTense;Trans;Voice;VoicePartcp;Person;Number;Gender;Long;AspPair;Aspect;Mood;Tense |
|
f1431 Location: |
Variable Format: character Notes: UNF:6:Qvntfkg9Nzf20k7M+Vi4lA== |
FormTranslit;LemmaTranslit;MoodTense;Trans;Voice;VoicePartcp;Person;Number;Gender;Long;AspPair;Aspect;Mood;Tense |
|
f1440 Location: |
Variable Format: character Notes: UNF:6:rGFY0vp0eA7zN5+s5q19GQ== |
FormTranslit;LemmaTranslit;MoodTense;Trans;Voice;VoicePartcp;Person;Number;Gender;Long;AspPair;Aspect;Mood;Tense |
|
f1435 Location: |
Variable Format: character Notes: UNF:6:7JWXykZeJAzpCsOhSolXpQ== |
f1438 Location: |
Variable Format: character Notes: UNF:6:gq6d5HyY6z4AmuXNSFKk9w== |
f1438 Location: |
Summary Statistics: Max. 0.816256588715481; StDev 0.6612926017933347; Mean -0.06773800473451592; Valid 172.0; Min. -1.62907144120676 Variable Format: numeric Notes: UNF:6:y92Kx7unB59nX0LH5LULpw== |
f1438 Location: |
Variable Format: character Notes: UNF:6:Q7kbZVw9RK+mIVZLWcZ78A== |
f1438 Location: |
Summary Statistics: Mean 125.73255813953493; Min. 50.0; Valid 172.0; Max. 1629.0; StDev 160.84956645056826; Variable Format: numeric Notes: UNF:6:Thb1CfmbMiC4O8lcpXfVtQ== |
f1438 Location: |
Variable Format: character Notes: UNF:6:x2MshY4GGm6tDAG1nR68IQ== |
f1438 Location: |
Variable Format: character Notes: UNF:6:IRDT9xGx3ELGUEaJ5bw+2g== |
f1438 Location: |
Variable Format: character Notes: UNF:6:Y54lVSCvRiXEtY3YJ1GRng== |
f1438 Location: |
Variable Format: character Notes: UNF:6:ByMWKGd6JsNaL9pSNROKgg== |
Label: |
00_readme_file.txt |
Text: |
ReadMe file for dataset, whith description of the individual files. |
Notes: |
text/plain |
Label: |
01journ.r |
Text: |
R script that analyses the data from the journalistic register (rus.journ.tab, original format csv). |
Notes: |
type/x-r-syntax |
Label: |
02fic.r |
Text: |
R script that analyses the data from the fiction register (rus.fiction.tab, original format csv). |
Notes: |
type/x-r-syntax |
Label: |
03scitech.r |
Text: |
R script that analyses the data from the scientific-technical register (rus.scitech_corrected.tab, original format csv). |
Notes: |
type/x-r-syntax |
Label: |
04verbtags.r |
Text: |
R script that takes care of the analysis of verbs by derivational morpology and semantics. Uses the three dataset files fic50_factor1tagged.tab, journ50_factor1tagged.tab and scitech50_factor1tagged.tab (original format csv). |
Notes: |
type/x-r-syntax |