Replication Data for: Predicting Russian aspect by frequency across genres (doi:10.18710/BIIGT6)

View:

Part 1: Document Description
Part 2: Study Description
Part 3: Data Files Description
Part 4: Variable Description
Part 5: Other Study-Related Materials
Entire Codebook

(external link)

Document Description
Citation
Title:	Replication Data for: Predicting Russian aspect by frequency across genres
Identification Number:	doi:10.18710/BIIGT6
Distributor:	DataverseNO
Date of Distribution:	2017-12-03
Version:	1
Bibliographic Citation:	Eckhoff, Hanne; Janda, Laura; Lyashevskaya, Olga Nikolayevna, 2017, "Replication Data for: Predicting Russian aspect by frequency across genres", https://doi.org/10.18710/BIIGT6, DataverseNO, V1, UNF:6:TCa0jCAvvGll3zm3uYltvg== [fileUNF]
Study Description
Citation
Title:	Replication Data for: Predicting Russian aspect by frequency across genres
Identification Number:	doi:10.18710/BIIGT6
Authoring Entity:	Eckhoff, Hanne (UiT The Arctic University of Norway)
	Janda, Laura (UiT The Arctic University of Norway)
	Lyashevskaya, Olga Nikolayevna (National Research University Higher School of Economics)
Producer:	UiT The Arctic University of Norway
Distributor:	DataverseNO
Distributor:	The Tromsø Repository of Language and Linguistics (TROLLing)
Access Authority:	Eckhoff, Hanne
Depositor:	Eckhoff, Hanne Martine
Date of Deposit:	2017-03-12
Holdings Information:	https://doi.org/10.18710/BIIGT6
Study Scope
Keywords:	Arts and Humanities, semantics, aspect, correspondence analysis, Russian, verbs, frequency
Abstract:	We ask whether the aspect of individual verbs can be predicted based on the statistical distribution of their inflectional forms and how this is influenced by genre. To address these questions, we present an analysis of the “grammatical profiles” (relative frequency distributions of inflectional forms) of three samples of verbs extracted from the Russian National Corpus, representing three genres: Journalistic prose, Fiction, and Scientific-Technical prose. We find that the aspect of a given verb can be correctly predicted from the distribution of its forms alone with an average accuracy of 92.7%. Remarkably, this accuracy is statistically indistinguishable from the accuracy of prediction of aspect based on morphological marking. We maintain that it would be possible for first language learners to use distributional tendencies, in addition to morphological and other cues (for example semantic and syntactic cues), in acquiring the verbal category of aspect in Russian.
Methodology and Processing
Sources Statement
Data Sources:	Russian National Corpus
Data Access
Other Study Description Materials
Related Publications
Citation
Title:	Eckhoff, Hanne M., et al. “Predicting Russian aspect by frequency across genres.” The Slavic and East European Journal, vol. 61, no. 4, 2017, pp. 844–75. JSTOR, http://www.jstor.org/stable/26633829.
Identification Number:	www.jstor.org/stable/26633829
Bibliographic Citation:	Eckhoff, Hanne M., et al. “Predicting Russian aspect by frequency across genres.” The Slavic and East European Journal, vol. 61, no. 4, 2017, pp. 844–75. JSTOR, http://www.jstor.org/stable/26633829.
File Description--f1436
File: fic50factor1tagged.tab
	Number of cases: 225 No. of variables per record: 8 Type of File: text/tab-separated-values
Notes:	UNF:6:FcLokpcNo9/lqGqnJr8/Fg==
File Description--f1434
File: journ50_factor1tagged.tab
	Number of cases: 185 No. of variables per record: 8 Type of File: text/tab-separated-values
Notes:	UNF:6:i5RDeUNEke185uknffIvNQ==
File Description--f1431
File: rus.fiction.tab
	Number of cases: 78084 No. of variables per record: 1 Type of File: text/tab-separated-values
Notes:	UNF:6:Qvntfkg9Nzf20k7M+Vi4lA==
File Description--f1440
File: rus.journ.tab
	Number of cases: 52716 No. of variables per record: 1 Type of File: text/tab-separated-values
Notes:	UNF:6:rGFY0vp0eA7zN5+s5q19GQ==
File Description--f1435
File: rus.scitech_corrected.tab
	Number of cases: 43528 No. of variables per record: 1 Type of File: text/tab-separated-values
Notes:	UNF:6:7JWXykZeJAzpCsOhSolXpQ==
File Description--f1438
File: scitech50factor1tagged.tab
	Number of cases: 172 No. of variables per record: 8 Type of File: text/tab-separated-values
Notes:	UNF:6:RDO6l2SoPtSoBKdY0RjAEQ==
Variable Description
List of Variables:	lemma - lemma factor1 - factor1 asp - asp freq - freq morph1 - morph1 morph2 - morph2 sem - sem comment - comment lemma - lemma factor1 - factor1 asp - asp freq - freq morph1 - morph1 morph2 - morph2 sem - sem comment - comment FormTranslit;LemmaTranslit;MoodTense;Trans;Voice;VoicePartcp;Person;Number;Gender;Long;AspPair;Aspect;Mood;Tense - FormTranslit;LemmaTranslit;MoodTense;Trans;Voice;VoicePartcp;Person;Number;Gender;Long;AspPair;Aspect;Mood;Tense FormTranslit;LemmaTranslit;MoodTense;Trans;Voice;VoicePartcp;Person;Number;Gender;Long;AspPair;Aspect;Mood;Tense - FormTranslit;LemmaTranslit;MoodTense;Trans;Voice;VoicePartcp;Person;Number;Gender;Long;AspPair;Aspect;Mood;Tense FormTranslit;LemmaTranslit;MoodTense;Trans;Voice;VoicePartcp;Person;Number;Gender;Long;AspPair;Aspect;Mood;Tense - FormTranslit;LemmaTranslit;MoodTense;Trans;Voice;VoicePartcp;Person;Number;Gender;Long;AspPair;Aspect;Mood;Tense lemma - lemma factor1 - factor1 asp - asp freq - freq morph1 - morph1 morph2 - morph2 sem - sem comment - comment
Variables
lemma
f1436 Location:	Variable Format: character Notes: UNF:6:HoWxUNy2PWjvdYWHzOl7mA==
factor1
f1436 Location:	Summary Statistics: Min. -1.89732600670106; Valid 225.0; Mean 0.021867606813530192; Max. 0.681086667741942; StDev 0.4581254248824531; Variable Format: numeric Notes: UNF:6:lgoQd+tZGQUXbw5ma5mnbQ==
asp
f1436 Location:	Variable Format: character Notes: UNF:6:4rtcn5WCDZb6YRIwFHnOGA==
freq
f1436 Location:	Summary Statistics: Min. 50.0; Max. 4808.0; Mean 170.16000000000005; Valid 225.0; StDev 361.5862760116872 Variable Format: numeric Notes: UNF:6:P84VF4ky5m0q3mrmLquyQw==
morph1
f1436 Location:	Variable Format: character Notes: UNF:6:eoMDWnwqhOBUVE0m6sz02w==
morph2
f1436 Location:	Variable Format: character Notes: UNF:6:wyfWAFbhAsHmmHO439tUAw==
sem
f1436 Location:	Variable Format: character Notes: UNF:6:cPBWumPehzhjlw8/kOeimg==
comment
f1436 Location:	Variable Format: character Notes: UNF:6:nn4mszXTUcjzg8WKMW8fSA==
lemma
f1434 Location:	Variable Format: character Notes: UNF:6:qA08d87TxeEzG1VL1HMfdw==
factor1
f1434 Location:	Summary Statistics: Valid 185.0; Min. -1.4306265462582; Mean -0.051142185551501296; StDev 0.5833663244301566; Max. 1.05373275750803 Variable Format: numeric Notes: UNF:6:Ur3Rto8lreR+VjbEqaQlPQ==
asp
f1434 Location:	Variable Format: character Notes: UNF:6:souK4BkC/a/+ANTJYGPojQ==
freq
f1434 Location:	Summary Statistics: Max. 2763.0; StDev 226.8726390721828; Valid 185.0; Min. 50.0; Mean 133.1513513513513 Variable Format: numeric Notes: UNF:6:34q56UPy+THVgL3eG8NF4g==
morph1
f1434 Location:	Variable Format: character Notes: UNF:6:/7N0nE01ywwJVyWDk3e58A==
morph2
f1434 Location:	Variable Format: character Notes: UNF:6:rF6gT6VapA3UoLWbfDF6+w==
sem
f1434 Location:	Variable Format: character Notes: UNF:6:zAucHmAEUBKQXlbm/OiQWw==
comment
f1434 Location:	Variable Format: character Notes: UNF:6:Ry0bWcXYoeVasNxIuvfdCg==
FormTranslit;LemmaTranslit;MoodTense;Trans;Voice;VoicePartcp;Person;Number;Gender;Long;AspPair;Aspect;Mood;Tense
f1431 Location:	Variable Format: character Notes: UNF:6:Qvntfkg9Nzf20k7M+Vi4lA==
FormTranslit;LemmaTranslit;MoodTense;Trans;Voice;VoicePartcp;Person;Number;Gender;Long;AspPair;Aspect;Mood;Tense
f1440 Location:	Variable Format: character Notes: UNF:6:rGFY0vp0eA7zN5+s5q19GQ==
FormTranslit;LemmaTranslit;MoodTense;Trans;Voice;VoicePartcp;Person;Number;Gender;Long;AspPair;Aspect;Mood;Tense
f1435 Location:	Variable Format: character Notes: UNF:6:7JWXykZeJAzpCsOhSolXpQ==
lemma
f1438 Location:	Variable Format: character Notes: UNF:6:gq6d5HyY6z4AmuXNSFKk9w==
factor1
f1438 Location:	Summary Statistics: Max. 0.816256588715481; StDev 0.6612926017933347; Mean -0.06773800473451592; Valid 172.0; Min. -1.62907144120676 Variable Format: numeric Notes: UNF:6:y92Kx7unB59nX0LH5LULpw==
asp
f1438 Location:	Variable Format: character Notes: UNF:6:Q7kbZVw9RK+mIVZLWcZ78A==
freq
f1438 Location:	Summary Statistics: Mean 125.73255813953493; Min. 50.0; Valid 172.0; Max. 1629.0; StDev 160.84956645056826; Variable Format: numeric Notes: UNF:6:Thb1CfmbMiC4O8lcpXfVtQ==
morph1
f1438 Location:	Variable Format: character Notes: UNF:6:x2MshY4GGm6tDAG1nR68IQ==
morph2
f1438 Location:	Variable Format: character Notes: UNF:6:IRDT9xGx3ELGUEaJ5bw+2g==
sem
f1438 Location:	Variable Format: character Notes: UNF:6:Y54lVSCvRiXEtY3YJ1GRng==
comment
f1438 Location:	Variable Format: character Notes: UNF:6:ByMWKGd6JsNaL9pSNROKgg==
Other Study-Related Materials
Label:	00_readme_file.txt
Text:	ReadMe file for dataset, whith description of the individual files.
Notes:	text/plain
Other Study-Related Materials
Label:	01journ.r
Text:	R script that analyses the data from the journalistic register (rus.journ.tab, original format csv).
Notes:	type/x-r-syntax
Other Study-Related Materials
Label:	02fic.r
Text:	R script that analyses the data from the fiction register (rus.fiction.tab, original format csv).
Notes:	type/x-r-syntax
Other Study-Related Materials
Label:	03scitech.r
Text:	R script that analyses the data from the scientific-technical register (rus.scitech_corrected.tab, original format csv).
Notes:	type/x-r-syntax
Other Study-Related Materials
Label:	04verbtags.r
Text:	R script that takes care of the analysis of verbs by derivational morpology and semantics. Uses the three dataset files fic50_factor1tagged.tab, journ50_factor1tagged.tab and scitech50_factor1tagged.tab (original format csv).
Notes:	type/x-r-syntax