Replication Data for: Predicting Russian aspect by frequency across genres (doi:10.18710/BIIGT6)

View:

Part 1: Document Description
Part 2: Study Description
Part 3: Data Files Description
Part 4: Variable Description
Part 5: Other Study-Related Materials
Entire Codebook

(external link)

Document Description

Citation

Title:

Replication Data for: Predicting Russian aspect by frequency across genres

Identification Number:

doi:10.18710/BIIGT6

Distributor:

DataverseNO

Date of Distribution:

2017-12-03

Version:

1

Bibliographic Citation:

Eckhoff, Hanne; Janda, Laura; Lyashevskaya, Olga Nikolayevna, 2017, "Replication Data for: Predicting Russian aspect by frequency across genres", https://doi.org/10.18710/BIIGT6, DataverseNO, V1, UNF:6:TCa0jCAvvGll3zm3uYltvg== [fileUNF]

Study Description

Citation

Title:

Replication Data for: Predicting Russian aspect by frequency across genres

Identification Number:

doi:10.18710/BIIGT6

Authoring Entity:

Eckhoff, Hanne (UiT The Arctic University of Norway)

Janda, Laura (UiT The Arctic University of Norway)

Lyashevskaya, Olga Nikolayevna (National Research University Higher School of Economics)

Producer:

UiT The Arctic University of Norway

Distributor:

DataverseNO

Distributor:

The Tromsø Repository of Language and Linguistics (TROLLing)

Access Authority:

Eckhoff, Hanne

Depositor:

Eckhoff, Hanne Martine

Date of Deposit:

2017-03-12

Holdings Information:

https://doi.org/10.18710/BIIGT6

Study Scope

Keywords:

Arts and Humanities, semantics, aspect, correspondence analysis, Russian, verbs, frequency

Abstract:

We ask whether the aspect of individual verbs can be predicted based on the statistical distribution of their inflectional forms and how this is influenced by genre. To address these questions, we present an analysis of the “grammatical profiles” (relative frequency distributions of inflectional forms) of three samples of verbs extracted from the Russian National Corpus, representing three genres: Journalistic prose, Fiction, and Scientific-Technical prose. We find that the aspect of a given verb can be correctly predicted from the distribution of its forms alone with an average accuracy of 92.7%. Remarkably, this accuracy is statistically indistinguishable from the accuracy of prediction of aspect based on morphological marking. We maintain that it would be possible for first language learners to use distributional tendencies, in addition to morphological and other cues (for example semantic and syntactic cues), in acquiring the verbal category of aspect in Russian.

Methodology and Processing

Sources Statement

Data Sources:

Russian National Corpus

Data Access

Other Study Description Materials

Related Publications

Citation

Title:

Eckhoff, Hanne M., et al. “Predicting Russian aspect by frequency across genres.” The Slavic and East European Journal, vol. 61, no. 4, 2017, pp. 844–75. JSTOR, http://www.jstor.org/stable/26633829.

Identification Number:

www.jstor.org/stable/26633829

Bibliographic Citation:

Eckhoff, Hanne M., et al. “Predicting Russian aspect by frequency across genres.” The Slavic and East European Journal, vol. 61, no. 4, 2017, pp. 844–75. JSTOR, http://www.jstor.org/stable/26633829.

File Description--f1436

File: fic50factor1tagged.tab

  • Number of cases: 225

  • No. of variables per record: 8

  • Type of File: text/tab-separated-values

Notes:

UNF:6:FcLokpcNo9/lqGqnJr8/Fg==

File Description--f1434

File: journ50_factor1tagged.tab

  • Number of cases: 185

  • No. of variables per record: 8

  • Type of File: text/tab-separated-values

Notes:

UNF:6:i5RDeUNEke185uknffIvNQ==

File Description--f1431

File: rus.fiction.tab

  • Number of cases: 78084

  • No. of variables per record: 1

  • Type of File: text/tab-separated-values

Notes:

UNF:6:Qvntfkg9Nzf20k7M+Vi4lA==

File Description--f1440

File: rus.journ.tab

  • Number of cases: 52716

  • No. of variables per record: 1

  • Type of File: text/tab-separated-values

Notes:

UNF:6:rGFY0vp0eA7zN5+s5q19GQ==

File Description--f1435

File: rus.scitech_corrected.tab

  • Number of cases: 43528

  • No. of variables per record: 1

  • Type of File: text/tab-separated-values

Notes:

UNF:6:7JWXykZeJAzpCsOhSolXpQ==

File Description--f1438

File: scitech50factor1tagged.tab

  • Number of cases: 172

  • No. of variables per record: 8

  • Type of File: text/tab-separated-values

Notes:

UNF:6:RDO6l2SoPtSoBKdY0RjAEQ==

Variable Description

List of Variables:

Variables

lemma

f1436 Location:

Variable Format: character

Notes: UNF:6:HoWxUNy2PWjvdYWHzOl7mA==

factor1

f1436 Location:

Summary Statistics: Min. -1.89732600670106; Valid 225.0; Mean 0.021867606813530192; Max. 0.681086667741942; StDev 0.4581254248824531;

Variable Format: numeric

Notes: UNF:6:lgoQd+tZGQUXbw5ma5mnbQ==

asp

f1436 Location:

Variable Format: character

Notes: UNF:6:4rtcn5WCDZb6YRIwFHnOGA==

freq

f1436 Location:

Summary Statistics: Min. 50.0; Max. 4808.0; Mean 170.16000000000005; Valid 225.0; StDev 361.5862760116872

Variable Format: numeric

Notes: UNF:6:P84VF4ky5m0q3mrmLquyQw==

morph1

f1436 Location:

Variable Format: character

Notes: UNF:6:eoMDWnwqhOBUVE0m6sz02w==

morph2

f1436 Location:

Variable Format: character

Notes: UNF:6:wyfWAFbhAsHmmHO439tUAw==

sem

f1436 Location:

Variable Format: character

Notes: UNF:6:cPBWumPehzhjlw8/kOeimg==

comment

f1436 Location:

Variable Format: character

Notes: UNF:6:nn4mszXTUcjzg8WKMW8fSA==

lemma

f1434 Location:

Variable Format: character

Notes: UNF:6:qA08d87TxeEzG1VL1HMfdw==

factor1

f1434 Location:

Summary Statistics: Valid 185.0; Min. -1.4306265462582; Mean -0.051142185551501296; StDev 0.5833663244301566; Max. 1.05373275750803

Variable Format: numeric

Notes: UNF:6:Ur3Rto8lreR+VjbEqaQlPQ==

asp

f1434 Location:

Variable Format: character

Notes: UNF:6:souK4BkC/a/+ANTJYGPojQ==

freq

f1434 Location:

Summary Statistics: Max. 2763.0; StDev 226.8726390721828; Valid 185.0; Min. 50.0; Mean 133.1513513513513

Variable Format: numeric

Notes: UNF:6:34q56UPy+THVgL3eG8NF4g==

morph1

f1434 Location:

Variable Format: character

Notes: UNF:6:/7N0nE01ywwJVyWDk3e58A==

morph2

f1434 Location:

Variable Format: character

Notes: UNF:6:rF6gT6VapA3UoLWbfDF6+w==

sem

f1434 Location:

Variable Format: character

Notes: UNF:6:zAucHmAEUBKQXlbm/OiQWw==

comment

f1434 Location:

Variable Format: character

Notes: UNF:6:Ry0bWcXYoeVasNxIuvfdCg==

FormTranslit;LemmaTranslit;MoodTense;Trans;Voice;VoicePartcp;Person;Number;Gender;Long;AspPair;Aspect;Mood;Tense

f1431 Location:

Variable Format: character

Notes: UNF:6:Qvntfkg9Nzf20k7M+Vi4lA==

FormTranslit;LemmaTranslit;MoodTense;Trans;Voice;VoicePartcp;Person;Number;Gender;Long;AspPair;Aspect;Mood;Tense

f1440 Location:

Variable Format: character

Notes: UNF:6:rGFY0vp0eA7zN5+s5q19GQ==

FormTranslit;LemmaTranslit;MoodTense;Trans;Voice;VoicePartcp;Person;Number;Gender;Long;AspPair;Aspect;Mood;Tense

f1435 Location:

Variable Format: character

Notes: UNF:6:7JWXykZeJAzpCsOhSolXpQ==

lemma

f1438 Location:

Variable Format: character

Notes: UNF:6:gq6d5HyY6z4AmuXNSFKk9w==

factor1

f1438 Location:

Summary Statistics: Max. 0.816256588715481; StDev 0.6612926017933347; Mean -0.06773800473451592; Valid 172.0; Min. -1.62907144120676

Variable Format: numeric

Notes: UNF:6:y92Kx7unB59nX0LH5LULpw==

asp

f1438 Location:

Variable Format: character

Notes: UNF:6:Q7kbZVw9RK+mIVZLWcZ78A==

freq

f1438 Location:

Summary Statistics: Mean 125.73255813953493; Min. 50.0; Valid 172.0; Max. 1629.0; StDev 160.84956645056826;

Variable Format: numeric

Notes: UNF:6:Thb1CfmbMiC4O8lcpXfVtQ==

morph1

f1438 Location:

Variable Format: character

Notes: UNF:6:x2MshY4GGm6tDAG1nR68IQ==

morph2

f1438 Location:

Variable Format: character

Notes: UNF:6:IRDT9xGx3ELGUEaJ5bw+2g==

sem

f1438 Location:

Variable Format: character

Notes: UNF:6:Y54lVSCvRiXEtY3YJ1GRng==

comment

f1438 Location:

Variable Format: character

Notes: UNF:6:ByMWKGd6JsNaL9pSNROKgg==

Other Study-Related Materials

Label:

00_readme_file.txt

Text:

ReadMe file for dataset, whith description of the individual files.

Notes:

text/plain

Other Study-Related Materials

Label:

01journ.r

Text:

R script that analyses the data from the journalistic register (rus.journ.tab, original format csv).

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

02fic.r

Text:

R script that analyses the data from the fiction register (rus.fiction.tab, original format csv).

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

03scitech.r

Text:

R script that analyses the data from the scientific-technical register (rus.scitech_corrected.tab, original format csv).

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

04verbtags.r

Text:

R script that takes care of the analysis of verbs by derivational morpology and semantics. Uses the three dataset files fic50_factor1tagged.tab, journ50_factor1tagged.tab and scitech50_factor1tagged.tab (original format csv).

Notes:

type/x-r-syntax