Persistent Identifier
|
doi:10.18710/Y7VGQE |
Publication Date
|
2024-09-17 |
Title
| Replication Data for: Understanding ‘many’ through the lens of Ukrainian багато |
Author
| Janda, Laura Alexis (UIT Norges arktiske universitet) - ORCID: 0000-0001-5047-1909 |
Point of Contact
|
Use email button above to contact.
Janda, Laura Alexis (UIT Norges arktiske universitet) |
Description
| Dataset description:
The General Regionally Annotated Corpus of Ukrainian (GRAC, Shvedova et al. 2017-2024, uacorpus.org) was consulted to collect data for further analysis concerning the distribution of Singular vs. Plural verb forms in the target bahato construction. GRAC is a Sketch Engine corpus of over 1.8 billion words, representing texts from over 30,000 authors created between 1816 and 2023. This corpus is designed to serve as source material for linguistic research on Standard Ukrainian. Our data was collected during the month of February 2024. We extracted and annotated 28,491 examples of the bahato construction.
An additional set of examples was collected from the Russian National Corpus (ruscorpora.ru) during the month of August 2024 to provide comparison with the Russian mnogo construction. For this purpose, 6,612 examples were extracted and annotated for word order and Singular vs. Plural verb agreement.
Both the Ukrainian and the Russian data are included in this dataset, along with the R scripts used to analyze this data. (2024-05-20)
Article abstract: We reveal an ongoing language change in Ukrainian involving a construction with a subject comprised of the indefinite quantifier багато ‘many’ modifying a noun phrase in the Genitive Plural. Number agreement on the verb varies, allowing both Singular (in 69.1% of attestations) and Plural (in 30.9% of attestations). Based on statistical analysis of corpus data, we investigate the influence of the factors of year of creation, word order of subject and verb, and animacy of the subject on the choice of verb number. We find that, while all combinations of word order and animacy are robustly attested, VS word order and inanimate subjects tend to prefer Singular, whereas SV word order and animate subjects tend to prefer Plural. Since about the 1950s, the proportion of Plural has been increasing, overtaking Singular in the current decade. We propose that this Singular vs. Plural variation is motivated by the human embodied experience of construing a group of items as either a homogeneous mass (and therefore Singular) or a multiplicity of individuals (and therefore Plural). This proposal is supported by the identification of micro-constructions that prefer Singular and show reduced individuation of human beings. (2024) |
Subject
| Arts and Humanities |
Keyword
| Ukrainian
indefinite quantifier
number agreement
language change
corpus data |
Related Publication
| Janda, Laura A. and Yuliia Palii. “Understanding ‘many’ through the lens of Ukrainian багато”. To appear in the journal Russian Linguistics. https://doi.org/10.1007/s11185-024-09301-7 |
Language
| English |
Producer
| UiT The Arctic University of Norway |
Production Date
| 2024 |
Contributor
| Researcher : Palii, Yuliia |
Distributor
| The Tromsø Repository of Language and Linguistics (TROLLing) (TROLLing) https://trolling.uit.no/ |
Distribution Date
| 2024-09-13 |
Depositor
| Janda, Laura Alexis |
Deposit Date
| 2024-05-20 |
Time Period
| Start Date: 1742 ; End Date: 2023 |
Date of Collection
| Start Date: 2024 ; End Date: 2024 |
Data Type
| corpus data |
Software
| R, Version: 4.4.1 (2024-06-14) -- "Race for Your Life" |
Data Source
| The first part of the data in this Dataset originates from the following source: Maria Shvedova, Ruprecht von Waldenfels, Sergey Yarygin, Andriy Rysin, Vasyl Starko, Tymofij Nikolajenko et al. (2017-2024): GRAC: General Regionally Annotated Corpus of Ukrainian. Electronic resource: Kyiv, Lviv, Jena. Available at uacorpus.org.
The extracted words that are contained in this dataset only represent non-substantial portions of the GRAC corpus. Therefore, the reuse (including redistribution) of these excerpts is permitted by the exceptions rules in IPR and database protection regulations, such as Fair use (USA cf. US Copyright Act), Fair dealing (UK; cf. Exceptions to copyright), the EU Database Directive (cf. article 8 Rights and obligations of lawful users), "lover, forskrifter, rettsavgjørelser og andre vedtak av offentlig myndighet" (Norway; cf. § 14 in Åndsverkloven), "uvesentlige deler av databaser" (Norway; cf. § 24 in Åndsverkloven), "sitatretten" (Norway; cf. § 29 in Åndsverkloven).
The second part of the data in this Dataset consists of examples retrieved via corpus searches carried out in the Russian National Corpus (RNC), available at ruscorpora.ru. RNC was used under the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International (CC BY-NC-ND 4.0) license (https://creativecommons.org/licenses/by-nc-nd/4.0/).
The Dataset only contains unsubstantial parts of the RNC. Therefore, the reuse (including redistribution) of these data is permitted by the exceptions rules in IPR and database protection regulations, such as Fair use (USA cf. US Copyright Act), Fair dealing (UK; cf. Exceptions to copyright), the EU Database Directive (cf. article 8 Rights and obligations of lawful users), "lover, forskrifter, rettsavgjørelser og andre vedtak av offentlig myndighet" (Norway; cf. § 14 in Åndsverkloven), "uvesentlige deler av databaser" (Norway; cf. § 24 in Åndsverkloven), "sitatretten" (Norway; cf. § 29 in Åndsverkloven). As these excerpts do not represent substantial parts of the reused sources, the redistribution of these excerpts is according to Creative Commons (CC) also permitted if they are extracted from sources that are distributed under Creative Commons licenses (cf. question "Do I always have to comply with the license terms? If not, what are the exceptions?" in the Creative Commons Frequently Asked Questions). |