Replication Data for: A panorama of inchoative constructions in Spanish: Cluster analysis as an answer to the near-synonymy puzzle.https://doi.org/10.18710/DR8QKQVan Hulle, SvenDataverseNO2023-03-182023-09-05T11:09:07Z<p>The dataset contains the data for the hierarchical cluster analysis as explained in the article "A panorama of inchoative constructions in Spanish: Cluster analysis as an answer to the near-synonymy puzzle".</p>
<p></p>
<p>The dataset contains the data for the hierarchical cluster analysis as explained in the article "A panorama of inchoative constructions in Spanish: Cluster analysis as an answer to the near-synonymy puzzle". In total, the dataset contains 3955 observations, which are tokens of the inchoative construction for the following auxiliaries: comenzar, empezar, meter, poner, echar(se), liar, arrancar and romper. The data originates from the the Spanish Web corpus (esTenTen18), accessed via Sketch Engine. Only the European Spanish subcorpus was selected. The search syntax that was used to detect the inchoative construction was the following: “[lemma="empezar"] [tag="R.*"]{0,3}"a"[tag="V.*"] within <s/>" (replacing the concrete lemma "empezar" by other lemma's for each auxiliary, see Spinc_queries_20221202.txt for all concrete corpus queries). After downloading samples of 10.000 tokens per auxiliary, the samples were manually cleaned. Only 500 tokens per auxiliary were retained in the dataset. Next, the data were annotated for the infinitive observed after the preposition 'a' and for the semantic class to which this infinitive belongs, following the existing ADESSE classification (see below), besides other criteria that are not taken into account for this study. Concretely, the variables 'INF' (infinitive) and 'Class' were used as input for the hierarchical cluster analysis (see data-specific sections below for more information about the variables).</p>Arts and Humanitiesinchoative constructioncluster analysisconstruction grammarSpanishEnglishVan Hulle, S., & Enghels, R. (2022). "De Spaanse inchoatiefconstructie in beeld. Clusteranalyse als antwoord op het quasi-synonymie vraagstuk", Handelingen - Koninklijke Zuidnederlandse maatschappij voor taal- en letterkunde en geschiedenis, 74, 277-305. doi: 10.21825/kzm.87036, doi, 10.21825/kzm.87036, https://doi.org/10.21825/kzm.870362023-03-18Van Hulle, Sven2023-03-092019-10-012020-10-01corpus data<p>The data contained in this dataset originate from the <a href="https://www.sketchengine.eu/estenten-spanish-corpus/" title="Spanish Web corpus (esTenTen18)" target="_blank">Spanish Web corpus (esTenTen18)</a>, accessed via Sketch Engine. Only the European Spanish subcorpus was selected.</p>
<p></p>
<p>The extracted words that are contained in the data files of this dataset only represent an insignificant part of the corpus, and they do not represent coherent text, only single words. Therefore, the reuse (including redistribution) of these words is permitted by the exceptions rules in IPR and database protection regulations, such as Fair use (USA cf. <a href="https://www.copyright.gov/fair-use/more-info.html" title="Fair use" target="_blank">US Copyright Act</a>), Fair dealing (UK; cf. <a href="https://www.gov.uk/guidance/exceptions-to-copyright" title="Fair dealing" target="_blank">Exceptions to copyright</a>), "sitatretten" (Norway; cf. <a href="https://lovdata.no/lov/2018-06-15-40/§29" title="sitatretten" target="_blank">§ 29 i Åndsverkloven</a>).</p>SpainCC0 1.0