Persistent Identifier
|
doi:10.18710/GIKMKM |
Publication Date
|
2025-09-02 |
Title
| Replication Data for: The semantic structuring of minimizing constructions in present-day Netherlandic Dutch: a distribution-based cluster analysis |
Author
| Van den Heede, Margot (Ghent University) - ORCID: 0000-0002-1908-1694
Lauwers, Peter (Ghent University) - ORCID: 0000-0002-6495-8977 |
Point of Contact
|
Use email button above to contact.
Van den Heede, Margot (Ghent University) |
Description
| Dataset abstract:
This dataset contains the data files that were used for the cluster analysis of the Dutch minimizing construction, as described in the publication cited below. In addition to a ReadMe file, it contains three files:
- A txt file is provided with the corpus queries that were used to find tokens of the minimizing constructions in the Dutch Web 2014 (nlTenTen14) corpus, available via Sketch Engine (more information about the TenTen corpora: Jakubíček, M., A. Kilgarriff, V. Kovář, P. Rychlý & V. Suchomel (2013). The TenTen corpus family. In: 7th International Corpus Linguistics Conference CL. Lancaster, 125–127).
- A csv file is provided that forms the input file for the cluster analysis. It contains a list of 5,863 minimizer-predicate combinations, more specifically a list of the predicates that are combined with the minimizers that have a token frequency of at least 10 in my dataset.
- An R-script is provided with the code to perform the cluster analysis in R.
(2025-08-28)
Article abstract:
This paper examines the semantic structuring of a paradigm of 89 minimizers, i.e., nouns that reinforce sentential negation in present-day Netherlandic Dutch, such as meter ‘meter’ in voor geen meter vertrouwen ‘not to trust for a meter’. Cosine distances are computed on the basis of the predicates the minimizers combine with in a sample of 100 tokens downloaded from the Dutch Web corpus 2014 (nlTenTen14) and clustered according to the Partitioning Around Medoids (PAM) algorithm into nine semantic clusters. The clusters largely correspond to semantic categories such as taboo terms or units of money. This suggests that, in general, minimizers belonging to the same semantic domain are combined with a similar (core) set of predicates. Based on the shared predicates per cluster, we detect signs of analogical attraction between minimizers or, conversely, competition. Crucially, low silhouette widths enable us to identify outliers in their respective clusters, for instance, minimizing nouns that exhibit signs of context expansion, as shown by their combination with semantically non-harmonious verbs. As such, this paper provides a synchronic snapshot of the semantic processes involved in (incipient) grammaticalization of minimizing nouns and, more in general, it illustrates how distributional semantics offers a heuristic to analyze the structure of a network of comparable micro-constructions. (2025-08-28) |
Subject
| Arts and Humanities |
Keyword
| minimizing constructions
Netherlandic Dutch
cluster analysis
corpus data
Construction Grammar |
Related Publication
| Van den Heede, Margot & Peter Lauwers. (2024). The semantic structuring of minimizing constructions in present-day Netherlandic Dutch: a distribution-based cluster analysis. Nederlandse Taalkunde 29(3), 358–401. https://doi.org/10.5117/nedtaa2024.3.003.heed doi: 10.5117/nedtaa2024.3.003.heed https://doi.org/10.5117/nedtaa2024.3.003.heed |
Language
| English |
Producer
| Ghent University https://www.ugent.be/en |
Funding Information
| Special Research Fund for Concerted Research Actions - Ghent University |
Distributor
| The Tromsø Repository of Language and Linguistics (TROLLing) (TROLLing) https://trolling.uit.no/ |
Depositor
| Van den Heede, Margot |
Deposit Date
| 2025-07-28 |
Date of Collection
| Start Date: 2019-10-01 ; End Date: 2022-12-31 |
Data Type
| corpus data |
Software
| MS Excel
R, Version: 4.4.2
R Studio, Version: 2025.05.1 |
Data Source
| Dutch Web 2014 corpus (nlTenTen14), available via SketchEngine: https://www.sketchengine.eu/nltenten-dutch-corpus/
Insubstantial parts of this source are reused in this dataset under exceptions and limitations to intellectual property protection for databases, such as the Norwegian Copyright Act, the EU Database Directive, UK Copyright and Rights in Databases Regulations and the US Copyright Act. |