Persistent Identifier
|
doi:10.18710/QAJKZW |
Publication Date
|
2018-10-30 |
Title
| Multi-Dimensional Analysis of Czech |
Author
| Cvrček, Václav (Czech National Corpus) - ORCID: 0000-0003-3977-2393 |
Point of Contact
|
Use email button above to contact.
Lukeš, David (Czech National Corpus) |
Description
| Original data for a general-purpose multi-dimensional analysis model of register variation in Czech.
This post contains a CSV data set of 137 linguistic features measured on 3428 Czech text chunks, and an R script which performs a factor analysis on this data set. The results of this factor analysis were used as a basis for an 8-dimensional model of register variation in Czech (see Related Publications), following the methodology introduced by Douglas Biber (see e.g. his 1988 seminal work Variation Across Speech and Writing for details on the methodology, or his 2014 article “Using multi-dimensional analysis to explore cross-linguistic universals of register variation” for a review of MDA results across a variety of languages).
The data is derived from the Koditex corpus , which aims to be as diversified as possible, covering various forms of spoken and written (both print and on-line) Czech. In compiling this corpus, the purpose was to provide a solid empirical basis for a comprehensive general-purpose model of register variation in Czech.
Apart from this data set and related publications, additional resources pertaining to the project are available via the czcorpus/mda GitHub repository. (2018-10-12) |
Subject
| Arts and Humanities |
Keyword
| multi-dimensional analysis
register variation
factor analysis
corpus
Czech |
Related Publication
| Cvrček, V., Komrsková, Z., Lukeš, D., Poukarová, P., Řehořková, A., & Zasina, A. J. (2018). From extra- to intratextual characteristics: Charting the space of variation in Czech through MDA. Corpus Linguistics and Linguistic Theory. doi: 10.1515/cllt-2018-0020 https://doi.org/10.1515/cllt-2018-0020
Cvrček, V., Komrsková, Z., Lukeš, D., Poukarová, P., Řehořková, A., & Zasina, A. J. (forthcoming). Variabilita češtiny: multidimenzionální analýza. Slovo a slovesnost. |
Language
| English |
Producer
| Czech National Corpus (CNC) https://korpus.cz 
|
Production Date
| 2018-10-12 |
Production Location
| Prague, Czech Republic |
Contributor
| Project Leader : Cvrček, Václav
Project Member : Komrsková, Zuzana
Project Member : Lukeš, David
Project Member : Poukarová, Petra
Project Member : Řehořková, Anna
Project Member : Zasina, Adrian Jan |
Funding Information
| European Regional Development Fund: CZ.02.1.01/0.0/0.0/16_013/0001758 |
Distributor
| The Tromsø Repository of Language and Linguistics (TROLLing) (TROLLing) https://trolling.uit.no/ |
Depositor
| Lukeš, David |
Deposit Date
| 2018-10-12 |
Time Period
| Start Date: 1990 ; End Date: 2014 |
Date of Collection
| Start Date: 2017 ; End Date: 2018 |
Data Type
| corpus data |
Software
| R: A Language and Environment for Statistical Computing, Version: 3.4.3
psych: Procedures for Personality and Psychological Research (R package), Version: 1.7.8 |
Data Source
| Koditex corpus (https://wiki.korpus.cz/doku.php/en:cnk:koditex) |