10.18710/QAJKZW
Cvrček, Václav0000-0003-3977-2393(Czech National Corpus)
Multi-Dimensional Analysis of Czech
DataverseNO
2018
doi:10.18710/QAJKZW/M1I7APdoi:10.18710/QAJKZW/LTNR3Kdoi:10.18710/QAJKZW/AYYGNFdoi:10.18710/QAJKZW/HAN1ML
Original data for a general-purpose multi-dimensional analysis model of register variation in Czech. This post contains a CSV data set of 137 linguistic features measured on 3428 Czech text chunks, and an R script which performs a factor analysis on this data set. The results of this factor analysis were used as a basis for an 8-dimensional model of register variation in Czech (see Related Publications), following the methodology introduced by Douglas Biber (see e.g. his 1988 seminal work Variation Across Speech and Writing for details on the methodology, or his 2014 article “Using multi-dimensional analysis to explore cross-linguistic universals of register variation” for a review of MDA results across a variety of languages). The data is derived from the Koditex corpus , which aims to be as diversified as possible, covering various forms of spoken and written (both print and on-line) Czech. In compiling this corpus, the purpose was to provide a solid empirical basis for a comprehensive general-purpose model of register variation in Czech. Apart from this data set and related publications, additional resources pertaining to the project are available via the czcorpus/mda GitHub repository.
Lukeš, David(Czech National Corpus)Czech National Corpus