10.18710/QAJKZWCvrček, VáclavVáclavCvrček0000-0003-3977-2393Czech National CorpusMulti-Dimensional Analysis of CzechDataverseNO2018Arts and Humanitiesmulti-dimensional analysisregister variationfactor analysiscorpusCzechLukeš, DavidDavidLukešCzech National CorpusCzech National CorpusCvrček, VáclavVáclavCvrčekKomrsková, ZuzanaZuzanaKomrskováLukeš, DavidDavidLukešPoukarová, PetraPetraPoukarováŘehořková, AnnaAnnaŘehořkováZasina, Adrian JanAdrian JanZasinaThe Tromsø Repository of Language and Linguistics (TROLLing)TheTromsø Repository of Language and Linguistics (TROLLing)2018-10-122018-10-122023-09-282017/2018corpus data10.1515/cllt-2018-00204928862701874206271007application/vnd.openxmlformats-officedocument.wordprocessingml.documentapplication/pdftext/tab-separated-valuestype/x-r-syntax1.1CC0 1.0<p>
Original data for a general-purpose multi-dimensional analysis model of
register variation in Czech.
</p>
<p>
This post contains a CSV data set of 137 linguistic features measured on
3428 Czech text chunks, and an R script which performs a factor analysis
on this data set. The results of this factor analysis were used as a
basis for an 8-dimensional model of register variation in Czech (see
Related Publications), following the methodology introduced by Douglas
Biber (see e.g. his 1988 seminal work
<a href="https://doi.org/10.1017/CBO9780511621024">
Variation Across Speech and Writing
</a>
for details on the methodology, or his 2014 article
<a href="https://doi.org/10.1075/lic.14.1.02bib">
“Using multi-dimensional analysis to explore cross-linguistic universals
of register variation”
</a>
for a review of MDA results across a variety of languages).
</p>
<p>
The data is derived from the
<a href="https://wiki.korpus.cz/doku.php/en:cnk:koditex">
Koditex corpus
</a>,
which aims to be as diversified as possible, covering various forms of
spoken and written (both print and on-line) Czech. In compiling this
corpus, the purpose was to provide a solid empirical basis for a
comprehensive general-purpose model of register variation in Czech.
</p>
<p>
Apart from this data set and related publications, additional
resources pertaining to the project are available via the
<a href="https://github.com/czcorpus/mda">
czcorpus/mda
</a>
GitHub repository.
</p>R: A Language and Environment for Statistical Computing, 3.4.3psych: Procedures for Personality and Psychological Research (R package), 1.7.8Prague, Czech RepublicEuropean Regional Development FundCZ.02.1.01/0.0/0.0/16_013/0001758