Replication data for: Salience-simplification strategy for markedness of causal subordinators: “because” and “since” in argumentative essays (doi:10.18710/RULYMP)

View:

Part 1: Document Description
Part 2: Study Description
Part 5: Other Study-Related Materials
Entire Codebook

(external link)

Document Description

Citation

Title:

Replication data for: Salience-simplification strategy for markedness of causal subordinators: “because” and “since” in argumentative essays

Identification Number:

doi:10.18710/RULYMP

Distributor:

DataverseNO

Date of Distribution:

2022-01-27

Version:

1

Bibliographic Citation:

Kang, Hui; Xu, Jiajin, 2022, "Replication data for: Salience-simplification strategy for markedness of causal subordinators: “because” and “since” in argumentative essays", https://doi.org/10.18710/RULYMP, DataverseNO, V1

Study Description

Citation

Title:

Replication data for: Salience-simplification strategy for markedness of causal subordinators: “because” and “since” in argumentative essays

Identification Number:

doi:10.18710/RULYMP

Authoring Entity:

Kang, Hui (Dalian University of Foreign Languages)

Xu, Jiajin (Beijing Foreign Studies University)

Other identifications and acknowledgements:

Software School/Intelligence Language Research Center

Other identifications and acknowledgements:

Wang, Luojia

Other identifications and acknowledgements:

Zhang, Yaxin

Other identifications and acknowledgements:

Zhang, Xiaobo

Producer:

Dalina University of Foreign Languages

Date of Production:

2021-08-05

Software used in Production:

AntConc

Software used in Production:

R Language

Software used in Production:

RStudio Team

Grant Number:

L20BYY016

Grant Number:

19ZDA319

Distributor:

DataverseNO

Distributor:

The Tromsø Repository of Language and Linguistics (TROLLing)

Access Authority:

Kang, Hui

Depositor:

Kang, Hui

Date of Deposit:

2021-08-05

Holdings Information:

https://doi.org/10.18710/RULYMP

Study Scope

Keywords:

Arts and Humanities, causal subordinators, "because", "since", contextual features, argumentative essays, syntax, English

Abstract:

<p>The dataset supports the research article "Salience-simplification strategy to markedness of causal subordinators: The case of “because” and “since” in argumentative essays". In total, the dataset marks features of 976 causal adverbial subordinations retrieved from student argumentative essays.Data points were extracted from three corpora. Specifically, all essays in NESSIE (Native English Speakers’ Similarly or Identically-prompted Essays, created by Xu Jiajin, 781 essays; 291,911 tokens) and argumentative essays in LOCNESS (the Louvain Corpus of Native English Essays, created by Granger, 323 essays; 230,138 tokens) were selected. Native argumentative essays from BAWE’s (British Academic Written English, created by Hilary Nesi) Arts and Humanities disciplinary group were chosen (512 essays; 1,360,932 tokens). In total, 1,616 essays comprising 1,882,981 tokens were examined.</p> <p>The dataset comprises 976 datapoints of causal subordinations conjoined by "because" and "since" in students' argumentative essays--488 data points of all "since" subordinations, and 488 randomly selected "because" subordinations. On these data points, ten contextual features that are potential predictors of people's choices between causal subordinators "because" and "since" were annotated.</p> <p></p> <p>The ten contextual features annotated are "position", "separation", "embeddedness", "initial adverbials", "sub-clause", "de-ranking", "clause-length ratio", "hedging terms", "clausal relationship", and "bridging".</p> <p></p> <p> Overall fourteen variables including ten contetual features are annotated: </p> <p>(1) "No." is the ID of each data point(this is one ID marker);</p> <p>(2) "subordinator" marks the logical subordinators (this categorical variable has two values: "because" and "since"); </p> <p>(3) "position" marks the logical adverbial clause positions compared with the main clause (this categorical variable has two values: "preposed" or "postposed"); </p> <p>(4) "sep" indicates whether a separating punctuation mark exists between the subordinate and main clauses(this categorical variable has two values: "YES" or "NO");</p> <p>(5) "embeddedness" indicates whether a complex sentence is embedded in a larger comlex sentence(this categorical variable has two values: "YES" or "NO"); </p> <p>(6) "ini.adv" denotes whether an initial adverbial exists in the causal subordination(this categorical variable has two values: "YES" or "NO");</p> <p>(7) "sub-clau" indicates whether the causal subordinate contains sub-clauses of any type(this categorical variable has two values: "YES" or "NO");</p> <p>(8) "deranking" indicates whether the predicate of the subordinate clause is complete(this categorical variable has two values: "YES" or "NO"); </p> <p>(9) "sub.main.ratio" is the length ratio of the subordinate and main clauses in terms of word count (this numerical variable is converted into ln value for better interpretation); </p> <p>(10) "hedging" indicates whether a hedging term exists in the subordinate clause(this categorical variable has two values: "YES" or "NO"); </p> <p>(11) "clau.rel" denotes the interclausal relationships on the general level(this categorical variable has two values: "direct" or "indirect");</p> <p>(12) "spc.clau.rel2" denotes the interclausal relationships on the secondary level(this categorical variable has five values: "im", "rm", "asst", "inpr", and "sugg");</p> <p>(13) "bridging" indicates whether the subordinate clause contains any information referring back to the preceding clause(this categorical variable has two values: "YES" or "NO");</p> <p>(14) "source" shows specific corpora the data points come from (this categorical variable has three values: "NESSIE", "LOCNESS", or "BAWE") ;</p> <p></p> <p>This dataset was constructed to explore contextual features that discriminate between causal subordinators of "because" and "since" and to rank the effective features.</p>

Time Period:

1995-2007

Date of Collection:

2019-12-01-2021-05-01

Country:

United States, United Kingdom

Kind of Data:

corpus data

Methodology and Processing

Mode of Data Collection:

corpus retrieval

Sources Statement

Data Sources:

<p>This dataset contains statistical data obtained by analyzing texts from the following three corpora:</p> <p> <ul> <li>NESSIE Corpus. See: Xu, Jiajin. (2012). NESSIE Corpus 1st release (NESSIEv1): Native English Speakers' Similarly or Identically-prompted Essays 1st release. Beijing: National Research Centre for Foreign Language Education, Beijing Foreign Studies University. Available at <a href="http://corpus.bfsu.edu.cn/info/1070/1335.htm" title="Corpus" target="_blank">http://corpus.bfsu.edu.cn/info/1070/1335.htm</a>.</li> <li>LOCNESS (the Louvain Corpus of Native English Essays). See: Granger, S. (1998). The computer learner corpus: A versatile new source of data for SLA research. In Granger, S. (ed.) Learner English on Computer. Addison Wesley Longman : London & New York, 3-18. Available at <a href="https://www.learnercorpusassociation.org/resources/tools/locness-corpus/" title="Corpus" target="_blank">https://www.learnercorpusassociation.org/resources/tools/locness-corpus/</a>.</li> <li>BAWE (British Academic Written English). See: Nesi, Hilary; Gardner, Sheena; Thompson, Paul; et al., 2008, British Academic Written English Corpus, Oxford Text Archive, <a href="http://hdl.handle.net/20.500.12024/2539" title="Corpus" target="_blank">http://hdl.handle.net/20.500.12024/2539</a>.</li> </ul> </p> <p></p> <p>All three corpora can be used for non-commercial purposes only. The BAWE corpus is explicitly licensed under Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0).</p> <p></p> <p>In this dataset, "Replication data for: Salience-simplification strategy for markedness of causal subordinators: “because” and “since” in argumentative essays", the data file "pub-causalsubordinator.csv" contains statistical data / calculations based on texts contained in the three source corpora.</p> <p>The file does not contain any coherent (parts of) utterances which the keywords were found in as all context was removed from the data file. The use of the three source corpora does thus not infringe the copyright of any right holders who have contributed to the corpora.</p>

Data Access

Notes:

<a href="http://creativecommons.org/licenses/by-nc/4.0">CC BY-NC 4.0</a>

Other Study Description Materials

Related Publications

Citation

Title:

Xu, Jiajin, and Hui Kang. ‘Salience-Simplification Strategy for Markedness of Causal Subordinators: “Because” and “since” in Argumentative Essays’. Lingua, vol. 272, June 2022, p. 103256. ScienceDirect, https://doi.org/10.1016/j.lingua.2022.103256.

Identification Number:

10.1016/j.lingua.2022.103256

Bibliographic Citation:

Xu, Jiajin, and Hui Kang. ‘Salience-Simplification Strategy for Markedness of Causal Subordinators: “Because” and “since” in Argumentative Essays’. Lingua, vol. 272, June 2022, p. 103256. ScienceDirect, https://doi.org/10.1016/j.lingua.2022.103256.

Other Reference Note(s)

Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik. 1985. A comprehensive grammar of the English language. London: Longman.

Other Study-Related Materials

Label:

00_readme_causal-subordinators.txt

Notes:

text/plain

Other Study-Related Materials

Label:

corpus_processing.R

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

exact.matches.2.r

Notes:

type/x-r-syntax

Other Study-Related Materials

Label:

pub-causalsubordinator.csv

Notes:

text/csv

Other Study-Related Materials

Label:

word_sentence_count.R

Notes:

type/x-r-syntax