View: |
Part 1: Document Description
|
Citation |
|
---|---|
Title: |
Replication data for: Salience-simplification strategy for markedness of causal subordinators: “because” and “since” in argumentative essays |
Identification Number: |
doi:10.18710/RULYMP |
Distributor: |
DataverseNO |
Date of Distribution: |
2022-01-27 |
Version: |
1 |
Bibliographic Citation: |
Kang, Hui; Xu, Jiajin, 2022, "Replication data for: Salience-simplification strategy for markedness of causal subordinators: “because” and “since” in argumentative essays", https://doi.org/10.18710/RULYMP, DataverseNO, V1 |
Citation |
|
Title: |
Replication data for: Salience-simplification strategy for markedness of causal subordinators: “because” and “since” in argumentative essays |
Identification Number: |
doi:10.18710/RULYMP |
Authoring Entity: |
Kang, Hui (Dalian University of Foreign Languages) |
Xu, Jiajin (Beijing Foreign Studies University) |
|
Other identifications and acknowledgements: |
Software School/Intelligence Language Research Center |
Other identifications and acknowledgements: |
Wang, Luojia |
Other identifications and acknowledgements: |
Zhang, Yaxin |
Other identifications and acknowledgements: |
Zhang, Xiaobo |
Producer: |
Dalina University of Foreign Languages |
Date of Production: |
2021-08-05 |
Software used in Production: |
AntConc |
Software used in Production: |
R Language |
Software used in Production: |
RStudio Team |
Grant Number: |
L20BYY016 |
Grant Number: |
19ZDA319 |
Distributor: |
DataverseNO |
Distributor: |
The Tromsø Repository of Language and Linguistics (TROLLing) |
Access Authority: |
Kang, Hui |
Depositor: |
Kang, Hui |
Date of Deposit: |
2021-08-05 |
Holdings Information: |
https://doi.org/10.18710/RULYMP |
Study Scope |
|
Keywords: |
Arts and Humanities, causal subordinators, "because", "since", contextual features, argumentative essays, syntax, English |
Abstract: |
<p>The dataset supports the research article "Salience-simplification strategy to markedness of causal subordinators: The case of “because” and “since” in argumentative essays". In total, the dataset marks features of 976 causal adverbial subordinations retrieved from student argumentative essays.Data points were extracted from three corpora. Specifically, all essays in NESSIE (Native English Speakers’ Similarly or Identically-prompted Essays, created by Xu Jiajin, 781 essays; 291,911 tokens) and argumentative essays in LOCNESS (the Louvain Corpus of Native English Essays, created by Granger, 323 essays; 230,138 tokens) were selected. Native argumentative essays from BAWE’s (British Academic Written English, created by Hilary Nesi) Arts and Humanities disciplinary group were chosen (512 essays; 1,360,932 tokens). In total, 1,616 essays comprising 1,882,981 tokens were examined.</p> <p>The dataset comprises 976 datapoints of causal subordinations conjoined by "because" and "since" in students' argumentative essays--488 data points of all "since" subordinations, and 488 randomly selected "because" subordinations. On these data points, ten contextual features that are potential predictors of people's choices between causal subordinators "because" and "since" were annotated.</p> <p></p> <p>The ten contextual features annotated are "position", "separation", "embeddedness", "initial adverbials", "sub-clause", "de-ranking", "clause-length ratio", "hedging terms", "clausal relationship", and "bridging".</p> <p></p> <p> Overall fourteen variables including ten contetual features are annotated: </p> <p>(1) "No." is the ID of each data point(this is one ID marker);</p> <p>(2) "subordinator" marks the logical subordinators (this categorical variable has two values: "because" and "since"); </p> <p>(3) "position" marks the logical adverbial clause positions compared with the main clause (this categorical variable has two values: "preposed" or "postposed"); </p> <p>(4) "sep" indicates whether a separating punctuation mark exists between the subordinate and main clauses(this categorical variable has two values: "YES" or "NO");</p> <p>(5) "embeddedness" indicates whether a complex sentence is embedded in a larger comlex sentence(this categorical variable has two values: "YES" or "NO"); </p> <p>(6) "ini.adv" denotes whether an initial adverbial exists in the causal subordination(this categorical variable has two values: "YES" or "NO");</p> <p>(7) "sub-clau" indicates whether the causal subordinate contains sub-clauses of any type(this categorical variable has two values: "YES" or "NO");</p> <p>(8) "deranking" indicates whether the predicate of the subordinate clause is complete(this categorical variable has two values: "YES" or "NO"); </p> <p>(9) "sub.main.ratio" is the length ratio of the subordinate and main clauses in terms of word count (this numerical variable is converted into ln value for better interpretation); </p> <p>(10) "hedging" indicates whether a hedging term exists in the subordinate clause(this categorical variable has two values: "YES" or "NO"); </p> <p>(11) "clau.rel" denotes the interclausal relationships on the general level(this categorical variable has two values: "direct" or "indirect");</p> <p>(12) "spc.clau.rel2" denotes the interclausal relationships on the secondary level(this categorical variable has five values: "im", "rm", "asst", "inpr", and "sugg");</p> <p>(13) "bridging" indicates whether the subordinate clause contains any information referring back to the preceding clause(this categorical variable has two values: "YES" or "NO");</p> <p>(14) "source" shows specific corpora the data points come from (this categorical variable has three values: "NESSIE", "LOCNESS", or "BAWE") ;</p> <p></p> <p>This dataset was constructed to explore contextual features that discriminate between causal subordinators of "because" and "since" and to rank the effective features.</p> |
Time Period: |
1995-2007 |
Date of Collection: |
2019-12-01-2021-05-01 |
Country: |
United States, United Kingdom |
Kind of Data: |
corpus data |
Methodology and Processing |
|
Mode of Data Collection: |
corpus retrieval |
Sources Statement |
|
Data Sources: |
NESSIE Corpus. See: Xu, Jiajin. (2012). NESSIE Corpus 1st release (NESSIEv1): Native English Speakers' Similarly or Identically-prompted Essays 1st release. Beijing: National Research Centre for Foreign Language Education, Beijing Foreign Studies University. Available at <a href="http://corpus.bfsu.edu.cn/info/1070/1335.htm" title="Corpus" target="_blank">http://corpus.bfsu.edu.cn/info/1070/1335.htm</a>. |
LOCNESS (the Louvain Corpus of Native English Essays). See: Granger, S. (1998). The computer learner corpus: A versatile new source of data for SLA research. In Granger, S. (ed.) Learner English on Computer. Addison Wesley Longman : London & New York, 3-18. Available at <a href="https://www.learnercorpusassociation.org/resources/tools/locness-corpus/" title="Corpus" target="_blank">https://www.learnercorpusassociation.org/resources/tools/locness-corpus/</a>. |
|
BAWE (British Academic Written English). See: Nesi, Hilary; Gardner, Sheena; Thompson, Paul; et al., 2008, British Academic Written English Corpus, Oxford Text Archive, <a href="http://hdl.handle.net/20.500.12024/2539" title="Corpus" target="_blank">http://hdl.handle.net/20.500.12024/2539</a>. |
|
Data Access |
|
Other Study Description Materials |
|
Related Publications |
|
Citation |
|
Title: |
Xu, Jiajin, and Hui Kang. ‘Salience-Simplification Strategy for Markedness of Causal Subordinators: “Because” and “since” in Argumentative Essays’. Lingua, vol. 272, June 2022, p. 103256. ScienceDirect, https://doi.org/10.1016/j.lingua.2022.103256. |
Identification Number: |
10.1016/j.lingua.2022.103256 |
Bibliographic Citation: |
Xu, Jiajin, and Hui Kang. ‘Salience-Simplification Strategy for Markedness of Causal Subordinators: “Because” and “since” in Argumentative Essays’. Lingua, vol. 272, June 2022, p. 103256. ScienceDirect, https://doi.org/10.1016/j.lingua.2022.103256. |
Other Reference Note(s) |
|
Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik. 1985. A comprehensive grammar of the English language. London: Longman. |
|
Label: |
00_readme_causal-subordinators.txt |
Notes: |
text/plain |
Label: |
corpus_processing.R |
Notes: |
type/x-r-syntax |
Label: |
exact.matches.2.r |
Notes: |
type/x-r-syntax |
Label: |
pub-causalsubordinator.csv |
Notes: |
text/csv |
Label: |
word_sentence_count.R |
Notes: |
type/x-r-syntax |