{"id":104591,"identifier":"RULYMP","persistentUrl":"https://doi.org/10.18710/RULYMP","protocol":"doi","authority":"10.18710","publisher":"DataverseNO","publicationDate":"2022-01-27","storageIdentifier":"S3://10.18710/RULYMP","datasetVersion":{"id":3507,"datasetId":104591,"datasetPersistentId":"doi:10.18710/RULYMP","storageIdentifier":"S3://10.18710/RULYMP","versionNumber":1,"versionMinorNumber":2,"versionState":"RELEASED","productionDate":"2021-08-05","lastUpdateTime":"2023-09-28T20:21:25Z","releaseTime":"2023-09-28T20:21:25Z","createTime":"2023-09-20T06:41:16Z","publicationDate":"2022-01-27","citationDate":"2022-01-27","termsOfUse":"
This dataset, \"Replication data for: Salience-simplification strategy for markedness of causal subordinators: “because” and “since” in argumentative essays\", may be reused according to the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license as described here: https://creativecommons.org/licenses/by-nc/4.0/.
\n\nThis dataset contains statistical data obtained by analyzing texts from the following three corpora:
\n\n
All three corpora can be used for non-commercial purposes only. The BAWE corpus is explicitly licensed under Attribution-NonCommercial-ShareAlike 3.0 Unported (CC BY-NC-SA 3.0).
\n\nIn this dataset, \"Replication data for: Salience-simplification strategy for markedness of causal subordinators: “because” and “since” in argumentative essays\", the data file \"pub-causalsubordinator.csv\" contains statistical data / calculations based on texts contained in the three source corpora.
\nThe file does not contain any coherent (parts of) utterances which the keywords were found in as all context was removed from the data file. The use of the three source corpora does thus not infringe the copyright of any right holders who have contributed to the corpora.
","fileAccessRequest":true,"metadataBlocks":{"citation":{"displayName":"Citation Metadata","name":"citation","fields":[{"typeName":"title","multiple":false,"typeClass":"primitive","value":"Replication data for: Salience-simplification strategy for markedness of causal subordinators: “because” and “since” in argumentative essays"},{"typeName":"author","multiple":true,"typeClass":"compound","value":[{"authorName":{"typeName":"authorName","multiple":false,"typeClass":"primitive","value":"Kang, Hui"},"authorAffiliation":{"typeName":"authorAffiliation","multiple":false,"typeClass":"primitive","value":"Dalian University of Foreign Languages"},"authorIdentifierScheme":{"typeName":"authorIdentifierScheme","multiple":false,"typeClass":"controlledVocabulary","value":"ORCID"},"authorIdentifier":{"typeName":"authorIdentifier","multiple":false,"typeClass":"primitive","value":"0000-0002-5979-1658"}},{"authorName":{"typeName":"authorName","multiple":false,"typeClass":"primitive","value":"Xu, Jiajin"},"authorAffiliation":{"typeName":"authorAffiliation","multiple":false,"typeClass":"primitive","value":"Beijing Foreign Studies University"},"authorIdentifierScheme":{"typeName":"authorIdentifierScheme","multiple":false,"typeClass":"controlledVocabulary","value":"ORCID"},"authorIdentifier":{"typeName":"authorIdentifier","multiple":false,"typeClass":"primitive","value":"0000-0003-3454-9352"}}]},{"typeName":"datasetContact","multiple":true,"typeClass":"compound","value":[{"datasetContactName":{"typeName":"datasetContactName","multiple":false,"typeClass":"primitive","value":"Kang, Hui"},"datasetContactAffiliation":{"typeName":"datasetContactAffiliation","multiple":false,"typeClass":"primitive","value":"Dalian University of Foreign Languages"},"datasetContactEmail":{"typeName":"datasetContactEmail","multiple":false,"typeClass":"primitive","value":"kanghui@dlufl.edu.cn"}}]},{"typeName":"dsDescription","multiple":true,"typeClass":"compound","value":[{"dsDescriptionValue":{"typeName":"dsDescriptionValue","multiple":false,"typeClass":"primitive","value":"The dataset supports the research article \"Salience-simplification strategy to markedness of causal subordinators: The case of “because” and “since” in argumentative essays\". In total, the dataset marks features of 976 causal adverbial subordinations retrieved from student argumentative essays.Data points were extracted from three corpora. Specifically, all essays in NESSIE (Native English Speakers’ Similarly or Identically-prompted Essays, created by Xu Jiajin, 781 essays; 291,911 tokens) and argumentative essays in LOCNESS (the Louvain Corpus of Native English Essays, created by Granger, 323 essays; 230,138 tokens) were selected. Native argumentative essays from BAWE’s (British Academic Written English, created by Hilary Nesi) Arts and Humanities disciplinary group were chosen (512 essays; 1,360,932 tokens). In total, 1,616 essays comprising 1,882,981 tokens were examined.
\nThe dataset comprises 976 datapoints of causal subordinations conjoined by \"because\" and \"since\" in students' argumentative essays--488 data points of all \"since\" subordinations, and 488 randomly selected \"because\" subordinations. On these data points, ten contextual features that are potential predictors of people's choices between causal subordinators \"because\" and \"since\" were annotated.
\n\nThe ten contextual features annotated are \"position\", \"separation\", \"embeddedness\", \"initial adverbials\", \"sub-clause\", \"de-ranking\", \"clause-length ratio\", \"hedging terms\", \"clausal relationship\", and \"bridging\".
\n\nOverall fourteen variables including ten contetual features are annotated:
\n(1) \"No.\" is the ID of each data point(this is one ID marker);
\n(2) \"subordinator\" marks the logical subordinators (this categorical variable has two values: \"because\" and \"since\");
\n(3) \"position\" marks the logical adverbial clause positions compared with the main clause (this categorical variable has two values: \"preposed\" or \"postposed\");
\n(4) \"sep\" indicates whether a separating punctuation mark exists between the subordinate and main clauses(this categorical variable has two values: \"YES\" or \"NO\");
\n(5) \"embeddedness\" indicates whether a complex sentence is embedded in a larger comlex sentence(this categorical variable has two values: \"YES\" or \"NO\");
\n(6) \"ini.adv\" denotes whether an initial adverbial exists in the causal subordination(this categorical variable has two values: \"YES\" or \"NO\");
\n(7) \"sub-clau\" indicates whether the causal subordinate contains sub-clauses of any type(this categorical variable has two values: \"YES\" or \"NO\");
\n(8) \"deranking\" indicates whether the predicate of the subordinate clause is complete(this categorical variable has two values: \"YES\" or \"NO\");
\n(9) \"sub.main.ratio\" is the length ratio of the subordinate and main clauses in terms of word count (this numerical variable is converted into ln value for better interpretation);
\n(10) \"hedging\" indicates whether a hedging term exists in the subordinate clause(this categorical variable has two values: \"YES\" or \"NO\");
\n(11) \"clau.rel\" denotes the interclausal relationships on the general level(this categorical variable has two values: \"direct\" or \"indirect\");
\n(12) \"spc.clau.rel2\" denotes the interclausal relationships on the secondary level(this categorical variable has five values: \"im\", \"rm\", \"asst\", \"inpr\", and \"sugg\");
\n(13) \"bridging\" indicates whether the subordinate clause contains any information referring back to the preceding clause(this categorical variable has two values: \"YES\" or \"NO\");
\n(14) \"source\" shows specific corpora the data points come from (this categorical variable has three values: \"NESSIE\", \"LOCNESS\", or \"BAWE\") ;
\n\nThis dataset was constructed to explore contextual features that discriminate between causal subordinators of \"because\" and \"since\" and to rank the effective features.
"},"dsDescriptionDate":{"typeName":"dsDescriptionDate","multiple":false,"typeClass":"primitive","value":"2021-08-05"}}]},{"typeName":"subject","multiple":true,"typeClass":"controlledVocabulary","value":["Arts and Humanities"]},{"typeName":"keyword","multiple":true,"typeClass":"compound","value":[{"keywordValue":{"typeName":"keywordValue","multiple":false,"typeClass":"primitive","value":"causal subordinators"}},{"keywordValue":{"typeName":"keywordValue","multiple":false,"typeClass":"primitive","value":"\"because\""}},{"keywordValue":{"typeName":"keywordValue","multiple":false,"typeClass":"primitive","value":"\"since\""}},{"keywordValue":{"typeName":"keywordValue","multiple":false,"typeClass":"primitive","value":"contextual features"}},{"keywordValue":{"typeName":"keywordValue","multiple":false,"typeClass":"primitive","value":"argumentative essays"}},{"keywordValue":{"typeName":"keywordValue","multiple":false,"typeClass":"primitive","value":"syntax"}},{"keywordValue":{"typeName":"keywordValue","multiple":false,"typeClass":"primitive","value":"English"}}]},{"typeName":"publication","multiple":true,"typeClass":"compound","value":[{"publicationCitation":{"typeName":"publicationCitation","multiple":false,"typeClass":"primitive","value":"Xu, Jiajin, and Hui Kang. ‘Salience-Simplification Strategy for Markedness of Causal Subordinators: “Because” and “since” in Argumentative Essays’. Lingua, vol. 272, June 2022, p. 103256. ScienceDirect, https://doi.org/10.1016/j.lingua.2022.103256."},"publicationIDType":{"typeName":"publicationIDType","multiple":false,"typeClass":"controlledVocabulary","value":"doi"},"publicationIDNumber":{"typeName":"publicationIDNumber","multiple":false,"typeClass":"primitive","value":"10.1016/j.lingua.2022.103256"},"publicationURL":{"typeName":"publicationURL","multiple":false,"typeClass":"primitive","value":"https://doi.org/10.1016/j.lingua.2022.103256"}}]},{"typeName":"language","multiple":true,"typeClass":"controlledVocabulary","value":["English"]},{"typeName":"producer","multiple":true,"typeClass":"compound","value":[{"producerName":{"typeName":"producerName","multiple":false,"typeClass":"primitive","value":"Dalina University of Foreign Languages"},"producerAbbreviation":{"typeName":"producerAbbreviation","multiple":false,"typeClass":"primitive","value":"DLUFL"},"producerURL":{"typeName":"producerURL","multiple":false,"typeClass":"primitive","value":"https://www.dlufl.edu.cn/en/"}}]},{"typeName":"productionDate","multiple":false,"typeClass":"primitive","value":"2021-08-05"},{"typeName":"productionPlace","multiple":true,"typeClass":"primitive","value":["Dalian"]},{"typeName":"contributor","multiple":true,"typeClass":"compound","value":[{"contributorType":{"typeName":"contributorType","multiple":false,"typeClass":"controlledVocabulary","value":"Hosting Institution"},"contributorName":{"typeName":"contributorName","multiple":false,"typeClass":"primitive","value":"Software School/Intelligence Language Research Center"}},{"contributorType":{"typeName":"contributorType","multiple":false,"typeClass":"controlledVocabulary","value":"Project Member"},"contributorName":{"typeName":"contributorName","multiple":false,"typeClass":"primitive","value":"Wang, Luojia"}},{"contributorType":{"typeName":"contributorType","multiple":false,"typeClass":"controlledVocabulary","value":"Project Member"},"contributorName":{"typeName":"contributorName","multiple":false,"typeClass":"primitive","value":"Zhang, Yaxin"}},{"contributorType":{"typeName":"contributorType","multiple":false,"typeClass":"controlledVocabulary","value":"Project Member"},"contributorName":{"typeName":"contributorName","multiple":false,"typeClass":"primitive","value":"Zhang, Xiaobo"}}]},{"typeName":"grantNumber","multiple":true,"typeClass":"compound","value":[{"grantNumberAgency":{"typeName":"grantNumberAgency","multiple":false,"typeClass":"primitive","value":"Liaoning Social Science Foundation"},"grantNumberValue":{"typeName":"grantNumberValue","multiple":false,"typeClass":"primitive","value":"L20BYY016"}},{"grantNumberAgency":{"typeName":"grantNumberAgency","multiple":false,"typeClass":"primitive","value":"National Social Science Fund of China (NSSFC)"},"grantNumberValue":{"typeName":"grantNumberValue","multiple":false,"typeClass":"primitive","value":"19ZDA319"}}]},{"typeName":"distributor","multiple":true,"typeClass":"compound","value":[{"distributorName":{"typeName":"distributorName","multiple":false,"typeClass":"primitive","value":"The Tromsø Repository of Language and Linguistics (TROLLing)"},"distributorAbbreviation":{"typeName":"distributorAbbreviation","multiple":false,"typeClass":"primitive","value":"TROLLing"},"distributorURL":{"typeName":"distributorURL","multiple":false,"typeClass":"primitive","value":"https://trolling.uit.no/"}}]},{"typeName":"depositor","multiple":false,"typeClass":"primitive","value":"Kang, Hui"},{"typeName":"dateOfDeposit","multiple":false,"typeClass":"primitive","value":"2021-08-05"},{"typeName":"timePeriodCovered","multiple":true,"typeClass":"compound","value":[{"timePeriodCoveredStart":{"typeName":"timePeriodCoveredStart","multiple":false,"typeClass":"primitive","value":"1995"},"timePeriodCoveredEnd":{"typeName":"timePeriodCoveredEnd","multiple":false,"typeClass":"primitive","value":"2007"}}]},{"typeName":"dateOfCollection","multiple":true,"typeClass":"compound","value":[{"dateOfCollectionStart":{"typeName":"dateOfCollectionStart","multiple":false,"typeClass":"primitive","value":"2019-12-01"},"dateOfCollectionEnd":{"typeName":"dateOfCollectionEnd","multiple":false,"typeClass":"primitive","value":"2021-05-01"}}]},{"typeName":"kindOfData","multiple":true,"typeClass":"primitive","value":["corpus data"]},{"typeName":"software","multiple":true,"typeClass":"compound","value":[{"softwareName":{"typeName":"softwareName","multiple":false,"typeClass":"primitive","value":"AntConc"},"softwareVersion":{"typeName":"softwareVersion","multiple":false,"typeClass":"primitive","value":"3.5.8"}},{"softwareName":{"typeName":"softwareName","multiple":false,"typeClass":"primitive","value":"R Language"},"softwareVersion":{"typeName":"softwareVersion","multiple":false,"typeClass":"primitive","value":"3.6.2"}},{"softwareName":{"typeName":"softwareName","multiple":false,"typeClass":"primitive","value":"RStudio Team"},"softwareVersion":{"typeName":"softwareVersion","multiple":false,"typeClass":"primitive","value":"1.1.456"}}]},{"typeName":"otherReferences","multiple":true,"typeClass":"primitive","value":["Quirk, Randolph, Sidney Greenbaum, Geoffrey Leech, and Jan Svartvik. 1985. A comprehensive grammar of the English language. London: Longman."]},{"typeName":"dataSources","multiple":true,"typeClass":"primitive","value":["NESSIE Corpus. See: Xu, Jiajin. (2012). NESSIE Corpus 1st release (NESSIEv1): Native English Speakers' Similarly or Identically-prompted Essays 1st release. Beijing: National Research Centre for Foreign Language Education, Beijing Foreign Studies University. Available at http://corpus.bfsu.edu.cn/info/1070/1335.htm.","LOCNESS (the Louvain Corpus of Native English Essays). See: Granger, S. (1998). The computer learner corpus: A versatile new source of data for SLA research. In Granger, S. (ed.) Learner English on Computer. Addison Wesley Longman : London & New York, 3-18. Available at https://www.learnercorpusassociation.org/resources/tools/locness-corpus/.","BAWE (British Academic Written English). See: Nesi, Hilary; Gardner, Sheena; Thompson, Paul; et al., 2008, British Academic Written English Corpus, Oxford Text Archive, http://hdl.handle.net/20.500.12024/2539."]}]},"geospatial":{"displayName":"Geospatial Metadata","name":"geospatial","fields":[{"typeName":"geographicCoverage","multiple":true,"typeClass":"compound","value":[{"country":{"typeName":"country","multiple":false,"typeClass":"controlledVocabulary","value":"United States"}},{"country":{"typeName":"country","multiple":false,"typeClass":"controlledVocabulary","value":"United Kingdom"}}]}]},"socialscience":{"displayName":"Social Science and Humanities Metadata","name":"socialscience","fields":[{"typeName":"collectorTraining","multiple":false,"typeClass":"primitive","value":"Annotators on the variable \"clausal relationship\" were trained with definitions and examples of the two levelled clausal relationship system examined in our research."},{"typeName":"collectionMode","multiple":true,"typeClass":"primitive","value":["corpus retrieval"]}]}},"files":[{"label":"00_readme_causal-subordinators.txt","restricted":false,"version":1,"datasetVersionId":3507,"dataFile":{"id":104628,"persistentId":"doi:10.18710/RULYMP/LNHNTU","pidURL":"https://doi.org/10.18710/RULYMP/LNHNTU","filename":"00_readme_causal-subordinators.txt","contentType":"text/plain","filesize":12006,"storageIdentifier":"S3://2002-yellow-dataverseno:17b3757e4ec-caeab368a2ec","rootDataFileId":-1,"md5":"dafb1ac2afbe8f881afc4950cc7cbb34","checksum":{"type":"MD5","value":"dafb1ac2afbe8f881afc4950cc7cbb34"},"creationDate":"2021-08-12"}},{"label":"corpus_processing.R","restricted":false,"version":1,"datasetVersionId":3507,"dataFile":{"id":104632,"persistentId":"doi:10.18710/RULYMP/8LS73Q","pidURL":"https://doi.org/10.18710/RULYMP/8LS73Q","filename":"corpus_processing.R","contentType":"type/x-r-syntax","filesize":2810,"storageIdentifier":"S3://2002-yellow-dataverseno:17b37624e34-e8cfa700896f","rootDataFileId":-1,"md5":"e17d2f4f622ffe3c1abe1a94445ff044","checksum":{"type":"MD5","value":"e17d2f4f622ffe3c1abe1a94445ff044"},"creationDate":"2021-08-12"}},{"label":"exact.matches.2.r","restricted":false,"version":1,"datasetVersionId":3507,"dataFile":{"id":104631,"persistentId":"doi:10.18710/RULYMP/NCLF5C","pidURL":"https://doi.org/10.18710/RULYMP/NCLF5C","filename":"exact.matches.2.r","contentType":"type/x-r-syntax","filesize":10052,"storageIdentifier":"S3://2002-yellow-dataverseno:17b37621e71-107bc62fe47e","rootDataFileId":-1,"md5":"e67f2ddd4621eeab66d9e8d5cc2079d0","checksum":{"type":"MD5","value":"e67f2ddd4621eeab66d9e8d5cc2079d0"},"creationDate":"2021-08-12"}},{"label":"pub-causalsubordinator.csv","restricted":false,"version":1,"datasetVersionId":3507,"dataFile":{"id":104629,"persistentId":"doi:10.18710/RULYMP/KAHJOG","pidURL":"https://doi.org/10.18710/RULYMP/KAHJOG","filename":"pub-causalsubordinator.csv","contentType":"text/csv","filesize":73198,"storageIdentifier":"S3://2002-yellow-dataverseno:17b37594ef6-42612298ba2f","rootDataFileId":-1,"md5":"cec02bd4b59c6f8403945004d35b8e0b","checksum":{"type":"MD5","value":"cec02bd4b59c6f8403945004d35b8e0b"},"creationDate":"2021-08-12"}},{"label":"word_sentence_count.R","restricted":false,"version":1,"datasetVersionId":3507,"dataFile":{"id":104630,"persistentId":"doi:10.18710/RULYMP/XF7AB6","pidURL":"https://doi.org/10.18710/RULYMP/XF7AB6","filename":"word_sentence_count.R","contentType":"type/x-r-syntax","filesize":1178,"storageIdentifier":"S3://2002-yellow-dataverseno:17b37620104-bbb0db8b6384","rootDataFileId":-1,"md5":"3e3a15d14db523b99718ebfeb5b660c7","checksum":{"type":"MD5","value":"3e3a15d14db523b99718ebfeb5b660c7"},"creationDate":"2021-08-12"}}],"citation":"Kang, Hui; Xu, Jiajin, 2022, \"Replication data for: Salience-simplification strategy for markedness of causal subordinators: “because” and “since” in argumentative essays\", https://doi.org/10.18710/RULYMP, DataverseNO, V1"}}