Skip to main content
Replication Data for: A multivariate account of particle alternation after bare-form try in native varieties of English
Version 1.0
Tizón-Couto, David, 2022, "Replication Data for: A multivariate account of particle alternation after bare-form try in native varieties of English", https://doi.org/10.18710/GVUZWI, DataverseNO, V1
Dataset Metrics
0 Views
0 Downloads
Table
Tree
Preview
Preview
Preview
Preview
Preview
Preview
Preview
Preview
Preview
Edit File

This file has already been deleted (or replaced) in the current version. It may not be edited.

Restrict Files and Add Dataset Terms of Access

Restricting limits access to published files. You can add or edit Terms of Access for the dataset, and allow people to Request Access to restricted files.

Enable access request
Delete Files

The file will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.


Select File(s)

Please select one or more files.

Share Dataset

Share this dataset on your favorite social media networks.

Dataset Citations

Citations for this dataset are retrieved from Crossref via DataCite using Make Data Count standards. For more information about dataset metrics, please refer to the User Guide.

Sorry, no citations were found.
Restricted Files Selected

The restricted file(s) selected may not be downloaded because you have not been granted access.

Download Options

The files selected are too large to download as a ZIP.

You can select individual files that are below the 40.0 GB download limit from the files table, or use the Data Access API for programmatic access to the files.

Select File(s)

Please select a file or files to be downloaded.

Restricted Files Selected

The restricted file(s) selected may not be downloaded because you have not been granted access.

Click Continue to download the files you have access to download.

Delete Dataset

Are you sure you want to delete this dataset and all of its files? You cannot undelete this dataset.

Delete Draft Version

Are you sure you want to delete this draft version? Files will be reverted to the most recently published version. You cannot undelete this draft.

Unpublished Dataset Private URL

Private URL can only be used with unpublished versions of datasets.

Unpublished Dataset Private URL

Are you sure you want to disable the Private URL? If you have shared the Private URL with others they will no longer be able to use it to access your unpublished dataset.

Delete Files

The file(s) will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Compute

This dataset contains restricted files you may not compute on because you have not been granted access.

Deaccession Dataset

Are you sure you want to deaccession? The selected version(s) will no longer be viewable by the public.

Deaccession Dataset

Are you sure you want to deaccession this dataset? It will no longer be viewable by the public.

Version Differences Details

Please select two versions to view the differences.

Version Differences Details
 
Version:
Last Updated:
Version:
Last Updated:
Select File(s)

Please select a file or files for access request.

Edit Tags

Select existing file tags or create new tags to describe your files. Each file can have more than one tag.

Request Access

  You need to Sign Up or Log In to request access.

???file.mapData.unpublished.header???

???file.mapData.unpublished.message???

Dataset Terms

Please confirm and/or complete the information needed below in order to continue.

This dataset, "Replication Data for: A multivariate account of particle alternation after bare-form try in native varieties of English" (henceforth: "Dataset"), may be reused according to the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license as described here: https://creativecommons.org/licenses/by-nc/4.0.

This Dataset contains data from the following sources:

BNC: The British National Corpus. Examples of usage taken from the British National Corpus were obtained under the terms of the BNC End User Licence (see http://www.natcorp.ox.ac.uk/docs/licence.html or the file "BNC_End_User_Licence.pdf" included in this Dataset. Copyright in the individual texts cited resides with the original IPR holders. For information and licensing conditions relating to the BNC, please see the Terms tab on the landing page of the Dataset, and the BNC web site at http://www.natcorp.ox.ac.uk/.

Section "2 Terms of the Licence Granted to the Licensee" of the BNC End User Licence states among otherthings that "(f) [t]here is no restriction on the use of the Licensee's Results except that the Licensee may not publish in print or electronic form or exploit commercially in any form whatsoever any extracts from the BNC Processed Material other than those permitted under the fair dealings provision of copyright law."

In this Dataset, the data file "bnc.csv" contains the following information:

  • the keywords which the BNC was searched for, and for each token
  • annotations/values for six variables. This information has been provided by the author of this Dataset.

This means that the file does not contain any coherent (parts of) utterances which the keywords were found in as all context was removed from the data file. Therefore, publishing this data file is considered to be permitted under the fair dealings provision of copyright law; see details in section "Fair dealing" below.

COCA: Corpus of Contemporary American English. COCA does not provide an (openly accessible) end user license agreement. However, on their webpage (cf. https://www.english-corpora.org/copyright.asp; see also the file "COCA_Note_on_Copyright.pdf" included in this Dataset), they mention that the use of their source texts is "strictly for academic research, and is purely non-commercial". This may be interpreted as also the reuse of text from COCA being allowed for non-commercial purposes only. On the same webpage, COCA also provides evidence of their use and dissemination of the text sources being within the bounds of US Fair Use Law.

In this Dataset, the data file "coca.csv" contains the following information:

  • the keywords which the COCA was searched for, and for each token
  • annotations/values for six variables. This information has been provided by the author of this Dataset.

This means that the file does not contain any coherent (parts of) utterances which the keywords were found in as all context was removed from the data file. Therefore, publishing this data file is considered to be permitted under the fair dealings provision of copyright law; see details in section "Fair use" below.

GloWbE: Corpus of Global Web-Based English. GloWbE does not provide an (openly accessible) end user license agreement. However, on their webpage (cf. https://www.english-corpora.org/copyright.asp; see also the file "COCA_Note_on_Copyright.pdf" included in this Dataset), they mention that the use of their source texts is "strictly for academic research, and is purely non-commercial". This may be interpreted as also the reuse of text from GloWbE being allowed for non-commercial purposes only. On the same webpage, GloWbE also provides evidence of their use and dissemination of the text sources being within the bounds of US Fair Use Law.

In this Dataset, the data file "glowbe.csv" contains the following information:

  • the keywords which the GloWbE was searched for, and for each token
  • annotations/values for six variables. This information has been provided by the author of this Dataset.

This means that the file does not contain any coherent (parts of) utterances which the keywords were found in as all context was removed from the data file. Therefore, publishing this data file is considered to be permitted under the fair dealings provision of copyright law; see details in section "Fair use" below.

ICE: The International Corpus of English, including the following components:

  • ICE-AUS: The Australian Component of ICE.
  • ICE-CAN: The Canadian Component of ICE.
  • ICE-GB: The British Component of ICE.
  • ICE-IRE: The Irish Component of ICE.
  • ICE-NZ: The New Zealand Component of ICE.

ICE-CAN and ICE-IRE were used under the general ICE License Agreement; see https://www.ice-corpora.uzh.ch/dam/jcr:7ae594b2-ee97-4935-8022-7d2d91b60be4/ICElicence_UZH.pdf or the file "ICE_License_Agreement.pdf" included in this Dataset.

ICE-GB was used under the ICE-GB License Agreement; see the file "ICE-GB_License_Agreement.pdf" included in this Dataset.

ICE-NZ was used under the ICE-NZ License Agreement; see the file "ICE-NZ_License_Agreement.pdf" included in this Dataset.

The ICE license agreements mentioned above include the following conditions (here cited according to the general ICE License Agreement):

  • “The Corpus must be used for non-profit academic research purposes only. […] The Licensee agrees not to reproduce or redistribute the Corpus or to use all or any part of the Corpus texts in any commercial product or service.”
  • “Publications based on the Corpus may include citations from texts only in a way which would be permitted under the fair dealings provision of copyright law.”
  • “If you publish a paper using any ICE corpus, please send a reference to ice@es.uzh.ch.”

In this Dataset, the data file "ice.csv" contains the following information:

  • the keywords which the ICE was searched for, and for each token
  • the context (usually one sentence) which the keyword appears in
  • annotations/frequency calculations/values for ten variables. This information has been provided by the author of this Dataset.
  • the corpus component where the keywords were found

This means that the file only contains very limited excerpts from the works that are the bases for the ICE components that were used. Therefore, publishing this data file is considered to be permitted under the fair dealings provision of copyright law; see details in section "Fair dealing" below.

While no explicit, separate license agreement for ICE-AUS exists, its use and the publication of data from ICE-AUS as represented in this Dataset correspond to the use and publication of the data extracted from the other ICE components, and thus are considered as qualifying as fair dealing.

Fair dealing:

According to UK Copyright Law (cf. https://www.gov.uk/guidance/exceptions-to-copyright#fair-dealing), “[f]actors that have been identified by the courts as relevant in determining whether a particular dealing with a work is fair include:

  • "does using the work affect the market for the original work? If a use of a work acts as a substitute for it, causing the owner to lose revenue, then it is not likely to be fair"
  • "is the amount of the work taken reasonable and appropriate? Was it necessary to use the amount that was taken? Usually only part of a work may be used”

The corpus extracts used in this Dataset may be said to represent fair dealing according to both of these factors:

  • The extracted material does not affect the market for the original work, as it is unlikely that any researcher would refrain from using the corpora or the original works which the corpora are based on because of the availability of the extracted material contained in this Dataset.
  • The amount of the extracted work is reasonable and appropriate as it was necessary to carry out the study, and as it is necessary to replicate the study. Therefore, publishing the data files is not considered to infringe the copyright of the original IPR holders.

Fair use:

According to US Copyright Act (cf. https://www.copyright.gov/fair-use/more-info.html), "Fair use is a legal doctrine that promotes freedom of expression by permitting the unlicensed use of copyright-protected works in certain circumstances". The Corpus of Contemporary American English (COCA; cf. https://www.english-corpora.org/copyright.asp; see also the file "COCA_Note_on_Copyright.pdf" included in this Dataset) provides an extended discussion of why they believe that their use of the texts in COCA is within the bounds of US Fair Use Law. These arguments may also be applied to other corpora that have been used in this Dataset. Below, the discussion by COCA is adapted to the data files included in this Dataset:

The following are the four criteria used to determine whether materials fall under the provisions of the Fair Use Law:

Criteria: The amount and substantiality of the portion taken

  • What favors Fair Use status: Small portions of the original text, rather than full-text access
  • The data files in this Dataset: Under no circumstances whatsoever do end users / reusers have access to entire texts (e.g. newspaper, magazine, or journal articles, or short stories). The vast majority of what users see are simply lists of words or phrases from different parts of the corpus and possibly frequency charts showing the frequency of these items. Access to small portions of the original text is more of an "afterthought", rather than the central feature of the text excerpts contained in the data files included in this Dataset. Access to actual portions of the original text is limited to short excerpts, in some cases only keywords. As a result, it would be difficult for end users to re-create even one paragraph from the original text, and it would be virtually impossible to re-create an entire page of text, much less the entire work.

Criteria: The purpose and character of the use

  • What favors Fair Use status: Academic, non-commercial
  • The data files in this Dataset: Given the license under which this Dataset is published, the use of any content of this Dataset is strictly for non-commercial purposes.

Criteria: The nature of the copyrighted work

  • What favors Fair Use status: Non-creative works
  • The data files in this Dataset: The source texts used in this Dataset include some creative works (e.g. short stories and small sections of novels), but the majority of these texts is composed of transcripts of TV shows, and articles from newspapers, magazines, and academic journals.

Criteria: The effect of the use upon the potential market

  • What favors Fair Use status: Little or no effect on the copyright holder
  • The data files in this Dataset: Because of the very limited access to entire works included in the corpora that have been used in this Dataset (see the first item above), it is extremely unlikely that anyone would use the data files included in this Dataset as a "substitute" for other access to the original texts. Other sources make these texts available as "complete works", which are meant to be read in their entirety. That is completely impossible by using the data files included in this Dataset. The very limited access to the texts through the data files included in this Dataset, as compared to access via other sources, serves two completely different audiences. The data files are intended for linguists and other researchers who want to see the frequency of the investigated linguistic phenomena, and it is completely inadequate for anyone who wishes to read the entire text of a work. As a result, there is very little or no "competition" between the data files as distributed in this Dataset and services that are provided by others. The distribution of the data files included in this Dataset has therefore virtually no market impact.
Preview Guestbook

Upon downloading files the guestbook asks for the following information.

Account Information

Package File Download

Use the Download URL in a Wget command or a download manager to download this package file. Download via web browser is not recommended. User Guide - Downloading a Dataverse Package via URL

https://dataverse.no/api/access/datafile/

Request Access

Please confirm and/or complete the information needed below in order to request access to files in this dataset.

This dataset, "Replication Data for: A multivariate account of particle alternation after bare-form try in native varieties of English" (henceforth: "Dataset"), may be reused according to the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license as described here: https://creativecommons.org/licenses/by-nc/4.0.

This Dataset contains data from the following sources:

BNC: The British National Corpus. Examples of usage taken from the British National Corpus were obtained under the terms of the BNC End User Licence (see http://www.natcorp.ox.ac.uk/docs/licence.html or the file "BNC_End_User_Licence.pdf" included in this Dataset. Copyright in the individual texts cited resides with the original IPR holders. For information and licensing conditions relating to the BNC, please see the Terms tab on the landing page of the Dataset, and the BNC web site at http://www.natcorp.ox.ac.uk/.

Section "2 Terms of the Licence Granted to the Licensee" of the BNC End User Licence states among otherthings that "(f) [t]here is no restriction on the use of the Licensee's Results except that the Licensee may not publish in print or electronic form or exploit commercially in any form whatsoever any extracts from the BNC Processed Material other than those permitted under the fair dealings provision of copyright law."

In this Dataset, the data file "bnc.csv" contains the following information:

  • the keywords which the BNC was searched for, and for each token
  • annotations/values for six variables. This information has been provided by the author of this Dataset.

This means that the file does not contain any coherent (parts of) utterances which the keywords were found in as all context was removed from the data file. Therefore, publishing this data file is considered to be permitted under the fair dealings provision of copyright law; see details in section "Fair dealing" below.

COCA: Corpus of Contemporary American English. COCA does not provide an (openly accessible) end user license agreement. However, on their webpage (cf. https://www.english-corpora.org/copyright.asp; see also the file "COCA_Note_on_Copyright.pdf" included in this Dataset), they mention that the use of their source texts is "strictly for academic research, and is purely non-commercial". This may be interpreted as also the reuse of text from COCA being allowed for non-commercial purposes only. On the same webpage, COCA also provides evidence of their use and dissemination of the text sources being within the bounds of US Fair Use Law.

In this Dataset, the data file "coca.csv" contains the following information:

  • the keywords which the COCA was searched for, and for each token
  • annotations/values for six variables. This information has been provided by the author of this Dataset.

This means that the file does not contain any coherent (parts of) utterances which the keywords were found in as all context was removed from the data file. Therefore, publishing this data file is considered to be permitted under the fair dealings provision of copyright law; see details in section "Fair use" below.

GloWbE: Corpus of Global Web-Based English. GloWbE does not provide an (openly accessible) end user license agreement. However, on their webpage (cf. https://www.english-corpora.org/copyright.asp; see also the file "COCA_Note_on_Copyright.pdf" included in this Dataset), they mention that the use of their source texts is "strictly for academic research, and is purely non-commercial". This may be interpreted as also the reuse of text from GloWbE being allowed for non-commercial purposes only. On the same webpage, GloWbE also provides evidence of their use and dissemination of the text sources being within the bounds of US Fair Use Law.

In this Dataset, the data file "glowbe.csv" contains the following information:

  • the keywords which the GloWbE was searched for, and for each token
  • annotations/values for six variables. This information has been provided by the author of this Dataset.

This means that the file does not contain any coherent (parts of) utterances which the keywords were found in as all context was removed from the data file. Therefore, publishing this data file is considered to be permitted under the fair dealings provision of copyright law; see details in section "Fair use" below.

ICE: The International Corpus of English, including the following components:

  • ICE-AUS: The Australian Component of ICE.
  • ICE-CAN: The Canadian Component of ICE.
  • ICE-GB: The British Component of ICE.
  • ICE-IRE: The Irish Component of ICE.
  • ICE-NZ: The New Zealand Component of ICE.

ICE-CAN and ICE-IRE were used under the general ICE License Agreement; see https://www.ice-corpora.uzh.ch/dam/jcr:7ae594b2-ee97-4935-8022-7d2d91b60be4/ICElicence_UZH.pdf or the file "ICE_License_Agreement.pdf" included in this Dataset.

ICE-GB was used under the ICE-GB License Agreement; see the file "ICE-GB_License_Agreement.pdf" included in this Dataset.

ICE-NZ was used under the ICE-NZ License Agreement; see the file "ICE-NZ_License_Agreement.pdf" included in this Dataset.

The ICE license agreements mentioned above include the following conditions (here cited according to the general ICE License Agreement):

  • “The Corpus must be used for non-profit academic research purposes only. […] The Licensee agrees not to reproduce or redistribute the Corpus or to use all or any part of the Corpus texts in any commercial product or service.”
  • “Publications based on the Corpus may include citations from texts only in a way which would be permitted under the fair dealings provision of copyright law.”
  • “If you publish a paper using any ICE corpus, please send a reference to ice@es.uzh.ch.”

In this Dataset, the data file "ice.csv" contains the following information:

  • the keywords which the ICE was searched for, and for each token
  • the context (usually one sentence) which the keyword appears in
  • annotations/frequency calculations/values for ten variables. This information has been provided by the author of this Dataset.
  • the corpus component where the keywords were found

This means that the file only contains very limited excerpts from the works that are the bases for the ICE components that were used. Therefore, publishing this data file is considered to be permitted under the fair dealings provision of copyright law; see details in section "Fair dealing" below.

While no explicit, separate license agreement for ICE-AUS exists, its use and the publication of data from ICE-AUS as represented in this Dataset correspond to the use and publication of the data extracted from the other ICE components, and thus are considered as qualifying as fair dealing.

Fair dealing:

According to UK Copyright Law (cf. https://www.gov.uk/guidance/exceptions-to-copyright#fair-dealing), “[f]actors that have been identified by the courts as relevant in determining whether a particular dealing with a work is fair include:

  • "does using the work affect the market for the original work? If a use of a work acts as a substitute for it, causing the owner to lose revenue, then it is not likely to be fair"
  • "is the amount of the work taken reasonable and appropriate? Was it necessary to use the amount that was taken? Usually only part of a work may be used”

The corpus extracts used in this Dataset may be said to represent fair dealing according to both of these factors:

  • The extracted material does not affect the market for the original work, as it is unlikely that any researcher would refrain from using the corpora or the original works which the corpora are based on because of the availability of the extracted material contained in this Dataset.
  • The amount of the extracted work is reasonable and appropriate as it was necessary to carry out the study, and as it is necessary to replicate the study. Therefore, publishing the data files is not considered to infringe the copyright of the original IPR holders.

Fair use:

According to US Copyright Act (cf. https://www.copyright.gov/fair-use/more-info.html), "Fair use is a legal doctrine that promotes freedom of expression by permitting the unlicensed use of copyright-protected works in certain circumstances". The Corpus of Contemporary American English (COCA; cf. https://www.english-corpora.org/copyright.asp; see also the file "COCA_Note_on_Copyright.pdf" included in this Dataset) provides an extended discussion of why they believe that their use of the texts in COCA is within the bounds of US Fair Use Law. These arguments may also be applied to other corpora that have been used in this Dataset. Below, the discussion by COCA is adapted to the data files included in this Dataset:

The following are the four criteria used to determine whether materials fall under the provisions of the Fair Use Law:

Criteria: The amount and substantiality of the portion taken

  • What favors Fair Use status: Small portions of the original text, rather than full-text access
  • The data files in this Dataset: Under no circumstances whatsoever do end users / reusers have access to entire texts (e.g. newspaper, magazine, or journal articles, or short stories). The vast majority of what users see are simply lists of words or phrases from different parts of the corpus and possibly frequency charts showing the frequency of these items. Access to small portions of the original text is more of an "afterthought", rather than the central feature of the text excerpts contained in the data files included in this Dataset. Access to actual portions of the original text is limited to short excerpts, in some cases only keywords. As a result, it would be difficult for end users to re-create even one paragraph from the original text, and it would be virtually impossible to re-create an entire page of text, much less the entire work.

Criteria: The purpose and character of the use

  • What favors Fair Use status: Academic, non-commercial
  • The data files in this Dataset: Given the license under which this Dataset is published, the use of any content of this Dataset is strictly for non-commercial purposes.

Criteria: The nature of the copyrighted work

  • What favors Fair Use status: Non-creative works
  • The data files in this Dataset: The source texts used in this Dataset include some creative works (e.g. short stories and small sections of novels), but the majority of these texts is composed of transcripts of TV shows, and articles from newspapers, magazines, and academic journals.

Criteria: The effect of the use upon the potential market

  • What favors Fair Use status: Little or no effect on the copyright holder
  • The data files in this Dataset: Because of the very limited access to entire works included in the corpora that have been used in this Dataset (see the first item above), it is extremely unlikely that anyone would use the data files included in this Dataset as a "substitute" for other access to the original texts. Other sources make these texts available as "complete works", which are meant to be read in their entirety. That is completely impossible by using the data files included in this Dataset. The very limited access to the texts through the data files included in this Dataset, as compared to access via other sources, serves two completely different audiences. The data files are intended for linguists and other researchers who want to see the frequency of the investigated linguistic phenomena, and it is completely inadequate for anyone who wishes to read the entire text of a work. As a result, there is very little or no "competition" between the data files as distributed in this Dataset and services that are provided by others. The distribution of the data files included in this Dataset has therefore virtually no market impact.
Compute Batch
Clear Batch
Dataset Dataset Persistent ID
Submit for Review

You will not be able to make changes to this dataset while it is in review.

Publish Dataset

Are you sure you want to republish this dataset?

Select if this is a minor or major version update.

Publish Dataset

This dataset cannot be published until TROLLing is published by its administrator.

Publish Dataset

This dataset cannot be published until TROLLing and DataverseNO are published.

Return to Author

Return this dataset to contributor for modification.

Contact DataverseNO Support

DataverseNO Support

Please fill this out to prove you are not a robot.

+ =