Actually in contemporary British speech: Data from the Spoken BNC corpora (doi:10.18710/A3SATC)

View:

Part 1: Document Description
Part 2: Study Description
Part 3: Data Files Description
Part 4: Variable Description
Part 5: Other Study-Related Materials
Entire Codebook

(external link)

Document Description

Citation

Title:

Actually in contemporary British speech: Data from the Spoken BNC corpora

Identification Number:

doi:10.18710/A3SATC

Distributor:

DataverseNO

Date of Distribution:

2021-01-30

Version:

1

Bibliographic Citation:

Sönning, Lukas; Krug, Manfred, 2021, "Actually in contemporary British speech: Data from the Spoken BNC corpora", https://doi.org/10.18710/A3SATC, DataverseNO, V1, UNF:6:S8E8T69mmiEVrGGC0XPIrg== [fileUNF]

Study Description

Citation

Title:

Actually in contemporary British speech: Data from the Spoken BNC corpora

Identification Number:

doi:10.18710/A3SATC

Authoring Entity:

Sönning, Lukas (University of Bamberg)

Krug, Manfred (University of Bamberg)

Other identifications and acknowledgements:

Stich, Felicia

Producer:

University of Bamberg

Software used in Production:

CQPweb

Software used in Production:

rcqp (R package)

Distributor:

DataverseNO

Distributor:

The Tromsø Repository of Language and Linguistics (TROLLing)

Access Authority:

Sönning, Lukas

Depositor:

Sönning, Lukas

Date of Deposit:

2021-01-15

Holdings Information:

https://doi.org/10.18710/A3SATC

Study Scope

Keywords:

Arts and Humanities, actually, grammaticalization, pragmaticalization, BNC, corpus data, spoken, frequency, position, discourse marker, English, British English

Abstract:

<p>This dataset contains tabular files with information about the usage of "actually" in contemporary British speech. We draw on two spoken corpora: (i) The demographically sampled part of the Spoken BNC1994 (Crowdy 1995) and (ii) the Spoken BNC2014 (Love et al. 2017). For both corpora, we list the usage rate observed for each speaker (total number of words produced, number of actually tokens, normalized frequency of actually expressed as per million words), along with information about the sex and age of the informant. In total, the dataset includes n = 1,408 speakers (Spoken BNC1994DS) and n = 668 speakers (Spoken BNC2014). For each corpus, we offer data tables with additional speaker meta-data. For a subset of the Spoken BNC1994DS (speakers with available information on gender and age; n = 886 speakers; n = 2,688 tokens), we also report on the position of actually in the clause (initial, medial, final), which was annotated manually.</p> <p></p> <p>Related publication: Sönning, Lukas & Manfred Krug. 2022. Comparing study designs and down-sampling strategies in corpus analysis: The importance of speaker metadata in the BNCs of 1994 and 2014. In Ole Schützler & Julia Schlüter (eds.), Data and methods in corpus linguistics: Comparative approaches, 127-159. Cambridge: Cambridge University Press. <a href="https://doi.org/10.1017/9781108589314.006" title="DOI" target="_blank">https://doi.org/10.1017/9781108589314.006</a></p>

Time Period:

1991-19932012-2016

Date of Collection:

1991-19932012-2016

Country:

United Kingdom

Kind of Data:

corpus data

Kind of Data:

observational data

Methodology and Processing

Sources Statement

Data Sources:

[BNC1994]: The British National Corpus, version 3 (BNC XML Edition). 2007. Distributed by Bodleian Libraries, University of Oxford, on behalf of the BNC Consortium. URL: <a href="http://www.natcorp.ox.ac.uk/" title="BNC" target="_blank">http://www.natcorp.ox.ac.uk/</a>.

[Spoken BNC2014]: Love, Robbie, Claire Dembry, Andrew Hardie, Vaclav Brezina & Tony McEnery. 2017. The Spoken BNC2014: Designing and building a spoken corpus of everyday conversations. International Journal of Corpus Linguistics, 22(3), 319–344.

Data Access

Restrictions:

<p>In the file "actually_data_1994_position.csv", the contents of the columns "left_context" and "right_context" are extracts from the British National Corpus (XML edition; <a href="http://www.natcorp.ox.ac.uk/" title="BNC" target="_blank">http://www.natcorp.ox.ac.uk/</a>). The BNC User Licence (cf. <a href="http://www.natcorp.ox.ac.uk/docs/licence.html" title="BNC User Licence" target="_blank">http://www.natcorp.ox.ac.uk/docs/licence.html</a>) states that the use of such extracts is only licensed under the fair dealings provision of UK Copyright Law (cf. <a href="https://www.gov.uk/guidance/exceptions-to-copyright#fair-dealing" title="UK Copyright Law" target="_blank">https://www.gov.uk/guidance/exceptions-to-copyright#fair-dealing</a>).</p> <p></p> <p>According to UK Copyright Law (cf. <a href="https://www.gov.uk/guidance/exceptions-to-copyright#fair-dealing" title="BNC" target="_blank">https://www.gov.uk/guidance/exceptions-to-copyright#fair-dealing</a>), “[f]actors that have been identified by the courts as relevant in determining whether a particular dealing with a work is fair include: <ul> <li>does using the work affect the market for the original work? If a use of a work acts as a substitute for it, causing the owner to lose revenue, then it is not likely to be fair</li> <li>is the amount of the work taken reasonable and appropriate? Was it necessary to use the amount that was taken? Usually only part of a work may be used”</li> </ul> <p></p> The extracts used in this present dataset may be said to represent fair dealing according to both these factors: <ul> <li>The extracted material does not affect the market for the original work, as it is unlikely that any researcher would refrain from using the BNC because of the availability of the extracted material contained in the present dataset.</li> <li>The amount of the extracted work is reasonable and appropriate as it was necessary to carry out the study, and as it is necessary to replicate the study. Also, the extracted material does not represent more than brief citations (single sentences, for example) from the corpus, and thus does not infringe the copyright of the original IPR holders.</li> </ul></p>

Other Study Description Materials

Related Publications

Citation

Title:

Sönning, Lukas & Manfred Krug. 2022. Comparing study designs and down-sampling strategies in corpus analysis: The importance of speaker metadata in the BNCs of 1994 and 2014. In Ole Schützler & Julia Schlüter (eds.), Data and methods in corpus linguistics: Comparative approaches, 127-159. Cambridge: Cambridge University Press. doi:10.1017/9781108589314

Identification Number:

10.1017/9781108589314.006

Bibliographic Citation:

Sönning, Lukas & Manfred Krug. 2022. Comparing study designs and down-sampling strategies in corpus analysis: The importance of speaker metadata in the BNCs of 1994 and 2014. In Ole Schützler & Julia Schlüter (eds.), Data and methods in corpus linguistics: Comparative approaches, 127-159. Cambridge: Cambridge University Press. doi:10.1017/9781108589314

File Description--f87508

File: actually_data_1994.tab

  • Number of cases: 1408

  • No. of variables per record: 8

  • Type of File: text/tab-separated-values

Notes:

UNF:6:JkxhHT/73pTRKCUE/NVqgw==

File Description--f87522

File: actually_data_1994_position.tab

  • Number of cases: 2688

  • No. of variables per record: 12

  • Type of File: text/tab-separated-values

Notes:

UNF:6:8XO1y870BWH4DwgOu9UCGA==

File Description--f87511

File: actually_data_2014.tab

  • Number of cases: 668

  • No. of variables per record: 7

  • Type of File: text/tab-separated-values

Notes:

UNF:6:NGpek+oF5WbD+YYyOZLk1A==

Variable Description

List of Variables:

Variables

u_who

f87508 Location:

Variable Format: character

Notes: UNF:6:T0zN1LrUz6CSyfdAlRR9aA==

u_age_group

f87508 Location:

Variable Format: character

Notes: UNF:6:0lRZoHXbpbpSczk7K7jcYQ==

u_sex

f87508 Location:

Variable Format: character

Notes: UNF:6:opWtOQ/xN2tKHWKllz/cMw==

u_age

f87508 Location:

Variable Format: character

Notes: UNF:6:rKx/enKuSyNHKcc16qHLQA==

u_n_words

f87508 Location:

Summary Statistics: Max. 68676.0; StDev 6207.546547408033; Mean 2987.1875; Valid 1408.0; Min. 0.0

Variable Format: numeric

Notes: UNF:6:7VYiH50XZCO25IyVSVGgYA==

u_n_tokens

f87508 Location:

Summary Statistics: StDev 7303.660383484829; Max. 80886.0; Min. 1.0; Valid 1408.0; Mean 3561.544744318186

Variable Format: numeric

Notes: UNF:6:vdjsZndGzw3SfMXFdSzDLA==

age_bins

f87508 Location:

Variable Format: character

Notes: UNF:6:a4O0CeqB5J6cbEt2VfDLfg==

count

f87508 Location:

Summary Statistics: Max. 74.0; Mean 2.3508522727272685; StDev 6.052757803448164; Min. 0.0; Valid 1408.0

Variable Format: numeric

Notes: UNF:6:5e33DjWx5vBnWTMzDhMpwQ==

text_id

f87522 Location:

Variable Format: character

Notes: UNF:6:nyG9eVn0/UcZxqY2f+caBQ==

u_who

f87522 Location:

Variable Format: character

Notes: UNF:6:rny9dMt3tyv7OaOZCCF8lA==

u_age_group

f87522 Location:

Variable Format: character

Notes: UNF:6:aVlqN5c9P3dU4uM4VvcEtg==

u_sex

f87522 Location:

Variable Format: character

Notes: UNF:6:OJapwxDR/FYKqI249TSVJA==

u_age

f87522 Location:

Variable Format: character

Notes: UNF:6:Rt2jvCJoNX8FOYkhdZmKsg==

u_n_words

f87522 Location:

Summary Statistics: Valid 2688.0; Max. 68676.0; Min. 132.0; Mean 16253.7046130953; StDev 14755.572488870326;

Variable Format: numeric

Notes: UNF:6:DaY6eJrWzrsDsPgjb2pxwg==

u_n_tokens

f87522 Location:

Summary Statistics: Valid 2688.0; Min. 161.0; Mean 19156.15141369055; StDev 17214.630635794252; Max. 80886.0

Variable Format: numeric

Notes: UNF:6:Fus9SYB8ZLlCzCyYUhFXfw==

left_context

f87522 Location:

Variable Format: character

Notes: UNF:6:zUc6CW7463b7r4w69tfNAQ==

query_item

f87522 Location:

Variable Format: character

Notes: UNF:6:I4Nm0khkAXay1o14cvyLdA==

right_context

f87522 Location:

Variable Format: character

Notes: UNF:6:iswXgMFN0jzelxuhCpdsPA==

position

f87522 Location:

Variable Format: character

Notes: UNF:6:jt5Q/NdO6+KCvPd/YRxJQQ==

comment

f87522 Location:

Variable Format: character

Notes: UNF:6:1pR+pzzsJfG72z9Cz3ZS8Q==

speaker

f87511 Location:

Variable Format: character

Notes: UNF:6:bsdLxS1Xr2f9f3Tq1124oA==

count

f87511 Location:

Summary Statistics: Mean 26.149700598802404; Valid 668.0; StDev 62.22801123464477; Max. 600.0; Min. 0.0

Variable Format: numeric

Notes: UNF:6:IonPlR0+orF+hyWtOPVtPw==

Exact_age

f87511 Location:

Summary Statistics: StDev 20.159236143987037; Min. 2.0; Mean 39.8939393939394; Max. 91.0; Valid 528.0

Variable Format: numeric

Notes: UNF:6:y/6FPXadf59goUvJsAgzzw==

Age_range

f87511 Location:

Variable Format: character

Notes: UNF:6:OmmHQeU74r9/WDiaXyr1FQ==

Gender

f87511 Location:

Variable Format: character

Notes: UNF:6:6h4jsXwf3Khl2uipbNQ42Q==

total

f87511 Location:

Summary Statistics: Max. 362107.0; Valid 668.0; StDev 34445.3470944488; Mean 17010.190119760486; Min. 19.0

Variable Format: numeric

Notes: UNF:6:3SlGNGgkBuiC7/SOAkXY9A==

age_bins

f87511 Location:

Variable Format: character

Notes: UNF:6:8yqm/89HVQgMa3nFeMgqEA==

Other Study-Related Materials

Label:

00_ReadMe_actually.txt

Notes:

text/plain

Other Study-Related Materials

Label:

data_retrieval_1994.html

Notes:

text/html

Other Study-Related Materials

Label:

data_retrieval_1994.Rmd

Notes:

application/octet-stream

Other Study-Related Materials

Label:

data_retrieval_actually_2014.html

Notes:

text/html

Other Study-Related Materials

Label:

data_retrieval_actually_2014.Rmd

Notes:

application/octet-stream

Other Study-Related Materials

Label:

data_retrieval_speaker_biodata_2014.html

Notes:

text/html

Other Study-Related Materials

Label:

data_retrieval_speaker_biodata_2014.Rmd

Notes:

application/octet-stream