Persistent Identifier
|
doi:10.18710/7LGSXY |
Publication Date
|
2025-06-18 |
Title
| Supporting Data for: Enhancing code-switching research through comparable corpora: Introducing the El Paso Bilingual Corpus |
Author
| Vanhaverbeke, Margot (Ghent University) - ORCID: 0000-0001-9893-4060
Enghels, Renata (Ghent University) - ORCID: 0000-0002-7785-0009
Parafita Couto, M. Carmen (Leiden University) - ORCID: 0000-0001-7306-3393
Ivanova, Iva (University of Texas at El Paso) - ORCID: 0000-0002-9039-9670 |
Point of Contact
|
Use email button above to contact.
Vanhaverbeke, Margot (Ghent University) |
Description
| Dataset description:
This dataset contains two data files that the related publication is based on. In particular, the data file Dataset_Diminutives contains in total 1886 diminutive constructions extracted from the Bangor Miami Corpus and the El Paso Bilingual Corpus. These constructions are coded for intralinguistic variables relating to the linguistic properties of both the base and the diminutive marker. The data file Metadata_Conversations_El_Paso_Bilingual_Corpus contains metadata about the conversations in the El Paso Bilingual Corpus. (2025-06-04)
Article Abstract:
Research on language contact outcomes, such as code-switching, continues to face theoretical and methodological challenges, particularly due to the difficulty of comparing findings across studies that use divergent data collection methods (Parafita Couto et al., 2021; Toribio, 2017). Accordingly, scholars have emphasized the need for publicly available and comparable bilingual corpora (Deuchar, 2020; Gullberg et al., 2009; Munarriz & Parafita Couto, 2014). This paper introduces the El Paso Bilingual Corpus, a new Spanish-English bilingual corpus recorded in El Paso (TX) in 2022, designed to be methodologically comparable to the Bangor Miami Corpus (Deuchar et al., 2014). The paper is structured in three main sections. First, we review existing Spanish-English corpora and examine the theoretical challenges posed by studies using non-comparable methodologies (Parafita Couto et al., 2021; Toribio, 2017), thereby underscoring the gap addressed by the El Paso Bilingual Corpus. Second, we outline the corpus creation process, discussing participant recruitment, data collection, and transcription, and provide an overview of these data, including participants’ sociolinguistic profiles. Third, to demonstrate the practical value of methodologically aligned corpora, we report a comparative case study on diminutive expressions in the El Paso and Bangor Miami corpora, illustrating how shared collection protocols can elucidate the role of community-specific social factors on bilinguals’ morphosyntactic choices. (2025-06-04) |
Subject
| Arts and Humanities |
Keyword
| bilingualism
code-switching
Spanish-English language contact
bilingual corpora
El Paso Bilingual Corpus
Bangor Miami Corpus
diminutive construction |
Related Publication
| Vanhaverbeke, Margot, et al., "Enhancing code-switching research through comparable corpora: Introducing the El Paso Bilingual Corpus" [forthcoming] |
Language
| English |
Producer
| Ghent University https://www.ugent.be/en |
Funding Information
| Research Foundation – Flanders: 1186523N |
Distributor
| The Tromsø Repository of Language and Linguistics (TROLLing) (TROLLing) https://trolling.uit.no/ |
Depositor
| Vanhaverbeke, Margot |
Deposit Date
| 2025-06-04 |
Time Period
| Start Date: 2022-04-01 ; End Date: 2024-12-31 |
Date of Collection
| Start Date: 2022-04-01 ; End Date: 2024-12-31 |
Data Type
| corpus data; survey data |
Software
| Excel |
Data Source
| The data in this dataset were retrieved from the following corpora:
- Deuchar, M., Carter, D., Davies, P., Donnelly, K., Parafita Couto, M. C., Stammers, J., Aveledo González, F., Fusser, M., Jones, L., Lloyd-Williams, S., Prys, M., & Robert, E. (2014). Bangor Miami Corpus [Conversational corpus]. Bangor: ESRC Centre for Research on Bilingualism in Theory & Practice.
- Vanhaverbeke, M., Dominguez, A., Ivanova, I., Parafita Couto, M. C., & Enghels, R. (2022). El Paso Bilingual Corpus [Conversational corpus]. Ghent: Ghent University.
The Bangor Miami Corpus is accessible online (https://bangortalk.org.uk/speakers.php?c=miami). The El Paso Bilingual Corpus has been created by the authors and is not yet publicly available.
The extracted text fragments that are contained in the data file Dataset_Diminutives only represent non-substantial portions of the sources listed above, and they do not represent coherent larger texts. Therefore, the reuse (including redistribution) of these excerpts is permitted by the exceptions rules in IPR and database protection regulations, such as Fair use (USA cf. US Copyright Act), the EU Database Directive (cf. article 8 Rights and obligations of lawful users), and the Norwegian Copyright Act (cf. § 24 Eneretten til databaser). |