Replication Data for: Modeling monophthongal versus diphthongal /aɪ/ in sung vocal performance with interpretable machine learning

Version 1.1

De Timmerman, Romeo; Verbeke, Gil, 2026, "Replication Data for: Modeling monophthongal versus diphthongal /aɪ/ in sung vocal performance with interpretable machine learning", https://doi.org/10.18710/RDU8M2, DataverseNO, V1

Learn about Data Citation Standards.

Contact Owner

Dataset Metrics

6 Downloads

Description	Dataset description This dataset contains replication data for the study "Modeling monophthongal versus diphthongal /aɪ/ in sung vocal performance with interpretable machine learning" (De Timmerman & Verbeke, 2026). The study investigates how binary perceptual annotations of monophthongal versus diphthongal /aɪ/ relate to measurable acoustic variation in sung vocal performance. Using a corpus of studio-recorded blues music processed with vocal-instrumental source separation, F1 and F2 formant trajectories were extracted for 1,004 /aɪ/ tokens, each perceptually categorized by a trained annotator as either monophthongal or diphthongal. These data were used to train and interpret two machine-learning models predicting the perceptual labels: a gradient boosted decision tree (XGBoost) trained on a set of engineered acoustic features, and a multilayer perceptron (MLP) trained directly on the raw, time-aligned formant trajectories. (2026-06-02) Article abstract This study combines modern source-separation techniques with interpretable machine learning methods to investigate how binary perceptual annotations of diphthongal and monophthongal /aɪ/ relate to measurable acoustic variation in sung vocal performance. Using a corpus of studio-recorded music processed with vocal-instrumental source separation, we extract F1 and F2 trajectories for 1,004 /aɪ/ tokens, each perceptually categorized as either monophthongal or diphthongal. These trajectories were modeled using two approaches: (i) a gradient boosted decision tree trained on engineered acoustic features, and (ii) a multilayer perceptron trained directly on raw formant trajectories. Both models achieved high accuracy and AUC-ROC, indicating that perceptual labels can reliably be predicted from both raw and engineered acoustic input. Moreover, a SHAP-based feature importance analysis showed that features such as Delta F1/Delta F2, cubic spline coefficients and trajectory derivatives captured systematic differences between perceived monophthongal and diphthongal tokens, highlighting the value of dynamic representations over static distance-based measures. The results indicate that perceptual annotations of diphthongal and monophthongal /aɪ/ correspond with quantifiable acoustic information and demonstrate how explainable machine learning can help map gradient vowel dynamics onto binary perceptual categories. The study further emphasizes how recent advances in source separation make sung performance a valuable new domain for phonetic research. (2026-06-02)
Subject	Arts and Humanities
Keyword	/aɪ/ vowel, monophthongization, formant trajectories, machine learning, sociophonetics, sociolinguistics, acoustics
Related Publication	Is Supplement To: De Timmerman, R., & Verbeke, G. (2026). Modeling monophthongal versus diphthongal /aɪ/ in sung vocal performance with interpretable machine learning. The Journal of the Acoustical Society of America, 160(1), 537–549.doi: https://doi.org/10.1121/10.0044380
License/Data Use Agreement	CC0 1.0

Filter by

	1 to 3 of 3 Files	Download
	00_README.txt Plain Text - 15.1 KB Published Jun 3, 2026 3 Downloads MD5: 2d15149d3af5d14653033961ed323449 File documenting the dataset and the contents of each data file.	Preview "00_README.txt" Access File File Access Public Download Options Plain Text Download Metadata Data File Citation Download EndNote XML Download RIS Download BibTeX
	formant_features.csv Comma Separated Values - 557.1 KB Published Jun 3, 2026 2 Downloads MD5: ebd7f2e6d1c474b35a25ee7f6dad7cf4 File containing engineered acoustic features for the same 1,004 /aɪ/ tokens (1,004 rows, 39 columns). Each row is a single vowel token, identified by vowel_id, with its perceptual label, vowel duration, a set of trajectory-based acoustic features derived from the F1 and F2 formants, and two phonological context variables. This file was used to train the gradient boosted decision tree (XGBoost) model and for the SHAP-based feature importance analysis.	Preview "formant_features.csv" Access File File Access Public Download Options Comma Separated Values Download Metadata Data File Citation Download EndNote XML Download RIS Download BibTeX
	formant_measurements.csv Comma Separated Values - 3.3 MB Published Jun 3, 2026 1 Download MD5: ae940a2782c26ef7da541babf31ef514 File containing raw, time-aligned F1 and F2 formant trajectories for the 1,004 /aɪ/ tokens (1,004 rows, 204 columns). Each row is a single vowel token, identified by vowel_id, with its perceptual label, vowel duration, and the F1 and F2 formant values interpolated to 100 equidistant timepoints across the vowel's duration. This file was used to train the multilayer perceptron (MLP) model and to generate the UMAP visualization.	Preview "formant_measurements.csv" Access File File Access Public Download Options Comma Separated Values Download Metadata Data File Citation Download EndNote XML Download RIS Download BibTeX

Citation Metadata

Persistent Identifier	doi:10.18710/RDU8M2
Publication Date	2026-06-03
Title	Replication Data for: Modeling monophthongal versus diphthongal /aɪ/ in sung vocal performance with interpretable machine learning
Author	https://ror.org/00cv9y106https://orcid.org/0000-0003-4395-4755 https://ror.org/00cv9y106https://orcid.org/0000-0002-9491-9557
Point of Contact	Use email button above to contact. De Timmerman, Romeo (Ghent University) TROLLing curator
Description	Dataset description This dataset contains replication data for the study "Modeling monophthongal versus diphthongal /aɪ/ in sung vocal performance with interpretable machine learning" (De Timmerman & Verbeke, 2026). The study investigates how binary perceptual annotations of monophthongal versus diphthongal /aɪ/ relate to measurable acoustic variation in sung vocal performance. Using a corpus of studio-recorded blues music processed with vocal-instrumental source separation, F1 and F2 formant trajectories were extracted for 1,004 /aɪ/ tokens, each perceptually categorized by a trained annotator as either monophthongal or diphthongal. These data were used to train and interpret two machine-learning models predicting the perceptual labels: a gradient boosted decision tree (XGBoost) trained on a set of engineered acoustic features, and a multilayer perceptron (MLP) trained directly on the raw, time-aligned formant trajectories. (2026-06-02) Article abstract This study combines modern source-separation techniques with interpretable machine learning methods to investigate how binary perceptual annotations of diphthongal and monophthongal /aɪ/ relate to measurable acoustic variation in sung vocal performance. Using a corpus of studio-recorded music processed with vocal-instrumental source separation, we extract F1 and F2 trajectories for 1,004 /aɪ/ tokens, each perceptually categorized as either monophthongal or diphthongal. These trajectories were modeled using two approaches: (i) a gradient boosted decision tree trained on engineered acoustic features, and (ii) a multilayer perceptron trained directly on raw formant trajectories. Both models achieved high accuracy and AUC-ROC, indicating that perceptual labels can reliably be predicted from both raw and engineered acoustic input. Moreover, a SHAP-based feature importance analysis showed that features such as Delta F1/Delta F2, cubic spline coefficients and trajectory derivatives captured systematic differences between perceived monophthongal and diphthongal tokens, highlighting the value of dynamic representations over static distance-based measures. The results indicate that perceptual annotations of diphthongal and monophthongal /aɪ/ correspond with quantifiable acoustic information and demonstrate how explainable machine learning can help map gradient vowel dynamics onto binary perceptual categories. The study further emphasizes how recent advances in source separation make sung performance a valuable new domain for phonetic research. (2026-06-02)
Subject	Arts and Humanities
Keyword	/aɪ/ vowel monophthongization formant trajectories machine learning sociophonetics sociolinguistics acoustics
Related Publication	Is Supplement To: De Timmerman, R., & Verbeke, G. (2026). Modeling monophthongal versus diphthongal /aɪ/ in sung vocal performance with interpretable machine learning. The Journal of the Acoustical Society of America, 160(1), 537–549. doi https://doi.org/10.1121/10.0044380 https://doi.org/10.1121/10.0044380
Language	English
Producer	Ghent University (UGent) https://www.ugent.be/
Contributor	Supervisor: Stef Slembrouck
Distributor	The Tromsø Repository of Language and Linguistics (TROLLing) (TROLLing) https://trolling.uit.no/
Depositor	De Timmerman, Romeo
Deposit Date	2026-06-01
Time Period	Start Date: 2008; End Date: 2023
Date of Collection	Start Date: 2023; End Date: 2024
Data Type	tabular
Software	Python, Version: 3.9.18 Keras, Version: 3.10.0 XGBoost, Version: 2.1.4 SHAP, Version: 0.48.0 UMAP, Version: 0.5.7 Praat, Version: 6.4.18
Related Material	GitHub repository containing python scripts/notebooks which were used to analyze this data: De Timmerman, R. (2026). Replication Code for "Modeling monophthongal versus diphthongal /aɪ/ in sung vocal performance with interpretable machine learning" [GitHub Repository]. https://github.com/romeodetimmerman/aae-in-blues-formants
Data Source	TROLLing repository containing original corpus data from which the vowel tokens were sampled: De Timmerman, R. (2025). Replication Data for: "Covering Blue Voices: African American English and Authenticity in Blues Covers" [Dataset]. DataverseNO. https://doi.org/doi:10.18710/DOJXAV This dataset is dedicated to the public domain under CC0.

Dataset Terms

License/Data Use Agreement

Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation shown on the dataset page.

Creative Commons CC0 1.0 Universal Public Domain Dedication. CC0 1.0

Dataset Version	Summary	Version Note	Contributors	Published on
No records found.

Edit File

This file has already been deleted (or replaced) in the current version. It may not be edited.

Restrict Access

Restricting limits access to published files. People who want to use the restricted files can request access by default. If you disable request access, you must add information about access to the Terms of Access field.

Learn about restricting files and dataset access in the User Guide.

Request Access

Enable access request

You must enable request access or add terms of access to restrict file access.

Terms of Access for Restricted Files

Save Changes

Edit Embargo

The selected file or files have already been published. Contact an administrator to change the embargo date or reason of the file or files.

Edit Retention Period

The selected file or files have already been published. Contact an administrator to change the retention period date or reason of the file or files.

Delete Files

The file will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Select File(s)

Please select one or more files.

Share Dataset

Share this dataset on your favorite social media networks.

Continue

Dataset Citations

Citations for this dataset are retrieved from Crossref via DataCite using Make Data Count standards. For more information about dataset metrics, please refer to the User Guide.

Sorry, no citations were found.

Inaccessible Files Selected

The selected file(s) may not be downloaded because you have not been granted access or the file(s) have a retention period that has expired or the files can only be transferred via Globus.

You may request access to any restricted file(s) by clicking the Request Access button.

Ineligible Files Selected

The selected file(s) may not be transferred because you have not been granted access or the file(s) have a retention period that has expired or the files are not Globus accessible.

You may request access to any restricted file(s) by clicking the Request Access button.

Download Options

The files selected are too large to download as a ZIP.

You can select individual files that are below the 9.3 GB download limit from the files table, or use the Data Access API for programmatic access to the files.

Select File(s)

Please select a file or files to be downloaded.

Inaccessible Files Selected

The selected file(s) may not be downloaded because you have not been granted access or the file(s) have a retention period that has expired.

Click Continue to download the files you have access to download.

Ineligible Files Selected

Some file(s) cannot be transferred. (They are restricted, embargoed, with an expired retention period, or not Globus accessible.)

Click Continue to transfer the elligible files.

Delete Dataset

Are you sure you want to delete this dataset and all of its files? You cannot undelete this dataset.

Delete Draft Version

Are you sure you want to delete this draft version? Files will be reverted to the most recently published version. You cannot undelete this draft.

Unpublished Dataset Preview URL

Preview URL can only be used with unpublished versions of datasets.

Unpublished Dataset Preview URL

Are you sure you want to disable the Preview URL? If you have shared the Preview URL with others they will no longer be able to use it to access your unpublished dataset.

Delete Files

The file(s) will be deleted after you click on the Delete button.

Files will not be removed from previously published versions of the dataset.

Compute

This dataset contains restricted files you may not compute on because you have not been granted access.

Deaccession Dataset

Are you sure you want to deaccession? This is permanent and the selected version(s) will no longer be viewable by the public.

Deaccession Dataset

Are you sure you want to deaccession this dataset? This is permanent an it will no longer be viewable by the public.

Version Differences Details

Please select two versions to view the differences.

Version Differences Details

Version:
Last Updated:

Select File(s)

Please select a file or files for access request.

Select File(s)

Embargoed files cannot be accessed. Please select an unembargoed file or files for your access request.

Edit Tags

Select existing file tags or create new tags to describe your files. Each file can have more than one tag.

Request Access

You need to Log In to request access.

Dataset Terms

Please confirm and/or complete the information needed below in order to request access to files in this dataset.

This dataset is made available under the following terms. Please confirm and/or complete the information needed below in order to continue.

License/Data Use Agreement

Our Community Norms as well as good scientific practices expect that proper credit is given via citation. Please use the data citation shown on the dataset page.

Creative Commons CC0 1.0 Universal Public Domain Dedication. CC0 1.0

Preview Guestbook

Upon downloading files the guestbook asks for the following information.

Guestbook Name

Collected Data

Account Information

Package File Download

Use the Download URL in a Wget command or a download manager to download this package file. Download via web browser is not recommended. User Guide - Downloading a Dataverse Package via URL

Download URL

https://dataverse.no/api/access/datafile/

Compute Batch

Clear Batch

Dataset	Persistent Identifier	Change Compute Batch

Compute Batch

Submit for Review

You will not be able to make changes to this dataset while it is in review.

Publish Dataset

Are you sure you want to republish this dataset?

Select if this is a minor or major version update.

Minor Release (1.2)

Major Release (2.0)

Publish Dataset

This dataset cannot be published until TROLLing is published by its administrator.

Publish Dataset

This dataset cannot be published until TROLLing and DataverseNO are published.

Return to Author

Return this dataset to contributor for modification.

Add/Edit a Version Note

Styled Citation