{"id":166238,"identifier":"CES0L8","persistentUrl":"https://doi.org/10.18710/CES0L8","protocol":"doi","authority":"10.18710","publisher":"DataverseNO","publicationDate":"2022-11-22","storageIdentifier":"S3://10.18710/CES0L8","datasetVersion":{"id":3771,"datasetId":166238,"datasetPersistentId":"doi:10.18710/CES0L8","storageIdentifier":"S3://10.18710/CES0L8","versionNumber":1,"versionMinorNumber":2,"versionState":"RELEASED","productionDate":"2021-03-01","lastUpdateTime":"2023-09-28T19:53:53Z","releaseTime":"2023-09-28T19:53:53Z","createTime":"2023-09-28T15:43:14Z","publicationDate":"2022-11-22","citationDate":"2022-11-22","termsOfUse":"
GNU General Public License v3.0:\nhttps://www.gnu.org/licenses/gpl-3.0.en.html.\n
\nThis license applies to the data, software, and analysis contained in this repository.","termsOfAccess":"GNU General Public License v3.0","fileAccessRequest":false,"metadataBlocks":{"citation":{"displayName":"Citation Metadata","name":"citation","fields":[{"typeName":"title","multiple":false,"typeClass":"primitive","value":"Replication Data for: Exposure and Emergence in Usage-Based Grammar: Computational Experiments in 35 Languages"},{"typeName":"author","multiple":true,"typeClass":"compound","value":[{"authorName":{"typeName":"authorName","multiple":false,"typeClass":"primitive","value":"Dunn, Jonathan"},"authorAffiliation":{"typeName":"authorAffiliation","multiple":false,"typeClass":"primitive","value":"University of Canterbury"},"authorIdentifierScheme":{"typeName":"authorIdentifierScheme","multiple":false,"typeClass":"controlledVocabulary","value":"ORCID"},"authorIdentifier":{"typeName":"authorIdentifier","multiple":false,"typeClass":"primitive","value":"0000-0002-1189-1908"}}]},{"typeName":"datasetContact","multiple":true,"typeClass":"compound","value":[{"datasetContactName":{"typeName":"datasetContactName","multiple":false,"typeClass":"primitive","value":"Dunn, Jonathan"},"datasetContactAffiliation":{"typeName":"datasetContactAffiliation","multiple":false,"typeClass":"primitive","value":"University of Canterbury"},"datasetContactEmail":{"typeName":"datasetContactEmail","multiple":false,"typeClass":"primitive","value":"jonathan.dunn@canterbury.ac.nz"}}]},{"typeName":"dsDescription","multiple":true,"typeClass":"compound","value":[{"dsDescriptionValue":{"typeName":"dsDescriptionValue","multiple":false,"typeClass":"primitive","value":"[article abstract:] This paper uses computational experiments to explore the role of exposure in the emergence of construction grammars. While usage-based grammars are hypothesized to depend on a learner’s exposure to actual language use, the mechanisms of such exposure have only been studied in a few constructions in isolation. This paper experiments with (i) the growth rate of the constructicon, (ii) the convergence rate of grammars exposed to independent registers, and (iii) the rate at which constructions are forgotten when they have not been recently observed. These experiments show that the lexicon grows more quickly than the grammar and that the growth rate of the grammar is not dependent on the growth rate of the lexicon. At the same time, register-specific grammars converge onto more similar constructions as the amount of exposure increases. This means that the influence of specific registers becomes less important as exposure increases. Finally, the rate at which constructions are forgotten when they have not been recently observed mirrors the growth rate of the constructicon. This paper thus presents a computational model of usage-based grammar that includes both the emergence and the unentrenchment of constructions.
\n\n[dataset abstract:] \nThis dataset consists of three zip folders containing the main analysis represented in the related publication as well as a number of separate corpus files that serve as the raw input to grammar learning.
"},"dsDescriptionDate":{"typeName":"dsDescriptionDate","multiple":false,"typeClass":"primitive","value":"2022-11-02"}}]},{"typeName":"subject","multiple":true,"typeClass":"controlledVocabulary","value":["Arts and Humanities","Computer and Information Science"]},{"typeName":"keyword","multiple":true,"typeClass":"compound","value":[{"keywordValue":{"typeName":"keywordValue","multiple":false,"typeClass":"primitive","value":"construction grammar"}},{"keywordValue":{"typeName":"keywordValue","multiple":false,"typeClass":"primitive","value":"constructicon"}},{"keywordValue":{"typeName":"keywordValue","multiple":false,"typeClass":"primitive","value":"exposure"}},{"keywordValue":{"typeName":"keywordValue","multiple":false,"typeClass":"primitive","value":"emergence"}},{"keywordValue":{"typeName":"keywordValue","multiple":false,"typeClass":"primitive","value":"usage-based grammar"}}]},{"typeName":"publication","multiple":true,"typeClass":"compound","value":[{"publicationCitation":{"typeName":"publicationCitation","multiple":false,"typeClass":"primitive","value":"Dunn, Jonathan. \"Exposure and emergence in usage-based grammar: computational experiments in 35 languages\" Cognitive Linguistics, vol. 33, no. 4, 2022, pp. 659-699. https://doi.org/10.1515/cog-2021-0106"},"publicationIDType":{"typeName":"publicationIDType","multiple":false,"typeClass":"controlledVocabulary","value":"doi"},"publicationIDNumber":{"typeName":"publicationIDNumber","multiple":false,"typeClass":"primitive","value":"10.1515/cog-2021-0106"},"publicationURL":{"typeName":"publicationURL","multiple":false,"typeClass":"primitive","value":"https://doi.org/10.1515/cog-2021-0106"}}]},{"typeName":"language","multiple":true,"typeClass":"controlledVocabulary","value":["English"]},{"typeName":"producer","multiple":true,"typeClass":"compound","value":[{"producerName":{"typeName":"producerName","multiple":false,"typeClass":"primitive","value":"University of Canterbury"},"producerAbbreviation":{"typeName":"producerAbbreviation","multiple":false,"typeClass":"primitive","value":"UC"},"producerURL":{"typeName":"producerURL","multiple":false,"typeClass":"primitive","value":"https://www.canterbury.ac.nz"}}]},{"typeName":"productionDate","multiple":false,"typeClass":"primitive","value":"2021-03-01"},{"typeName":"contributor","multiple":true,"typeClass":"compound","value":[{"contributorType":{"typeName":"contributorType","multiple":false,"typeClass":"controlledVocabulary","value":"Project Leader"},"contributorName":{"typeName":"contributorName","multiple":false,"typeClass":"primitive","value":"Dunn, Jonathan"}}]},{"typeName":"distributor","multiple":true,"typeClass":"compound","value":[{"distributorName":{"typeName":"distributorName","multiple":false,"typeClass":"primitive","value":"The Tromsø Repository of Language and Linguistics (TROLLing)"},"distributorAbbreviation":{"typeName":"distributorAbbreviation","multiple":false,"typeClass":"primitive","value":"TROLLing"},"distributorURL":{"typeName":"distributorURL","multiple":false,"typeClass":"primitive","value":"https://trolling.uit.no/"}}]},{"typeName":"depositor","multiple":false,"typeClass":"primitive","value":"Dunn, Jonathan"},{"typeName":"dateOfDeposit","multiple":false,"typeClass":"primitive","value":"2022-11-02"},{"typeName":"kindOfData","multiple":true,"typeClass":"primitive","value":["Text corpus","Experimental data","Source code"]},{"typeName":"software","multiple":true,"typeClass":"compound","value":[{"softwareName":{"typeName":"softwareName","multiple":false,"typeClass":"primitive","value":"Python"},"softwareVersion":{"typeName":"softwareVersion","multiple":false,"typeClass":"primitive","value":"3.7"}}]},{"typeName":"relatedMaterial","multiple":true,"typeClass":"primitive","value":["Dunn, J. “Exposure and Emergence in Usage-Based Grammar: Computational Experiments in 35 Languages.” (manuscript version). https://jdunn.name/2022/10/22/exposure-and-emergence-in-usage-based-grammar/"]},{"typeName":"dataSources","multiple":true,"typeClass":"primitive","value":["Dunn, J. Mapping languages: the Corpus of Global Language Use. Lang Resources & Evaluation 54, 999–1018 (2020). https://doi.org/10.1007/s10579-020-09489-2.
\n\nThis corpus can be visualized and downloaded from https://www.earthlings.io.
"]}]}},"files":[{"description":"This file provides an overview of this data","label":"0_ReadMe.txt","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":170286,"persistentId":"doi:10.18710/CES0L8/PLLW5R","pidURL":"https://doi.org/10.18710/CES0L8/PLLW5R","filename":"0_ReadMe.txt","contentType":"text/plain","filesize":11055,"description":"This file provides an overview of this data","storageIdentifier":"S3://2002-yellow-dataverseno:1849fcfb704-caecb4889a7f","rootDataFileId":-1,"md5":"c458cfb166100550724d3c5feec9e3e6","checksum":{"type":"MD5","value":"c458cfb166100550724d3c5feec9e3e6"},"creationDate":"2022-11-22"}},{"description":"For replicating the statistical analysis in the paper.","label":"1_Supplementary_Material_01_Analysis.zip","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":170226,"persistentId":"doi:10.18710/CES0L8/4EGVGK","pidURL":"https://doi.org/10.18710/CES0L8/4EGVGK","filename":"1_Supplementary_Material_01_Analysis.zip","contentType":"application/zip","filesize":24872718,"description":"For replicating the statistical analysis in the paper.","storageIdentifier":"S3://2002-yellow-dataverseno:1848b4c9000-43595616e04b","rootDataFileId":-1,"md5":"6a32ebe8c5605f9faff3396a1fd040bb","checksum":{"type":"MD5","value":"6a32ebe8c5605f9faff3396a1fd040bb"},"creationDate":"2022-11-18"}},{"description":"For replicating the semantic domains for the grammars.","label":"2_Supplementary_Material_02_Semantic_Domains.zip","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":167506,"persistentId":"doi:10.18710/CES0L8/PUFH9M","pidURL":"https://doi.org/10.18710/CES0L8/PUFH9M","filename":"2_Supplementary_Material_02_Semantic_Domains.zip","contentType":"application/zip","filesize":160958699,"description":"For replicating the semantic domains for the grammars.","storageIdentifier":"S3://2002-yellow-dataverseno:1843d94eafc-bb3752b38d74","rootDataFileId":-1,"md5":"8f062379b6f8a58edb14b3a74b68d6e0","checksum":{"type":"MD5","value":"8f062379b6f8a58edb14b3a74b68d6e0"},"creationDate":"2022-11-03"}},{"description":"Examples of constructions and tokens of constructions.","label":"3_Supplementary_Material_03_Grammars.zip","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":167507,"persistentId":"doi:10.18710/CES0L8/LPWVM7","pidURL":"https://doi.org/10.18710/CES0L8/LPWVM7","filename":"3_Supplementary_Material_03_Grammars.zip","contentType":"application/zip","filesize":464860358,"description":"Examples of constructions and tokens of constructions.","storageIdentifier":"S3://2002-yellow-dataverseno:1843d99b118-caff7187ab5d","rootDataFileId":-1,"md5":"fcdfe286c68b3a585f29034ed6667ecb","checksum":{"type":"MD5","value":"fcdfe286c68b3a585f29034ed6667ecb"},"creationDate":"2022-11-03"}},{"description":"Arabic, Wikipedia corpus","label":"wiki.ara.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168085,"persistentId":"doi:10.18710/CES0L8/NMHSKA","pidURL":"https://doi.org/10.18710/CES0L8/NMHSKA","filename":"wiki.ara.clean.gz","contentType":"application/x-gzip","filesize":45353707,"description":"Arabic, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843df29e9a-3004a3a3282c","rootDataFileId":-1,"md5":"330306e59ced66edfac6eccdeb3d9d70","checksum":{"type":"MD5","value":"330306e59ced66edfac6eccdeb3d9d70"},"creationDate":"2022-11-03"}},{"description":"Bulgarian, Wikipedia corpus","label":"wiki.bul.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168125,"persistentId":"doi:10.18710/CES0L8/IJLCNU","pidURL":"https://doi.org/10.18710/CES0L8/IJLCNU","filename":"wiki.bul.clean.gz","contentType":"application/x-gzip","filesize":40079489,"description":"Bulgarian, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843def3b71-70d929844b15","rootDataFileId":-1,"md5":"2637ed3c5f5118bb6233cb7bc9991ca6","checksum":{"type":"MD5","value":"2637ed3c5f5118bb6233cb7bc9991ca6"},"creationDate":"2022-11-03"}},{"description":"Catalan, Wikipedia corpus","label":"wiki.cat.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168080,"persistentId":"doi:10.18710/CES0L8/HJRLPC","pidURL":"https://doi.org/10.18710/CES0L8/HJRLPC","filename":"wiki.cat.clean.gz","contentType":"application/x-gzip","filesize":32722484,"description":"Catalan, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843df9680a-27b139fd8958","rootDataFileId":-1,"md5":"c8930cfe2d2470aa03016c4159bcae98","checksum":{"type":"MD5","value":"c8930cfe2d2470aa03016c4159bcae98"},"creationDate":"2022-11-03"}},{"description":"Czech, Wikipedia corpus","label":"wiki.ces.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168111,"persistentId":"doi:10.18710/CES0L8/LSJFNV","pidURL":"https://doi.org/10.18710/CES0L8/LSJFNV","filename":"wiki.ces.clean.gz","contentType":"application/x-gzip","filesize":43693862,"description":"Czech, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843df9d9a5-41133cc0b629","rootDataFileId":-1,"md5":"9e06f51646f33c65d8a335c417fbeaa4","checksum":{"type":"MD5","value":"9e06f51646f33c65d8a335c417fbeaa4"},"creationDate":"2022-11-03"}},{"description":"Danish, Wikipedia corpus","label":"wiki.dan.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168099,"persistentId":"doi:10.18710/CES0L8/I7QZTT","pidURL":"https://doi.org/10.18710/CES0L8/I7QZTT","filename":"wiki.dan.clean.gz","contentType":"application/x-gzip","filesize":29654834,"description":"Danish, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843dfd3e1a-9f542c53c5e6","rootDataFileId":-1,"md5":"a9dea682da849b1c55540199587f3d30","checksum":{"type":"MD5","value":"a9dea682da849b1c55540199587f3d30"},"creationDate":"2022-11-03"}},{"description":"German, Wikipedia corpus","label":"wiki.deu.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168127,"persistentId":"doi:10.18710/CES0L8/EZ9JKN","pidURL":"https://doi.org/10.18710/CES0L8/EZ9JKN","filename":"wiki.deu.clean.gz","contentType":"application/x-gzip","filesize":39970694,"description":"German, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843df306e6-ba3c47afa43b","rootDataFileId":-1,"md5":"db606df2a9e15b78e70f32f60fbc8209","checksum":{"type":"MD5","value":"db606df2a9e15b78e70f32f60fbc8209"},"creationDate":"2022-11-03"}},{"description":"Greek, Wikipedia corpus","label":"wiki.ell.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168128,"persistentId":"doi:10.18710/CES0L8/AT49IZ","pidURL":"https://doi.org/10.18710/CES0L8/AT49IZ","filename":"wiki.ell.clean.gz","contentType":"application/x-gzip","filesize":46571355,"description":"Greek, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843dfdbfde-deb11aa04182","rootDataFileId":-1,"md5":"50dcdf9e0196cbb9311577be3d959b0a","checksum":{"type":"MD5","value":"50dcdf9e0196cbb9311577be3d959b0a"},"creationDate":"2022-11-03"}},{"description":"English, Wikipedia corpus","label":"wiki.eng.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168102,"persistentId":"doi:10.18710/CES0L8/BBA2AH","pidURL":"https://doi.org/10.18710/CES0L8/BBA2AH","filename":"wiki.eng.clean.gz","contentType":"application/x-gzip","filesize":31152894,"description":"English, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843e063ca6-de6d49112a44","rootDataFileId":-1,"md5":"583be72d4a5e016fe28f92fb60d17879","checksum":{"type":"MD5","value":"583be72d4a5e016fe28f92fb60d17879"},"creationDate":"2022-11-03"}},{"description":"Estonian, Wikipedia corpus","label":"wiki.est.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168097,"persistentId":"doi:10.18710/CES0L8/PXJOGA","pidURL":"https://doi.org/10.18710/CES0L8/PXJOGA","filename":"wiki.est.clean.gz","contentType":"application/x-gzip","filesize":25311805,"description":"Estonian, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843dfe040d-9363fd7972e8","rootDataFileId":-1,"md5":"7b34964e9635662b2b0edbde81ecf605","checksum":{"type":"MD5","value":"7b34964e9635662b2b0edbde81ecf605"},"creationDate":"2022-11-03"}},{"description":"Farsi, Wikipedia corpus","label":"wiki.fas.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168083,"persistentId":"doi:10.18710/CES0L8/YCEJML","pidURL":"https://doi.org/10.18710/CES0L8/YCEJML","filename":"wiki.fas.clean.gz","contentType":"application/x-gzip","filesize":35014946,"description":"Farsi, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843df361c7-7f9539a2a6c3","rootDataFileId":-1,"md5":"8ffe0293e785ff54b390e065bbeeb534","checksum":{"type":"MD5","value":"8ffe0293e785ff54b390e065bbeeb534"},"creationDate":"2022-11-03"}},{"description":"Finnish, Wikipedia corpus","label":"wiki.fin.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168107,"persistentId":"doi:10.18710/CES0L8/UFGHTP","pidURL":"https://doi.org/10.18710/CES0L8/UFGHTP","filename":"wiki.fin.clean.gz","contentType":"application/x-gzip","filesize":43789846,"description":"Finnish, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843df3d393-fb50e15929d1","rootDataFileId":-1,"md5":"9d591a2efe711d5cea28fa6240327ab7","checksum":{"type":"MD5","value":"9d591a2efe711d5cea28fa6240327ab7"},"creationDate":"2022-11-03"}},{"description":"French, Wikipedia corpus","label":"wiki.fra.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168124,"persistentId":"doi:10.18710/CES0L8/YHU0AW","pidURL":"https://doi.org/10.18710/CES0L8/YHU0AW","filename":"wiki.fra.clean.gz","contentType":"application/x-gzip","filesize":33456714,"description":"French, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843dfa30a4-c6170a05fecc","rootDataFileId":-1,"md5":"d75f73c957b963fd0665398cc0b45a78","checksum":{"type":"MD5","value":"d75f73c957b963fd0665398cc0b45a78"},"creationDate":"2022-11-03"}},{"description":"Galician, Wikipedia corpus","label":"wiki.glg.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168109,"persistentId":"doi:10.18710/CES0L8/CBLAAS","pidURL":"https://doi.org/10.18710/CES0L8/CBLAAS","filename":"wiki.glg.clean.gz","contentType":"application/x-gzip","filesize":30416844,"description":"Galician, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843dfa809e-e2c850af7626","rootDataFileId":-1,"md5":"888d461391de02ae06bcf2ebfc533c8d","checksum":{"type":"MD5","value":"888d461391de02ae06bcf2ebfc533c8d"},"creationDate":"2022-11-03"}},{"description":"Hebrew, Wikipedia corpus","label":"wiki.heb.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168137,"persistentId":"doi:10.18710/CES0L8/CSIETV","pidURL":"https://doi.org/10.18710/CES0L8/CSIETV","filename":"wiki.heb.clean.gz","contentType":"application/x-gzip","filesize":45269212,"description":"Hebrew, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843dfebab7-5d69bdc2e8ac","rootDataFileId":-1,"md5":"fc47221cfb06fdf44c45a752e69e291d","checksum":{"type":"MD5","value":"fc47221cfb06fdf44c45a752e69e291d"},"creationDate":"2022-11-03"}},{"description":"Hindi, Wikipedia corpus","label":"wiki.hin.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168100,"persistentId":"doi:10.18710/CES0L8/HMZHDX","pidURL":"https://doi.org/10.18710/CES0L8/HMZHDX","filename":"wiki.hin.clean.gz","contentType":"application/x-gzip","filesize":34635673,"description":"Hindi, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843df42e8a-1fa92bf69b94","rootDataFileId":-1,"md5":"f881a204b49156c2a8d5fac1e3b63b41","checksum":{"type":"MD5","value":"f881a204b49156c2a8d5fac1e3b63b41"},"creationDate":"2022-11-03"}},{"description":"Hungarian, Wikipedia corpus","label":"wiki.hun.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168104,"persistentId":"doi:10.18710/CES0L8/SVJYOH","pidURL":"https://doi.org/10.18710/CES0L8/SVJYOH","filename":"wiki.hun.clean.gz","contentType":"application/x-gzip","filesize":45297190,"description":"Hungarian, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843dfaf617-5e60bf3a4f1c","rootDataFileId":-1,"md5":"1662f357979c966d43cac2bbfc535233","checksum":{"type":"MD5","value":"1662f357979c966d43cac2bbfc535233"},"creationDate":"2022-11-03"}},{"description":"Indonesian, Wikipedia corpus","label":"wiki.ind.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168116,"persistentId":"doi:10.18710/CES0L8/SCXO8V","pidURL":"https://doi.org/10.18710/CES0L8/SCXO8V","filename":"wiki.ind.clean.gz","contentType":"application/x-gzip","filesize":30042740,"description":"Indonesian, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843dff0994-faaa3dd6214b","rootDataFileId":-1,"md5":"ade3676e46eec65b0db85175addcc15c","checksum":{"type":"MD5","value":"ade3676e46eec65b0db85175addcc15c"},"creationDate":"2022-11-03"}},{"description":"Italian, Wikipedia corpus","label":"wiki.ita.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168110,"persistentId":"doi:10.18710/CES0L8/8Q9GVU","pidURL":"https://doi.org/10.18710/CES0L8/8Q9GVU","filename":"wiki.ita.clean.gz","contentType":"application/x-gzip","filesize":34981818,"description":"Italian, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843dff6450-170017759231","rootDataFileId":-1,"md5":"a93537dc085007538095674de5b99c7a","checksum":{"type":"MD5","value":"a93537dc085007538095674de5b99c7a"},"creationDate":"2022-11-03"}},{"description":"Korean, Wikipedia corpus","label":"wiki.kor.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168120,"persistentId":"doi:10.18710/CES0L8/8AGCTQ","pidURL":"https://doi.org/10.18710/CES0L8/8AGCTQ","filename":"wiki.kor.clean.gz","contentType":"application/x-gzip","filesize":48483399,"description":"Korean, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843dfb7528-a8ad8f9735a3","rootDataFileId":-1,"md5":"ff3abddd967edb7f1ba3d10fc35b72bf","checksum":{"type":"MD5","value":"ff3abddd967edb7f1ba3d10fc35b72bf"},"creationDate":"2022-11-03"}},{"description":"Latvian, Wikipedia corpus","label":"wiki.lav.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168123,"persistentId":"doi:10.18710/CES0L8/DKFS4R","pidURL":"https://doi.org/10.18710/CES0L8/DKFS4R","filename":"wiki.lav.clean.gz","contentType":"application/x-gzip","filesize":13821402,"description":"Latvian, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843dfb9a90-113806866941","rootDataFileId":-1,"md5":"d78284cf31d4a39ce7562edd3be56888","checksum":{"type":"MD5","value":"d78284cf31d4a39ce7562edd3be56888"},"creationDate":"2022-11-03"}},{"description":"Dutch, Wikipedia corpus","label":"wiki.nld.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168113,"persistentId":"doi:10.18710/CES0L8/MPHRES","pidURL":"https://doi.org/10.18710/CES0L8/MPHRES","filename":"wiki.nld.clean.gz","contentType":"application/x-gzip","filesize":35358309,"description":"Dutch, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843dffc071-20b7bd1bf6e1","rootDataFileId":-1,"md5":"ec1c2348bb6eabf3d247aa99da475ee8","checksum":{"type":"MD5","value":"ec1c2348bb6eabf3d247aa99da475ee8"},"creationDate":"2022-11-03"}},{"description":"Norwegian, Wikipedia corpus","label":"wiki.nor.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168096,"persistentId":"doi:10.18710/CES0L8/FS8L6S","pidURL":"https://doi.org/10.18710/CES0L8/FS8L6S","filename":"wiki.nor.clean.gz","contentType":"application/x-gzip","filesize":34698984,"description":"Norwegian, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843df48937-c11c720cb3a2","rootDataFileId":-1,"md5":"d491b8273b0f7fc26c7d159cef549394","checksum":{"type":"MD5","value":"d491b8273b0f7fc26c7d159cef549394"},"creationDate":"2022-11-03"}},{"description":"Polish, Wikipedia corpus","label":"wiki.pol.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168079,"persistentId":"doi:10.18710/CES0L8/JM4ZWO","pidURL":"https://doi.org/10.18710/CES0L8/JM4ZWO","filename":"wiki.pol.clean.gz","contentType":"application/x-gzip","filesize":44154437,"description":"Polish, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843e029c59-df48a2dcb6d4","rootDataFileId":-1,"md5":"db4510005038908512f84c3023a4b00b","checksum":{"type":"MD5","value":"db4510005038908512f84c3023a4b00b"},"creationDate":"2022-11-03"}},{"description":"Portuguese, Wikipedia corpus","label":"wiki.por.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168117,"persistentId":"doi:10.18710/CES0L8/V8N21B","pidURL":"https://doi.org/10.18710/CES0L8/V8N21B","filename":"wiki.por.clean.gz","contentType":"application/x-gzip","filesize":33383617,"description":"Portuguese, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843df4e024-2fbd669296d6","rootDataFileId":-1,"md5":"edee5c2ab0e77ff23e7573fa5727ee16","checksum":{"type":"MD5","value":"edee5c2ab0e77ff23e7573fa5727ee16"},"creationDate":"2022-11-03"}},{"description":"Romanian, Wikipedia corpus","label":"wiki.ron.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168126,"persistentId":"doi:10.18710/CES0L8/SU0CCH","pidURL":"https://doi.org/10.18710/CES0L8/SU0CCH","filename":"wiki.ron.clean.gz","contentType":"application/x-gzip","filesize":36660079,"description":"Romanian, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843e001f40-778d0d844557","rootDataFileId":-1,"md5":"04ab60ccdf006ca5a8f0859e857e4cb5","checksum":{"type":"MD5","value":"04ab60ccdf006ca5a8f0859e857e4cb5"},"creationDate":"2022-11-03"}},{"description":"Russian, Wikipedia corpus","label":"wiki.rus.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168087,"persistentId":"doi:10.18710/CES0L8/ZI8MBQ","pidURL":"https://doi.org/10.18710/CES0L8/ZI8MBQ","filename":"wiki.rus.clean.gz","contentType":"application/x-gzip","filesize":51897408,"description":"Russian, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843df565c6-93d74542a9c2","rootDataFileId":-1,"md5":"d501312db0798cc3bdf6b43161412c3d","checksum":{"type":"MD5","value":"d501312db0798cc3bdf6b43161412c3d"},"creationDate":"2022-11-03"}},{"description":"Slovenian, WIkipedia corpus","label":"wiki.slv.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168131,"persistentId":"doi:10.18710/CES0L8/MWCMM5","pidURL":"https://doi.org/10.18710/CES0L8/MWCMM5","filename":"wiki.slv.clean.gz","contentType":"application/x-gzip","filesize":25963617,"description":"Slovenian, WIkipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843df5aa01-b84cc3cc98a6","rootDataFileId":-1,"md5":"8f391fdd04fe6f1812e0deccadf19b70","checksum":{"type":"MD5","value":"8f391fdd04fe6f1812e0deccadf19b70"},"creationDate":"2022-11-03"}},{"description":"Spanish, Wikipedia corpus","label":"wiki.spa.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168119,"persistentId":"doi:10.18710/CES0L8/KL4FI3","pidURL":"https://doi.org/10.18710/CES0L8/KL4FI3","filename":"wiki.spa.clean.gz","contentType":"application/x-gzip","filesize":32821272,"description":"Spanish, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843dfc0504-93e61e3a747e","rootDataFileId":-1,"md5":"9671d9bfa8445b57ef975e93317f22b5","checksum":{"type":"MD5","value":"9671d9bfa8445b57ef975e93317f22b5"},"creationDate":"2022-11-03"}},{"description":"Swedish, Wikipedia corpus","label":"wiki.swe.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168112,"persistentId":"doi:10.18710/CES0L8/RMZOHX","pidURL":"https://doi.org/10.18710/CES0L8/RMZOHX","filename":"wiki.swe.clean.gz","contentType":"application/x-gzip","filesize":38410355,"description":"Swedish, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843e01db9b-1020c5b40fff","rootDataFileId":-1,"md5":"07cbff1a2cbc3b33f99aeb2854eb4c26","checksum":{"type":"MD5","value":"07cbff1a2cbc3b33f99aeb2854eb4c26"},"creationDate":"2022-11-03"}},{"description":"Tagalog, WIkipedia corpus","label":"wiki.tgl.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168095,"persistentId":"doi:10.18710/CES0L8/Z0XHKC","pidURL":"https://doi.org/10.18710/CES0L8/Z0XHKC","filename":"wiki.tgl.clean.gz","contentType":"application/x-gzip","filesize":4317595,"description":"Tagalog, WIkipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843dfc1a19-ab4dde132b87","rootDataFileId":-1,"md5":"6c8d6ee2accb89e9a60ce6ee40c35bad","checksum":{"type":"MD5","value":"6c8d6ee2accb89e9a60ce6ee40c35bad"},"creationDate":"2022-11-03"}},{"description":"Thai, Wikipedia corpus","label":"wiki.tha.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168093,"persistentId":"doi:10.18710/CES0L8/RMXXUO","pidURL":"https://doi.org/10.18710/CES0L8/RMXXUO","filename":"wiki.tha.clean.gz","contentType":"application/x-gzip","filesize":41778756,"description":"Thai, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843e008b6e-20b9ef06c128","rootDataFileId":-1,"md5":"78001b27f58f2b3c17b9351ff039fc53","checksum":{"type":"MD5","value":"78001b27f58f2b3c17b9351ff039fc53"},"creationDate":"2022-11-03"}},{"description":"Turkish, Wikipedia corpus","label":"wiki.tur.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168106,"persistentId":"doi:10.18710/CES0L8/5RSYQI","pidURL":"https://doi.org/10.18710/CES0L8/5RSYQI","filename":"wiki.tur.clean.gz","contentType":"application/x-gzip","filesize":32706441,"description":"Turkish, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843dfc72eb-1886573b8035","rootDataFileId":-1,"md5":"fa9448cc39af6bb5d47442cc59bf5521","checksum":{"type":"MD5","value":"fa9448cc39af6bb5d47442cc59bf5521"},"creationDate":"2022-11-03"}},{"description":"Ukrainian, Wikipedia corpus","label":"wiki.ukr.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168108,"persistentId":"doi:10.18710/CES0L8/JFRC5C","pidURL":"https://doi.org/10.18710/CES0L8/JFRC5C","filename":"wiki.ukr.clean.gz","contentType":"application/x-gzip","filesize":51815323,"description":"Ukrainian, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843e016a44-7a7a49baa319","rootDataFileId":-1,"md5":"abbe184bae777aad6c6dbe8bea782747","checksum":{"type":"MD5","value":"abbe184bae777aad6c6dbe8bea782747"},"creationDate":"2022-11-03"}},{"description":"Urdu, Wikipedia corpus","label":"wiki.urd.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168103,"persistentId":"doi:10.18710/CES0L8/BGZAXF","pidURL":"https://doi.org/10.18710/CES0L8/BGZAXF","filename":"wiki.urd.clean.gz","contentType":"application/x-gzip","filesize":6555780,"description":"Urdu, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843e00e4de-04fbba0c24cf","rootDataFileId":-1,"md5":"f52b111a6d26e9ab1210c0b9da214baa","checksum":{"type":"MD5","value":"f52b111a6d26e9ab1210c0b9da214baa"},"creationDate":"2022-11-03"}},{"description":"Vietnamese, Wikipedia corpus","label":"wiki.vie.clean.gz","restricted":false,"version":1,"datasetVersionId":3771,"dataFile":{"id":168078,"persistentId":"doi:10.18710/CES0L8/X0DP9Z","pidURL":"https://doi.org/10.18710/CES0L8/X0DP9Z","filename":"wiki.vie.clean.gz","contentType":"application/x-gzip","filesize":27246786,"description":"Vietnamese, Wikipedia corpus","storageIdentifier":"S3://2002-yellow-dataverseno:1843e00d275-eddaab1b71b7","rootDataFileId":-1,"md5":"e57d89718ba412ab42538fdc44f6dc4d","checksum":{"type":"MD5","value":"e57d89718ba412ab42538fdc44f6dc4d"},"creationDate":"2022-11-03"}}],"citation":"Dunn, Jonathan, 2022, \"Replication Data for: Exposure and Emergence in Usage-Based Grammar: Computational Experiments in 35 Languages\", https://doi.org/10.18710/CES0L8, DataverseNO, V1"}}