|
Persistent Identifier
|
doi:10.18710/WEZMJE |
|
Publication Date
|
2026-02-26 |
|
Title
| Label system, dictionaries, and audit evidence for harmonised over 133,000 feedstock items across major conversion technologies |
|
Author
| Barahmand, Zahirhttps://ror.org/05ecg5h20ORCIDhttps://orcid.org/0000-0001-9031-596X |
|
Point of Contact
|
Use email button above to contact.
Barahmand, Zahir (University of South-Eastern Norway) |
|
Description
| This repository provides a large-scale, reproducibly labeled dataset of feedstock terms reported in the biomass and waste conversion literature. The dataset contains 133,000+ labeled feedstock items extracted from 121,000+ peer-reviewed studies. It is derived from upstream, review-based technology corpora hosted in DataverseNO, where feedstock descriptors were extracted from titles and abstracts and stored together with stable record identifiers and technology/corpus labels. In this release, the curated feedstock descriptor strings are converted into an atomic token-level representation using rule-based splitting, so that each row contains one feedstock item linked to its source record. Each atomic feedstock item is labeled using a four-stage pipeline (L1–L4). L1 assigns two foundational attributes—material status and renewability—using controlled vocabularies and a deterministic-first decision hierarchy: curated dictionary matching, followed by explicit rules and text normalization, and finally a governed LLM-assisted resolver only for unresolved cases. L2 adds a hazardness triage label, focusing on waste- and by-product-related terms, using curated dictionaries and an external reference list aligned with the European List of Waste as a consistency aid (not as a regulatory classification). L3 applies a dedicated taxonomy for primary biomass items by assigning a primary domain and subcategory, while leaving non-primary items outside the taxonomy (taxonomy fields remain blank). L4 is a final quality-assurance stage applied after L1–L3 to ensure that equivalent feedstock terms receive consistent labels across all technology corpora when the datasets are combined. The release is designed for reuse and computational reproducibility. It provides one labeled table per technology/corpus and a consolidated global dictionary with a consistent column structure. Versioned dictionaries, scripts, and audit artifacts are included to reproduce the labeling workflow and to document how each label set was produced. Consistency is supported through multi-layer validation and curation, including structured manual review, automated coverage and conflict audits, and cross-technology dictionary governance. Manual corrections are externalized into curated artifacts rather than overwriting output files. The dataset is intended for applications that require harmonized feedstock terminology across conversion pathways, including cross-technology feedstock mapping, harmonized inputs for modelling workflows, and comparative literature synthesis. (2026-02-18) |
|
Subject
| Chemistry; Earth and Environmental Sciences; Engineering |
|
Keyword
| circular economy
hazardous waste
primary biomass
evidence map
gasification
pyrolysis
fermentation |
|
Producer
| University of South-Eastern Norway (USN) https://www.usn.no/english/ |
|
Production Date
| 2026-02-18 |
|
Production Location
| University of South-Eastern Norway |
|
Contributor
| Data Curator: Barahmand, Zahir |
|
Distributor
| University of South-Eastern Norway (USN) https://dataverse.no/dataverse/usn |
|
Distribution Date
| 2026-02-18 |
|
Depositor
| Barahmand, Zahir |
|
Deposit Date
| 2026-02-18 |
|
Data Type
| Tabular data and documentation |
|
Software
| Python, Version: 3
Microsoft Office, Version: 365 |
|
Related Dataset
| Barahmand, Zahir, 2026, "Supplementary dataset and reproducible codes for LLM-assisted mapping feedstocks of eight conversion technologies from over 121,000 studies", https://doi.org/10.18710/JM6U7B, DataverseNO, V1; Barahmand Zahir; Eikeland Marianne Sørflaten, 2025, "Dataset and code supplement: Mapping gasification technologies and feedstocks with a dual validated large-scale literature-derived dataset", https://doi.org/10.23642/USN.30546347, DataverseNO, V1; Barahmand Zahir, 2025, "Supplementary code and curated data for 1,863 experimental gasification studies (laboratory to commercial scale)", https://doi.org/10.23642/USN.30702092, DataverseNO, V1 |