|
Description
|
This dataset was developed to systematically characterise feedstock–technology relationships across eight major biomass conversion technologies by mining a large Scopus-derived bibliographic corpus (1887–2025; partial coverage for 2025). The workflow is LLM-assisted and fully reproducible, combining automated extraction of feedstock and technology phrases from bibliographic text fields (titles, abstracts, and keywords) with rule-based cleaning and a subsequent LLM-based validation step, followed by targeted manual curation for final release. The dataset is intended for use in technology landscape analyses, evidence synthesis, and comparative assessments of biomass conversion pathways, where consistent and traceable feedstock descriptors are required across a very large volume of studies. A data descriptor titled "A large-scale, LLM-assisted and validated dataset of biomass and waste conversion technologies and feedstocks" with the following abstract will published based on this dataset: Biomass, organic wastes and biogenic by-products are increasingly targeted for low-carbon fuels and value-added chemicals. However, strategic decision-making from a circular economy perspective requires a big-picture view of the relative significance of different conversion technologies in handling diverse feedstock portfolios, and no large-scale, cross-technology mapping of these portfolios is currently available. Thus, a literature-derived dataset was assembled, that links eight major waste-to-x valorisation technologies (gasification, pyrolysis, hydrothermal liquefaction, torrefaction, anaerobic digestion, aerobic digestion, fermentation and transesterification) to their reported feedstocks. Using the Scopus database, 121,365 records were retrieved with harmonised search strings, spanning publications from 1887 to 2025. This constrained yet scalable search strategy both facilitates automated extraction and validation and yields a rich dataset. Further, a large language model assisted workflow was implemented to extract candidate technology and feedstock phrases, followed by a two-level validation that combines rule-based cleaning with targeted LLM re-evaluation to minimise manual curation. The resulting dataset provides technology-specific, validated feedstock descriptors that supports comparative analyses and decision-support applications in a circular bioeconomy context. (2025-12-15)
|
|
Keyword
|
bioeconomy, circular economy, biomass, conversion technologies, gasification, fermentation, pyrolysis, torrefaction, aerobic digestion, anaerobic digestion |