The data set consists of simulated time‑series measurements from two gas‑lifted subsea oil wells, used to develop and evaluate data‑driven virtual flow metering (VFM) models for oil and gas flow rate prediction.
Purpose: To assess a range of machine learning algorithms (10 methods, including LSTM, MLP, XGBoost, SVR, tree‑based and linear methods) for predicting multiphase flow rates in subsea oil production, and identify which give the lowest prediction error.
To study the impact of measurement noise, the effect of noise filtering (median filter), and the quantification of prediction uncertainty (via 95% confidence intervals in XGBoost) in a VFM context.
Scope: Two wells (Well 1 and Well 2) are considered, each represented by an open‑loop simulation model of a gas‑lifted oil well derived from Janatian et al. (2022).
For each well, 5 762 samples of process data are generated and split into 70% training and 30% test sets using a time‑series split; key input variables include bottom‑hole and wellhead pressures and temperatures plus choke opening, with oil and gas flow rates as targets.
The study covers the full workflow: data collection from the simulator, preprocessing (scaling, time‑series splitting, noise injection and filtering), model training and hyperparameter tuning, performance comparison via MAPE, and uncertainty quantification.
Nature of the data: Synthetic, model‑generated process data rather than field measurements: data come from a validated dynamic model of gas‑lifted wells, not directly from a physical asset.
Multivariate, time‑series data at sample‑level resolution, comprising sensor‑like inputs (pressures, temperatures, choke openings) and corresponding oil and gas flow rates over time for each well.
Used primarily as a benchmarking set for supervised learning: different regression algorithms are trained and tested on identical data to compare prediction accuracy, robustness to impulse noise, and the effect of noise reduction and uncertainty quantification techniques.