New Publication in Water Research Highlights the Importance of Data Handling in Pollution Research
Understanding the link between socioeconomic drivers and surface water pollution is crucial for protecting aquatic ecosystems and drinking water. Yet, many environmental datasets pose challenges such as values below detection limits and extreme measurements, which complicate statistical analysis.
This study, funded by the Urban Futures research platform, uses a global dataset of pharmaceutical pollution in rivers to show that data preparation, including handling values below detection thresholds, aggregation, and normalization, influence the outcomes of linear regression and group comparison tests. The results highlight that not only the choice of statistical method but also the way raw data are treated impacts the conclusions drawn about external (e.g., socioeconomic) influences on pollution.
Based on these findings, the authors propose a systematic procedure for exploring multiple data-preparation pathways and provide recommendations for future studies. These insights underline the importance of transparent and careful data handling in environmental research and improve the reliability of conclusions about the drivers of anthropogenic water pollution.
