Explaining deep learning models for ozone pollution prediction via embedded feature selection

Ambient air pollution is a pervasive global issue that poses significant health risks. Among pollutants, ozone (O3) is responsible for an estimated 1 to 1.2 million premature deaths yearly. Furthermore, O3 adversely affects climate warming, crop productivity, and more. Its formation occurs when nitrogen oxides and volatile organic compounds react with short-wavelength solar radiation. Consequently, urban areas with high traffic volume and elevated temperatures are particularly prone to elevated O3 levels, which pose a significant health risk to their inhabitants. In response to this problem, many countries have developed web and mobile applications that provide real-time air pollution information using sensor data. However, while these applications offer valuable insight into current pollution levels, predicting future pollutant behavior is crucial for effective planning and mitigation strategies. Therefore, our main objectives are to develop accurate and efficient prediction models and identify the key factors that influence O3 levels. We adopt a time series forecasting approach to address these objectives, which allows us to analyze and predict future O3 behavior. Additionally, we tackle the feature selection problem to identify the most relevant features and periods that contribute to prediction accuracy by introducing a novel method called the Time Selection Layer in Deep Learning models, which significantly improves model performance, reduces complexity, and enhances interpretability. Our study focuses on data collected from five representative areas in Seville, Cordova, and Jaen provinces in Spain, using multiple sensors to capture comprehensive pollution data. We compare the performance of three models: Lasso, Decision Tree, and Deep Learning with and without incorporating the Time Selection Layer. Our results demonstrate that including the Time Selection Layer significantly enhances the effectiveness and interpretability of Deep Learning models, achieving an average effectiveness improvement of 9% across all monitored areas.

PDF Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods