Jordà Mascaró, Marc2024-05-202024-05-202022-09-18https://hdl.handle.net/20.500.14468/14198Noise pollution is the second most important environmental risk factor for health in Western Europe. It affects a large amount of people, it can cause a wide range of serious illnesses, and it is estimated to be the reason for 12000 premature deaths in Europe every year. Barcelona is above the 75th percentile of European cities exposed to harmful road traffic noise levels, and it is one of the most affected by nightly leisure noise. Several initiatives have been recently developed to address this problem, following the European regulations on this matter. The city provides a network of sensors to collect noise data at every minute all over the territory. We use noise data from 2017 to 2021 from a significant point of Barcelona. We process this information to transform it into an appropriate input for machine learning models, handling the missing values with the Prophet algorithm. Our multivariate time series problem is the following one: predicting the hourly noise values of the following 10 hours based on the previous 48 hourly values of noise and the values of weather and seasonal variables from the last hour. We compare different modelling approaches, all of them introduced with a theoretical framework. On the one hand, we use AutoML tools, such as TPOT and Keras, to determine optimal models for our problem. On the other hand, we manually tune Recurrent Neural Networks (RNNs), Long Short-Term Memory networks (LSTMs) and Gated Recurrent Units (GRUs), designed to perform well on long sequences of data. A manually tuned neural network combining RNN, LSTM and GRU layers outperformed all the other approaches with an average test RMSE of 3.412 dB(A) over all prediction horizons. Neural networks, though, are often considered black boxes, because they are so complex that it is very hard for the developers to justify the decisions they make. Therefore, in this work there is a theoretical introduction about the explainability of machine learning and deep learning models, focused on SHAP (Shapley Additive explanation) values. The Deep SHAP method is used to calculate the importance of the features on the predictions of the RNN-LSTM-GRU model. The feature with the highest contribution to the output is a seasonal variable informing the hour range of the day, followed by the noise in the three most recent hours.eninfo:eu-repo/semantics/openAccessPrediction of the noise pollution in Barcelona and model explainability using SHAP valuestesis de maestríanoise pollutionTime SeriesBarcelonaAutoMLLSTMGRUSHAP