EU Swim Project

Predicting water quality

Machine learning models can be developed as useful tools to predict bathing water quality, enabling beach-users to access real-time water quality information, rather than retrospective results. At the majority of designated bathing sites, one water sample is collected per week, after which the results are analysed in a laboratory and reported to the public. This process can take up to 48 hours, and with water quality having the ability to change over much shorter time scales, relying on a weekly, retrospective result means that people are using bathing sites with unknown levels of water quality. As previously discussed here, entering a body of water which has poor water quality can lead to a serious illness, such as gastroenteritis. This is where having the ability to provide an accurate daily prediction of water quality, from a specially designed model, could help to protect public health, by reducing the risk of illness in recreational water users.

The first stage of building a predictive system was for our modellers to select the correct modelling approach, which was not an easy task, as the number of approaches and algorithms which could be suitable for use is considerable. After a review of the literature to explore what approaches have been used successfully in similar projects, a range of different non-linear and tree-based methods were selected for initial testing. As a large number of variables, further described below, are to be input into the model, tree-based methods were first utilised, as they can handle many predictive variables, without requiring in-depth variable selection.

Building a model to predict water quality combines many elements, including, for e.g., historical and up-to-date data from physical water samples, the level of local rainfall, direction/speed of wind, tidal patterns, and a range of other potential hyro-meteorological variables. Additionally, water quality can be impacted by other factors such as sewage infrastructure, dogs on the beach, and even the amount of birds generally present at the location. By using local environmental variables for each site, modellers can build useful tools that provide predictions that are up-to-date, accurate and easily comparable to pre-existing weekly results, ensuring the transition to the new system is as smooth and useful as possible.

As might be expected, the project’s model building process has had its own unique challenges, namely, the relatively small amount of observed historical data which has been classified as poor; the model is hungry for this type of data, as it is able to learn from it. With this in mind, a preliminary model, which will use gauged rainfall data to determine a threshold of rainfall quantity beyond which bathing water quality is likely to be impaired, has been developed. This early model, which will provide binary results, i.e. ‘poor’ or ‘not poor’, will be able to be implemented at all 9 EU SWIM sites; whilst, providing modellers with an informative benchmark against which more complex multi-variate models will be able to be compared. These more complex models are in development by the project team and could have the ability to provide results which match the existing classifications.

Building a model that has the ability to accurately predict water quality, at highly variable locations is a complicated process but once achieved could transform the level of information provided to beach users; making checking the level of water cleanliness, at your favourite beach, becoming as easy and as normal, as checking the daily weather forecast.


Lost your password?

User registration is disabled for now. Contact site administrator.