ML in Freshwater Management

ML in Freshwater Management

Session #1: Sept 6th, 6PM ET.
🤔
Satellite data has enabled new ways of monitoring water levels around the world. However, all satellite measurements have a certain amount of error & this group will use ML techniques to decrease that error so better water management policies might result!
🤔
Pre-Requisites - This engagement requires a working understanding of Python, data science practices, and machine learning models. No domain-specific expertise in environmental sciences is necessary, but interest is preferred.
 

Leads / Advisors


notion image
~Debjani Mukherjee | Lead Consultant Green Connect Solutions
 
notion image
~Jesse Passmore | Lead Global Resources Analysis Group
notion image
~Dr. Zhauqin Li | Advisor
Prior Experience with Post Doctoral Research with Natural Resources Canada
 
 
❗ Weekly Meeting Time: 6-7 PM ET on Tuesdays
❗ Slack channel [Communication point]: click here to join [If issues, drop ammar@ai.science an email]
 
 

Context


Lake water level monitoring is fundamental for our precious freshwater resources management and hydrological research, especially in the context of climate change and intensified anthropogenic activities.
In situ hydrological (ie the manual way) measuring of bodies of water are either not possible or not reliable for most of Earth’s freshwater, as a result, altimetry measurements (measurements of height/altitude) from satellites such as the European Space Agency’s ‘Sentinel’ series are relied upon. Such measurements have irreducible hidden variables, an unknowable error, that needs to be taken into account to ensure the readings are useful. Currently, addressing these irreducible limitations is an algorithm called a ‘Kalman Filter’.
A Kalman Filter is one of the most important and common estimation algorithms. The Kalman Filter produces estimates of hidden variables based on inaccurate and uncertain measurements. Also, the Kalman Filter predicts the future system state based on past estimations.
In more recent years there have been attempts to make the estimations inside a Kalman Filter more accurate through deep learning. Types of recurrent Neural Networks can be used to learn local or global states based on given measurements, with accuracy honed enough that they are used by stock market brokerages. However in the satellite altimetry domain, thus far such attempts to combine a RNN with a Kalman Filter have only provided questionable improvement over a regular Kalman Filter. In addition, these attempts are so computationally expensive as to be beyond the reach of most people’s accessible hardware.
In the previous Working Group [attending not a pre-requisite], we explored the KalmanNet paper, which used a Neural Network at iterative stages to reduce the estimated elements as much as possible. Though the KalmanNet paper showed incredible promise, the associated Github repo did not deliver commensurately.
In the end, both teams of the Working Group opted to create a linear Kalman Filter, and achieved reasonable results therein; with plenty of room for improvement.
Though the KalmanNet repo did not function, even had it done so; we came to the conclusion that running a Neural Network to study the state represented in a series of measurements per time steps, in every single timestep; would be far more computationally expensive than the possible tiny improvements in accuracy procured. It was deduced that a different structuring of neural network and Kalman Filter combination would be needed.
If we’re able to improve on what’s currently available, this could lead to better water management data, and therefore better policy making in this arena ‼️
 
 

Minimum Success


Minimum Goal:
  • Build a Kalman filter with more accurate estimations in measurement and error compared to industry standard Kalman filters, being as computationally inexpensive as possible.
Stretch Goal:
  • Expand the use and robustness of our Kalman Filter to water levels in a variety of lakes around the world, focusing on bodies of water rendered critically low by changing ecosystems (whether due to climate or human interference).
 
 

Early Audience Hypothesis


The target audience for this working group would be any public or private agency with a satellite whose purpose at least in part, is to take altimetry measurements. In addition, any agency that procures the services of such satellites for water level observation, flood threats, and changing water tables. Examples would be NASA, ESA (European Space Agency), NRCan (Natural Resources Canada), and governments of countries with rapidly disappearing fresh water such as South Africa and USA.
 
 

Starting Dataset


Four datasets total: Sentinel A, Sentinel B, Sentinel A+B, and Cleaned Sentinel A+B (two standard deviations of water levels taken).
M Sat water level & error readings for Lake Winnipeg: M_Sat. (Already cleaned and non-useful data removed, and error calculated per timestep. Timesteps have been trimmed to those of the Sentinel Satellites.)

Starting Recipes (please be familiar with these before session 1)

  • The KalmanNet paper: This was the scholarly article that sparked the original Machine Learning in Freshwater Management group’s initiative. Though the associated code repository did not prove fruit bearing, spending a few minutes understanding what the KalmanNet team were attempting to do will give a good incite into what this working group aims at.
  • Pytorch Basics: For those who are interested in using deep learning to improve the accuracy of Kalman Filters, having a baseline understanding of Pytorch or Tensorflow is mandatory.
  • Getting Started with Google Colab: For those who do not have access to a local system capable of adequate data science work, Google Colab is a must (and it’s free).

Recipes Relevant to the Working Group, throughout our 8 weeks:

 

Tentative Timeline


#
Major Milestones
Expected Finish
1
Get familiar with the domain and previous work.
Week 1
2
Download sample data, explore the data, clean and structure the data, identify the correlations, perform feature extraction and run baseline model [based on colab with all the code needed]
Week 1
3
Collaborative optimization and experimentation [playing with classes, features, and models for improvement]
4 weeks
4
Expanding the model(s) to lakes around the world, continued exploration of model(s) as results come in.
3 weeks
 
 

Why join?


Aggregate Intellect hosts one of the most diverse ML communities in the world. Over the course of the working group
  • You’ll get an immersion into that community & walk out with some cool new friends.
  • Get spotlighted for your efforts on our community website!
  • Advance your ML skills in remote sensing for the water resources management domain

 

Week 1 Results


Aggregate Intellect hosts one of the most diverse ML communities in the world. Over the course of the working group
  • A Random Forest Regressor & Linear Kalman Filter hybrid. Taking the example Kalman Filter and Random Forest Regression models from above, and combining them at uniformly distributed time steps.
    • The KF trusts its own predictions more, and longer.
    • The amplitude of errors remains constant, instead of climbing.
    • The Kalman Gain remains constant, instead of narrowing in on 1.

    Week 2 Results


    • Sanjay B’s Introduction to Tensorflow and LSTM’s. An excellent guide to learn from, accessible to beginners.
      • Examples of changes in the parameters within a Deep Learning model are visually explored.
      • Any questions, please ask Sanjay. He’s very knowledgable.
    • A LSTM & Linear Kalman Filter hybrid. Taking the example LSTM from the Deep Learning intro Colab Notebook, and injecting it in a similar manner to the Random Forest Regressor from last week.
      • The model error predictions are averaged with the Kalman Filter predictions every 39th time step. To help offset the overconfidence issue of the previous hybrid.
      • Akin to the previous hybrid, this one trusts itself throughout the measurement process. (But does so while still incorporating more of the recorded results into its predictions.)
      • The Kalman Gains are now gently narrowing in on ‘0’, though are extremely erratic.

    Week 3 Results


    • Several ML/DL - KF hybrids. Experimenting with various ML and DL models with the initial linear Kalman filter model.
      • The hunt continues for the elusive ideal of a Kalman Filter narrowing in on ‘0’ in Kalman Gains, with steady and low error, and a predicted water level line graph that equally ignores and considers the measured results.
      • With increased intentional bias introduced to correct overconfidence problems, or lack of confidence problems in the KF; introduce bizarre results.
      • Robustness needs to be introduced to the overarching model; whether through a differing KF, or in the ML/DL aspect. This could be different feature engineering, to a multivariate LSTM.

    Week 4 Results


    • Anonymous Bob’s time series ML approach to ML/KF hybrid. Using the RFR and KF models from this landing page, an intrepid exploration into a hybrid Kalman Filter is made.
      • Instead of partially/wholly replacing the error prediction within a Kalman Filter every X time steps, this model partially replaces the water level prediction every X time steps.
      • Unique feature engineering is executed, more closely resembling stock market features.
      • The optimal X for interference frequency is investigated, with clear results abounding from the foray.
      • The author mostly copy and pasted code from this page; demonstrating the power of teamwork, and how anyone can start contributing nomatter how new to the data science field they are.
    • Anonymous Homer’s LSTM/GRU KF replacement. Using the LSTM and GRU posted in earlier weeks, an attempt to bypass the Kalman Filter altogether is made.
      • The author made the LSTM/GRU much larger, with more layers and neurons, and a longer loopback sequence for prediction.
      • If the measurements provided by the satellite were accurate, this approach would be ideal. However, the reason for a Kalman Filter is the measurements aren’t accurate; and we don’t know how inaccurate.
      • Using the scoring matrix for next week, it might be possible for the author to take this route in the future.
     
     
     
     

    This effort is being sponsored by our friends at


    notion image