ML in Streamflow Prediction | Sponsored by NRCan

ML in Streamflow Prediction | Sponsored by NRCan

Dates: June - Oct 2022
We will be applying time series and other relevant techniques to Canadian streamflow data in drought-prone areas.

Leads / Advisors

Lead: Yan Nusinovich, Data Scientist at Slate.AI
Advisors: Dr Andre Erler, Senior Climate Scientist Aquanty | Dr. Karen Smith, Assistant Professor, U of Toronto


  • “Adequate water resources management is an essential component of socioeconomic security and development. This is made even more critical with the increasing global population and impacts of climate change on water resources” (
  • Minimal goal: create a time series model for predicting streamflow using Natural Resources Canada data.
  • Stretch goal: create a model that automatically updates streamflow predictions at regular intervals.

Top Contributors

~ Winning $2500 in prizes for notable progress

A bit about our experience

Being the last of the NRCan-sponsored working-groups, this group had a lot of initial difficulties getting started and probably suffered from a longer absence of one of the leads/domain experts (myself).
The goal of this group was to predict streamflow in multiple watersheds in Canada based on topographic and weather data, following a number of recent research papers (which used US data). This goal was achieved for the most part, although not at the scale we initially aimed for. As is often the case with real-data-projects the first hurdle was to acquire the right datasets and while supposedly cleaned and structure datasets were available, the "quality of the quality control" on the available data left much to be desired, resulting in a lot of time spent on data exploration and cleaning (and unfortunately some attrition). One thing that became quite apparent to me as a domain expert in this context is that my background made it much easier for me to handle and clean the data, while it was much more of an obstacle to participants with a pure data science background - but I guess that was the point of the workshop: to help data scientist overcome these barriers and get their feet wet!
In the end, we were able to train models with some predictive skill, and I think we all learned a lot - I certainly did learn a lot about the timeseries prediction aspect! However, as it turned out, the models were relying primarily on auto-regressive properties, rather than the underlying dynamics of the system (like rain and snowmelt). This was a bit disappointing to me as a scientist, but it does highlight that teaching physics to a machine learning algorithm requires a bit more expertise and understanding (of both, ML and the system under consideration). In conclusion, I think this was really a story about collaboration and learning (as it should be): I don't think we would have been able to achieve what we have without the contributions of everyone on the team (domain experts/leads and data scientists/participants). I for one, certainly hope to continue experimenting with the models we have developed, and maybe start another group to work on this topic (now that the data is much cleaner)!
~ Andre Erler, Senior Climate Scientist, Advisor

Sponsored by our friends at

notion image