Back in 1950, weather forecasting took a massive leap forward. Researchers used the first ever programmable computer, ENIAC, to crunch numbers and solve equations that described how weather behaves. This was the start of a digital revolution in weather forecasting. Fast forward more than 70 years, and we’ve made some serious progress. Thanks to improvements in computing power and the models we use, our weather forecasts have become more accurate over time. In fact, a 7-day forecast today is about as accurate as a 5-day forecast back in 2000, and a 3-day forecast in 1980. That might not sound like a big deal, but it’s actually pretty impressive. Every extra day of accurate forecasting is valuable. It helps with things like logistics planning, disaster management, agriculture, and energy production. It’s a “quiet” revolution that has made a huge difference in our lives, saving lives and creating economic value in many sectors.
But now, we’re on the brink of another revolution in weather forecasting. And this time, it’s fueled by advances in machine learning. Instead of relying on hard-coded equations, the idea is to use algorithms that can learn from huge amounts of past weather data. The concept of using machine learning in weather forecasting has been around since 2018, but it’s really taken off in the last couple of years. Several large machine learning models have shown that they can rival the performance of the best physics-based models. For example, Google’s MetNet has demonstrated state-of-the-art regional weather forecasting capabilities, while Google DeepMind’s GraphCast can make global predictions at a competitive level. The great thing about these machine learning methods is that once they’re trained, they can create forecasts in just minutes using inexpensive hardware. In contrast, traditional weather forecasts require powerful supercomputers that run for hours every day. Machine learning opens up a world of possibilities for the weather forecasting community.
And that’s not all. Leading weather forecasting centers, like the European Centre for Medium-Range Weather Forecasts (ECMWF) and the National Oceanic and Atmospheric Administration (NOAA), have also recognized the potential of machine learning. They’re investing in it and developing strategies to take advantage of this technology. But in order to make sure that machine learning models are trustworthy and optimized for the right goals, it’s crucial to evaluate them. Evaluating weather forecasts is no easy task because it’s a complex problem with many different aspects. Different users have different needs. For example, renewable energy producers care about wind speeds and solar radiation, while crisis response teams are concerned about storms and heat waves. There’s no single metric that can tell us what a “good” weather forecast is. The evaluation process has to take into account the multi-faceted nature of weather and its applications. On top of that, differences in evaluation setups can make it difficult to compare models. That’s why having a fair and reproducible way to evaluate and compare different methodologies is so important.
That brings us to WeatherBench 2 (WB2). WB2 is an update to the original benchmark that was published in 2020. Its goal is to accelerate the progress of data-driven weather models by providing a trusted and reproducible framework for evaluation. The official website of WB2 contains scores from several state-of-the-art models, including models from Google and other organizations. It also includes forecasts from traditional weather models, so we can compare and see how the new machine learning models stack up against the best physics-based models. WB2 also provides an open-source evaluation framework that makes it easier to evaluate forecasts. Weather forecast data at high resolutions can be computationally challenging, so the evaluation code is built on Apache Beam to distribute the computation and make it more efficient.
But the evaluation process doesn’t stop there. We’re constantly looking for ways to improve. We’re adding more data and metrics to the benchmark. We want to include station observations, better precipitation datasets, and even explore nowcasting and subseasonal-to-seasonal predictions. WeatherBench 2 is just the starting point, and we’re excited to see how it can aid researchers in developing the next generation of weather forecasting models.