Time series forecasting is crucial for a wide range of real-world applications. Whether it’s predicting demand or forecasting the spread of a pandemic, having accurate forecasts can make a huge difference. When it comes to multivariate time series forecasting, there are two main categories of models: univariate and multivariate.
Univariate models focus on capturing the interactions and patterns within a single variable’s time series. For example, they can help us understand how mortgage rates increase due to inflation or how traffic peaks during rush hour. These models are excellent at capturing trends and seasonal patterns within a single variable.
On the other hand, multivariate models not only consider inter-series interactions but also process intra-series features, which are known as cross-variate information. This is particularly useful when one series serves as an advanced indicator for another. For example, if we know that body weight affects blood pressure or that increasing the price of a product leads to a decrease in sales, we can leverage that information to improve our forecasts.
In recent years, deep learning Transformer-based architectures have gained popularity for multivariate forecasting. They have shown superior performance on sequence tasks. However, it’s interesting to note that these advanced multivariate models often perform worse than simple univariate linear models on long-term forecasting benchmarks.
This raises two important questions: Does cross-variate information really benefit time series forecasting? And if not, can multivariate models still perform as well as univariate models?
In our research, “TSMixer: An All-MLP Architecture for Time Series Forecasting,” we dive deep into these questions. We analyze the advantages of univariate linear models and discover their effectiveness. This analysis leads us to develop TSMixer, an advanced multivariate model that combines linear model characteristics and performs exceptionally well on long-term forecasting benchmarks.
To the best of our knowledge, TSMixer is the first multivariate model that performs as well as state-of-the-art univariate models on these benchmarks, where cross-variate information is shown to be less beneficial.
But how does TSMixer achieve such impressive results? The key difference lies in how it captures temporal patterns. Linear models use fixed weights to capture static temporal patterns, while Transformers use attention mechanisms with dynamic weights. In our analysis, we show that linear models have simple solutions to recover time series or bound errors under common assumptions of temporal patterns. This makes them highly effective at learning static temporal patterns in univariate time series.
To leverage the advantages of linear models while still incorporating cross-variate information, we replace the attention layers in Transformers with linear layers. This results in the TSMixer model, which alternates between time-mixing and feature-mixing, capturing both temporal patterns and cross-variate information efficiently.
When evaluating TSMixer on long-term forecasting benchmarks, we observe significant improvement in mean squared error compared to other multivariate models. In fact, TSMixer performs on par with state-of-the-art univariate models, further highlighting the capabilities of multivariate models.
Furthermore, we conduct an ablation study where we compare TSMixer with a variant that only includes time mixing layers. Surprisingly, the variant performs almost as well as the complete TSMixer model. This suggests that cross-variate information may be less beneficial on these popular benchmarks and confirms the superior performance of univariate models.
However, it’s worth noting that these benchmarks may not accurately represent real-world scenarios where cross-variate information is crucial. To address this, we evaluate TSMixer on the challenging M5 benchmark, a large-scale retail dataset that contains important cross-variate interactions. TSMixer outperforms other methods in this setting, showcasing the effectiveness of multivariate models in situations where temporal patterns alone are not sufficient.
To further leverage information from the M5 dataset, we propose an extended TSMixer architecture that incorporates static features and future time series. This extended architecture outperforms popular models used in industrial applications, highlighting its potential for real-world impact.
In conclusion, TSMixer is an advanced multivariate model that combines linear model characteristics with the power of deep learning. It performs as well as state-of-the-art univariate models on long-term forecasting benchmarks and showcases the importance of considering cross-variate information in time series forecasting. We hope that our work inspires further exploration in this field and leads to the development of even more powerful and effective models for real-world applications.
We would like to acknowledge the contributions of Si-An Chen, Chun-Liang Li, Nate Yoder, Sercan O. Arik, and all those involved in this research.