Comprehensive Time Series Exploratory Analysis | by Erich Henrique | Nov, 2023

Category:

Harness the Potential of AI Tools with ChatGPT. Our blog offers comprehensive insights into the world of AI technology, showcasing the latest advancements and practical applications facilitated by ChatGPT’s intelligent capabilities.

Autocorrelation

Once our data is stationary, we can investigate other key time series attributes: partial autocorrelation and autocorrelation. In formal terms:

The autocorrelation function (ACF) measures the linear relationship between lagged values of a time series. In other words, it measures the correlation of the time series with itself. [2]

The partial autocorrelation function (PACF) measures the correlation between lagged values in a time series when we remove the influence of correlated lagged values in between. Those are known as confounding variables. [3]

Both metrics can be visualized with statistical plots known as correlograms. But first, it is important to develop a better understanding of them.

Since this article is focused on exploratory analysis and these concepts are fundamental to statistical forecasting models, I will keep the explanation brief, but bear in mind that these are highly important ideas to build a solid intuition upon when working with time series. For a comprehensive read, I recommend the great kernel “Time Series: Interpreting ACF and PACF” by the Kaggle Notebooks Grandmaster Leonie Monigatti.

As noted above, autocorrelation measures how the time series correlates with itself on previous q lags. You can think of it as a measurement of the linear relationship of a subset of your data with a copy of itself shifted back by q periods. Autocorrelation, or ACF, is an important metric to determine the order q of Moving Average (MA) models.

On the other hand, partial autocorrelation is the correlation of the time series with its p lagged version, but now solely regarding its direct effects. For example, if I want to check the partial autocorrelation of the t-3 to t-1 time period with my current t0 value, I won’t care about how t-3 influences t-2 and t-1 or how t-2 influences t-1. I’ll be exclusively focused on the direct effects of t-3, t-2, and t-1 on my current time stamp, t0. Partial autocorrelation, or PACF, is an important metric to determine the order p of Autoregressive (AR) models.

With these concepts cleared out, we can now come back to our data. Since the two metrics are often analyzed together, our last function will combine the PACF and ACF plots in a grid plot that will return correlograms for multiple variables. It will make use of statsmodels plot_pacf() and plot_acf() functions, and map them to a Matplotlib subplots() grid.

Notice how both statsmodels functions use the same arguments, except for the method parameter that is exclusive to the plot_pacf() plot.

Now you can experiment with different aggregations of your data, but remember that when resampling the time series, each lag will then represent a different jump back in time. For illustrative purposes, let’s analyze the PACF and ACF for all four stations in the month of January 2016, with a 6-hours aggregated dataset.

Figure 19. PACF and ACF Correlograms for Jan 2016. Image by the author.

Correlograms return the correlation coefficients ranging from -1.0 to 1.0 and a shaded area indicating the significance threshold. Any value that extends beyond that should be considered statistically significant.

From the results above, we can finally conclude that on a 6-hours aggregation:

  • Lags 1, 2, 3 (t-6h, t-12h, and t-18h) and sometimes 4 (t-24h) have significant PACF.
  • Lags 1 and 4 (t-6h and t-24h) show significant ACF for most cases.

And take note of some final good practices:

  • Plotting correlograms for large periods of time series with high granularity (For example, plotting a whole-year correlogram for a dataset with hourly measurements) should be avoided, as the significance threshold narrows down to zero with increasingly higher sample sizes.
  • I defined an x_label parameter to our function to make it easy to annotate the X-axis with the time period represented by each lag. It is common to see correlograms without that information, but having easy access to it can avoid misinterpretations of the results.
  • Statsmodels plot_acf() and plot_pacf() default values are set to include the 0-lag correlation coefficient in the plot. Since the correlation of a number with itself is always one, I have set our plots to start from the first lag with the parameter zero=False. It also improves the scale of the Y-axis, making the lags we actually need to analyze more readable.

Discover the vast possibilities of AI tools by visiting our website at
https://chatgptoai.com/ to delve deeper into this transformative technology.

Reviews

There are no reviews yet.

Be the first to review “Comprehensive Time Series Exploratory Analysis | by Erich Henrique | Nov, 2023”

Your email address will not be published. Required fields are marked *

Back to top button