Keywords: Granger causality test | singular matrix | time series analysis
Abstract: This article delves into the root causes of the "LinAlgError: Singular matrix" error encountered when performing Granger causality tests using the statsmodels library. By examining the impact of perfectly correlated time series data on parameter covariance matrix computations, it explains the mathematical mechanism behind singular matrix formation. Two primary solutions are presented: adding minimal noise to break perfect correlations, and checking for duplicate columns or fully correlated features in the data. Code examples illustrate how to diagnose and resolve this issue, ensuring stable execution of Granger causality tests.
Background and Error Phenomenon
In time series analysis using Python's statsmodels library, the Granger causality test is a common statistical tool used to determine if one time series can predict another. However, users may encounter the "LinAlgError: Singular matrix" error during execution, which halts the testing process. This error typically occurs when running the grangercausalitytests function, especially with larger lag orders (e.g., maxlag=20).
Root Cause of the Error
The error stems from perfectly or highly correlated sequences in the input data. Internally, the Granger causality test computes maximum likelihood estimates for parameters, which involves calculating the parameter covariance matrix and its inverse. When two time series are perfectly correlated, the covariance matrix becomes near-zero, leading to singularity (i.e., a determinant of zero) and making inversion impossible. Mathematically, for an n×n matrix A, if there exists a non-zero vector x such that Ax=0, then A is singular. In code, this manifests as a failure in numpy.linalg.inv(cov_p), where cov_p is the estimated covariance matrix.
Code Example and Problem Reproduction
The following code demonstrates how to reproduce the error. We generate two sine wave time series with a perfect linear relationship, causing the covariance matrix to become singular.
import numpy as np
import pandas as pd
from statsmodels.tsa.stattools import grangercausalitytests
n = 1000
ls = np.linspace(0, 2*np.pi, n)
df1 = pd.DataFrame(np.sin(ls))
df2 = pd.DataFrame(2*np.sin(1 + ls))
df = pd.concat([df1, df2], axis=1)
try:
grangercausalitytests(df, maxlag=20, verbose=False)
except np.linalg.LinAlgError as e:
print("Error caught:", e)
Running this code throws a "LinAlgError: Singular matrix" error because df1 and df2 are perfectly correlated (one series is a scaled and shifted version of the other).
Solution 1: Adding Noise to Break Perfect Correlations
The most straightforward solution is to add minimal random noise to the data, breaking perfect correlations without significantly altering statistical properties. This can be achieved by superimposing a small random matrix onto the original data.
np.random.seed(42) # Ensure reproducibility
df_noisy = df + 0.00001 * np.random.randn(n, 2)
result = grangercausalitytests(df_noisy, maxlag=20, verbose=False)
print("Test completed successfully without errors.")
After adding noise, the covariance matrix is no longer singular, allowing the Granger causality test to proceed smoothly. The noise magnitude should be small enough to avoid overly affecting the essential characteristics of the time series, typically recommended at a scale like 1e-5.
Solution 2: Checking and Handling Duplicate or Fully Correlated Features
Another common cause is duplicate columns or fully correlated features in the data. This can be diagnosed by computing the correlation matrix.
corr_matrix = df.corr()
print("Correlation matrix:\n", corr_matrix)
# Check for column pairs with correlation coefficient of 1.0
for i in range(corr_matrix.shape[0]):
for j in range(i+1, corr_matrix.shape[1]):
if abs(corr_matrix.iloc[i, j] - 1.0) < 1e-10:
print(f"Column {i} and column {j} are perfectly correlated, which may cause singular matrix errors.")
If fully correlated columns are found, consider removing duplicates or performing feature selection to ensure the data matrix is full-rank. In practice, this may involve data cleaning steps, such as eliminating redundant variables or using Principal Component Analysis (PCA) for dimensionality reduction.
In-Depth Analysis and Preventive Measures
From a statistical perspective, Granger causality tests rely on assumptions of stationarity and no multicollinearity in time series. Perfect correlation violates the latter, making the model unidentifiable. Best practices to prevent such errors include:
- Data Preprocessing: Check for correlations and redundancies in data before analysis, using methods like
df.corr()anddf.duplicated(). - Model Diagnostics: Before running tests, consider performing unit root tests (e.g., ADF test) to ensure stationarity and validate lag order appropriateness.
- Error Handling: Implement exception handling in code to gracefully manage potential singular matrix scenarios, e.g., by attempting to add noise or adjust lag orders.
Additionally, it is important to understand the limitations of Granger causality tests: they only detect linear relationships, and results can be influenced by data frequency and sample size. In application, combine domain knowledge with other statistical methods for comprehensive judgment.
Conclusion
The "LinAlgError: Singular matrix" error in Granger causality tests is typically caused by perfect correlations in data, leading to a singular covariance matrix that cannot be inverted. This issue can be effectively resolved by adding minimal noise to the data or removing duplicate features. In practice, thorough data exploration and preprocessing before analysis are recommended to ensure the robustness and reliability of statistical tests. The code examples and methods provided in this article offer practical guidance for handling similar issues, enhancing the efficiency and accuracy of time series analysis.