Keywords: Python | Sine Curve Fitting | Least Squares | SciPy | Parameter Estimation
Abstract: This article provides a comprehensive guide to sine curve fitting using Python's SciPy library. Based on the best answer from the Q&A data, we explore parameter estimation methods through least squares optimization, including initial guess strategies for amplitude, frequency, phase, and offset. Complete code implementations demonstrate accurate parameter extraction from noisy data, with discussions on frequency estimation challenges. Additional insights from FFT-based methods are incorporated, offering readers a complete solution for sine curve fitting applications.
Fundamentals of Sine Curve Fitting
In data analysis and signal processing, sine curve fitting is a common technical requirement, particularly in economics, physics, and engineering. The general form of a sine function can be expressed as: y = A * sin(ωt + φ) + C, where A represents amplitude, ω is angular frequency, φ denotes phase, and C is the offset. The fitting objective is to accurately estimate these parameters from observed data.
Least Squares Optimization Method
Based on the best answer from the Q&A data, we employ the leastsq function from SciPy for least squares optimization. This method finds optimal parameters by minimizing the sum of squared residuals. Here's the complete implementation:
import numpy as np
from scipy.optimize import leastsq
import matplotlib.pyplot as plt
# Generate simulated data
N = 1000
t = np.linspace(0, 4*np.pi, N)
true_freq = 1.15247
data = 3.0 * np.sin(true_freq * t + 0.001) + 0.5 + np.random.randn(N)
# Initial parameter estimates
guess_mean = np.mean(data)
guess_std = 3 * np.std(data) / (2**0.5) / (2**0.5)
guess_phase = 0
guess_freq = 1
guess_amp = guess_std
# Define optimization function
optimize_func = lambda x: x[0] * np.sin(x[1] * t + x[2]) + x[3] - data
# Perform optimization
initial_guess = [guess_amp, guess_freq, guess_phase, guess_mean]
est_amp, est_freq, est_phase, est_mean = leastsq(optimize_func, initial_guess)[0]
# Generate fitted curve
fine_t = np.arange(0, max(t), 0.1)
data_fit = est_amp * np.sin(est_freq * fine_t + est_phase) + est_mean
# Visualize results
plt.figure(figsize=(10, 6))
plt.plot(t, data, '.', alpha=0.5, label='Original Data (with Noise)')
plt.plot(fine_t, data_fit, 'r-', linewidth=2, label='Fitted Curve')
plt.xlabel('Time')
plt.ylabel('Amplitude')
plt.legend()
plt.grid(True)
plt.show()
Initial Parameter Estimation Strategy
Successful sine curve fitting heavily relies on reasonable initial parameter estimates:
- Offset Estimation: Directly compute the data mean:
guess_mean = np.mean(data) - Amplitude Estimation: Estimate based on data standard deviation:
guess_amp = 3 * np.std(data) / (2√2) - Frequency Estimation: This is the most challenging aspect. If the approximate frequency range is known, it can be manually specified; otherwise, alternative methods are needed
- Phase Estimation: Typically start from 0, allowing the optimization algorithm to adjust automatically
Frequency Estimation Challenges and Solutions
As noted in the Q&A data, direct optimization of frequency parameters can lead to fitting failures, especially when initial guesses deviate significantly from true values. As a supplement, we can reference the FFT method from the first answer:
def estimate_frequency_fft(tt, yy):
"""Estimate dominant frequency using FFT"""
ff = np.fft.fftfreq(len(tt), (tt[1] - tt[0]))
Fyy = abs(np.fft.fft(yy))
# Exclude zero-frequency peak (corresponding to offset)
dominant_idx = np.argmax(Fyy[1:]) + 1
return abs(ff[dominant_idx])
# Improve frequency estimation in original code
guess_freq = estimate_frequency_fft(t, data - guess_mean)
print(f"FFT Estimated Frequency: {guess_freq:.4f}")
print(f"True Frequency: {true_freq:.4f}")
print(f"Relative Error: {abs(guess_freq - true_freq)/true_freq*100:.2f}%")
Fitting Quality Assessment
Evaluating sine curve fitting quality requires multiple metrics:
# Calculate goodness-of-fit metrics
residuals = data - (est_amp * np.sin(est_freq * t + est_phase) + est_mean)
RSS = np.sum(residuals**2) # Residual Sum of Squares
TSS = np.sum((data - np.mean(data))**2) # Total Sum of Squares
R_squared = 1 - (RSS / TSS)
print(f"Amplitude Estimate: {est_amp:.4f}")
print(f"Frequency Estimate: {est_freq:.4f}")
print(f"Phase Estimate: {est_phase:.4f}")
print(f"Offset Estimate: {est_mean:.4f}")
print(f"R² Value: {R_squared:.4f}")
Practical Application Considerations
In practical applications, sine curve fitting requires attention to:
- Data Preprocessing: Ensure uniformly sampled time series, interpolate if necessary
- Noise Handling: Filtering may be required in high-noise environments
- Multiple Frequency Components: More complex models needed for signals with multiple frequencies
- Boundary Conditions: Account for phase parameter periodicity to avoid 2π integer multiples
Extended Applications
The methods described can be extended to more complex scenarios:
- Multiple Superimposed Sine Waves: Extend optimization function to handle multiple frequency components
- Nonlinear Least Squares: Use
curve_fitfunction for more flexible interfaces - Real-time Fitting: Implement real-time sine parameter estimation with sliding window techniques
By combining least squares optimization with FFT-based frequency estimation, we establish a robust sine curve fitting framework. This approach maintains computational efficiency while handling real-world data with varying noise levels, providing reliable tools for economic cycle analysis, signal processing, and other applications.