Mathematical Background¶
This section presents the mathematical foundations of Dynamic Linear Models following the notation and framework of West and Harrison (1997), Bayesian Forecasting and Dynamic Models.
For a plain-language introduction, see Key Concepts.
The DLM Quadruple¶
A Dynamic Linear Model is defined by the quadruple \(\{\mathbf{F}_t, \mathbf{G}_t, V_t, \mathbf{W}_t\}\) at each time \(t\), together with initial information \((\boldsymbol{\theta}_0 \mid D_0) \sim \mathcal{N}(\mathbf{m}_0, \mathbf{C}_0)\).
Observation equation:
System equation (state evolution):
where:
\(Y_t \in \mathbb{R}\) is the observation at time \(t\),
\(\boldsymbol{\theta}_t \in \mathbb{R}^n\) is the state (parameter) vector,
\(\mathbf{F}_t \in \mathbb{R}^n\) is the regression (observation) vector,
\(\mathbf{G}_t \in \mathbb{R}^{n \times n}\) is the system (evolution) matrix,
\(V_t > 0\) is the observational variance,
\(\mathbf{W}_t \in \mathbb{R}^{n \times n}\) is the evolution covariance matrix,
\(D_t = \{Y_1, \ldots, Y_t\}\) denotes all information available at time \(t\).
The error sequences \(\{\nu_t\}\) and \(\{\boldsymbol{\omega}_t\}\) are mutually independent, independent over time, and independent of the initial state \(\boldsymbol{\theta}_0\).
Note
Dynaris uses West–Harrison notation directly:
observation_matrix/.F\(= \mathbf{F}_t^\prime\)system_matrix/.G\(= \mathbf{G}_t\)obs_cov/.V\(= V_t\)evolution_cov/.W\(= \mathbf{W}_t\)
Kalman Filter (Sequential Updating)¶
The Kalman filter computes the posterior \((\boldsymbol{\theta}_t \mid D_t) \sim \mathcal{N}(\mathbf{m}_t, \mathbf{C}_t)\) recursively. Given the posterior at \(t-1\):
Prior (one-step forecast of the state)¶
so that \((\boldsymbol{\theta}_t \mid D_{t-1}) \sim \mathcal{N}(\mathbf{a}_t, \mathbf{R}_t)\).
One-step forecast of the observation¶
so that \((Y_t \mid D_{t-1}) \sim \mathcal{N}(f_t, Q_t)\).
Posterior (updating)¶
Upon observing \(Y_t\):
This gives \((\boldsymbol{\theta}_t \mid D_t) \sim \mathcal{N}(\mathbf{m}_t, \mathbf{C}_t)\).
Log-likelihood¶
The log-likelihood decomposes via the prediction error decomposition:
This is computed inside jax.lax.scan and is fully differentiable via
JAX’s autodiff, enabling gradient-based parameter estimation.
Missing observations¶
When \(Y_t\) is missing, the update step is skipped. The posterior reverts to the prior: \(\mathbf{m}_t = \mathbf{a}_t\), \(\mathbf{C}_t = \mathbf{R}_t\). Uncertainty grows without data to constrain the state.
Retrospective Analysis (Smoothing)¶
The Rauch–Tung–Striebel smoother computes the retrospective distribution \((\boldsymbol{\theta}_t \mid D_T)\) for \(t < T\), using all available data. This is the “what do we now believe about the past?” question in West and Harrison’s framework.
Starting from \((\boldsymbol{\theta}_T \mid D_T) \sim \mathcal{N}(\mathbf{m}_T, \mathbf{C}_T)\), the backward recursion for \(t = T-1, T-2, \ldots, 1\) is:
where superscript \((s)\) denotes smoothed quantities. The smoother uses future information to reduce uncertainty, so \(\mathbf{C}_t^{(s)} \leq \mathbf{C}_t\) in the positive-definite sense.
Implemented via jax.lax.scan(..., reverse=True).
DLM Superposition¶
West and Harrison’s superposition principle: if two independent DLMs share the same observation, the combined model is obtained by stacking their state vectors. Given component DLMs \(\{\mathbf{F}_t^{(i)}, \mathbf{G}_t^{(i)}, V_t^{(i)}, \mathbf{W}_t^{(i)}\}\) for \(i = 1, 2\), the superposition yields:
The observation is the sum of component contributions:
In dynaris, this is the + operator:
model = LocalLinearTrend() + Seasonal(period=12) + Cycle(period=40)
Component Models¶
Polynomial (Trend) DLMs¶
First-order polynomial (local level). The simplest DLM: a random walk observed with noise. The state is the current level \(\mu_t\):
Here \(G = [1]\), \(F = [1]\). State dimension: 1.
Second-order polynomial (local linear trend). The state is \(\boldsymbol{\theta}_t = (\mu_t, \beta_t)^\prime\), where \(\mu_t\) is the level and \(\beta_t\) the growth rate:
State dimension: 2.
Seasonal DLMs¶
Free-form (dummy) seasonal. The seasonal effect \(\gamma_t\) satisfies the sum-to-zero constraint across one full period \(s\):
The system matrix \(\mathbf{G}\) has \(-1\) in the first row and a shift identity below. State dimension: \(s - 1\).
Fourier-form seasonal. Each harmonic \(j = 1, \ldots, \lfloor s/2 \rfloor\) contributes a rotation at frequency \(\omega_j = 2\pi j / s\):
The system matrix \(\mathbf{G}\) is block-diagonal with rotation blocks. State dimension: \(s - 1\).
Regression DLMs¶
The state vector holds regression coefficients \(\boldsymbol{\theta}_t = \boldsymbol{\beta}_t\). With \(\mathbf{G} = \mathbf{I}\) (random walk coefficients):
where \(\mathbf{x}_t\) is the vector of regressors at time \(t\) (playing the role of \(\mathbf{F}_t\)). Setting \(\mathbf{W} = \mathbf{0}\) gives static (time-invariant) coefficients.
Cyclic DLMs¶
A damped stochastic cycle with period \(\lambda\) and damping factor \(\rho \in (0, 1]\):
State dimension: 2. Setting \(\rho = 1\) recovers an undamped cycle. The eigenvalues of \(\mathbf{G}\) have modulus \(\rho\), so the cycle decays geometrically when \(\rho < 1\).
Autoregressive DLMs¶
An AR(p) process cast in companion-form state space. The state is \(\boldsymbol{\theta}_t = (x_t, x_{t-1}, \ldots, x_{t-p+1})^\prime\) with system matrix:
and \(\mathbf{F} = (1, 0, \ldots, 0)^\prime\). State dimension: \(p\).
Forecasting¶
The \(k\)-step-ahead forecast distribution is obtained by iterating the prior equations without updating:
with \(\mathbf{R}_t(0) = \mathbf{C}_t\). The forecast distribution for the observable is:
Forecast uncertainty grows with horizon \(k\) due to the accumulated evolution noise \(\mathbf{W}\).
Parameter Estimation¶
Maximum Likelihood Estimation (MLE)¶
The log-likelihood \(\ell(\boldsymbol{\psi}) = \log p(Y_{1:T} \mid \boldsymbol{\psi})\) (where \(\boldsymbol{\psi}\) collects the unknown hyperparameters \(V\) and \(\mathbf{W}\)) is computed via the prediction error decomposition from the Kalman filter. Since the entire computation is implemented in JAX, gradients \(\nabla_{\boldsymbol{\psi}} \ell\) are obtained via automatic differentiation.
Variance parameters must be positive. Dynaris provides log and softplus transforms to map unconstrained parameters to valid ranges:
EM Algorithm¶
The Expectation–Maximization algorithm alternates between:
E-step: Run the Kalman filter and smoother to obtain \(\mathbf{m}_t^{(s)}\) and \(\mathbf{C}_t^{(s)}\) for all \(t\).
M-step: Update the variance components. For the observational variance:
The EM algorithm guarantees non-decreasing log-likelihood at each iteration for the exact M-step.
Nonlinear Filters¶
When the state-space model is nonlinear, the Kalman filter no longer gives exact inference. Dynaris provides three approximate filters.
Extended Kalman Filter (EKF)¶
The EKF linearizes the nonlinear functions at each time step via first-order Taylor expansion, then applies the standard Kalman recursion.
Given the nonlinear state-space model:
Predict:
Update:
In dynaris, the Jacobians are computed automatically via jax.jacfwd —
no manual derivation required.
Unscented Kalman Filter (UKF)¶
The UKF avoids linearization by propagating deterministically chosen sigma points through the nonlinear functions, then recovering the mean and covariance from the transformed points.
Sigma points for a state with mean \(\mathbf{m}\) and covariance \(\mathbf{C}\) of dimension \(n\):
where \(\lambda = \alpha^2(n + \kappa) - n\) and \([\sqrt{\mathbf{M}}]_i\) is the \(i\)-th row of the Cholesky factor of \(\mathbf{M}\).
Weights:
Predict: Propagate sigma points through \(f\), recover predicted mean and covariance from the weighted transformed points.
Update: Propagate sigma points through \(h\), compute cross-covariance, and apply the Kalman gain.
The UKF captures second-order nonlinear effects that the EKF misses, and requires no Jacobian computation.
Particle Filter (Sequential Monte Carlo)¶
The particle filter represents the filtering distribution as a weighted set of \(N\) particles (samples), enabling inference in arbitrarily nonlinear and non-Gaussian models.
Algorithm (bootstrap particle filter):
Resample: Draw ancestor indices from the categorical distribution defined by the current weights.
Propagate: For each particle \(i\):
\[\boldsymbol{\theta}_t^{(i)} = f\!\left(\boldsymbol{\theta}_{t-1}^{(i)}\right) + \boldsymbol{\omega}_t^{(i)}, \quad \boldsymbol{\omega}_t^{(i)} \sim \mathcal{N}(\mathbf{0}, \mathbf{Q})\]Implemented via
jax.vmapfor vectorized propagation.Reweight: Compute unnormalized weights from the observation likelihood:
\[\tilde{w}_t^{(i)} = p\!\left(Y_t \mid \boldsymbol{\theta}_t^{(i)}\right)\]Normalize: \(w_t^{(i)} = \tilde{w}_t^{(i)} / \sum_j \tilde{w}_t^{(j)}\).
Estimate: The filtered mean and covariance are weighted statistics of the particle cloud.
Resampling strategies: Multinomial, systematic (default), and stratified. The effective sample size \(\text{ESS} = 1 / \sum_i (w_t^{(i)})^2\) monitors particle degeneracy.
Log-likelihood: \(\log \hat{p}(Y_t \mid Y_{1:t-1}) = \log\!\left(\frac{1}{N}\sum_i \tilde{w}_t^{(i)}\right)\).
Markov-Switching Models¶
Hamilton Filter¶
The Hamilton filter extends the Kalman filter to models with \(K\) discrete regimes governed by a Markov chain with transition matrix \(\mathbf{P}\), where \(P_{ij} = \Pr(S_t = j \mid S_{t-1} = i)\).
At each time step, the filter maintains \(K\) Gaussian state beliefs (one per regime) and a vector of regime probabilities.
Algorithm (with Kim’s collapse approximation):
Collapse: For each target regime \(j\), compute the probability-weighted mixture of the \(K\) source-regime states:
\[\begin{split}w_{i|j} &= \frac{P_{ij}\,\pi_{t-1}^{(i)}}{\sum_k P_{kj}\,\pi_{t-1}^{(k)}} \\ \mathbf{m}_{t|t-1}^{(j)} &= \sum_i w_{i|j}\,\mathbf{m}_{t-1}^{(i)} \\ \mathbf{C}_{t|t-1}^{(j)} &= \sum_i w_{i|j}\!\left[\mathbf{C}_{t-1}^{(i)} + \boldsymbol{\delta}_i\,\boldsymbol{\delta}_i^\prime\right]\end{split}\]where \(\boldsymbol{\delta}_i = \mathbf{m}_{t-1}^{(i)} - \mathbf{m}_{t|t-1}^{(j)}\).
Predict/Update: Run \(K\) parallel Kalman predict+update steps (one per regime), each using regime-specific matrices \(\{\mathbf{G}_j, \mathbf{F}_j, \mathbf{W}_j, V_j\}\). Implemented via
jax.vmap.Regime probability update:
\[\pi_t^{(j)} = \frac{p(Y_t \mid S_t = j,\, D_{t-1})\,\bar{\pi}_t^{(j)}}{\sum_k p(Y_t \mid S_t = k,\, D_{t-1})\,\bar{\pi}_t^{(k)}}\]where \(\bar{\pi}_t^{(j)} = \sum_i P_{ij}\,\pi_{t-1}^{(i)}\) are the predicted regime probabilities.
Kim Smoother¶
The Kim smoother is the backward counterpart of the Hamilton filter, producing smoothed regime probabilities and state estimates.
For \(t = T-1, T-2, \ldots, 1\):
Smoothed regime probabilities:
\[\pi_t^{(i,s)} = \pi_t^{(i)} \sum_j \frac{P_{ij}\,\pi_{t+1}^{(j,s)}}{\bar{\pi}_{t+1}^{(j)}}\]Per-regime RTS step: Apply the RTS smoother backward recursion independently for each regime \(j\), using regime-specific \(\mathbf{G}_j\).
Collapse the \(K\) smoothed states using the smoothed regime probabilities.
Dynamic Factor Models¶
A DFM with \(r\) latent factors and \(m\) observed variables:
where \(\boldsymbol{\Lambda} \in \mathbb{R}^{m \times r}\) is the loading matrix, \(\mathbf{R}\) is diagonal (idiosyncratic noise), and \(\mathbf{Q} = \mathbf{I}_r\) for identification.
EM estimation alternates:
E-step: Kalman filter + RTS smoother on the factor state space.
M-step: Update \(\boldsymbol{\Lambda}\) and \(\mathbf{R}\):
\[\begin{split}\hat{\boldsymbol{\Lambda}} &= \left[\sum_t \mathbf{y}_t\,\mathbf{m}_t^{(s)\prime}\right] \left[\sum_t \left(\mathbf{C}_t^{(s)} + \mathbf{m}_t^{(s)}\,\mathbf{m}_t^{(s)\prime}\right)\right]^{-1} \\ \hat{R}_{ii} &= \frac{1}{T}\sum_t \left[(y_{t,i} - \hat{\boldsymbol{\Lambda}}_i\,\mathbf{m}_t^{(s)})^2 + \hat{\boldsymbol{\Lambda}}_i\,\mathbf{C}_t^{(s)}\,\hat{\boldsymbol{\Lambda}}_i^\prime\right]\end{split}\]
Varimax rotation is applied post-estimation for interpretability, maximizing the variance of squared loadings across factors.
Bayesian Estimation¶
The log-posterior for a DLM with hyperparameters \(\boldsymbol{\psi}\) (variance parameters) is:
Dynaris uses NumPyro’s NUTS (No-U-Turn Sampler) to draw posterior samples. Since the entire Kalman filter log-likelihood is differentiable via JAX autodiff, NUTS can efficiently explore the posterior using Hamiltonian dynamics.
Posterior predictive forecasting: For each posterior sample \(\boldsymbol{\psi}^{(s)}\), run the Kalman filter and forecast. Aggregate across samples for credible intervals.
Model comparison via WAIC and LOO-CV requires pointwise log-likelihoods \(\log p(Y_t \mid Y_{1:t-1}, \boldsymbol{\psi})\), which are the individual terms of the prediction error decomposition.
References¶
West, M. and Harrison, J. (1997). Bayesian Forecasting and Dynamic Models, 2nd edition. Springer.
Durbin, J. and Koopman, S.J. (2012). Time Series Analysis by State Space Methods, 2nd edition. Oxford University Press.
Harvey, A.C. (1989). Forecasting, Structural Time Series Models and the Kalman Filter. Cambridge University Press.
Petris, G., Petrone, S., and Campagnoli, P. (2009). Dynamic Linear Models with R. Springer.
Julier, S.J. and Uhlmann, J.K. (2004). “Unscented Filtering and Nonlinear Estimation.” Proceedings of the IEEE, 92(3), 401-422.
Doucet, A., de Freitas, N. and Gordon, N. (2001). Sequential Monte Carlo Methods in Practice. Springer.
Kim, C.-J. (1994). “Dynamic Linear Models with Markov-Switching.” Journal of Econometrics, 60(1-2), 1-22.
Hamilton, J.D. (1989). “A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle.” Econometrica, 57(2).
Stock, J.H. and Watson, M.W. (2002). “Forecasting Using Principal Components from a Large Number of Predictors.” JASA, 97(460).