Mathematical Background ======================= This section presents the mathematical foundations of Dynamic Linear Models following the notation and framework of West and Harrison (1997), *Bayesian Forecasting and Dynamic Models*. For a plain-language introduction, see :doc:`getting-started/concepts`. The DLM Quadruple ----------------- A Dynamic Linear Model is defined by the quadruple :math:`\{\mathbf{F}_t, \mathbf{G}_t, V_t, \mathbf{W}_t\}` at each time :math:`t`, together with initial information :math:`(\boldsymbol{\theta}_0 \mid D_0) \sim \mathcal{N}(\mathbf{m}_0, \mathbf{C}_0)`. **Observation equation:** .. math:: Y_t = \mathbf{F}_t^\prime\,\boldsymbol{\theta}_t + \nu_t, \qquad \nu_t \sim \mathcal{N}(0, V_t) **System equation** (state evolution): .. math:: \boldsymbol{\theta}_t = \mathbf{G}_t\,\boldsymbol{\theta}_{t-1} + \boldsymbol{\omega}_t, \qquad \boldsymbol{\omega}_t \sim \mathcal{N}(\mathbf{0}, \mathbf{W}_t) where: - :math:`Y_t \in \mathbb{R}` is the observation at time :math:`t`, - :math:`\boldsymbol{\theta}_t \in \mathbb{R}^n` is the state (parameter) vector, - :math:`\mathbf{F}_t \in \mathbb{R}^n` is the regression (observation) vector, - :math:`\mathbf{G}_t \in \mathbb{R}^{n \times n}` is the system (evolution) matrix, - :math:`V_t > 0` is the observational variance, - :math:`\mathbf{W}_t \in \mathbb{R}^{n \times n}` is the evolution covariance matrix, - :math:`D_t = \{Y_1, \ldots, Y_t\}` denotes all information available at time :math:`t`. The error sequences :math:`\{\nu_t\}` and :math:`\{\boldsymbol{\omega}_t\}` are mutually independent, independent over time, and independent of the initial state :math:`\boldsymbol{\theta}_0`. .. note:: Dynaris uses West--Harrison notation directly: - ``observation_matrix`` / ``.F`` :math:`= \mathbf{F}_t^\prime` - ``system_matrix`` / ``.G`` :math:`= \mathbf{G}_t` - ``obs_cov`` / ``.V`` :math:`= V_t` - ``evolution_cov`` / ``.W`` :math:`= \mathbf{W}_t` Kalman Filter (Sequential Updating) ------------------------------------ The Kalman filter computes the posterior :math:`(\boldsymbol{\theta}_t \mid D_t) \sim \mathcal{N}(\mathbf{m}_t, \mathbf{C}_t)` recursively. Given the posterior at :math:`t-1`: .. math:: (\boldsymbol{\theta}_{t-1} \mid D_{t-1}) \sim \mathcal{N}(\mathbf{m}_{t-1}, \mathbf{C}_{t-1}) Prior (one-step forecast of the state) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. math:: \mathbf{a}_t &= \mathbf{G}_t\,\mathbf{m}_{t-1} \\ \mathbf{R}_t &= \mathbf{G}_t\,\mathbf{C}_{t-1}\,\mathbf{G}_t^\prime + \mathbf{W}_t so that :math:`(\boldsymbol{\theta}_t \mid D_{t-1}) \sim \mathcal{N}(\mathbf{a}_t, \mathbf{R}_t)`. One-step forecast of the observation ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ .. math:: f_t &= \mathbf{F}_t^\prime\,\mathbf{a}_t \\ Q_t &= \mathbf{F}_t^\prime\,\mathbf{R}_t\,\mathbf{F}_t + V_t so that :math:`(Y_t \mid D_{t-1}) \sim \mathcal{N}(f_t, Q_t)`. Posterior (updating) ~~~~~~~~~~~~~~~~~~~~ Upon observing :math:`Y_t`: .. math:: e_t &= Y_t - f_t \qquad \text{(forecast error)} \\ \mathbf{A}_t &= \mathbf{R}_t\,\mathbf{F}_t\,Q_t^{-1} \qquad \text{(adaptive coefficient / Kalman gain)} \\ \mathbf{m}_t &= \mathbf{a}_t + \mathbf{A}_t\,e_t \\ \mathbf{C}_t &= \mathbf{R}_t - \mathbf{A}_t\,Q_t\,\mathbf{A}_t^\prime This gives :math:`(\boldsymbol{\theta}_t \mid D_t) \sim \mathcal{N}(\mathbf{m}_t, \mathbf{C}_t)`. Log-likelihood ~~~~~~~~~~~~~~ The log-likelihood decomposes via the prediction error decomposition: .. math:: \log p(Y_{1:T}) = \sum_{t=1}^{T} \log p(Y_t \mid D_{t-1}) = -\frac{1}{2} \sum_{t=1}^{T} \left[ \log(2\pi) + \log Q_t + \frac{e_t^2}{Q_t} \right] This is computed inside ``jax.lax.scan`` and is fully differentiable via JAX's autodiff, enabling gradient-based parameter estimation. Missing observations ~~~~~~~~~~~~~~~~~~~~ When :math:`Y_t` is missing, the update step is skipped. The posterior reverts to the prior: :math:`\mathbf{m}_t = \mathbf{a}_t`, :math:`\mathbf{C}_t = \mathbf{R}_t`. Uncertainty grows without data to constrain the state. Retrospective Analysis (Smoothing) ---------------------------------- The Rauch--Tung--Striebel smoother computes the retrospective distribution :math:`(\boldsymbol{\theta}_t \mid D_T)` for :math:`t < T`, using all available data. This is the "what do we now believe about the past?" question in West and Harrison's framework. Starting from :math:`(\boldsymbol{\theta}_T \mid D_T) \sim \mathcal{N}(\mathbf{m}_T, \mathbf{C}_T)`, the backward recursion for :math:`t = T-1, T-2, \ldots, 1` is: .. math:: \mathbf{B}_t &= \mathbf{C}_t\,\mathbf{G}_{t+1}^\prime\,\mathbf{R}_{t+1}^{-1} \qquad \text{(smoother gain)} \\ \mathbf{m}_t^{(s)} &= \mathbf{m}_t + \mathbf{B}_t\,(\mathbf{m}_{t+1}^{(s)} - \mathbf{a}_{t+1}) \\ \mathbf{C}_t^{(s)} &= \mathbf{C}_t + \mathbf{B}_t\,(\mathbf{C}_{t+1}^{(s)} - \mathbf{R}_{t+1})\,\mathbf{B}_t^\prime where superscript :math:`(s)` denotes smoothed quantities. The smoother uses future information to reduce uncertainty, so :math:`\mathbf{C}_t^{(s)} \leq \mathbf{C}_t` in the positive-definite sense. Implemented via ``jax.lax.scan(..., reverse=True)``. DLM Superposition ----------------- West and Harrison's superposition principle: if two independent DLMs share the same observation, the combined model is obtained by stacking their state vectors. Given component DLMs :math:`\{\mathbf{F}_t^{(i)}, \mathbf{G}_t^{(i)}, V_t^{(i)}, \mathbf{W}_t^{(i)}\}` for :math:`i = 1, 2`, the superposition yields: .. math:: \mathbf{G}_t &= \begin{pmatrix} \mathbf{G}_t^{(1)} & \mathbf{0} \\ \mathbf{0} & \mathbf{G}_t^{(2)} \end{pmatrix}, \qquad \mathbf{F}_t = \begin{pmatrix} \mathbf{F}_t^{(1)} \\ \mathbf{F}_t^{(2)} \end{pmatrix} \\ \mathbf{W}_t &= \begin{pmatrix} \mathbf{W}_t^{(1)} & \mathbf{0} \\ \mathbf{0} & \mathbf{W}_t^{(2)} \end{pmatrix}, \qquad V_t = V_t^{(1)} + V_t^{(2)} The observation is the sum of component contributions: .. math:: Y_t = \mathbf{F}_t^{(1)\prime}\,\boldsymbol{\theta}_t^{(1)} + \mathbf{F}_t^{(2)\prime}\,\boldsymbol{\theta}_t^{(2)} + \nu_t In dynaris, this is the ``+`` operator: .. code-block:: python model = LocalLinearTrend() + Seasonal(period=12) + Cycle(period=40) Component Models ---------------- Polynomial (Trend) DLMs ~~~~~~~~~~~~~~~~~~~~~~~ **First-order polynomial** (local level). The simplest DLM: a random walk observed with noise. The state is the current level :math:`\mu_t`: .. math:: \mu_t &= \mu_{t-1} + \omega_t, \quad \omega_t \sim \mathcal{N}(0, W) \\ Y_t &= \mu_t + \nu_t, \quad \nu_t \sim \mathcal{N}(0, V) Here :math:`G = [1]`, :math:`F = [1]`. State dimension: 1. **Second-order polynomial** (local linear trend). The state is :math:`\boldsymbol{\theta}_t = (\mu_t, \beta_t)^\prime`, where :math:`\mu_t` is the level and :math:`\beta_t` the growth rate: .. math:: \begin{pmatrix} \mu_t \\ \beta_t \end{pmatrix} = \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} \mu_{t-1} \\ \beta_{t-1} \end{pmatrix} + \boldsymbol{\omega}_t, \qquad Y_t = \begin{pmatrix} 1 & 0 \end{pmatrix} \begin{pmatrix} \mu_t \\ \beta_t \end{pmatrix} + \nu_t State dimension: 2. Seasonal DLMs ~~~~~~~~~~~~~ **Free-form (dummy) seasonal.** The seasonal effect :math:`\gamma_t` satisfies the sum-to-zero constraint across one full period :math:`s`: .. math:: \gamma_t = -\sum_{j=1}^{s-1} \gamma_{t-j} + \omega_t The system matrix :math:`\mathbf{G}` has :math:`-1` in the first row and a shift identity below. State dimension: :math:`s - 1`. **Fourier-form seasonal.** Each harmonic :math:`j = 1, \ldots, \lfloor s/2 \rfloor` contributes a rotation at frequency :math:`\omega_j = 2\pi j / s`: .. math:: \begin{pmatrix} \gamma_{j,t} \\ \gamma_{j,t}^* \end{pmatrix} = \begin{pmatrix} \cos\omega_j & \sin\omega_j \\ -\sin\omega_j & \cos\omega_j \end{pmatrix} \begin{pmatrix} \gamma_{j,t-1} \\ \gamma_{j,t-1}^* \end{pmatrix} + \boldsymbol{\omega}_{j,t} The system matrix :math:`\mathbf{G}` is block-diagonal with rotation blocks. State dimension: :math:`s - 1`. Regression DLMs ~~~~~~~~~~~~~~~ The state vector holds regression coefficients :math:`\boldsymbol{\theta}_t = \boldsymbol{\beta}_t`. With :math:`\mathbf{G} = \mathbf{I}` (random walk coefficients): .. math:: \boldsymbol{\beta}_t &= \boldsymbol{\beta}_{t-1} + \boldsymbol{\omega}_t \\ Y_t &= \mathbf{x}_t^\prime\,\boldsymbol{\beta}_t + \nu_t where :math:`\mathbf{x}_t` is the vector of regressors at time :math:`t` (playing the role of :math:`\mathbf{F}_t`). Setting :math:`\mathbf{W} = \mathbf{0}` gives static (time-invariant) coefficients. Cyclic DLMs ~~~~~~~~~~~ A damped stochastic cycle with period :math:`\lambda` and damping factor :math:`\rho \in (0, 1]`: .. math:: \begin{pmatrix} \psi_t \\ \psi_t^* \end{pmatrix} = \rho \begin{pmatrix} \cos\omega & \sin\omega \\ -\sin\omega & \cos\omega \end{pmatrix} \begin{pmatrix} \psi_{t-1} \\ \psi_{t-1}^* \end{pmatrix} + \boldsymbol{\omega}_t, \qquad \omega = \frac{2\pi}{\lambda} State dimension: 2. Setting :math:`\rho = 1` recovers an undamped cycle. The eigenvalues of :math:`\mathbf{G}` have modulus :math:`\rho`, so the cycle decays geometrically when :math:`\rho < 1`. Autoregressive DLMs ~~~~~~~~~~~~~~~~~~~ An AR(p) process cast in companion-form state space. The state is :math:`\boldsymbol{\theta}_t = (x_t, x_{t-1}, \ldots, x_{t-p+1})^\prime` with system matrix: .. math:: \mathbf{G} = \begin{pmatrix} \phi_1 & \phi_2 & \cdots & \phi_p \\ 1 & 0 & \cdots & 0 \\ & \ddots & & \vdots \\ 0 & \cdots & 1 & 0 \end{pmatrix} and :math:`\mathbf{F} = (1, 0, \ldots, 0)^\prime`. State dimension: :math:`p`. Forecasting ----------- The :math:`k`-step-ahead forecast distribution is obtained by iterating the prior equations without updating: .. math:: \mathbf{a}_t(k) &= \mathbf{G}^k\,\mathbf{m}_t \\ \mathbf{R}_t(k) &= \mathbf{G}\,\mathbf{R}_t(k-1)\,\mathbf{G}^\prime + \mathbf{W} with :math:`\mathbf{R}_t(0) = \mathbf{C}_t`. The forecast distribution for the observable is: .. math:: (Y_{t+k} \mid D_t) \sim \mathcal{N}\!\left(\mathbf{F}^\prime\,\mathbf{a}_t(k),\; \mathbf{F}^\prime\,\mathbf{R}_t(k)\,\mathbf{F} + V\right) Forecast uncertainty grows with horizon :math:`k` due to the accumulated evolution noise :math:`\mathbf{W}`. Parameter Estimation -------------------- Maximum Likelihood Estimation (MLE) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The log-likelihood :math:`\ell(\boldsymbol{\psi}) = \log p(Y_{1:T} \mid \boldsymbol{\psi})` (where :math:`\boldsymbol{\psi}` collects the unknown hyperparameters :math:`V` and :math:`\mathbf{W}`) is computed via the prediction error decomposition from the Kalman filter. Since the entire computation is implemented in JAX, gradients :math:`\nabla_{\boldsymbol{\psi}} \ell` are obtained via automatic differentiation. Variance parameters must be positive. Dynaris provides log and softplus transforms to map unconstrained parameters to valid ranges: .. math:: \sigma^2 = \exp(\psi) \qquad \text{or} \qquad \sigma^2 = \log(1 + \exp(\psi)) EM Algorithm ~~~~~~~~~~~~ The Expectation--Maximization algorithm alternates between: **E-step:** Run the Kalman filter and smoother to obtain :math:`\mathbf{m}_t^{(s)}` and :math:`\mathbf{C}_t^{(s)}` for all :math:`t`. **M-step:** Update the variance components. For the observational variance: .. math:: \hat{V} = \frac{1}{T} \sum_{t=1}^{T} \left[ (Y_t - \mathbf{F}^\prime\,\mathbf{m}_t^{(s)})^2 + \mathbf{F}^\prime\,\mathbf{C}_t^{(s)}\,\mathbf{F} \right] The EM algorithm guarantees non-decreasing log-likelihood at each iteration for the exact M-step. Nonlinear Filters ----------------- When the state-space model is nonlinear, the Kalman filter no longer gives exact inference. Dynaris provides three approximate filters. Extended Kalman Filter (EKF) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The EKF linearizes the nonlinear functions at each time step via first-order Taylor expansion, then applies the standard Kalman recursion. Given the nonlinear state-space model: .. math:: \boldsymbol{\theta}_t &= f(\boldsymbol{\theta}_{t-1}) + \boldsymbol{\omega}_t, \quad \boldsymbol{\omega}_t \sim \mathcal{N}(\mathbf{0}, \mathbf{Q}) \\ Y_t &= h(\boldsymbol{\theta}_t) + \boldsymbol{\nu}_t, \quad \boldsymbol{\nu}_t \sim \mathcal{N}(\mathbf{0}, \mathbf{R}) **Predict:** .. math:: \mathbf{a}_t &= f(\mathbf{m}_{t-1}) \\ \mathbf{F}_t &= \left.\frac{\partial f}{\partial \boldsymbol{\theta}}\right|_{\mathbf{m}_{t-1}} \qquad \text{(Jacobian via } \texttt{jax.jacfwd}\text{)} \\ \mathbf{R}_t &= \mathbf{F}_t\,\mathbf{C}_{t-1}\,\mathbf{F}_t^\prime + \mathbf{Q} **Update:** .. math:: \hat{Y}_t &= h(\mathbf{a}_t) \\ \mathbf{H}_t &= \left.\frac{\partial h}{\partial \boldsymbol{\theta}}\right|_{\mathbf{a}_t} \\ \mathbf{S}_t &= \mathbf{H}_t\,\mathbf{R}_t\,\mathbf{H}_t^\prime + \mathbf{R} \\ \mathbf{K}_t &= \mathbf{R}_t\,\mathbf{H}_t^\prime\,\mathbf{S}_t^{-1} \\ \mathbf{m}_t &= \mathbf{a}_t + \mathbf{K}_t\,(Y_t - \hat{Y}_t) \\ \mathbf{C}_t &= (\mathbf{I} - \mathbf{K}_t\,\mathbf{H}_t)\,\mathbf{R}_t In dynaris, the Jacobians are computed automatically via ``jax.jacfwd`` --- no manual derivation required. Unscented Kalman Filter (UKF) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The UKF avoids linearization by propagating deterministically chosen **sigma points** through the nonlinear functions, then recovering the mean and covariance from the transformed points. **Sigma points** for a state with mean :math:`\mathbf{m}` and covariance :math:`\mathbf{C}` of dimension :math:`n`: .. math:: \boldsymbol{\chi}_0 &= \mathbf{m} \\ \boldsymbol{\chi}_i &= \mathbf{m} + \left[\sqrt{(n + \lambda)\,\mathbf{C}}\right]_i, \quad i = 1, \ldots, n \\ \boldsymbol{\chi}_{n+i} &= \mathbf{m} - \left[\sqrt{(n + \lambda)\,\mathbf{C}}\right]_i, \quad i = 1, \ldots, n where :math:`\lambda = \alpha^2(n + \kappa) - n` and :math:`[\sqrt{\mathbf{M}}]_i` is the :math:`i`-th row of the Cholesky factor of :math:`\mathbf{M}`. **Weights:** .. math:: w_0^{(m)} &= \frac{\lambda}{n + \lambda}, \quad w_0^{(c)} = w_0^{(m)} + 1 - \alpha^2 + \beta \\ w_i^{(m)} &= w_i^{(c)} = \frac{1}{2(n + \lambda)}, \quad i = 1, \ldots, 2n **Predict:** Propagate sigma points through :math:`f`, recover predicted mean and covariance from the weighted transformed points. **Update:** Propagate sigma points through :math:`h`, compute cross-covariance, and apply the Kalman gain. The UKF captures second-order nonlinear effects that the EKF misses, and requires no Jacobian computation. Particle Filter (Sequential Monte Carlo) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The particle filter represents the filtering distribution as a weighted set of :math:`N` **particles** (samples), enabling inference in arbitrarily nonlinear and non-Gaussian models. **Algorithm** (bootstrap particle filter): 1. **Resample:** Draw ancestor indices from the categorical distribution defined by the current weights. 2. **Propagate:** For each particle :math:`i`: .. math:: \boldsymbol{\theta}_t^{(i)} = f\!\left(\boldsymbol{\theta}_{t-1}^{(i)}\right) + \boldsymbol{\omega}_t^{(i)}, \quad \boldsymbol{\omega}_t^{(i)} \sim \mathcal{N}(\mathbf{0}, \mathbf{Q}) Implemented via ``jax.vmap`` for vectorized propagation. 3. **Reweight:** Compute unnormalized weights from the observation likelihood: .. math:: \tilde{w}_t^{(i)} = p\!\left(Y_t \mid \boldsymbol{\theta}_t^{(i)}\right) Normalize: :math:`w_t^{(i)} = \tilde{w}_t^{(i)} / \sum_j \tilde{w}_t^{(j)}`. 4. **Estimate:** The filtered mean and covariance are weighted statistics of the particle cloud. **Resampling strategies:** Multinomial, systematic (default), and stratified. The **effective sample size** :math:`\text{ESS} = 1 / \sum_i (w_t^{(i)})^2` monitors particle degeneracy. **Log-likelihood:** :math:`\log \hat{p}(Y_t \mid Y_{1:t-1}) = \log\!\left(\frac{1}{N}\sum_i \tilde{w}_t^{(i)}\right)`. Markov-Switching Models ----------------------- Hamilton Filter ~~~~~~~~~~~~~~~ The Hamilton filter extends the Kalman filter to models with :math:`K` discrete regimes governed by a Markov chain with transition matrix :math:`\mathbf{P}`, where :math:`P_{ij} = \Pr(S_t = j \mid S_{t-1} = i)`. At each time step, the filter maintains :math:`K` Gaussian state beliefs (one per regime) and a vector of regime probabilities. **Algorithm** (with Kim's collapse approximation): 1. **Collapse:** For each target regime :math:`j`, compute the probability-weighted mixture of the :math:`K` source-regime states: .. math:: w_{i|j} &= \frac{P_{ij}\,\pi_{t-1}^{(i)}}{\sum_k P_{kj}\,\pi_{t-1}^{(k)}} \\ \mathbf{m}_{t|t-1}^{(j)} &= \sum_i w_{i|j}\,\mathbf{m}_{t-1}^{(i)} \\ \mathbf{C}_{t|t-1}^{(j)} &= \sum_i w_{i|j}\!\left[\mathbf{C}_{t-1}^{(i)} + \boldsymbol{\delta}_i\,\boldsymbol{\delta}_i^\prime\right] where :math:`\boldsymbol{\delta}_i = \mathbf{m}_{t-1}^{(i)} - \mathbf{m}_{t|t-1}^{(j)}`. 2. **Predict/Update:** Run :math:`K` parallel Kalman predict+update steps (one per regime), each using regime-specific matrices :math:`\{\mathbf{G}_j, \mathbf{F}_j, \mathbf{W}_j, V_j\}`. Implemented via ``jax.vmap``. 3. **Regime probability update:** .. math:: \pi_t^{(j)} = \frac{p(Y_t \mid S_t = j,\, D_{t-1})\,\bar{\pi}_t^{(j)}}{\sum_k p(Y_t \mid S_t = k,\, D_{t-1})\,\bar{\pi}_t^{(k)}} where :math:`\bar{\pi}_t^{(j)} = \sum_i P_{ij}\,\pi_{t-1}^{(i)}` are the predicted regime probabilities. Kim Smoother ~~~~~~~~~~~~ The Kim smoother is the backward counterpart of the Hamilton filter, producing smoothed regime probabilities and state estimates. For :math:`t = T-1, T-2, \ldots, 1`: 1. **Smoothed regime probabilities:** .. math:: \pi_t^{(i,s)} = \pi_t^{(i)} \sum_j \frac{P_{ij}\,\pi_{t+1}^{(j,s)}}{\bar{\pi}_{t+1}^{(j)}} 2. **Per-regime RTS step:** Apply the RTS smoother backward recursion independently for each regime :math:`j`, using regime-specific :math:`\mathbf{G}_j`. 3. **Collapse** the :math:`K` smoothed states using the smoothed regime probabilities. Dynamic Factor Models --------------------- A DFM with :math:`r` latent factors and :math:`m` observed variables: .. math:: \mathbf{f}_t &= \mathbf{G}_f\,\mathbf{f}_{t-1} + \boldsymbol{\omega}_t, \quad \boldsymbol{\omega}_t \sim \mathcal{N}(\mathbf{0}, \mathbf{I}_r) \\ \mathbf{y}_t &= \boldsymbol{\Lambda}\,\mathbf{f}_t + \mathbf{e}_t, \quad \mathbf{e}_t \sim \mathcal{N}(\mathbf{0}, \mathbf{R}) where :math:`\boldsymbol{\Lambda} \in \mathbb{R}^{m \times r}` is the **loading matrix**, :math:`\mathbf{R}` is diagonal (idiosyncratic noise), and :math:`\mathbf{Q} = \mathbf{I}_r` for identification. **EM estimation** alternates: - **E-step:** Kalman filter + RTS smoother on the factor state space. - **M-step:** Update :math:`\boldsymbol{\Lambda}` and :math:`\mathbf{R}`: .. math:: \hat{\boldsymbol{\Lambda}} &= \left[\sum_t \mathbf{y}_t\,\mathbf{m}_t^{(s)\prime}\right] \left[\sum_t \left(\mathbf{C}_t^{(s)} + \mathbf{m}_t^{(s)}\,\mathbf{m}_t^{(s)\prime}\right)\right]^{-1} \\ \hat{R}_{ii} &= \frac{1}{T}\sum_t \left[(y_{t,i} - \hat{\boldsymbol{\Lambda}}_i\,\mathbf{m}_t^{(s)})^2 + \hat{\boldsymbol{\Lambda}}_i\,\mathbf{C}_t^{(s)}\,\hat{\boldsymbol{\Lambda}}_i^\prime\right] **Varimax rotation** is applied post-estimation for interpretability, maximizing the variance of squared loadings across factors. Bayesian Estimation ------------------- The log-posterior for a DLM with hyperparameters :math:`\boldsymbol{\psi}` (variance parameters) is: .. math:: \log p(\boldsymbol{\psi} \mid Y_{1:T}) \propto \underbrace{\log p(Y_{1:T} \mid \boldsymbol{\psi})}_{\text{Kalman filter log-likelihood}} + \underbrace{\log p(\boldsymbol{\psi})}_{\text{prior}} Dynaris uses NumPyro's NUTS (No-U-Turn Sampler) to draw posterior samples. Since the entire Kalman filter log-likelihood is differentiable via JAX autodiff, NUTS can efficiently explore the posterior using Hamiltonian dynamics. **Posterior predictive forecasting:** For each posterior sample :math:`\boldsymbol{\psi}^{(s)}`, run the Kalman filter and forecast. Aggregate across samples for credible intervals. **Model comparison** via WAIC and LOO-CV requires **pointwise** log-likelihoods :math:`\log p(Y_t \mid Y_{1:t-1}, \boldsymbol{\psi})`, which are the individual terms of the prediction error decomposition. References ---------- - West, M. and Harrison, J. (1997). *Bayesian Forecasting and Dynamic Models*, 2nd edition. Springer. - Durbin, J. and Koopman, S.J. (2012). *Time Series Analysis by State Space Methods*, 2nd edition. Oxford University Press. - Harvey, A.C. (1989). *Forecasting, Structural Time Series Models and the Kalman Filter*. Cambridge University Press. - Petris, G., Petrone, S., and Campagnoli, P. (2009). *Dynamic Linear Models with R*. Springer. - Julier, S.J. and Uhlmann, J.K. (2004). "Unscented Filtering and Nonlinear Estimation." *Proceedings of the IEEE*, 92(3), 401-422. - Doucet, A., de Freitas, N. and Gordon, N. (2001). *Sequential Monte Carlo Methods in Practice*. Springer. - Kim, C.-J. (1994). "Dynamic Linear Models with Markov-Switching." *Journal of Econometrics*, 60(1-2), 1-22. - Hamilton, J.D. (1989). "A New Approach to the Economic Analysis of Nonstationary Time Series and the Business Cycle." *Econometrica*, 57(2). - Stock, J.H. and Watson, M.W. (2002). "Forecasting Using Principal Components from a Large Number of Predictors." *JASA*, 97(460).