Foundation Model Survival Analysis

This work, ongoing as an MSc dissertation at the University of Warwick under the supervision of Matheus F. Torquato, investigates an unexplored intersection: whether frozen time-series foundation models can serve as viable feature extractors for survival-analytic remaining useful life estimation essentially bypassing the fine-tuning paradigm entirely.

Setup

The experimental design pairs three pre-trained foundation models i.e. MOMENT, Chronos, and Moirai with six survival analysis heads (CoxPH, DeepHit, MTLR, CoxTime, DSM, Weibull). All FM weights remain frozen; the survival heads receive raw embeddings with no task-specific adaptation of the encoder.

Evaluation spans three benchmark datasets i.e. C-MAPSS and N-CMAPSS (simulated turbofan degradation trajectories) and XJTU-SY (accelerated bearing run-to-failure vibration records) giving a combinatorial matrix of 3 FMs × 6 heads × multiple dataset splits × 3 seeds: several hundred independent training runs.

Frozen foundation model pipeline: sensor time-series → FM embeddings (frozen weights) → survival head → S(t) curve

Why Survival Analysis?

The dominant prognostics paradigm treats RUL as a point regression problem, collapsing the full temporal uncertainty into a scalar estimate. Survival analysis recovers what regression discards: the complete conditional distribution over failure time, giving not a single point prediction but a survival function $$S(t \mid \mathbf{x})$$ from which any quantile, confidence band, or decision threshold can be derived.

For fleet-level maintenance scheduling under asymmetric cost structures, this distributional output is operationally essential.

Survival curves S(t) for four engine trajectories — full probability distribution over failure time, with 90% confidence band

Results

Three results anchor the empirical contribution:

  • C-index of 0.949 on N-CMAPSS (Chronos + CoxPH). Near-perfect discriminative ranking of failure risk across heterogeneous flight regimes
  • RMSE of 11.4 cycles on C-MAPSS FD003 (Chronos + CoxPH), competitive with state-of-the-art regression-only methods that directly minimise MSE, despite the survival objective optimising a fundamentally different criterion
  • 8.5× error reduction relative to Dinten et al. (CMES, 2025) on multi-condition C-MAPSS subsets, attributable to a corrected normalisation scheme detailed below

To the best of our knowledge, this is the first investigation coupling frozen foundation model embeddings with survival analysis heads in the prognostics literature.

Normalisation

The most consequential finding concerns preprocessing rather than architecture.

MOMENT exhibited near-chance discriminative performance (C-index ≈ 0.5) on the multi-condition C-MAPSS subsets (FD002, FD004), while performing strongly on single-condition splits. A discrepancy too systematic to attribute to the encoder itself.

The root cause was global normalisation conflating operating regimes, superimposing distinct degradation manifolds into an indistinct feature cloud. Stratifying normalisation per engine and per operating condition recovered +22 percentage points of C-index, improving chance-level output to competitive discrimination. What appeared to be a model-level failure was, in fact, a preprocessing artifact.

Which Head Wins?

When the input representation is a frozen, non-adapted embedding space, survival heads are not interchangeable.

CoxPH was the most consistently reliable head across all FM, dataset combinations. Its Cox partial likelihood requires only correct relative risk ordering i.e. it is agnostic to the marginal distribution of the embedding space, demanding only that higher-risk units yield monotonically higher hazard scores. This invariance to distributional form makes it well-suited to frozen representations whose geometry was shaped by unrelated pre-training objectives.

Conversely, Weibull and DSM heads exhibited systematic degeneracy. Both impose strong parametric assumptions on the conditional hazard, assumptions that foundation model embeddings, shaped by masked reconstruction or forecasting losses, have no reason to satisfy. The Weibull head collapsed to identical survival curves for every unit across all FM–dataset pairs, a pathology consistent with the embedding manifold lying outside the family of distributions the head can represent.

Survival head performance ranking on N-CMAPSS (Chronos) — CoxPH leads at 0.949 C-index; Weibull and DSM degenerate

Cross-Domain Transfer: Engines → Bearings

The stronger test of a universal feature extractor is cross-domain transfer: whether embeddings learned from one signal type generalise to physically distinct degradation mechanisms without adaptation.

On XJTU-SY accelerated bearing degradation data, the framework transfers effectively. Moirai + CoxPH achieved a C-index of 0.828, exceeding a from-scratch LSTM+DeepHit baseline by 21 percentage points (0.607). Notably, the FM performance ranking inverts: Moirai dominates on bearing vibration whereas Chronos leads on turbofan operating profiles. This is evidence that pre-training corpus composition and architectural inductive biases interact non-trivially with downstream signal characteristics.

XJTU-SY bearing vibration waveform with accelerating wear — Moirai+CoxPH C-index 0.828 vs LSTM+DeepHit baseline 0.607 (+21 pp)

What’s Next

First, Chronos-2 (October 2025) introduces cross-series attention, potentially resolving the channel-independent embedding bottleneck that limits multivariate sensor fusion in the current pipeline. Second, end-to-end fine-tuning of the FM encoder under a survival objective would test whether task-adapted representations can close the remaining gap to fully supervised prognostic models. Though the frozen paradigm’s minimal computational overhead and reproducibility remain compelling practical benefits.

← back to projects