BLOG

Biology AI Models Have a Blind Spot: They See Cells as Snapshots, Not Living Systems

Z Zara Mitchell Apr 7, 2026 4 min read
Engine Score 5/10 — Notable

Important research limitation in biology AI models but niche audience and technical scope.

Biology AI Models Have a Blind Spot: They See Cells as Snapshots, Not Living Systems
  • Most foundation models used in computational biology treat single-cell transcriptomic data as static snapshots, ignoring the temporal dynamics that govern how cells change state, differentiate, and respond to drugs.
  • This limitation affects downstream applications including drug target discovery, disease progression modeling, and cell therapy development, where understanding cellular trajectories over time is critical.
  • Researchers are developing approaches like RNA velocity modeling and temporal variational autoencoders to incorporate dynamic information, but these methods have not yet been integrated into the largest foundation models.
  • Billions of dollars in AI biotech investment are riding on models that may be fundamentally incomplete in how they represent cellular biology.

What Happened

Recent research has drawn attention to a structural limitation in the foundation models now widely used in computational biology: they treat single-cell transcriptomes as frozen measurements rather than as points along continuous biological trajectories. Single-cell RNA sequencing (scRNA-seq) captures the gene expression profile of individual cells at one moment in time. Foundation models like Geneformer, developed by Christina Theodoris at Harvard, and scGPT, developed by Bo Wang’s lab at the University of Toronto, are trained on millions of these static profiles. But cells are not static objects. They are constantly transitioning between states, differentiating, dividing, and dying.

The result is that these models can classify cell types and predict gene regulatory networks from snapshots, but they struggle to capture the dynamics of how a healthy cell becomes cancerous, how an immune cell activates in response to infection, or how a stem cell differentiates into a specific tissue type.

Why It Matters

The pharmaceutical and biotech industries are investing heavily in AI-driven drug discovery platforms that rely on these foundation models. Companies like Recursion Pharmaceuticals, Insitro (founded by Daphne Koller), and Cellarity have collectively raised billions of dollars on the premise that AI can model cellular behavior well enough to identify drug targets and predict therapeutic outcomes. If the underlying models cannot represent how cells change over time, the predictions they generate may miss critical dynamics that determine whether a drug candidate succeeds or fails in clinical trials.

The problem is particularly acute in oncology research, where tumor evolution and drug resistance are inherently temporal processes. A model that sees a cancer cell only as a static gene expression profile cannot predict how that cell will evolve resistance to a targeted therapy over weeks or months. Similarly, in cell therapy development for conditions like type 1 diabetes or Parkinson’s disease, understanding the precise trajectory of stem cell differentiation is essential to producing the correct cell type at clinical scale.

Technical Details

Single-cell transcriptomics measures the messenger RNA present in individual cells, producing a vector of expression values across approximately 20,000-30,000 genes per cell. Modern scRNA-seq datasets can contain hundreds of thousands to millions of cells. Foundation models trained on these datasets use transformer or variational autoencoder architectures to learn representations of cell states. Geneformer, for example, was pre-trained on approximately 30 million single-cell transcriptomes from human tissues and has demonstrated strong performance on cell type classification and gene network inference tasks.

The temporal blind spot arises because each cell in the training data is sequenced once and then destroyed in the process. There is no way to observe the same cell twice. Researchers have developed computational workarounds, most notably RNA velocity, a method introduced by Gioele La Manno and colleagues in a 2018 Nature paper. RNA velocity estimates the direction and speed of a cell’s state change by comparing the ratio of unspliced to spliced mRNA molecules. Tools like scVelo, developed by Volker Bergen at the Helmholtz Center Munich, extended this approach using dynamical modeling. However, these methods operate as post-hoc analysis layers rather than being integrated into the foundation model training process itself.

More recent work on temporal variational autoencoders (temporal-VAE) and neural ordinary differential equations (neural ODEs) attempts to model cell state trajectories as continuous curves through gene expression space. These approaches have shown promise in small-scale studies but have not yet been scaled to the dataset sizes used by the largest foundation models.

Who’s Affected

Drug discovery companies using AI foundation models for target identification and lead optimization are most directly affected. If their models cannot capture temporal dynamics, they risk pursuing drug targets that appear promising in static analysis but fail when cellular behavior unfolds over time. Academic researchers in developmental biology, immunology, and cancer biology who rely on these models for hypothesis generation may also be working with an incomplete picture. Investors in AI biotech companies should scrutinize whether the platforms they are funding have addressed this temporal limitation or are building on models that treat biology as a collection of photographs rather than a film.

What’s Next

Several research groups are working to close this gap. The Theis Lab at the Helmholtz Center Munich is developing next-generation trajectory inference methods designed to integrate with foundation model architectures. Meta AI’s protein science team and the Arc Institute, co-founded by Patrick Hsu, have signaled interest in building temporal awareness into biological foundation models. The critical challenge remains data: capturing true temporal measurements at single-cell resolution requires experimental techniques like lineage tracing and live-cell imaging that are orders of magnitude more expensive than standard scRNA-seq, limiting the training data available for temporally aware models.

Share

Enjoyed this story?

Get articles like this delivered daily. The Engine Room — free AI intelligence newsletter.

Join 500+ AI professionals · No spam · Unsubscribe anytime