Menu Menu

Generative Modeling Reveals the Connection between Cellular Morphology and Gene Expression

COSMIC (Cross-modal generation between Single-cell RNA-seq and MICroscopy images) is a bidirectional generative framework that quantitatively links single-cell nuclear morphology and gene expression. It captures the bidirectional flow of information between cellular form and gene expression, opening new avenues for mechanistic discovery and predictive modeling in both basic and translational cell biology.

The understanding of how transcriptional programs give rise to cellular morphology, and how morphological features reflect and influence cell identity and function remains limited. This is due in part to the lack of large-scale datasets pairing the two modalities, as well as the absence of computational frameworks capable of modeling their cross-modal structure.

We introduce COSMIC, a bidirectional generative framework that enables quantitative decomposition of transcriptional variance reflected in morphology and morphological variance explained by gene expression. COSMIC accurately modeled cell type identity, as well as continuous dynamics such as cell-cycle progression, establishing a quantitative link between morphological phenotypes and underlying gene expression.

Publication

Generative Modeling Reveals the Connection between Cellular Morphology and Gene Expression
Shuo Wen, Ramon Vinas Torne, Johannes Bues, Camille Lucie Lambert, Nadia Grenningloh, Timothee Ferrari, Elisa Bugani, Joern Pezoldt, Jillian Rose Love, Wouter Karthaus, Bart Deplancke, and Maria Brbic

@article {Wen2026.01.22.700673,
	author = {Wen, Shuo and Vi{\~n}as Torn{\'e}, Ramon and Bues, Johannes and Lambert, Camille Lucie and Grenningloh, Nadia and Ferrari, Timoth{\'e}e and Bugani, Elisa and Pezoldt, Joern and Love, Jillian Rose and Karthaus, Wouter and Deplancke, Bart and Brbi{\'c}, Maria},
	title = {Generative modeling reveals the connection between cellular morphology and gene expression},
	year = {2026},
	URL = {https://www.biorxiv.org/content/early/2026/01/24/2026.01.22.700673},
	journal = {bioRxiv}
}

Overview of COSMIC

COSMIC is built on two key ingredients:
(1) a foundation model of nuclear morphology pretrained on over 21 million segmented nuclei, and
(2) conditional diffusion models trained on paired single-cell images and transcriptomes generated using the IRIS platform.

By learning to translate from transcriptomes to nuclear images and from nuclear images to transcriptomes, COSMIC captures how transcriptional programs manifest in cellular form and how morphological variation reflects underlying gene expression. This unified framework enables generation, prediction, and discovery of morphology-associated genes at single-cell resolution.

COSMIC generates realistic nuclear images from gene expression

COSMIC accurately synthesizes high-resolution nuclear images conditioned on single-cell transcriptomic profiles. Generated images closely match real nuclei in size, shape, and texture, and preserve cell-type-specific morphology. In the embedding space, real and generated nuclei overlap strongly.

Generated images retain sufficient biological signal to support accurate cell type classification, approaching the performance achieved on real microscopy images. COSMIC generalizes to unseen batches, new donors, and even across species.

COSMIC predicts transcriptomic profiles from nuclear morphology

In the reverse direction, COSMIC infers biologically meaningful transcriptomic profiles directly from nuclear images. Predicted transcriptomes recover the global structure of gene expression space, preserve cell-type separation, and enable accurate downstream cell-type classification.

At the gene level, COSMIC identifies a subset of genes whose expression can be robustly predicted from morphology, including cell-type marker genes with high correlation to ground truth. These results demonstrate that nuclear morphology encodes precise information about specific transcriptional programs.

COSMIC captures continuous cell-cycle dynamics

Beyond discrete cell types, COSMIC learns continuous biological processes. In mouse fibroblasts, COSMIC accurately recovers cell-cycle dynamics from both modalities. Predicted transcriptomes reproduce phase-specific expression patterns of canonical cell-cycle genes, while generated nuclear images reflect expected morphological changes across the cell cycle, such as systematic variation in nuclear size.

COSMIC identifies morphology-associated genes in cancer

Applied to prostate cancer cells treated with chemotherapy, COSMIC uncovers genes whose expression is tightly coupled to nuclear morphology. Using a cycle-consistent generative strategy, COSMIC identifies morphology-associated genes linked to treatment response and cell-cycle arrest.

Code

A PyTorch implementation of COSMIC is available on GitHub.

Contributors

The following people contributed to this work:

Shuo Wen

Ramon Vinas Torne

Johannes Bues

Camille Lucie Lambert

Nadia Grenningloh

Timothee Ferrari

Elisa Bugani

Joern Pezoldt

Jillian Rose Love

Wouter Karthaus

Bart Deplancke

Maria Brbić