InstructCell: An AI Copilot for Single-Cell Analysis Using Natural Language

AI-Powered Single-Cell Analysis: InstructCell – A Multimodal Copilot

The analysis of single-cell RNA sequencing data (scRNA-seq) is considered key to understanding complex biological processes. It provides detailed insights into the gene expression of individual cells and enables the identification of various cell types, states, and their interactions. However, traditional analysis methods are often complex and time-consuming. With InstructCell, a research team now presents a new approach: a multimodal AI copilot that simplifies single-cell analysis through natural language processing.

InstructCell is based on large language models (LLMs), known for their ability to interpret complex natural language instructions and perform diverse tasks. The researchers consider scRNA-seq data as the "language of cell biology" and leverage the strengths of LLMs to understand and process this language. At its core, InstructCell combines the power of LLMs with a multimodal architecture approach that can interpret both text instructions and scRNA-seq profiles simultaneously.

InstructCell was trained on an extensive multimodal dataset that pairs text instructions with scRNA-seq profiles from various tissues and species. This approach allows the model to learn the relationship between natural language descriptions and the underlying gene expression patterns. By integrating both modalities, InstructCell can perform complex tasks such as cell type annotation, conditional generation of pseudo-cells, and drug sensitivity prediction using simple natural language commands.

The developers of InstructCell have tested the performance of their model in extensive evaluations. The results show that InstructCell matches or even surpasses the performance of existing single-cell foundation models while also adapting to different experimental conditions. A crucial advantage of InstructCell lies in its intuitive usability. Researchers can perform complex data analyses without requiring in-depth programming skills or specialized bioinformatics expertise. This lowers the technical barriers and allows a broader audience to gain biological insights from single-cell data.

The development of InstructCell is in the context of the growing interest in AI-powered solutions for biomedical research. Similar approaches, such as PathChat, a multimodal AI assistant for pathology, or multimodal generative AI models for medical image analysis, highlight the potential of AI to simplify complex data evaluations and accelerate research. Mindverse, as a German company specializing in the development of customized AI solutions, is observing these developments with great interest and sees the integration of multimodal AI systems like InstructCell into its platform as a promising opportunity to provide researchers and developers with powerful tools for analyzing biological data. The combination of AI-powered text generation, image analysis, and research tools enables Mindverse, as a holistic AI partner, to advance biomedical research.

Outlook

InstructCell represents an important step towards more accessible and efficient single-cell analysis. The ability to control complex analyses via natural language opens up new possibilities for biomedical research. Future developments could include the integration of InstructCell into existing bioinformatics platforms and the expansion of the model to other data types and analysis tasks. The ongoing development of AI-powered tools like InstructCell promises to revolutionize biomedical research and deepen our understanding of complex biological systems.

Bibliographie Browaeys, R., Saelens, W. & Saeys, Y. NicheNet: modeling intercellular communication by linking ligands to target genes. Nat Methods (2019) Bonnardel et al. Stellate Cells, Hepatocytes, and Endothelial Cells Imprint the Kupffer Cell Identity on Monocytes Colonizing the Liver Macrophage Niche. Immunity (2019) Guilliams et al. Spatial proteogenomics reveals distinct and evolutionarily conserved hepatic macrophage niches. Cell (2022) Lu, M.Y. et al. A multimodal generative AI copilot for human pathology. Nature 634, 466–473 (2024). OpenAI. GPT-4 Technical Report. (2023). Schaefer, M. et al. Multimodal learning of transcriptomes and text enables interactive single-cell RNA-seq data exploration with natural-language chats. bioRxiv 2024.10.15.618501 (2024). Szałata, A. et al. Transformers in single-cell omics: a review and new perspectives. Nat Methods 21, 1430–1443 (2024). Xiao, Y. et al. CellAgent: An LLM-driven Multi-Agent Framework for Automated Single-cell Data Analysis. arXiv preprint arXiv:2407.09811 (2024). Zhang, G. et al. A Multimodal Vision-text AI Copilot for Brain Disease Diagnosis and Medical Imaging. medRxiv 2025.01.09.25320293 (2025). Li, B. et al. MMedAgent: Learning to Use Medical Tools with Multi-modal Agent. arXiv preprint arXiv:2407.02483 (2024).