Junjun He
Principal Investigator

Junjun He

Principal Investigator
Junjun He leads the GMAI (General Medical AI) research group. He received his PhD from Shanghai Jiao Tong University (SJTU), advised by Prof. Lixu Gu, and conducted research at the Multimedia Lab (MMLAB) of Shenzhen Institute of Advanced Technology (SIAT), CAS, with Prof. Yu Qiao. His research interests span dense prediction (medical image segmentation, object detection, instance segmentation), efficient deep learning (model compression, NAS, quantization), and general medical AI — including multimodal large language models, segmentation foundation models, clinical AI systems, and biomedical data infrastructure.

About Me

Junjun He is a Young Scientist at Shanghai Artificial Intelligence Laboratory, a Full-Time Mentor at the Shanghai Innovation Institute (SII), and an Adjunct PhD Supervisor at Fudan University. His current research focuses on multimodal understanding, multimodal generation, unified multimodal understanding-and-generation, and multi-agent systems and their applications in medicine. He has over 11,000 Google Scholar citations and an h-index of 46, was named to Stanford University’s list of the World’s Top 2% Scientists, and received the MICCAI 2025 Best Paper and Young Scientist Award Shortlist and the MICCAI 2025 Best Workshop Paper Award. He has won more than 10 awards in international challenges, including 6 championships. As head of the General Medical AI (GMAI) team at Shanghai AI Laboratory, he leads the team in building and open-sourcing multiple large-scale benchmark datasets and high-performance models for medical AI. Representative results include the 3D medical image pretraining model STU-Net; the medical image segmentation foundation models SAM-Med2D and SAM-Med3D; the large-scale systematic medical multimodal benchmarks OmniMedVQA and GMAI-MMBench; the general medical multimodal large model GMAI-VL; the gigapixel pathology WSI multimodal large model SlideChat; the large-scale fundus photography generation model RetinaLogos; and the ophthalmic surgical video generation model Ophora. Recently, the team’s open-source Project Imaging-X (a large-scale medical imaging data survey and open sharing platform) has drawn wide attention both in China and abroad. He has also contributed to major projects including the general multimodal large model InternVL, the scientific multimodal large model Intern-S1, and the unified understanding-and-generation model Lumina-DiMOO.

Research Interests

  • Medical Multimodal LLMs: GMAI-VL series, UniMedVL, SlideChat
  • Medical Segmentation Foundation Models: SAM-Med2D, SAM-Med3D, STU-Net (14M–1.4B params)
  • Clinical AI Systems: MedSegAgent multi-agent segmentation, surgical video understanding (OphCLIP, Ophora)
  • Medical Data Infrastructure: Project Imaging-X (1,000+ open medical imaging datasets)
  • Efficient Deep Learning: model compression, NAS, quantization

Highlights

The GMAI group has achieved systematic breakthroughs in general medical AI:

Medical Multimodal LLMs — GMAI-VL, trained on 5.5M image-text pairs across 18 clinical specialties and 38 imaging modalities, is a world-leading medical vision-language model. SlideChat is the first vision-language assistant to understand gigapixel whole-slide pathology images (CVPR 2025). GMAI-VL-R1 uses reinforcement learning to achieve ~30% average accuracy improvement across eight modalities, surpassing models 36× larger.

Medical Segmentation Foundation Models — SAM-Med3D extends the SAM architecture to 3D medical imaging and is among the most widely adopted open-source models in the field. The STU-Net family (14M–1.4B parameters) is the largest medical segmentation model to date, achieving 90.06% mean DSC on TotalSegmentator and winning the MICCAI 2023 ATLAS and SPPIN challenge championships.

Clinical AI Systems — OphCLIP (ICCV 2025) built a 375K video-text pair ophthalmic surgical dataset, achieving state-of-the-art zero-shot performance on 11 benchmarks. MedSegAgent orchestrates multi-agent collaboration for universal medical image segmentation across 23 datasets and 343 targets (IEEE JBHI 2026).

Academic Overview

  • Published at top venues including CVPR, ICCV, ECCV, NeurIPS, MICCAI, AAAI
  • Open-source projects with thousands of GitHub stars
  • Collaboration interest from Stanford University and other leading institutions
  • Led team to championship victories at multiple MICCAI 2023 challenges

Selected Early Works

  • APCNet: Adaptive Pyramid Context Network for Semantic Segmentation (CVPR 2019)
  • Dynamic Multi-scale Filters for Semantic Segmentation (ICCV 2019)
  • EfficientFCN: Holistically-guided Decoding for Semantic Segmentation (ECCV 2020; 4 ECCV papers in the same year)
  • ODIR-2019 Competition: 1st place in Ocular Disease Intelligent Recognition (Rank 1/1500+)

Professional Service

  • Reviewer for CVPR, MICCAI, ICME