News
November 2025 Paper
MedQ-Deg released — benchmarking MLLM robustness under medical image degradations
We release MedQ-Deg, a multidimensional benchmark evaluating 40 mainstream MLLMs across 18 degradation types, 30 capability dimensions, and 7 imaging modalities (24,894 QA pairs). We reveal the AI Dunning-Kruger Effect — models maintain high confidence despite severe accuracy collapse under degraded inputs.
September 2025 Paper
UniMedVL: First unified model for medical image understanding and generation
We introduce UniMedVL, the first medical multimodal model that unifies image understanding and generation in a single architecture. Built on 5.6M multimodal samples and Progressive Curriculum Learning, UniMedVL achieves strong results on 5 understanding benchmarks and 8 generation modalities.
October 2025 Paper
MedQ-Bench: New benchmark for medical image quality assessment in MLLMs
We release MedQ-Bench, a novel perception-reasoning benchmark for medical image quality assessment covering 5 modalities and 40+ quality attributes (3,308 samples). Zero-shot evaluation of 14 leading MLLMs shows GPT-4 achieves 68.97% accuracy — still 13.5% below expert performance.
September 2025 Join
Welcoming new visiting researchers to GMAI Lab
We welcome Wenhao Tang (Nankai University), Shujian Gao (Fudan University), and Jiashi Lin (Northwestern Polytechnical University) as new visiting researchers, bringing expertise in computational pathology, multimodal learning, and LLM-based agents.
August 2025 Paper
Survey of Scientific LLMs released — collaboration with 20+ global institutions
We release a comprehensive survey of Scientific Large Language Models (Sci-LLMs), produced in collaboration with 20+ leading global institutions. The survey covers 1,000+ papers and 600+ key datasets, and proposes a roadmap for AI-assisted scientific discovery ecosystems.
July 2025 Paper
OphCLIP accepted at ICCV 2025 — hierarchical surgical video-language pre-training
OphCLIP, our hierarchical retrieval-augmented framework for ophthalmic surgical video understanding, was accepted at ICCV 2025. Built on OphVL (375K+ video-text pairs), OphCLIP sets new records on 11 benchmarks for phase recognition and multi-instrument detection tasks.
June 2025 Paper
Ophora accepted at MICCAI 2025 as Oral Presentation
Our text-guided ophthalmic surgical video generation model Ophora was accepted at MICCAI 2025 as an oral presentation. Ophora is trained on 160K video-text pairs and outperforms existing methods on FID, FVD, and CLIPScore metrics.
June 2025 Paper
RetinaLogos, ProgEmu, and MRI Translation accepted at MICCAI 2025
Three papers accepted at MICCAI 2025: RetinaLogos (language-driven high-resolution fundus image generation), ProgEmu (interpretable counterfactual medical image generation), and Multi-modal MRI Translation via Evidential Regression and Distribution Calibration.
May 2025 Paper
MedITok: First unified visual tokenizer for medical image synthesis and interpretation
We release MedITok, the first unified visual tokenizer designed for medical images. Pre-trained on 30M+ images, MedITok achieves SOTA across reconstruction, classification, generation, and VQA tasks spanning 9 imaging modalities and 30+ datasets.
May 2025 Paper
MedSegAgent accepted at IEEE Journal of Biomedical and Health Informatics
MedSegAgent, our multi-agent system for instructive medical image segmentation, was accepted at IEEE Journal of Biomedical and Health Informatics (JBHI). The system supports 343 segmentation targets across CT, MRI, PET/CT, and ultrasound without training a single universal model.
February 2025 Paper
SlideChat accepted at CVPR 2025
Our whole-slide pathology image understanding assistant SlideChat was accepted at CVPR 2025. SlideChat achieves 81.17% accuracy on SlideBench-VQA and surpasses state-of-the-art on 18 of 22 benchmark tasks.
December 2024 Paper
GMAI-MMBench presented at NeurIPS 2024 — evaluating 50 large vision-language models
GMAI-MMBench was presented at NeurIPS 2024 — the most comprehensive general medical AI evaluation platform covering 284 datasets, 38 imaging modalities, and 18 clinical tasks. Even top-performing GPT-4o achieves only 53.96% accuracy.
November 2024 Paper
GMAI-VL released — general medical VLM trained on 5.5M image-text pairs
We release GMAI-VL, a general-purpose medical vision-language model trained on GMAI-VL-5.5M — 5.5M high-quality image-text pairs spanning 18 clinical specialties and 10+ imaging modalities. GMAI-VL achieves or surpasses SOTA on multiple medical multimodal VQA and diagnostic reasoning benchmarks.
October 2024 Award
SAM-Med3D selected as Oral at ECCV 2024 Biomedical Image Computing Workshop
SAM-Med3D was selected as an oral presentation at the ECCV 2024 Biomedical Image Computing (BIC) Workshop. SAM-Med3D adapts SAM to 3D volumetric medical images, covering 247 segmentation categories across 21K medical volumes.
June 2024 Paper
OmniMedVQA accepted at CVPR 2024 — large-scale medical VQA benchmark
OmniMedVQA was accepted at CVPR 2024. The benchmark integrates 73 datasets across 12 imaging modalities and 20+ anatomical regions, revealing that many medical-specific models surprisingly underperform general-purpose LVLMs on medical VQA tasks.