Medical Image Segmentation

MedSegAgent: A Universal and Scalable Multi-Agent System for Instructive Medical Image Segmentation

Orchestrating specialized segmentation models through natural language instructions, coarse-to-fine dataset matching, and multi-model result integration

Shanghai AI Laboratory in collaboration with Shanghai Jiao Tong University.

GitHub IEEE Xplore

Published in IEEE Journal of Biomedical and Health Informatics (JBHI), 2026.

MedSegAgent framework overview — Overview of the MedSegAgent framework: natural language query parsing, coarse-to-fine dataset matching (modality → anatomy → label), and final segmentation with rank-aware ensemble integration.

Medical image segmentation has seen remarkable advances with universal models like STU-Net and SAM-Med3D, yet no single model can cover the full diversity of clinical segmentation tasks across all modalities and anatomical targets. MedSegAgent takes a fundamentally different approach: instead of training one monolithic model, it orchestrates a library of specialized, dataset-specific models through a multi-agent system driven by natural language.

Given a free-form segmentation request such as "Please help me segment liver in this MR image", MedSegAgent parses the query to extract modality and target information, then performs a three-stage coarse-to-fine filtering: modality filtering narrows candidates from the full library, anatomy filtering identifies the relevant body region, and label selection pinpoints the exact segmentation target. The matched models are executed in parallel, and their outputs are integrated via a rank-aware ensemble strategy.

The current system integrates 23 datasets and supports 343 segmentation targets across CT, MRI, PET/CT, and ultrasound modalities. This architecture is inherently scalable: adding a new segmentation capability requires only registering a new dataset metadata entry and its trained model, with no retraining of the orchestration system.

Key Features

Universal & Scalable

Handles diverse medical image segmentation tasks through natural language instructions. Adding new modalities or targets requires only a JSON metadata entry — no retraining of the core system.

Precise Automation

Coarse-to-fine filtering (modality → anatomy → label) automatically selects the most suitable segmentation model from the library, without manual intervention.

Enhanced Robustness

Multi-model integration and rank-aware ensemble improve reliability. When multiple candidate models match a query, their outputs are combined to reduce individual model failures.

Supported Datasets (23 total)

Dataset	Modalities	Body Region	Representative Targets
TotalSegmentator v2	CT	Whole-body	117 structures (organs, vessels, bones, brain)
TotalSegmentator MRI	MRI	Whole-body	56 structures (organs, vessels, spine, muscles)
AutoPET	PET/CT	Whole-body	Whole-body tumor sites
SegRap2023	CT	Head & neck	45 OAR structures, GTVp, GTVnd
BraTS21	MRI	Head & neck	Whole tumor, tumor core, enhancing tumor
AMOS22	MRI, CT	Abdomen	15 abdominal and pelvic structures
MM-WHS	MRI, CT	Heart	Cardiac chambers, myocardium, great vessels
KiTS23	CT	Abdomen	Kidneys, renal tumors, renal cysts
+ 15 more datasets covering thorax, abdomen, head & neck regions…

Conclusion

MedSegAgent demonstrates that multi-agent orchestration offers a practical and scalable alternative to training ever-larger monolithic segmentation models. By decoupling language understanding from segmentation execution, it turns the growing ecosystem of specialized medical models into a unified, language-driven segmentation service. The system currently supports 23 datasets and 343 targets, and is designed so that every new trained model immediately expands the system's capabilities without retraining.

Key Contributions

Proposed MedSegAgent, the first multi-agent system for instructive medical image segmentation driven by natural language, integrating 23 datasets and 343 segmentation targets.
Designed a coarse-to-fine dataset matching pipeline (modality → anatomy → label) that automatically selects the best segmentation model for any given query.
Introduced rank-aware ensemble integration that combines outputs from multiple matched models to improve segmentation robustness and reliability.
Built an extensible architecture where new segmentation capabilities can be added via a single JSON metadata entry, requiring no retraining of the orchestration system.

Authors

Ziyan Huang, Haoyu Wang, Jin Ye, Yuanfeng Ji, Xiaowei Hu, Lihao Liu, Zhikai Yang, Wei Li, Ming Hu, Yanzhou Su, Tianbin Li, Yun Gu, Shaoting Zhang, Yu Qiao, Lixu Gu, Junjun He

IEEE Journal of Biomedical and Health Informatics (JBHI), 2026

GitHub Repository IEEE Xplore Paper ← Back to Projects