UKBOB: A Massive MRI Dataset for Advancing Medical Image Segmentation

A Milestone in Medical Image Segmentation: UKBOB – A Dataset with Billions of MRI Masks

Medical imaging faces the challenge of collecting large amounts of high-quality, annotated data. Data privacy concerns, logistical hurdles, and high costs for manual labeling pose significant obstacles. A new, extensive dataset called UK Biobank Organs and Bones (UKBOB) promises to remedy this situation and open up new possibilities for the development of generalizable 3D segmentation models.

UKBOB: Scope and Creation

UKBOB is based on the UK Biobank MRI dataset and comprises an impressive 51,761 3D MRI scans, which translates to 17.9 million 2D images. The dataset contains over 1.37 billion 2D segmentation masks for 72 different organs. The creation of this massive dataset was made possible through automated procedures. A specially developed, automated cleaning process with organ-specific filters ensures the quality of the data. Additionally, a subset of 300 MRI scans was manually annotated with 11 abdominal classes (UKBOB-manual) to validate the accuracy of the automatic annotation.

Validation and Application

The generated masks were validated by testing zero-shot generalization. Models trained on the filtered UKBOB dataset were tested on smaller, manually annotated datasets from similar areas, such as abdominal MRI scans. The results confirmed the high quality of the UKBOB labels. To further minimize the influence of potentially noisy labels, a new method called Entropy Test-time Adaptation (ETTA) was developed, which refines the segmentation output.

Swin-BOB: A Powerful Foundation Model

Based on UKBOB, a foundation model called Swin-BOB was trained. This model is based on the Swin-UNetr architecture and achieved state-of-the-art results in various 3D medical imaging benchmarks. For example, Swin-BOB achieved a 0.4% improvement on the BRATS Brain MRI Tumor Challenge and a 1.3% improvement on the BTCV Abdominal CT Scan Benchmark.

Outlook and Availability

UKBOB represents a significant advance in the field of medical image segmentation. The dataset and pre-trained models offer researchers and developers new opportunities to develop innovative AI solutions for medical diagnostics and therapy. The filtered labels are to be made available through the UK Biobank. The code and pre-trained models are already available online.

Bibliography: Bourigault, E., Jamaludin, A., & Hamdi, A. (2025). UKBOB: One Billion MRI Labeled Masks for Generalizable 3D Medical Image Segmentation. arXiv preprint arXiv:2504.06908. https://arxiv.org/abs/2504.06908 https://arxiv.org/html/2504.06908v1 https://www.researchgate.net/publication/390638501_UKBOB_One_Billion_MRI_Labeled_Masks_for_Generalizable_3D_Medical_Image_Segmentation https://www.themoonlight.io/review/ukbob-one-billion-mri-labeled-masks-for-generalizable-3d-medical-image-segmentation https://www.youtube.com/watch?v=pxhqze2Gv5U https://www.reddit.com/r/ElvenAINews/comments/1jwgm3x/250406908_ukbob_one_billion_mri_labeled_masks_for/ https://www.researchgate.net/publication/377596529_Segment_anything_in_medical_images https://threedmedprint.biomedcentral.com/articles/10.1186/s41205-025-00254-1 https://www.themoonlight.io/zh/review/ukbob-one-billion-mri-labeled-masks-for-generalizable-3d-medical-image-segmentation