【万字长文】nnUnetV2环境搭建与模型训练推理教程

原创已于 2025-10-25 20:04:53 修改 · 313 阅读

2 GEO检测

标签

#nnUnetV2 #nnUnet

于 2025-10-25 19:53:19 首次发布

Python从0到1 同时被 3 个专栏收录

15 篇文章 ¥19.90 ¥99.00

订阅专栏

深度学习

11 篇文章

订阅专栏

环境安装

4 篇文章

订阅专栏

欢迎讨论交流，可通过官方在本博客最后提供的微信名片与我联系

1-环境搭建

https://github.com/MIC-DKFZ/nnUNet/blob/master/documentation/installation_instructions.md

1.1 基础要求

类别	要求与说明
操作系统	支持 Linux (Ubuntu 18.04/20.04/22.04; CentOS, RHEL)、Windows 和 macOS
设备支持	支持 GPU（推荐）、CPU 和 Apple M1/M2（当前 Apple MPS 未实现 3D 卷积，因此在这些设备上可能需要使用 CPU）。

1.2 硬件需求

训练 (Training)

组件	推荐配置
设备	推荐使用 GPU，CPU 或 MPS (Apple M1/M2) 训练耗时极长。
GPU	至少 10 GB 显存（常见型号：RTX 2080Ti、RTX 3080/3090 或 RTX 4080/4090）。
CPU	需要高性能 CPU，至少 6 核（12 线程）。需求与数据增强、输入通道数和目标结构数量相关，GPU 越快，CPU 也应越强。

推理 (Inference)

组件	推荐配置
设备	推荐使用 GPU，速度远快于其他选项，但 CPU 和 MPS 也可用。
GPU	至少 4 GB 可用显存。

1.3 环境安装

python3.10，pytorch 2.1.2+cu118，torchvison 0.16.2+cu118，然后安装nnUnet和hiddenlayer

unzip batchgeneratorsv2-0.3.0.zip && cd batchgeneratorsv2-0.3.0 && pip install .
unzip nnUNet-master.zip && cd nnUNet-master && pip install .
unzip hiddenlayer-more_plotted_details.zip && cd hiddenlayer-more_plotted_details &&  pip install .
# https://github.com/MIC-DKFZ/batchgeneratorsv2
# pip install --upgrade git+https://github.com/FabianIsensee/hiddenlayer.git@more_plotted_details#egg=hiddenlayer

注意：如果hiddenlayer在线安装困难，可以下载zip包进行安装

1.4 代码修改

修改1：https://github.com/MIC-DKFZ/nnUNet/issues/2742

修改2：

https://github.com/MIC-DKFZ/nnUNet/issues/2735

# vim /root/miniconda3/lib/python3.10/site-packages/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py
from torch.cuda.amp import GradScaler
# 163行
@https://github.com/MIC-DKFZ/nnUNet/blob/master/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py
self.grad_scaler = GradScaler() if self.device.type == 'cuda' else None

2-数据转化

https://github.com/MIC-DKFZ/nnUNet/blob/master/documentation/dataset_format.md

数据转化包括：1-数据的目录结构 2-生成dataset.json 3-环境变量设置 4-数据预处理

2.1 数据目录结构转换

https://github.com/MIC-DKFZ/nnUNet/blob/master/documentation/setting_up_paths.md

nnUNet_raw：为数据集创建一个子文件夹，命名为 DatasetXXX_YYY，其中 XXX 是一个三位数的标识符（例如 001, 002, 043, 999, ...），YYY 是（唯一的）数据集名称。

示例目录结构：

如上图所示：新建三个文件夹，nnUNet_raw、nnUNet_preprocessed、nnUNet_results：

cd /root/autodl-tmp && mkdir nnUNetV2
cd nnUNetV2 && mkdir nnUNet_raw && mkdir nnUNet_preprocessed && mkdir nnUNet_results

最外层：/root/autodl-tmp/nnUNetV2
nnUNet_raw/Dataset001_NAME1
├── dataset.json
├── imagesTr
│   ├── ...
├── imagesTs
│   ├── ...
└── labelsTr
    ├── ...
nnUNet_raw/Dataset002_NAME2
├── dataset.json
├── imagesTr
│   ├── ...
├── imagesTs
│   ├── ...
└── labelsTr
    ├── ...

nnUNet_preprocessed：这是存放预处理后数据的文件夹。在训练期间，程序也会从此文件夹读取数据。
nnUNet_results：此变量指定 nnU-Net 保存模型权重的位置。如果下载了预训练模型，它们也会保存在这里。

目录结构下的训练输入和GT文件如下：

├── dataset.json #下一步讲解
├── imagesTr
│   ├── la_003_0000.nii.gz
│   ├── la_004_0000.nii.gz
│   ├── ...
├── imagesTs
│   ├── la_001_0000.nii.gz
│   ├── la_002_0000.nii.gz
│   ├── ...
└── labelsTr
    ├── la_003.nii.gz
    ├── la_004.nii.gz
    ├── ...

imagesTr：包含属于训练样本的图像。nnUNet 将使用这些数据进行流程配置、带

交叉验证的训练，以及寻找后处理方法和最佳集成模型。

imagesTs (可选) 包含属于测试样本的图像，在推理阶段使用。
labelsTr：包含训练样本的真实分割图（ground truth）。
dataset.json 包含数据集的元数据。

注意imagesTr下面可能是多个模态，所以出现la_003_0000.nii.gz、la_003_0001.nii.gz等，但是对应labelsTr下面只有la_003.nii.gz。包括后面生成json时名称也只需要写la_003.nii.gz

2.2 生成dataset.json

作者给的参考脚本是https://github.com/MIC-DKFZ/nnUNet/blob/master/nnunetv2/dataset_conversion/generate_dataset_json.py

稍作修改后的代码如下，针对数据多个模态:

from typing import Tuple, Union, List
from batchgenerators.utilities.file_and_folder_operations import save_json, join


def generate_dataset_json(output_folder: str,
                          channel_names: dict,
                          labels: dict,
                          num_training_cases: int,
                          file_ending: str,
                          citation: Union[List[str], str] = None,
                          regions_class_order: Tuple[int, ...] = None,
                          dataset_name: str = None,
                          reference: str = None,
                          release: str = None,
                          description: str = None,
                          overwrite_image_reader_writer: str = None,
                          license: str = 'Whoever converted this dataset was lazy and didn\'t look it up!',
                          converted_by: str = "Please enter your name, especially when sharing datasets with others in a common infrastructure!",
                          **kwargs):
    """
    Generates a dataset.json file in the output folder

    channel_names:
        Channel names must map the index to the name of the channel, example:
        {
            0: 'T1',
            1: 'CT'
        }
        Note that the channel names may influence the normalization scheme!! Learn more in the documentation.

    labels:
        This will tell nnU-Net what labels to expect. Important: This will also determine whether you use region-based training or not.
        Example regular labels:
        {
            'background': 0,
            'left atrium': 1,
            'some other label': 2
        }
        Example region-based training:
        {
            'background': 0,
            'whole tumor': (1, 2, 3),
            'tumor core': (2, 3),
            'enhancing tumor': 3
        }

        Remember that nnU-Net expects consecutive values for labels! nnU-Net also expects 0 to be background!

    num_training_cases: is used to double check all cases are there!

    file_ending: needed for finding the files correctly. IMPORTANT! File endings must match between images and
    segmentations!

    dataset_name, reference, release, license, description: self-explanatory and not used by nnU-Net. Just for
    completeness and as a reminder that these would be great!

    overwrite_image_reader_writer: If you need a special IO class for your dataset you can derive it from
    BaseReaderWriter, place it into nnunet.imageio and reference it here by name

    kwargs: whatever you put here will be placed in the dataset.json as well

    """
    has_regions: bool = any([isinstance(i, (tuple, list)) and len(i) > 1 for i in labels.values()])
    if has_regions:
        assert regions_class_order is not None, f"You have defined regions but regions_class_order is not set. " \
                                                f"You need that."
    # channel names need strings as keys
    keys = list(channel_names.keys())
    for k in keys:
        if not isinstance(k, str):
            channel_names[str(k)] = channel_names[k]
            del channel_names[k]

    # labels need ints as values
    for l in labels.keys():
        value = labels[l]
        if isinstance(value, (tuple, list)):
            value = tuple([int(i) for i in value])
            labels[l] = value
        else:
            labels[l] = int(labels[l])

    dataset_json = {
        'channel_names': channel_names,  # previously this was called 'modality'. I didn't like this so this is
        # channel_names now. Live with it.
        'labels': labels,
        'numTraining': num_training_cases,
        'file_ending': file_ending,
        'licence': license,
        'converted_by': converted_by
    }

    if dataset_name is not None:
        dataset_json['name'] = dataset_name
    if reference is not None:
        dataset_json['reference'] = reference
    if release is not None:
        dataset_json['release'] = release
    if citation is not None:
        dataset_json['citation'] = release
    if description is not None:
        dataset_json['description'] = description
    if overwrite_image_reader_writer is not None:
        dataset_json['overwrite_image_reader_writer'] = overwrite_image_reader_writer
    if regions_class_order is not None:
        dataset_json['regions_class_order'] = regions_class_order

    dataset_json.update(kwargs)

    save_json(dataset_json, join(output_folder, 'dataset.json'), sort_keys=False)


output_folder="/root/autodl-tmp/nnUNetV2/nnUNet_raw/Dataset500_Lung"
channel_names={"0": "CT"}
labels={'background': 0, 'cancer': 1}
num_training_cases=10
file_ending=".nii.gz"
dataset_name = "Lung"
generate_dataset_json(output_folder,
                          channel_names,
                          labels,
                          num_training_cases,
                          file_ending,
                          dataset_name=dataset_name)

针对数据单个模态:

python generate_json_SMode.py

import os
from typing import Tuple, Union, List
from batchgenerators.utilities.file_and_folder_operations import save_json, join, subfiles, maybe_mkdir_p

def generate_dataset_json(output_folder: str,
                          channel_names: dict,
                          labels: dict,
                          num_training_cases: int,
                          file_ending: str,
                          citation: Union[List[str], str] = None,
                          regions_class_order: Tuple[int, ...] = None,
                          dataset_name: str = None,
                          reference: str = None,
                          release: str = None,
                          description: str = None,
                          overwrite_image_reader_writer: str = None,
                          license: str = 'Whoever converted this dataset was lazy and didn\'t look it up!',
                          converted_by: str = "Please enter your name, especially when sharing datasets with others in a common infrastructure!",
                          **kwargs):
    """
    (函数文档字符串保持不变...)
    """
    # (函数内部逻辑保持不变)
    has_regions: bool = any([isinstance(i, (tuple, list)) and len(i) > 1 for i in labels.values()])
    if has_regions:
        assert regions_class_order is not None, f"You have defined regions but regions_class_order is not set. You need that."
    keys = list(channel_names.keys())
    for k in keys:
        if not isinstance(k, str):
            channel_names[str(k)] = channel_names[k]
            del channel_names[k]
    for l in labels.keys():
        value = labels[l]
        if isinstance(value, (tuple, list)):
            value = tuple([int(i) for i in value])
            labels[l] = value
        else:
            labels[l] = int(labels[l])
    dataset_json = {
        'channel_names': channel_names,
        'labels': labels,
        'numTraining': num_training_cases,
        'file_ending': file_ending,
        'licence': license,
        'converted_by': converted_by
    }
    if dataset_name is not None:
        dataset_json['name'] = dataset_name
    if reference is not None:
        dataset_json['reference'] = reference
    if release is not None:
        dataset_json['release'] = release
    if citation is not None:
        dataset_json['citation'] = release
    if description is not None:
        dataset_json['description'] = description
    if overwrite_image_reader_writer is not None:
        dataset_json['overwrite_image_reader_writer'] = overwrite_image_reader_writer
    if regions_class_order is not None:
        dataset_json['regions_class_order'] = regions_class_order
    dataset_json.update(kwargs)
    save_json(dataset_json, join(output_folder, 'dataset.json'), sort_keys=False)


if __name__ == '__main__':
    # --- 1. 设置你的路径和数据集信息 ---
    # !! 请确保这里的路径是正确的 !!
    output_folder = "/root/autodl-tmp/nnUNetV2/nnUNet_raw/Dataset500_Lung"
    imagesTr_folder = join(output_folder, "imagesTr")
    labelsTr_folder = join(output_folder, "labelsTr")
    
    channel_names = {"0": "CT"}
    labels = {'background': 0, 'cancer': 1}
    file_ending = ".nii.gz"
    dataset_name = "Lung"


    # --- 2. 扫描文件并构建数据集字典 ---
    dataset_dict = {}
    
    # 扫描labelsTr文件夹中的所有标签文件
    label_files = subfiles(labelsTr_folder, suffix=file_ending, join=False)
    
    for label_filename in label_files:
        # 从标签文件名中提取样本标识符 (例如从 "lung_001.nii.gz" 提取 "lung_001")
        identifier = label_filename[:-len(file_ending)]
        
        # 构造对应的图像文件名
        image_filename = f"{identifier}{file_ending}"
        
        # 检查对应的图像文件是否存在于imagesTr文件夹中
        if os.path.exists(join(imagesTr_folder, image_filename)):
            # 构建图像和标签的相对路径
            image_path = f"./imagesTr/{image_filename}"
            label_path = f"./labelsTr/{label_filename}"
            
            # 将该样本添加到字典中
            dataset_dict[identifier] = {
                "images": [image_path],  # 即使只有一个图像，也需要放在列表中
                "label": label_path
            }

    num_training_cases = len(dataset_dict)

    # --- 3. 调用函数生成最终的 dataset.json 文件 ---
    generate_dataset_json(output_folder,
                          channel_names,
                          labels,
                          num_training_cases,
                          file_ending,
                          dataset_name=dataset_name,
                          dataset=dataset_dict) # 将构建好的字典传入

    print(f"成功在 '{output_folder}' 文件夹中生成了 dataset.json 文件。")
    print(f"共找到 {num_training_cases} 个匹配的训练样本。")

2.3 环境变量设置

https://github.com/MIC-DKFZ/nnUNet/blob/master/documentation/set_environment_variables.md

添加数据集相关的环境变量：

export nnUNet_raw="/root/autodl-tmp/nnUNetV2/nnUNet_raw"
export nnUNet_preprocessed="/root/autodl-tmp/nnUNetV2/nnUNet_preprocessed"
export nnUNet_results="/root/autodl-tmp/nnUNetV2/nnUNet_results"

或者修改.bashrc

vim ~/.bashrc
# 添加环境变量
source ~/.bashrc

2.4 数据预处理

数据处理前,目录结构如下：

├── nnUNetV2
│       ├── nnUNet_preprocessed
│       │   └── Dataset500_Lung #该文件夹下的具体结构如2.1
│       ├── nnUNet_raw
│       │   ├── nnUNet_cropped_data
│       │   └── nnUNet_raw_data

执行命令，注意这里的数字500就是数据集task后面的数字(Dataset500_Lung)

nnUNetv2_plan_and_preprocess -d 500  --verify_dataset_integrity

注意：如果报错如下，则表明内存不够，将数据处理的进程降为1同时单独处理不同的数据，则命令行改为：

nnUNetv2_plan_and_preprocess -d 500 -c 2d -np 1 
nnUNetv2_plan_and_preprocess -d 500 -c 3d_fullres -np 1 
nnUNetv2_plan_and_preprocess -d 500 -c 3d_lowres -np 1

数据预处理完log输出

数据处理大约15分钟

文件大小

原始数据1.26G，预处理后的文件共3.2G

3-模型训练

训练前：更改epoch,1000-->500[1]

https://github.com/MIC-DKFZ/nnUNet/blob/master/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py

训练验证split信息位于：nnUNetV2/nnUNet_preprocessed/Dataset500_Lung/splits_final.json

3.1 3D fullres训练

3.1.1 训练命令

Fold=0[1,2,3,4]

500来自Dataset500_Lung，后面0代表K折的第i次训练，默认5折，K取值[0,4]

[1,2] [3,4] [5,6] [7,8] [9,10]

K=5,

i=0, [1,2,3,4,5,6,7,8]训练，[9,10]验证

i=1, [1,2,3,4,5,6,9,10]训练，[7,8]验证

i=2, [1,2,3,4,7,8, 9,10]训练，[5,6]验证

i=3, [1,2,5,6,7,8, 9,10]训练，[3,4]验证

i=4, [3,4,5,6,7,8, 9,10]训练，[1,2]验证

nnUNetv2_train 500 3d_fullres 0  --npz
nnUNetv2_train 500 3d_fullres 1  --npz
nnUNetv2_train 500 3d_fullres 2 --npz
nnUNetv2_train 500 3d_fullres 3  --npz
nnUNetv2_train 500 3d_fullres 4  --npz

显存占用9G

nnUNetv2_train 500 3d_fullres 0 --npz

############################
INFO: You are using the old nnU-Net default plans. We have updated our recommendations. Please consider using those instead! Read more here: https://github.com/MIC-DKFZ/nnUNet/blob/master/documentation/resenc_p
############################

Using device: cuda:0

#######################################################################
Please cite the following paper when using nnU-Net:
Isensee, F., Jaeger, P. F., Kohl, S. A., Petersen, J., & Maier-Hein, K. H. (2021). nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods, 18(2), 203-211.
#######################################################################

2025-09-06 20:08:11.702268: Using torch.compile...
2025-09-06 20:08:13.751839: do_dummy_2d_data_aug: False
2025-09-06 20:08:13.752347: Using splits from existing split file: /root/autodl-tmp/nnUNetV2/nnUNet_preprocessed/Dataset500_Lung/splits_final.json
2025-09-06 20:08:13.752525: The split file contains 5 splits.
2025-09-06 20:08:13.752571: Desired fold for training: 0
2025-09-06 20:08:13.752606: This split has 8 training and 2 validation cases.
using pin_memory on device 0
using pin_memory on device 0

This is the configuration used by this training:
Configuration name: 3d_fullres
 {'data_identifier': 'nnUNetPlans_3d_fullres', 'preprocessor_name': 'DefaultPreprocessor', 'batch_size': 2, 'patch_size': [96, 160, 160], 'median_image_size_in_voxels': [277.0, 512.0, 512.0], 'spacing': [1.2449731, 0.7988280057907104, 0.7988280057907104], 'normalization_schemes': ['CTNormalization'], 'use_mask_for_norm': [False], 'resampling_fn_data': 'resample_data_or_seg_to_shape', 'resampling_fn_seg': 'resample_da_to_shape', 'resampling_fn_data_kwargs': {'is_seg': False, 'order': 3, 'order_z': 0, 'force_separate_z': None}, 'resampling_fn_seg_kwargs': {'is_seg': True, 'order': 1, 'order_z': 0, 'force_separate_z': None}, ng_fn_probabilities': 'resample_data_or_seg_to_shape', 'resampling_fn_probabilities_kwargs': {'is_seg': False, 'order': 1, 'order_z': 0, 'force_separate_z': None}, 'architecture': {'network_class_name': 'dynami_architectures.architectures.unet.PlainConvUNet', 'arch_kwargs': {'n_stages': 6, 'features_per_stage': [32, 64, 128, 256, 320, 320], 'conv_op': 'torch.nn.modules.conv.Conv3d', 'kernel_sizes': [[3, 3, 3], [3, 3,3, 3], [3, 3, 3], [3, 3, 3], [3, 3, 3]], 'strides': [[1, 1, 1], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [1, 2, 2]], 'n_conv_per_stage': [2, 2, 2, 2, 2, 2], 'n_conv_per_stage_decoder': [2, 2, 2, 2, 2], 'convrue, 'norm_op': 'torch.nn.modules.instancenorm.InstanceNorm3d', 'norm_op_kwargs': {'eps': 1e-05, 'affine': True}, 'dropout_op': None, 'dropout_op_kwargs': None, 'nonlin': 'torch.nn.LeakyReLU', 'nonlin_kwargs': ': True}}, '_kw_requires_import': ['conv_op', 'norm_op', 'dropout_op', 'nonlin']}, 'batch_dice': True}

These are the global plan.json settings:
 {'dataset_name': 'Dataset500_Lung', 'plans_name': 'nnUNetPlans', 'original_median_spacing_after_transp': [1.2449942827224731, 0.7988280057907104, 0.7988280057907104], 'original_median_shape_after_transp': [2942], 'image_reader_writer': 'SimpleITKIO', 'transpose_forward': [0, 1, 2], 'transpose_backward': [0, 1, 2], 'experiment_planner_used': 'ExperimentPlanner', 'label_manager': 'LabelManager', 'foreground_intensity_s_per_channel': {'0': {'max': 391.0, 'mean': -122.59545135498047, 'median': -2.0, 'min': -1024.0, 'percentile_00_5': -905.0, 'percentile_99_5': 197.0, 'std': 254.0592803955078}}}

2025-09-06 20:08:16.383001: Unable to plot network architecture: nnUNet_compile is enabled!
2025-09-06 20:08:16.423777:
2025-09-06 20:08:16.424186: Epoch 0
2025-09-06 20:08:16.424435: Current learning rate: 0.01
2025-09-06 20:10:34.441466: train_loss 0.0519
2025-09-06 20:10:34.442359: val_loss -0.0573
2025-09-06 20:10:34.442469: Pseudo dice [0.0]
2025-09-06 20:10:34.442565: Epoch time: 138.02 s
2025-09-06 20:10:34.442641: Yayy! New best EMA pseudo Dice: 0.0
2025-09-06 20:10:37.455573:
2025-09-06 20:10:37.455887: Epoch 1
2025-09-06 20:10:37.456042: Current learning rate: 0.0091
2025-09-06 20:11:32.261170: train_loss -0.1501
2025-09-06 20:11:32.261490: val_loss -0.3403
2025-09-06 20:11:32.261577: Pseudo dice [0.2837]
2025-09-06 20:11:32.261677: Epoch time: 54.81 s
2025-09-06 20:11:32.262234: Yayy! New best EMA pseudo Dice: 0.0284
2025-09-06 20:11:34.496846:
2025-09-06 20:11:34.497135: Epoch 2
2025-09-06 20:11:34.497264: Current learning rate: 0.00818
2025-09-06 20:12:29.591384: train_loss -0.4005
2025-09-06 20:12:29.591805: val_loss -0.587
2025-09-06 20:12:29.591911: Pseudo dice [0.7296]
2025-09-06 20:12:29.592026: Epoch time: 55.1 s
2025-09-06 20:12:29.592102: Yayy! New best EMA pseudo Dice: 0.0985
2025-09-06 20:12:32.079641:
2025-09-06 20:12:32.080288: Epoch 3
2025-09-06 20:12:32.080418: Current learning rate: 0.00725
2025-09-06 20:13:27.075207: train_loss -0.4856
2025-09-06 20:13:27.075572: val_loss -0.5756
2025-09-06 20:13:27.075674: Pseudo dice [0.6417]
2025-09-06 20:13:27.075785: Epoch time: 55.0 s
2025-09-06 20:13:27.075871: Yayy! New best EMA pseudo Dice: 0.1528
2025-09-06 20:13:29.470544:
2025-09-06 20:13:29.471013: Epoch 4
2025-09-06 20:13:29.471315: Current learning rate: 0.00631
2025-09-06 20:14:24.554845: train_loss -0.5572
2025-09-06 20:14:24.555499: val_loss -0.7106
2025-09-06 20:14:24.555609: Pseudo dice [0.7982]
2025-09-06 20:14:24.555726: Epoch time: 55.09 s
2025-09-06 20:14:24.555808: Yayy! New best EMA pseudo Dice: 0.2174
2025-09-06 20:14:27.035247:
2025-09-06 20:14:27.035626: Epoch 5
2025-09-06 20:14:27.035763: Current learning rate: 0.00536
2025-09-06 20:15:29.180974: train_loss -0.6041
2025-09-06 20:15:29.181327: val_loss -0.5621
2025-09-06 20:15:29.181425: Pseudo dice [0.5705]
2025-09-06 20:15:29.181531: Epoch time: 62.15 s
2025-09-06 20:15:29.181607: Yayy! New best EMA pseudo Dice: 0.2527
2025-09-06 20:15:31.479853:
2025-09-06 20:15:31.480425: Epoch 6
2025-09-06 20:15:31.480557: Current learning rate: 0.00438
2025-09-06 20:16:52.389704: train_loss -0.614
2025-09-06 20:16:52.390165: val_loss -0.5788
2025-09-06 20:16:52.390282: Pseudo dice [0.7144]
2025-09-06 20:16:52.390401: Epoch time: 80.91 s
2025-09-06 20:16:52.390483: Yayy! New best EMA pseudo Dice: 0.2989
2025-09-06 20:16:54.800908:
2025-09-06 20:16:54.801170: Epoch 7
2025-09-06 20:16:54.801300: Current learning rate: 0.00338
2025-09-06 20:18:18.236024: train_loss -0.6473
2025-09-06 20:18:18.236401: val_loss -0.636
2025-09-06 20:18:18.236505: Pseudo dice [0.7225]
2025-09-06 20:18:18.236631: Epoch time: 83.44 s
2025-09-06 20:18:18.236715: Yayy! New best EMA pseudo Dice: 0.3412
2025-09-06 20:18:20.692332:
2025-09-06 20:18:20.692897: Epoch 8
2025-09-06 20:18:20.693097: Current learning rate: 0.00235
2025-09-06 20:20:09.441580: train_loss -0.6382
2025-09-06 20:20:09.442209: val_loss -0.6178
2025-09-06 20:20:09.442320: Pseudo dice [0.6854]
2025-09-06 20:20:09.442439: Epoch time: 108.75 s
2025-09-06 20:20:09.442522: Yayy! New best EMA pseudo Dice: 0.3756
2025-09-06 20:20:11.829999:
2025-09-06 20:20:11.830412: Epoch 9
2025-09-06 20:20:11.830566: Current learning rate: 0.00126
2025-09-06 20:22:00.151048: train_loss -0.6891
2025-09-06 20:22:00.151394: val_loss -0.7709
2025-09-06 20:22:00.151526: Pseudo dice [0.9049]
2025-09-06 20:22:00.151630: Epoch time: 108.32 s
2025-09-06 20:22:00.151708: Yayy! New best EMA pseudo Dice: 0.4286
2025-09-06 20:22:03.420717: Training done.
2025-09-06 20:22:03.446447: Using splits from existing split file: /root/autodl-tmp/nnUNetV2/nnUNet_preprocessed/Dataset500_Lung/splits_final.json
2025-09-06 20:22:03.446766: The split file contains 5 splits.
2025-09-06 20:22:03.446835: Desired fold for training: 0
2025-09-06 20:22:03.446889: This split has 8 training and 2 validation cases.
2025-09-06 20:22:03.447048: predicting lung_001
2025-09-06 20:22:04.228852: lung_001, shape torch.Size([1, 244, 444, 444]), rank 0
2025-09-06 20:23:36.718254: predicting lung_014
2025-09-06 20:23:37.247420: lung_014, shape torch.Size([1, 296, 476, 476]), rank 0
2025-09-06 20:26:18.876790: Validation complete
2025-09-06 20:26:18.876952: Mean Validation Dice:  0.4860391693927178

3.2 2D 训练

3.1.2 训练命令

Fold=0[1,2,3,4]

nnUNetv2_train 500 2d 0 --npz

3.3 断点重训

在命令行后面加上--c

nnUNetv2_train 500 2d 0 --npz --c

4-模型推理

cd /root/autodl-tmp/nnUNetV2/nnUNet_raw/Dataset500_LUNG
nnUNetv2_predict -i imagesTr -o imagesTr_3d_fullres_output -c 3d_fullres -d 500
nnUNetv2_predict -i imagesTr -o imagesTr_2d_output -c 2d -d 500

附：FAQ

如果训练"卡住"，如何debug

修改代码https://github.com/MIC-DKFZ/nnUNet/blob/master/nnunetv2/training/nnUNetTrainer/nnUNetTrainer.py 第989行，在下面这句代码前后插入两行代码：print("The train step start!") 以及 print("The step finished!")

修改后的代码如下：

 with autocast(self.device.type, enabled=True) if self.device.type == 'cuda' else dummy_context():
    print("The train step start!")
    output = self.network(data)
    # del data
    l = self.loss(output, target)
    print("The step finished!")