【组学计算 pyradiomics】可算是玩明白了!!!

原创已于 2023-11-30 12:01:16 修改 · 1.2k 阅读

16 ·

本内容遵循CC 4.0 BY-SA版权协议

GEO检测

标签

#python #目标检测 #人工智能 #数据挖掘

于 2023-11-29 19:33:52 首次发布

本文介绍了如何使用PyRadiomics进行医学图像特征提取，包括通过Docker部署、镜像拉取和使用docker-compose进行容器化运行。作者分享了解决计算速度慢和输入处理的问题，以及不同模式下的使用方法和收获。

放一下源码:pyradiomics 感兴趣自己研究一下，还是有很多收获的

介绍一下特征提取

网络盗用一张图，不知道是哪位大哥大姐的，如有冒犯，请不要怪罪小的。评论后，我放上就是了～～～嘤嘤嘤
好像是这个，我也没去考究最终来源，先放一个参考：

(Visikis D et al. Artificial intelligence, machine (deep) learningand radio(geno)mics: definitions and nuclear medicine imaging applications. EurJ.Nucl Med Mol Imaging, 2019;46(13):2630-37.)

在这里插入图片描述

计算可以用两种方式：

１镜像方式

1.1 拉取一个官方镜像

docker pull radiomics/pyradiomics:latest

简单写一个docker-compose.yml 运行一下radiomics

docer-compose 内容如下：

version: "2.3"
services:
  radiomics:
    image: radiomics/pyradiomics:latest
    build:
      context: .
    volumes:
      - /media/deepocean/Data/DICOMS:/media/deepocean/Data/DICOMS　# 挂载你自己存储数据的volume
    ports:
      - 5000:5000
    environment:
      - TZ=Asia/Shanghai

之后启动容器并进入容器

# 启动
docker-compose up -d
# 进入
docker exec -it {container_id} bash
# 进入存储数据的目录下执行一个实例
(base) jovyan@fc56f1d0c86b:/media/deepocean/Data/DICOMS/demos/ct_heart$ 
pyradiomics image.nrrd mask.nrrd -p /media/deepocean/Data/DICOMS/demos/Projects/imageProcessing/utils/Params.yaml -j 3 -o ./result.json --log-file="./radio.log" --label=1 --logging-level=DEBUG

运行结束后，我的目录结构如下

在这里插入图片描述

其他调用方式请参考下面的参数说明

我尝试的是single mode 方式，batch　mode 方式没有尝试，后面会在这里补充一下，你也可以自己去看下官方例子看下实现哦～

(base) jovyan@fc56f1d0c86b:/media/deepocean/Data/DICOMS/demos/ct_heart$ pyradiomics -h
usage: pyradiomics image|batch [mask] [Options]

optional arguments:
  -h, --help            show this help message and exit
  --label N, -l N       (DEPRECATED) Value of label in mask to use for
                        feature extraction.
  --version             Print version and exit

Input:
  Input files and arguments defining the extraction:
  - image and mask files (single mode) or CSV-file specifying them (batch mode)
  - Parameter file (.yml/.yaml or .json)
  - Overrides for customization type 3 ("settings")
  - Multi-threaded batch processing

  {Image,Batch}FILE     Image file (single mode) or CSV batch file (batch mode)
  MaskFILE              Mask file identifying the ROI in the Image. 
                        Only required when in single mode, ignored otherwise.
  --param FILE, -p FILE
                        Parameter file containing the settings to be used in extraction
  --setting "SETTING_NAME:VALUE", -s "SETTING_NAME:VALUE"
                        Additional parameters which will override those in the
                        parameter file and/or the default settings. Multiple
                        settings possible. N.B. Only works for customization
                        type 3 ("setting").
  --jobs N, -j N        (Batch mode only) Specifies the number of threads to use for
                        parallel processing. This is applied at the case level;
                        i.e. 1 thread per case. Actual number of workers used is
                        min(cases, jobs).
  --validate            If specified, check if input is valid and check if file locations point to exisiting files

Output:
  Arguments controlling output redirection and the formatting of calculated results.

  --out FILE, -o FILE   File to append output to.
  --out-dir OUT_DIR, -od OUT_DIR
                        Directory to store output. If specified in segment mode, this writes csv output for each processed case. In voxel mode, this directory is used to store the featuremaps. If not specified in voxel mode, the current working directory is used instead.
  --mode {segment,voxel}, -m {segment,voxel}
                        Extraction mode for PyRadiomics.
  --skip-nans           Add this argument to skip returning features that have an
                        invalid result (NaN)
  --format {csv,json,txt}, -f {csv,json,txt}
                        Format for the output.
                        "txt" (Default): one feature per line in format "case-N_name:value"
                        "json": Features are written in a JSON format dictionary
                        (1 dictionary per case, 1 case per line) "{name:value}"
                        "csv": one row of feature names, followed by one row of
                        feature values per case.
  --format-path {absolute,relative,basename}
                        Controls input image and mask path formatting in the output.
                        "absolute" (Default): Absolute file paths.
                        "relative": File paths relative to current working directory.
                        "basename": Only stores filename.
  --unix-path, -up      If specified, ensures that all paths in the output
                        use unix-style path separators ("/").

Logging:
  Controls the (amount of) logging output to the console and the (optional) log-file.

  --logging-level LEVEL
                        Set capture level for logging
  --log-file FILE       File to append logger output to
  --verbosity [{1,2,3,4,5}], -v [{1,2,3,4,5}]
                        Regulate output to stderr. By default [3], level
                        WARNING and up are printed. By specifying this
                        argument without a value, level INFO [4] is assumed.
                        A higher value results in more verbose output.

2脚本方式

完整代码

import SimpleITK as sitk
import os
from loguru import logger
import numpy as np

root = "" # 放一个你自己的文件夹根目录

from radiomics import (featureextractor, imageoperations, firstorder, glcm, gldm, glrlm, glszm,
                       ngtdm, shape, shape2D)
PARAMS = os.path.join("Params.yaml")

origin_img_path = os.path.join(root, "1.2.392.200036.9116.2.6.1.44063.1796265406.1656894518.71296.nii.gz")

img_path = os.path.join(root, "1.2.392.200036.9116.2.6.1.44063.1796265406.1656894518.71297_5.12_PE-CTA_SureStar_20220704092609_4.nii")
image = sitk.ReadImage(img_path)
origin_image = sitk.ReadImage(origin_img_path)

print(image.GetOrigin(), origin_image.GetOrigin())

def deal_img_mask():
    origin_image_np = sitk.GetArrayFromImage(origin_image)
    mask_np = np.zeros_like(origin_image_np).astype("uint8")
    logger.warning(mask_np.shape) #, z,y,x
    mask_np[5:100, 5:100, 10:100] = 1
    mask = sitk.GetImageFromArray(mask_np)
    mask.CopyInformation(origin_image)
    # 获取mask　roi 范围
    
    new_image_np = np.zeros_like(mask_np).astype('int16')
    new_image_np[5:10,:,:] = origin_image_np[5:10,:,:]
    new_image = sitk.GetImageFromArray(new_image_np)
    new_image.CopyInformation(origin_image)
    
    # sitk.WriteImage(origin_image, os.path.join(root, "image.nrrd"), True)
    # sitk.WriteImage(mask, os.path.join(root, "mask.nrrd"), True)
    # 截取nii大小

    # 获取同样的mask大小
    return origin_image, mask

def cal_radiomics(image,mask, label=1):

    bb, correctedMask = imageoperations.checkMask(image, mask)
    image, mask = imageoperations.cropToTumorMask(image, mask, bb, padDistance=2)
    extractor = featureextractor.RadiomicsFeatureExtractor(PARAMS)
    logger.info(f"Extraction parameters:\n\t, {extractor.settings}")
    # 原图，　平方，　梯度，指数，高斯拉普拉斯，小波，对数　每个内层dict里面还可以传递公式的其他因子
    # refer to: https://pyradiomics.readthedocs.io/en/v3.0.1/customization.html

    # Extract features
    result = extractor.execute(image, mask, label=label)
    logger.warning(len(result))
    # 过滤只返回array 结果数值
    result = {
        k: v.tolist() if isinstance(v, np.ndarray) else v for k, v in result.items()
    }
    return result

img, mask = deal_img_mask()
cal_radiomics(img, mask)

遇到的问题

pyradiomics 官方提供了两种处理方式，一种是支持单个image 和单个label mask 的，还有一种是支持多个image 和　label mask 的，后者的输入是一个csv 文件，里面的格式可以参考官方示例

1、计算很慢

这个问题困扰了我好久，后来发现是后处理了一下，根据image 和　mask 切了一个长方体出来，减少数据大小，计算就快了。。。
那我在传入之前我直接处理行么？　行！　
我尝试提前切出一个长方体输入了，但是发现物理信息没有更新，导致计算失败。所以提前切记得要重新计算新的长方体的物理信息
如果不想自己切，记得调用　imageoperations.checkMask 和　imageoperations.cropToTumorMask,　具体，参考源码这里哈　imageoperations

2、关于输入：
输入可以是一个路径或者读好后的sitk.Image.

	官方给的例子都是nrrd 文件，但是其实nii.gz 我觉得也可以，性能没啥变化。
	
	single  mode 模式: 处理单个image 和 mask
	batch mode 模式: 处理多个image 和 mask , 采用一个csv文件，借助pandas 或者其他库来读取获得多个文件，这个我只是尝试了一下官方的例子，没有用自己的数据跑，后面如果尝试了，会在这里补充一下。
	
	2.1 
		nrrd 或者　nii.gz  文件的path，或者其他sitk 可读的数据格式都可以.
	2.2 
		sitk.Image, 也就是上述文件类型，我们读好后再作为输入传进去.