Pytorch 框架学习摘录（3）

原创已于 2025-12-04 12:23:12 修改 · 54 阅读

0 ·

本内容遵循CC 4.0 BY-SA版权协议

GEO检测

标签

#pytorch #学习 #人工智能 #python #深度学习

于 2025-11-17 15:48:04 首次发布

Python 同时被 3 个专栏收录

11 篇文章

订阅专栏

Pytorch

10 篇文章

订阅专栏

深度学习

4 篇文章

订阅专栏

一、图像数据输入

可以利用PIL、OpenCV、torchvision.io等库来读取输入图像。

PIL(Python Imaging Library)库:

支持图像存储，显示和处理，能够处理几乎所有的图片存储模式，包括28个与图片处理相关的模块或类；

PIL支持各种图像数据格式：JPEG、GIF、PNG、TIFF等9种数据格式；

PIL对每种格式有一个专门的表示和处理这种格式的类，如TIFF格式对应的类为：

Image.TiffImagePlugin.TiffImageFile 等等

读入的数据格式为： (宽、高、通道)---> (W,H,C)

PIL库中的Image类常见的操作有：

open(filename) : 加载图像文件

mode : 图像的色彩模式，包括 L(灰度图像)、RGB(彩色图像)等

size : 二元组、表示图像的宽度和高度、单位是像素

save(filename ,format) : 保存图像

convert(mode) : 将图像转换为新的模式(注：图像的模式定义了图像的颜色空间和像素数据的表示方式。通过 convert() 方法，你可以将图像从一种模式转换为另一种模式，比如从 RGB 转换为灰度（L），或从 RGBA 转换为 RGB 等等。) 常见的图像模式有： “1”：黑白图像，“L”：灰度图像、“P”：调色板图像，“RGB”：彩色图像、“RGBA”：彩色图像，带有 alpha 渐变通道（透明度）、“CMYK”: 彩色图像（青色、品红色、黄色和黑色四个通道）、“HSV”：HSV颜色空间，色相、饱和度、明度。

resize(size) : 将图像大小调整为 size 。

1.1PIL与张量的格式转换

使用torchvision包自带的transforms模块完成PIL与张量数据的格式转换。

torchvision.transforms模块包含了图像转换函数，这些函数可以作用于PIL对象和Tensor对象。常见的函数有：Resize、CenterCrop、RandomCrop、RandomHorizontalFlip、Normalize、ToTensor 、ColorJitter、RandomRotation、 RandomVerticalFlip、Compose(组合多个转换函数，实现链式转换)等。

常用transforms操作：

ToPILImage ：将数据格式为(C,H,W)的张量、或(H,W,C)的Numpy数组转换成PIL图像。

ToTensor: 将数值范围在[0,255]区间的PIL格式的图像、或数据格式为（H,W,C)的Numpy

数组转换成（C,H,W）、torch.float类型的张量，数值范围为[0.0,1.0]。

1.2 输入图像以PIL形式读入，返回张量：

import torch
from PIL import Image
import torchvision.transforms as transforms
loader = transforms.Compose([transforms.ToTensor()])
def image_loader(image_name):
    image = Image.open(image_name).convert(‘RGB’)
    image = loader(image).unsqueeze(0) # image 已转换为tensor张量;
    return image.to(device, torch.float) # 以指定数据类型输出至指定设备;

1.3输入张量转换成PIL图像：

import torch
from PIL import Image
import torchvision.transforms as transforms
unloader = transforms.ToPILImage()
def image_unloader(tensor):
    image = tensor.cpu().clone()
    image = image.squeeze(0)
    image = unloader(image)
    return image   #返回PIL图像

使用OPENCV库方法读入图像：

OpenCV（Open Source Computer Vision Library）是一个开源计算机视觉和机器学习软件库，具有C++、Python、Java、MATLAB接口，支持Windows，Linux，Android以及Mac操作系统。

import cv2
image = cv2.imread(image_name)
resize_image = cv2.resize(image,(224,224))
resize_image = cv2.cvtColor(resize_image,cv2.COLOR_BGR2RGB)

使用torchvision.io包读入图像：

torchvision.io包提供了对图像、视频文件的读写操作。

torchvision.io.read_image(path,mode)：读入JPEG或PNG格式的图像，并保存为 3 维 R G B 或灰度张量，返回输出张量outputtensor[channel,height,width],数据格式uint8，值范围[0,255] 。

torchvision.io.read_video(filename)：从文件中读入视频。

torchvision.io包读入图像，加载图像：

from torchvision.io import read_image
import torchvision.transforms as transforms
#读入图像
img = read_image('example.jpg')
#将读入的张量转换为PIL图像
my_img = transforms.ToPILImage()(img)

loader = transforms.Compose([
transforms.Resize(imsize),
transforms.ToTensor()]) #转换为torch tensor
def image_loader(image_name)： #图像加载函数
image = Image.open(image_name)
image = loader(image).unsqueeze(0)
return image.to(device,torch.float)
my_img = image_loader('sytle.jpg')

显示读入的图像：matplotlib.pyplot是matplotlib的一个基于状态的接口。提供了一种类似MATLAB的隐式绘图方式，主要用于交互式绘图和简单的编程绘图生成。

import numpy as np
import matplotlib.pyplot as plt
x = np.array([1,2])
y = np.cos(x)
plt.plot(x,y)

常见的操作：

ion() 打开交互模式。

ioff() 关闭交互模式。

imshow(image) 在屏幕上显示图像。

title 为图像设置标题。

pause 启动持续interval秒的GUI事件循环。如果有活动的图形，它将在执行该命令前更新和显示，并且在该命令执行期间运行GUI事件循环（如果有）。

plot() 绘制线图。

figure() 创建一个图形对象的函数，用于后续的绘图操作。

show() 显示所有打开的图形。

显示图像举例：

import matplotlib.pyplot as plt
from PIL import Image

# 读取图像
image_path = 'your_image_file.jpg'  # 请替换为你的图像文件路径
img = Image.open(image_path)

# 显示图像
plt.imshow(img)
plt.axis('off')  # 隐藏坐标轴
plt.show()  #

二、创建神经网络

可以自定义神经网络模型，也可以直接调用Pytorch框架中提供的模型。

自定义模型： PyTorch 提供 torch.nn、torch.nn.Module、torch.nn.functional等模块，用于自定义神经网络模型。

直接调用预训练模型：

Pytorch 提供torchvision.models,包含了用于处理不同任务的各种模型，如图像分类、语义分割、目标检测、关键点检测等。

torch.nn.Module

PyTorch使用模块(module)来表示神经网络，torch.nn.Module是用于封装PyTorch模型及组件的基类，自定义模型需继承该基类。

torch.nn.Module 包含了 init 以及 forward方法等， init 方法定义了module的内部状态，自定义模型时需调用该方法进行模型的初始化。forwar定义了每次调用需执行的计算操作，自定义的子类会将其覆盖。

2.1 自定义网络模型的方法：

首先需要继承torch.nn.Module类：通过init 方法初始化整个模型，定义模型结果及待学习的参数，使用forward方法定义模型的前向计算过程。由于Pytorch支持模型的自动梯度计算，因此在forward()中无需定义反向计算过程。

import torch.nn as nn
import torch.nn.functional as F
class myModule(nn.Module):
    def __init__(self):
        super(myModule, self).__init__() #获取当前类的父类(即nn.Module),并调用父类的构造函数
        self.conv = nn.Conv2d(3, 64, kernel_size = 3, stride = 1)
    def forward(self, x):
        return F.relu(self.conv(x)) # F提供relu函数 对 self.conv(x) 计算

parameter 与buffer

模块类（module）中包含两种不同状态的参数：

parameters：可学习的参数，即反向传播时可以被优化器更新的参数。

buffers：不可学习的参数，即反向传播时不可以被优化器更新的参数。

Parameter被保存在state_dict之中

Buffer分成两种：persistent和non-persistent。前者保存在state_dict中，而后者不包含在state_dict中。

两者的注册举例：

persistent：
    self.register_buffer('persistent_buffer', torch.zeros(2))

non_persistent:
    self.register_buffer('non_persistent_buffer', torch.zeros(2), persistent=False)

保存模型时保存的是包含在state_dict中的参数（parameter、persistent buffer ）。

对注册过的parameter和注册过的non- persistent buffer，在执module.to(device)操作时，可以自动进行设备移动。

torch.nn.Module 常用的属性或方法：

parameter / parameters() : 返回包含了模块中所有parameter的迭代器（iterator）

ParameterList() : 注册多层parameter。

buffers(): 返回包含了模块中所有buffer的迭代器。

state_dict(): 返回引用了模块中所有状态的字典，包含了parameter和persistent buffer参数。查询单层状态： model.fc1.state_dict()，整个模型状态：model.state_dict()。

register_parameter(name,param) ：在模块中注册一个parameter，parameter保存在state_dict中。

register_buffer(name ,tensor,persistent=True): 在模块中注册一个buffer。persistent为True的buffer保存在state_dict中。

add_module(name,module)：将名为name的子模块module添加到当前模块中，子模块通常为torch.nn.Conv2d、torch.nn.ReLU等，添加的模块本身无顺序，在前向计算中明确顺序。

requires_grad_(requires_grad=True)：原位设置parameter的requires_grad属性。当需要对模型进行精调，或仅对模型的一部分进行训练时，可以使用该方法将不需要变化的部分模块冻结。

to(device)/to(dtype)/to(tensor)：转换为指定的设备/数据类型/张量。

type(dst_type)：将所有parameter和buffer原位转换为dst_type。

zero_grad() ：将所有parameter的梯度设置为0。

train() ：设置模块为训练模式。eval() ：设置模块为评估模式。

以上方法综合举例，一个简单的训练过程：

import torch
import torch.nn as nn
import torch.optim as optim

class CustomModel(nn.Module):
    def __init__(self):
        super(CustomModel, self).__init__()
        self.weight = nn.Parameter(torch.randn(2))  # 可学习参数
        self.register_buffer('running_mean', torch.zeros(2))  

    def forward(self, x):
        return x + self.weight + self.running_mean

    def update_running_mean(self, new_values, momentum=0.9):
        self.running_mean = momentum * self.running_mean + (1 - momentum) * new_values

# 模拟数据集
data = [torch.tensor([1.0, 2.0]), torch.tensor([3.0, 4.0]), torch.tensor([5.0, 6.0])]

# 创建模型实例
model = CustomModel()

# 定义优化器，并提供模型的参数
optimizer = optim.SGD(model.parameters(), lr=0.01)




# 训练过程
for epoch in range(3):  # 训练3个epoch
    for input_tensor in data:
        optimizer.zero_grad()  # 清零梯度

        output = model(input_tensor)  # 前向计算

        # 可以在这里动态定义损失函数
        # 示例：使用均方误差损失
        loss_fn = nn.MSELoss()
        loss = loss_fn(output, input_tensor)  # 计算损失
        loss.backward()  # 反向传播

        optimizer.step()  # 更新模型参数

        # 假设新的均值（为演示目的，简单使用当前输入的均值）
        new_mean = input_tensor.clone().detach().mean(dim=0)
        model.update_running_mean(new_mean)

        print(f"Output after updating running_mean: {output}, Loss: {loss.item()}")

print("Final running mean:", model.running_mean)
运行结果：
Output after updating running_mean: tensor([0.5802, 2.6274], grad_fn=<AddBackward0>), Loss: 0.28489404916763306
Output after updating running_mean: tensor([2.7344, 4.7711], grad_fn=<AddBackward0>), Loss: 0.33255112171173096
Output after updating running_mean: tensor([5.0721, 7.0984], grad_fn=<AddBackward0>), Loss: 0.6058115363121033
Output after updating running_mean: tensor([1.5729, 3.5889], grad_fn=<AddBackward0>), Loss: 1.426371455192566
Output after updating running_mean: tensor([3.6185, 5.6244], grad_fn=<AddBackward0>), Loss: 1.5105193853378296
Output after updating running_mean: tensor([5.8585, 7.8543], grad_fn=<AddBackward0>), Loss: 2.0877790451049805
Output after updating running_mean: tensor([2.2715, 4.2574], grad_fn=<AddBackward0>), Loss: 3.356250047683716
Output after updating running_mean: tensor([4.2382, 6.2142], grad_fn=<AddBackward0>), Loss: 3.218034029006958
Output after updating running_mean: tensor([6.4073, 8.3736], grad_fn=<AddBackward0>), Loss: 3.8072614669799805
Final running mean: tensor([2.2299, 2.2299])

学习资料：https://novel.ict.ac.cn/aics/