Pytorch学习笔记#1：拟合函数/梯度下降

最新推荐文章于 2025-12-09 17:45:50 发布

原创最新推荐文章于 2025-12-09 17:45:50 发布 · 946 阅读

0 ·

本内容遵循CC 4.0 BY-SA版权协议

标签

#pytorch #学习 #深度学习

Python 专栏收录该内容

12 篇文章

订阅专栏

文章介绍了如何使用PyTorch进行手动和自动梯度下降来拟合函数。手动梯度下降通过计算损失函数的梯度更新权重；自动梯度则利用PyTorch的自动反向传播机制简化这一过程。示例中涉及线性回归和简单的神经网络模型。

学习自https://pytorch.org/tutorials/beginner/pytorch_with_examples.html

概念

Pytorch Tensor在概念上和Numpy的array一样是一个 $n$ 维向量的。不过Tensor可以在GPU中进行计算，且可以跟踪计算图（computational graph）和梯度（gradients）。

手动梯度下降拟合函数

我们用三次函数去拟合任意函数。
$y^=a+bx+cx2+dx3\hat{y}=a+bx+cx^2+dx^3$
定义损失函数 $L=∑(y−y^)2L=\sum(y-\hat{y})^2$
那么梯度为:
$L:2∗∑(y−y^)L:2*\sum(y-\hat{y})$
$a:2∗∑(y−y^)a:2*\sum(y-\hat{y})$
$b:2∗x∗∑(y−y^)b:2*x*\sum(y-\hat{y})$
$c:2∗x2∗∑(y−y^)c:2*x^2*\sum(y-\hat{y})$
$d:2∗x3∗∑(y−y^)d:2*x^3*\sum(y-\hat{y})$

代码

import torch
import math

dtype = torch.float
device = torch.device("cuda:0") # Run on GPU

# Create random input and output data
x = torch.linspace(-math.pi, math.pi, 2000, device=device,dtype=dtype)
y = torch.sin(x)

# Randomly initialize weights
a = torch.randn((), device=device, dtype=dtype)
b = torch.randn((), device=device, dtype=dtype)
c = torch.randn((), device=device, dtype=dtype)
d = torch.randn((), device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(2000):
    # Forward pass: compute predicted y
    y_pred = a + b * x + c * x **2 + d *x ** 3

    # Compute and print loss
    loss = (y_pred - y).pow(2).sum().item()
    if t % 100 == 99:
        print(t, loss)
    
    # Backprop to compute gradients of a, b, c, d with respect to loss
    grad_y_pred = 2.0 * (y_pred - y)
    grad_a = grad_y_pred.sum()
    grad_b = (grad_y_pred * x).sum()
    grad_c = (grad_y_pred * x ** 2).sum()
    grad_d = (grad_y_pred * x ** 3).sum()

    # Update weights using gradient descent
    a -= learning_rate * grad_a
    b -= learning_rate * grad_b
    c -= learning_rate * grad_c
    d -= learning_rate * grad_d

print(f'Result: y = {a.item()} + {b.item()} x + {c.item()} x^2 + {d.item()} x^3')

自动梯度下降拟合函数

通过PyTorch: nn构建神经网络，如果我们需要一个三次函数来拟合，那么我们就需要一个隐藏层为1层，节点为3个的神经网络。
即 $y^=∑i=13(wixi+bi)\hat{y}=\sum_{i=1}^3(w_ix^i+b_i)$

model = torch.nn.Sequential(
    torch.nn.Linear(3, 1), #三个节点
    torch.nn.Flatten(0, 1) # 把三个节点的结果加起来
)

由于我们的神经网络第一层有三个输入（ $x,x^2,x^3$ ）,所以我们需要把数据预处理一下

x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)

然后我们预测输出就可以直接调用model了

y_pred = model(xx) # y_pred也是一个tensor

损失函数

loss_fn = torch.nn.MSELoss(reduction='sum') # 定义,使用均方误差
loss = loss_fn(y_pred, y) # 计算均方误差
model.zero_grad() # 先把原先模型的梯度信息清零
loss.backward() # 计算反向传播的梯度

完整代码

import torch
import math

x = torch.linspace(-math.pi, math.pi, 2000)
y = torch.sin(x)

p = torch.tensor([1, 2, 3])
xx = x.unsqueeze(-1).pow(p)


model = torch.nn.Sequential(
    torch.nn.Linear(3, 1),
    torch.nn.Flatten(0, 1)
)

loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-6
for t in range(2000):
    y_pred = model(xx)
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    model.zero_grad()

    loss.backward()

    with torch.no_grad(): # 进行梯度下降
        for param in model.parameters():
            param -= learning_rate * param.grad

linear_layer = model[0]
print(f'Result: y = {linear_layer.bias.item()} + {linear_layer.weight[:, 0].item()} x + {linear_layer.weight[:, 1].item()} x^2 + {linear_layer.weight[:, 2].item()} x^3')