LSTM,GRU,Bi-RNN

最新推荐文章于 2026-03-17 19:02:28 发布

原创最新推荐文章于 2026-03-17 19:02:28 发布 · 288 阅读

0 ·

本内容遵循CC 4.0 BY-SA版权协议

收录于

NLP 专栏收录该内容

52 篇文章

订阅专栏

1. simple RNN
network has (causal) memory encoded in history vector (ht).
在这里插入图片描述
yt = function of previous and current words x1:t
= function of current word xt and previous words
approximately = function of xt and ht-1 (history vector outputed at time t-1
approximately = function of ht

recurrent unit图示：
在这里插入图片描述

在这里插入图片描述
更新公式：

2. LSTM

LSTM可解决：1. 梯度消失 2.梯度消失所导致的长期依赖问题（simple RNN中因为存在梯度消失问题所以学习不到词间的长期依赖关系）。

1.1. LSTM图示

<版本1>

在这里插入图片描述

<版本2>
在这里插入图片描述
<版本3>

1.2. 更新公式

<版本1>
在这里插入图片描述
<版本2>

在这里插入图片描述
<版本3>

c: memory cell
a: activation value - i.e. hidden state
y^: output value
x: input value
在这里插入图片描述

GRU

图示：
在这里插入图片描述
更新公式：

Bi-RNN

Use the complete sequence word 1:T to predict each word.
在这里插入图片描述

yt由两个history vector经过linear transformation （乘以weight matrix W，加上bias）在经过activation function得出。

reference:
[1] 4F10: Deep Learning for Sequence Data
[2] https://www.bilibili.com/video/BV1Qb411p7mG
[3] https://www.bilibili.com/video/BV1JE411g7XF?p=20