1. simple RNN
network has (causal) memory encoded in history vector (ht).

yt = function of previous and current words x1:t
= function of current word xt and previous words
approximately = function of xt and ht-1 (history vector outputed at time t-1
approximately = function of ht
recurrent unit图示:


更新公式:

2. LSTM
LSTM可解决:1. 梯度消失 2.梯度消失所导致的长期依赖问题 (simple RNN中因为存在梯度消失问题所以学习不到词间的长期依赖关系)。
1.1. LSTM图示
<版本1>

<版本2>

<版本3>

1.2. 更新公式
<版本1>

<版本2>

<版本3>
c: memory cell
a: activation value - i.e. hidden state
y^: output value
x: input value

GRU
图示:

更新公式:



Bi-RNN
Use the complete sequence word 1:T to predict each word.



yt由两个history vector经过linear transformation (乘以weight matrix W, 加上bias)在经过activation function得出。
reference:
[1] 4F10: Deep Learning for Sequence Data
[2] https://www.bilibili.com/video/BV1Qb411p7mG
[3] https://www.bilibili.com/video/BV1JE411g7XF?p=20

4890

被折叠的 条评论
为什么被折叠?



