chore: change pic to cos

This commit is contained in:
camera-2018
2023-07-02 00:33:36 +08:00
parent 08fa485f6f
commit cd9d239d20
91 changed files with 462 additions and 462 deletions

View File

@@ -30,13 +30,13 @@ mlp 的重点和创新并非它的模型结构,而是它的训练方式,前
BERT 模型的输入就是上面三者的和,如图所示:
![](https://hdu-cs-wiki.oss-cn-hangzhou.aliyuncs.com/boxcngc1a7cWapQA9rSLXYqUvkf.png)
![](https://pic-hdu-cs-wiki-1307923872.cos.ap-shanghai.myqcloud.com/boxcngc1a7cWapQA9rSLXYqUvkf.png)
## 模型结构
简单来说BERT 是 transformer<strong>编码器</strong>的叠加,<strong>也就是下图左边部分</strong>。这算一个 block。
![](https://hdu-cs-wiki.oss-cn-hangzhou.aliyuncs.com/boxcnPg8594YzCdnX6KZxpEYYod.png)
![](https://pic-hdu-cs-wiki-1307923872.cos.ap-shanghai.myqcloud.com/boxcnPg8594YzCdnX6KZxpEYYod.png)
说白了就是一个 多头自注意力=>layer-norm=> 接 feed forward(其实就是 mlp)=>layer-norm没有什么创新点在这里。因为是一个 backbone 模型,它没有具体的分类头之类的东西。输出就是最后一层 block 的输出。