文章:Microscaling Data Formats for Deep Learning
1. 复现resnet50(参考文献9)FP32、MXINT8、MXFP8、MXFP6、MXFP4
Table2. Direct-cast inference with MX data formats
git clone https://github.com/NVIDIA/DeepLearningExamples
cd DeepLearningExamples/PyTorch/Classification/
// download ILSVRC2012_img_train.tar
mkdir train && mv ILSVRC2012_img_train.tar train/ && cd train
tar -xvf ILSVRC2012_img_train.tar && rm -f ILSVRC2012_img_train.tar
find . -name "*.tar" | while read NAME ; do mkdir -p "${NAME%.tar}"; tar -xvf "${NAME}" -C "${NAME%.tar}"; rm -f "${NAME}"; done
cd ..
mkdir val && mv ILSVRC2012_img_val.tar val/ && cd val && tar -xvf ILSVRC2012_img_val.tar
wget -qO- https://raw.githubusercontent.com/soumith/imagenetloader.torch/master/valprep.sh | bash
docker build . -t nvidia_resnet50
将本地目录/data/workspaces/zhangxin/ILSVRC2012挂载到docker的/imagenet目录下:
nvidia-docker run --rm -it -v /data/workspaces/zhangxin/ILSVRC2012:/imagenet --ipc=host nvidia_resnet50
启动docker后,在docker中下载预训练模型并进行推理(--amp为bool选项:是否启用混合精度):
wget --content-disposition https://api.ngc.nvidia.com/v2/models/nvidia/resnet50_pyt_amp/versions/20.06.0/zip -O resnet50_pyt_amp_20.06.0.zip
unzip resnet50_pyt_amp_20.06.0.zip
vim /workspace/rn50/image_classification/dataloaders.py
注释掉135行、189行
// 混合精度
python ./main.py --arch=resnet50 --evaluate --amp --epochs=1 --pretrained -b=256 /imagenet/
// FP32
python ./main.py --arch=resnet50 --evaluate --epochs=1 --pretrained -b=256 /imagenet/
再去这里PyTorch emulation library for Microscaling (MX)-compatible data formats将torch.nn算子换为MX算子:
git clone https://github.com/microsoft/microxcaling.git
python -c "import torch; print(torch.__version__)"
pip install torch --upgrade
cd path/to/exmaple/
bash run_mxfp6.sh
cd path/to/mx/test/
python -m pytest .
vim main.py
from mx import finalize_mx_specs
from mx import mx_mapping
if __name__ == "__main__":
epilog = [
"Based on the architecture picked by --arch flag, you may use the following options:\n"
]
# Simple MX spec for MXFP6 weights+activations
mx_specs = {
'w_elem_format': 'fp6_e3m2',
'a_elem_format': 'fp6_e3m2',
'block_size': 32,
'bfloat': 16,
'custom_cuda': True,
# For quantization-aware finetuning, do backward pass in FP32
'quantize_backprop': False,
}
mx_specs = finalize_mx_specs(mx_specs)
# Auto-inject MX modules and functions
# This will replace certain torch.nn.* and torch.nn.functional.*
# modules/functions in the global namespace!
mx_mapping.inject_pyt_ops(mx_specs)
for model, ep in available_models().items():
model_help = "\n".join(ep.parser().format_help().split("\n")[2:])
epilog.append(model_help)
parser = argparse.ArgumentParser(
description="PyTorch ImageNet Training",
epilog="\n".join(epilog),
formatter_class=argparse.RawDescriptionHelpFormatter,
)
add_parser_arguments(parser)
args, rest = parser.parse_known_args()
model_arch = available_models()[args.arch]
model_args, rest = model_arch.parser().parse_known_args(rest)
print(model_args)
assert len(rest) == 0, f"Unknown args passed: {rest}"
cudnn.benchmark = True
main(args, model_args, model_arch)
// 多行注释
vim main.py
Ctrl+v
向下箭头选中多行
大写I
#
Esc
// 取消注释
:659,675s/#//g
2. 复现GNMT
git clone https://github.com/NVIDIA/DeepLearningExamples
cd DeepLearningExamples/PyTorch/Translation/GNMT
bash scripts/docker/build.sh
bash scripts/docker/interactive.sh
bash scripts/wmt16_en_de.sh
python3 -m torch.distributed.launch --nproc_per_node=<#GPUs> train.py --seed 2 --train-global-batch-size 1024
python3 translate.py \
--input data/wmt16_de_en/newstest2014.en \
--reference data/wmt16_de_en/newstest2014.de \
--output /tmp/output \
--model gnmt/model_best.pth
python3 translate.py \
--input-text "The quick brown fox jumps over the lazy dog" \
--model gnmt/model_best.pth
3. transformer复制docker中的文件至主机
docker ps
(py311) zhangxin@SH-AI-GPU04:~/code$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b3966e6d051a nvidia_transformer:transformer "/opt/nvidia/nvidia_…" 2 hours ago Up 2 hours 6006/tcp, 8888/tcp focused_bartik
docker cp focused_bartik:/workspace/translation/transformer.txt ~/code/
docker cp focused_bartik:/workspace/translation/transformer-model_list.txt ~/code/

1159

被折叠的 条评论
为什么被折叠?



