Azure/MS-AMP issues and pull requests

#90 - Update third party NCCL to MSCCL

Pull Request - State: closed - Opened by yzygitzh about 1 year ago

#89 - Fix bug: undeclared symbols when executing `make postinstall`

Pull Request - State: closed - Opened by tocean about 1 year ago

#88 - `make postinstall` failed due to undeclared symbols.

Issue - State: closed - Opened by guoshzhao about 1 year ago - 4 comments

#87 - V0.2.0 Test Plan

Issue - State: closed - Opened by tocean about 1 year ago
Labels: test

#86 - Replace dist_op with fp8_op

Issue - State: closed - Opened by tocean about 1 year ago

#85 - Support torch.distributed.all_reduce for fp8 tensor

Pull Request - State: closed - Opened by tocean over 1 year ago

#84 - feat(ScalingTensor, ScalingMeta): pickle and unpickle

Pull Request - State: closed - Opened by wkcn over 1 year ago

#83 - Remove env PYTHONOPTIMIZE in docker image

Pull Request - State: closed - Opened by wkcn over 1 year ago - 1 comment

#82 - Cache TE build to save time in pipelines

Pull Request - State: closed - Opened by abuccts over 1 year ago
Labels: CI/CD

#81 - Support bf16+pipeline+ZeRO1

Pull Request - State: closed - Opened by tocean over 1 year ago

#80 - Support tensor parallelism

Pull Request - State: closed - Opened by tocean over 1 year ago

#79 - add cifar10 example using deepspeed and MS-AMP

Pull Request - State: closed - Opened by tocean over 1 year ago

#78 - fix(zero): fix the bug of gradient accumulation in FP8 deepspeed ZeRO

Pull Request - State: closed - Opened by wkcn over 1 year ago

#77 - Support msamp section in deepspeed config

Pull Request - State: closed - Opened by tocean over 1 year ago - 3 comments

#76 - [Bug Fixed] update the cast of ScalingTensor-FP16

Pull Request - State: closed - Opened by wkcn over 1 year ago

#75 - [WIP] DistributedDataParallel with FP8-support

Pull Request - State: closed - Opened by wkcn over 1 year ago - 1 comment

#74 - Support pytorch 2.1

Pull Request - State: closed - Opened by tocean over 1 year ago

#73 - feat(deepspeed): Deepspeed ZeRO Optimizer with MS-AMP support

Pull Request - State: closed - Opened by wkcn over 1 year ago

#72 - Add Dockerfile for msamp

Pull Request - State: closed - Opened by abuccts over 1 year ago

#71 - feat(deepspeed): Deepspeed Optimizer with MS-AMP support

Pull Request - State: closed - Opened by wkcn over 1 year ago

#70 - feat(deepspeed): Deepspeed Optimizer with MS-AMP support

Pull Request - State: closed - Opened by wkcn over 1 year ago - 1 comment

#69 - Add TE results on homepage

Pull Request - State: closed - Opened by tocean over 1 year ago

#68 - Add performance result of DeiT and RoBERTa in homepage

Pull Request - State: closed - Opened by tocean over 1 year ago

#67 - V0.2 Release Plan

Issue - State: closed - Opened by cp5555 over 1 year ago
Labels: iteration plan

#66 - Refine homepage

Pull Request - State: closed - Opened by tocean over 1 year ago

#65 - [Feature] ScalingTensor support for F.linear

Pull Request - State: closed - Opened by wkcn over 1 year ago

#64 - [Feature] Copy custom attributes from Linear to FP8Linear

Pull Request - State: closed - Opened by wkcn over 1 year ago

#63 - [Bug Fixed] Fix all reduce grads when training more than one models

Pull Request - State: closed - Opened by wkcn over 1 year ago - 2 comments

#62 - [Bug] `LBOptimizer.all_reduce_grads` reduces gradients of only a model, even if training several models

Issue - State: closed - Opened by wkcn over 1 year ago

#61 - unit-test for multi-process training

Issue - State: closed - Opened by wkcn over 1 year ago

#60 - [Bug Fixed] `state` is not found when using FP32 optimizer states for LBAdamWBase

Pull Request - State: closed - Opened by wkcn over 1 year ago

#59 - [BUG Fixed] fix dist_op building when using PyTorch 2.0

Pull Request - State: closed - Opened by wkcn over 1 year ago - 3 comments

#58 - [Bug Fixed] Fix inf/nan weight gradient checker

Pull Request - State: closed - Opened by wkcn over 1 year ago - 4 comments

#57 - [Bug Fixed] Install nccl to system path

Pull Request - State: closed - Opened by tocean over 1 year ago

#56 - [Bug Fixed] Check NAN weight gradient in mixed precision training

Pull Request - State: closed - Opened by wkcn over 1 year ago - 1 comment

#55 - Add Dockerfile with FP8-NCCL

Pull Request - State: closed - Opened by wkcn over 1 year ago - 1 comment

#54 - enhance document and revert norm computing

Pull Request - State: closed - Opened by tocean over 1 year ago

#53 - MNIST example failed in docker nvcr.io/nvidia/pytorch:22.09-py3

Issue - State: closed - Opened by tocean over 1 year ago
Labels: bug

#52 - [Bug Fix]: Flatten tensor before passing to torch.__foreach_norm

Pull Request - State: closed - Opened by tocean over 1 year ago

#51 - V0.1.0 Test Plan

Issue - State: closed - Opened by tocean over 1 year ago - 1 comment
Labels: bug bash

#50 - Support FP8 ProcessGroup in pytorch

Issue - State: closed - Opened by tocean over 1 year ago

#49 - Can not run mnist_ddp.py when using pytorch 1.14

Issue - State: closed - Opened by tocean over 1 year ago - 2 comments

#48 - [Bug Fixed] remove `get_fp8_wgrads` and update `LBOptimizer.all_reduce_grads`

Pull Request - State: closed - Opened by wkcn over 1 year ago

#47 - [Bug Fixed] update the cast of ScalingTensor-FP16

Pull Request - State: closed - Opened by wkcn over 1 year ago - 1 comment

#46 - [Feature] add `ScalingTensor.nelement` and `ScalingTensor.data_ptr` for GPT Training

Pull Request - State: closed - Opened by wkcn over 1 year ago

#45 - [Bug Fixed] add `scaler.update()` in MNIST examples

Pull Request - State: closed - Opened by wkcn over 1 year ago

#44 - nccl buildig failed without specifying NVCC_GENCODE

Issue - State: closed - Opened by tocean over 1 year ago - 3 comments

#43 - Moving extension installation from post install to setup.py under project root folder

Issue - State: closed - Opened by tocean over 1 year ago

#42 - MS-AMP does not support pytorch2.1

Issue - State: closed - Opened by tocean over 1 year ago - 3 comments

#41 - Auto scaling factor tuning for FP8 collective communication

Issue - State: open - Opened by tocean over 1 year ago

#40 - Support pipeline parallelism and tensor parallelism

Issue - State: closed - Opened by tocean over 1 year ago

#39 - Update adamw optimizer with vectorized apis

Pull Request - State: closed - Opened by abuccts over 1 year ago

#38 - [Bug Fixed] Add the mapping between torch.distributed.ReduceOp and ncclRedOp_t

Pull Request - State: closed - Opened by wkcn over 1 year ago - 2 comments

#37 - Fix bug in the override of `torch._amp_foreach_non_finite_check_and_unscale_`

Pull Request - State: closed - Opened by wkcn over 1 year ago

#36 - Add grad scaler for examples

Pull Request - State: closed - Opened by wkcn over 1 year ago

#35 - NVLink bandwidth of H100 FP8 is only 1/10 of H100 FP16

Issue - State: closed - Opened by tocean over 1 year ago - 1 comment

#34 - Support Zero

Issue - State: closed - Opened by tocean over 1 year ago

#33 - Remove the dependency of Transformer Engine

Issue - State: closed - Opened by tocean over 1 year ago - 1 comment

#32 - There is a down spike in curve of accuracy@1 when training vision transformer

Issue - State: closed - Opened by tocean over 1 year ago

#31 - Fix incompatible attributes in `msamp.initialize`

Pull Request - State: closed - Opened by abuccts over 1 year ago
Labels: bug, CI/CD

#30 - Optimize performance to close the gap between TE and MS-AMP

Issue - State: closed - Opened by tocean over 1 year ago

#29 - MS-AMP needs a website

Issue - State: closed - Opened by tocean over 1 year ago - 1 comment

#28 - Not support pytorch 1.14

Issue - State: closed - Opened by tocean over 1 year ago - 1 comment

#26 - Add README and MANIFEST.in

Pull Request - State: closed - Opened by tocean over 1 year ago

#25 - Support opt-out in time scaling

Pull Request - State: closed - Opened by abuccts over 1 year ago

#24 - Update unscale and clip grad norm with foreach calls

Pull Request - State: closed - Opened by abuccts over 1 year ago

#23 - Wrap tex fp8 transpose in linear backward

Pull Request - State: closed - Opened by abuccts over 1 year ago - 1 comment

#22 - [WIP] Update the condition to check FP8 NaN and INF

Pull Request - State: closed - Opened by wkcn over 1 year ago - 1 comment

#21 - Add mnist and mnist-ddp examples

Pull Request - State: closed - Opened by tocean over 1 year ago

#20 - Fix dist op build issue with torch1.14+

Pull Request - State: closed - Opened by abuccts over 1 year ago
Labels: bug

#19 - Add msamp.initialize api and unit test

Pull Request - State: closed - Opened by tocean over 1 year ago

#18 - all_reduce_grads should be called before optimizer.step

Pull Request - State: closed - Opened by wkcn over 1 year ago - 2 comments

#17 - Fix bug in historical window quantization

Pull Request - State: closed - Opened by wkcn over 1 year ago

#16 - Add Zero Support

Pull Request - State: closed - Opened by guoshzhao over 1 year ago

#15 - add clip_grad in nn package.

Pull Request - State: closed - Opened by tocean over 1 year ago

#14 - Add scale_inv for ScalingTensor

Pull Request - State: closed - Opened by wkcn over 1 year ago

#13 - Remove H100 check in fake test

Pull Request - State: closed - Opened by wkcn over 1 year ago

#12 - The operators of Automatic Mixed Precision (AMP)

Pull Request - State: closed - Opened by wkcn over 1 year ago - 2 comments

#11 - Add optimizer package and unit tests

Pull Request - State: closed - Opened by tocean over 1 year ago

#10 - Operators - Add customized collective operators.

Pull Request - State: closed - Opened by guoshzhao over 1 year ago - 2 comments

#9 - Add msamp.nn package and unit tests

Pull Request - State: closed - Opened by tocean over 1 year ago

#8 - Operators - Add the fp8 gemm operator.

Pull Request - State: closed - Opened by guoshzhao over 1 year ago

#7 - Add tensor package and unittests

Pull Request - State: closed - Opened by tocean over 1 year ago

#6 - Setup - Add template for PR, bug report and enhancement request.

Pull Request - State: closed - Opened by guoshzhao over 1 year ago

#5 - Setup - Revise setup to add package dependency.

Pull Request - State: closed - Opened by guoshzhao over 1 year ago

#4 - Common - Add lazy import module.

Pull Request - State: closed - Opened by guoshzhao over 1 year ago - 3 comments

#3 - Common - Add transformer engine wrapper

Pull Request - State: closed - Opened by guoshzhao over 1 year ago

#2 - Setup - Initialize build/lint/test and pipeline

Pull Request - State: closed - Opened by abuccts over 1 year ago
Labels: CI/CD

#1 - Common - Add dtypes and tests

Pull Request - State: closed - Opened by tocean over 1 year ago

GitHub / Azure/MS-AMP issues and pull requests