Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / NVIDIA/apex issues and pull requests

#1455 - .cu files should not include torch/extension.h

Pull Request - State: open - Opened by lostmsu over 2 years ago - 4 comments

#1439 - Add BF16 support to FusedMixedPrecisionLamb

Pull Request - State: closed - Opened by nv-joseli over 2 years ago

#1420 - Could not find permutation search CUDA kernels, falling back to CPU path

Issue - State: closed - Opened by te-shi over 2 years ago - 5 comments
Labels: bug

#1415 - NVCC --threads option is hardcoded

Issue - State: open - Opened by wvidana over 2 years ago - 2 comments
Labels: bug

#1408 - how to invoke amp.initialize() and amp.scale_loss() from different module

Issue - State: closed - Opened by kehuanfeng over 2 years ago - 2 comments
Labels: bug

#1400 - [transformer] Port Sequence Parallelism (takeover of #1396)

Pull Request - State: closed - Opened by crcrpar over 2 years ago - 1 comment

#1394 - FusedDenseGeluDense output NAN

Issue - State: open - Opened by gongjingcs over 2 years ago - 2 comments
Labels: bug

#1326 - Installation Error

Issue - State: closed - Opened by GMN23362 almost 3 years ago - 2 comments

#1293 - The following error occurred while installing apex

Issue - State: closed - Opened by xxw11 almost 3 years ago - 2 comments

#1282 - Handle len(cached_x.grad_fn.next_functions) == 1 in cached_cast

Pull Request - State: open - Opened by jiafatom almost 3 years ago - 8 comments

#1230 - Using apex leeads to a `CUDA out of memory` on an A100

Issue - State: closed - Opened by StrangeTcy about 3 years ago - 2 comments

#1229 - [FMHA] add support for later CUDA (8.x)

Pull Request - State: closed - Opened by jqueguiner about 3 years ago - 4 comments

#1204 - pipeline_parallel - ModuleNotFoundError: No module named 'amp_C'

Issue - State: open - Opened by MatthieuCed about 3 years ago - 20 comments

#1193 - RuntimeError: apex.optimizers.FusedAdam requires cuda extensions

Issue - State: open - Opened by life97 over 3 years ago - 18 comments

#1178 - BFloat16 support in multi_tensor_*

Issue - State: closed - Opened by zhengwy888 over 3 years ago - 2 comments

#1175 - no_sync equivalent used for gradient accumulation

Issue - State: open - Opened by amsword over 3 years ago - 2 comments

#1141 - install apex error, flatten_unflatten.obj cannot open

Issue - State: open - Opened by MrBook2019 over 3 years ago - 6 comments

#1089 - Failed to install apex on CUDA 10.1 torch 1.6.0

Issue - State: closed - Opened by Ema1997 over 3 years ago - 2 comments

#1072 - FastLayerNorm ext not found after install on master

Issue - State: closed - Opened by sshleifer almost 4 years ago - 3 comments

#990 - TypeError: unsupported operand type(s) for +: 'NoneType' and 'str'

Issue - State: open - Opened by KrisWongz about 4 years ago - 14 comments

#965 - RuntimeError: expected scalar type Float but found Half

Issue - State: open - Opened by superlwx over 4 years ago - 7 comments

#957 - fatal error: cublas_v2.h: No such file or directory

Issue - State: open - Opened by shizhediao over 4 years ago - 6 comments

#954 - ModuleNotFoundError: No module named 'fused_layer_norm_cuda'

Issue - State: closed - Opened by ajesujoba over 4 years ago - 3 comments

#874 - Anaconda fail to build with "--cpp_ext" and "--cuda_ext" options

Issue - State: open - Opened by BurguerJohn over 4 years ago - 2 comments

#865 - distributed lamb breaks python-only amp

Issue - State: closed - Opened by lisadunlap over 4 years ago - 10 comments

#855 - LAMB and gradient clipping (instructions vs api)

Issue - State: open - Opened by ggaemo over 4 years ago - 2 comments

#810 - super slow to build Apex from source in docker

Issue - State: open - Opened by alexucb over 4 years ago - 1 comment

#802 - Build error (error: expected primary-expression before 'some' token)

Issue - State: open - Opened by kkjh0723 over 4 years ago - 24 comments

#777 - " ZeroDivisionError: float division by zero" in scaler.py

Issue - State: closed - Opened by qmpzzpmq almost 5 years ago - 2 comments

#774 - Grad norm cut in half every 2000 steps?

Issue - State: closed - Opened by PCerles almost 5 years ago

#715 - problems with fp16 on multi-gpu training

Issue - State: closed - Opened by ssp573 almost 5 years ago - 1 comment

#702 - Update pyprof for nsight

Pull Request - State: closed - Opened by ghost almost 5 years ago - 3 comments

#698 - Avoid exception when initializing FusedNovoGrad with amp

Pull Request - State: closed - Opened by henrymai almost 5 years ago

#694 - Multiple independent models, only one requires apex.amp, crash in non-amp CPU model

Issue - State: open - Opened by lopuhin almost 5 years ago - 13 comments

#635 - Gradient overflow.  Skipping step, loss scaler 0 reducing loss scale to

Issue - State: open - Opened by zsun1029 about 5 years ago - 11 comments

#621 - ImportError: cannot import name 'amp'

Issue - State: open - Opened by vr25 about 5 years ago - 13 comments

#573 - Original ImportError was: ModuleNotFoundError("No module named 'amp_C')

Issue - State: closed - Opened by misslibra about 5 years ago - 9 comments

#548 - Problem installation

Issue - State: open - Opened by emsansone over 5 years ago - 7 comments

#547 - Module 'torch.nn' has no attribute 'backends'

Issue - State: closed - Opened by YuryBolkonsky over 5 years ago - 8 comments

#533 - Not able to observe any speedup on a Nvidia T4 (Turing arch)

Issue - State: open - Opened by aditya1709 over 5 years ago - 4 comments

#519 - RuntimeError: main thread is not in main loop

Issue - State: open - Opened by H-YunHui over 5 years ago - 3 comments

#497 - Installation Error.

Issue - State: open - Opened by chunyuanY over 5 years ago - 2 comments

#466 - remove deprecated backend.FunctionBackend calls

Pull Request - State: closed - Opened by ptrblck over 5 years ago - 2 comments

#464 - Keep certain modules as FP32

Issue - State: closed - Opened by yaysummeriscoming over 5 years ago - 3 comments

#393 - I try the example when init init_process_group got an error

Issue - State: closed - Opened by PistonY over 5 years ago - 15 comments

#370 - undefined symbol: __ZN2at19UndefinedTensorImpl10_singletonE

Issue - State: closed - Opened by rmrao over 5 years ago - 4 comments

#318 - How to handle gradient overflow when training a deep model with mixed precision?

Issue - State: open - Opened by tfwu over 5 years ago - 29 comments

#187 - bugs after apex installation

Issue - State: open - Opened by yinwenpeng almost 6 years ago - 7 comments
Labels: extension build

#161 - No module named 'fused_layer_norm_cuda'

Issue - State: closed - Opened by alvin-leong almost 6 years ago - 23 comments

#116 - TypeError: Class advice impossible in Python3

Issue - State: closed - Opened by lynnna-xu about 6 years ago - 15 comments

#86 - Warning: apex was installed without --cuda_ext.

Issue - State: closed - Opened by amuier about 6 years ago - 35 comments