GitHub / pytorch/pytorch issues and pull requests
Labelled with: module: cuda
#157381 - Torch is unusable when cuda-12.4 is installed locally
Issue -
State: closed - Opened by sekyondaMeta 5 months ago
- 6 comments
Labels: module: binaries, module: cuda, triaged, module: regression
#157366 - DISABLED test_graph_partition_cpu_tensor_symints (__main__.CudaGraphTreeTests)
Issue -
State: closed - Opened by pytorch-bot[bot] 5 months ago
- 1 comment
Labels: module: cuda, module: rocm, triaged, module: flaky-tests, skipped
#157359 - DISABLED test_graph_partition_cpu_scalar_mutation (__main__.CudaGraphTreeTests)
Issue -
State: closed - Opened by pytorch-bot[bot] 5 months ago
- 11 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped
#157350 - DISABLED test_graph_partition_cpu_scalar4 (__main__.CudaGraphTreeTests)
Issue -
State: open - Opened by pytorch-bot[bot] 5 months ago
- 15 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped
#157339 - DISABLED test_graph_partition_cpu_scalar3 (__main__.CudaGraphTreeTests)
Issue -
State: open - Opened by pytorch-bot[bot] 5 months ago
- 12 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped
#157312 - DISABLED test_graph_partition_cpu_scalar2 (__main__.CudaGraphTreeTests)
Issue -
State: open - Opened by pytorch-bot[bot] 5 months ago
- 22 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped
#157280 - DISABLED test_graph_partition_cpu_scalar1 (__main__.CudaGraphTreeTests)
Issue -
State: closed - Opened by pytorch-bot[bot] 5 months ago
- 18 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped
#157276 - several `transformers` tests fail with `torch 2.8 RC` but pass with `torch 2.7.1` on `T4` (but both pass on `A10`)
Issue -
State: open - Opened by ydshieh 5 months ago
- 3 comments
Labels: high priority, triage review, module: cuda, oncall: pt2, module: inductor, module: higher order operators, module: pt2-dispatcher, module: flex attention
#157258 - DISABLED test_graph_partition_cpu_op_and_dynamic_shapes (__main__.CudaGraphTreeTests)
Issue -
State: closed - Opened by pytorch-bot[bot] 5 months ago
- 23 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped
#157256 - DISABLED test_mempool_limited_memory_with_allocator (__main__.TestMemPool)
Issue -
State: open - Opened by pytorch-bot[bot] 5 months ago
- 15 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped
#157172 - DISABLED test_compile_kernel_advanced (__main__.TestCompileKernel)
Issue -
State: open - Opened by pytorch-bot[bot] 5 months ago
- 10 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped
#157145 - [SDPA] Fix `alloc_with_matching_layout` stride sorting
Pull Request -
State: closed - Opened by eqy 5 months ago
- 3 comments
Labels: module: cuda, open source, Merged, ciflow/trunk, topic: not user facing, merging, module: sdpa
#157143 - DISABLED test_function_compiled_multiple_times (__main__.CudaGraphTreeTests)
Issue -
State: closed - Opened by pytorch-bot[bot] 5 months ago
- 26 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped
#157112 - DISABLED test_frozen_fn (__main__.CudaGraphTreeTests)
Issue -
State: closed - Opened by pytorch-bot[bot] 5 months ago
- 23 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped
#157110 - DISABLED test_invalid_status_for_legacy_api (__main__.TestCuda)
Issue -
State: open - Opened by pytorch-bot[bot] 5 months ago
- 12 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped
#157109 - DISABLED test_invalid_status_for_legacy_api (__main__.TestCuda)
Issue -
State: closed - Opened by pytorch-bot[bot] 5 months ago
- 1 comment
Labels: module: cuda, triaged, module: flaky-tests, skipped
#157086 - DISABLED test_forward_with_skipped_cudagraphed_backward (__main__.CudaGraphTreeTests)
Issue -
State: open - Opened by pytorch-bot[bot] 5 months ago
- 25 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped
#157057 - DISABLED test_forward_generation (__main__.CudaGraphTreeTests)
Issue -
State: open - Opened by pytorch-bot[bot] 5 months ago
Labels: module: cuda, module: rocm, triaged, module: flaky-tests, skipped
#156984 - DISABLED test_forward_backward_not_called_backend_cudagraphs (__main__.CudaGraphTreeTests)
Issue -
State: open - Opened by pytorch-bot[bot] 5 months ago
- 24 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped
#156957 - DISABLED test_forward_backward (__main__.CudaGraphTreeTests)
Issue -
State: open - Opened by pytorch-bot[bot] 5 months ago
- 17 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped
#156886 - DISABLED test_expanded_inputs (__main__.CudaGraphTreeTests)
Issue -
State: closed - Opened by pytorch-bot[bot] 5 months ago
- 8 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped
#156838 - DISABLED test_execution_into_recording (__main__.CudaGraphTreeTests)
Issue -
State: closed - Opened by pytorch-bot[bot] 5 months ago
- 26 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped
#156815 - `load_inline` forcibly adds legacy `-gencode` flags and cannot be overridden, preventing use of newer compute capabilities
Issue -
State: closed - Opened by yuxuan-z19 5 months ago
- 3 comments
Labels: module: cpp-extensions, module: cuda, triaged
#156801 - DISABLED test_error_on_dealloc_use (__main__.CudaGraphTreeTests)
Issue -
State: closed - Opened by pytorch-bot[bot] 5 months ago
- 26 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped
#156777 - DISABLED test_end_recording_early (__main__.CudaGraphTreeTests)
Issue -
State: closed - Opened by pytorch-bot[bot] 5 months ago
- 1 comment
Labels: module: cuda, module: rocm, triaged, module: flaky-tests, skipped
#156754 - DISABLED test_empty_storage (__main__.CudaGraphTreeTests)
Issue -
State: open - Opened by pytorch-bot[bot] 5 months ago
Labels: module: cuda, module: rocm, triaged, module: flaky-tests, skipped
#156747 - Restore CUDA 12.4 manylinux build and test in CI
Issue -
State: closed - Opened by atalman 5 months ago
- 1 comment
Labels: module: build, module: cuda, module: ci, triaged
#156735 - DISABLED test_empty_cpu_tensor (__main__.CudaGraphTreeTests)
Issue -
State: closed - Opened by pytorch-bot[bot] 5 months ago
- 31 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped
#156693 - DISABLED test_dynamic_warmup (__main__.CudaGraphTreeTests)
Issue -
State: closed - Opened by pytorch-bot[bot] 5 months ago
- 31 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped
#156548 - [CUDA] Skip test on low vram machines
Pull Request -
State: closed - Opened by Isalia20 5 months ago
- 8 comments
Labels: module: cuda, open source, Merged, ciflow/trunk, topic: not user facing, merging
#156548 - [CUDA] Skip test on low vram machines
Pull Request -
State: closed - Opened by Isalia20 5 months ago
- 8 comments
Labels: module: cuda, open source, Merged, ciflow/trunk, topic: not user facing
#156203 - [CUTLASS] [CUDA] SM100 GroupMM
Pull Request -
State: closed - Opened by AaronWang04 5 months ago
- 16 comments
Labels: module: cuda, triaged, open source, Merged, ciflow/trunk, topic: not user facing, merging
#156203 - [CUTLASS] [CUDA] SM100 GroupMM
Pull Request -
State: closed - Opened by AaronWang04 5 months ago
- 16 comments
Labels: module: cuda, triaged, open source, Merged, ciflow/trunk, topic: not user facing
#156202 - Upgrade torch._grouped_mm to SM100+
Issue -
State: closed - Opened by syed-ahmed 5 months ago
Labels: module: cuda, triaged
#156160 - [SDPA] RTX5080 is different from CPU calculation result in backward with long seq
Issue -
State: closed - Opened by O5-7 5 months ago
- 2 comments
Labels: module: numerical-stability, module: cuda, triaged, module: sdpa
#156140 - [cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel
Pull Request -
State: open - Opened by eqy 5 months ago
- 13 comments
Labels: module: cudnn, module: cuda, module: cpu, module: convolution, triaged, open source, Merged, Reverted, ciflow/trunk, release notes: cuda, topic: bug fixes, ci-no-td
#156074 - Improve IPC for Expandable Segments to use fabric handle when possible
Pull Request -
State: closed - Opened by youkaichao 5 months ago
- 11 comments
Labels: module: cuda, triaged, open source, Merged, ciflow/trunk, topic: not user facing, merging
#156015 - Function 'MmBackward0' returned nan values in its 0th output.
Issue -
State: closed - Opened by O5-7 5 months ago
- 3 comments
Labels: module: cuda
#155900 - [C10][CUDA] Eagerly create context on torch.cuda.set_device(device) call
Pull Request -
State: open - Opened by Aidyn-A 5 months ago
- 1 comment
Labels: module: cuda, open source, release notes: cuda, topic: not user facing
#155889 - Context on torch.cuda.memory._record_memory_history max_entries
Pull Request -
State: open - Opened by b-koopman 5 months ago
- 7 comments
Labels: module: docs, module: cuda, triaged, open source, medium, topic: not user facing, docathon-h1-2025
#155889 - Context on torch.cuda.memory._record_memory_history max_entries
Pull Request -
State: closed - Opened by b-koopman 5 months ago
- 9 comments
Labels: module: docs, module: cuda, triaged, open source, medium, Merged, ciflow/trunk, topic: not user facing, merging, docathon-h1-2025
#155888 - [CUDA][CUTLASS] test_cutlass_backend.py unit test failures on SM90+
Issue -
State: closed - Opened by nWEIdia 5 months ago
- 2 comments
Labels: module: cuda, module: tests, triaged
#155857 - Optionally avoid `record_streams` in autograd with `TORCH_AUTOGRAD_AVOID_RECORD_STREAMS=1`
Pull Request -
State: open - Opened by eqy 5 months ago
- 1 comment
Labels: module: cuda, module: memory usage, open source, module: cuda graphs, topic: not user facing
#155857 - Optionally avoid `record_streams` in autograd with `TORCH_AUTOGRAD_AVOID_RECORD_STREAMS=1`
Pull Request -
State: closed - Opened by eqy 5 months ago
- 2 comments
Labels: module: cuda, module: memory usage, triaged, open source, module: cuda graphs, Stale, topic: not user facing
#155668 - torch.cuda.set_device(0) behaves differently from torch.cuda.set_device(1) in terms of cuda context
Issue -
State: closed - Opened by youkaichao 5 months ago
- 2 comments
Labels: module: cuda, triaged
#155397 - [CUDA] fix illegal memory access in attention
Pull Request -
State: closed - Opened by Isalia20 5 months ago
- 23 comments
Labels: module: cuda, triaged, open source, Merged, ciflow/trunk, release notes: cuda, merging, module: sdpa
#155397 - [CUDA] fix illegal memory access in attention
Pull Request -
State: closed - Opened by Isalia20 5 months ago
- 25 comments
Labels: module: cuda, triaged, open source, Merged, ciflow/trunk, release notes: cuda, module: sdpa
#155350 - CUDA_HOME doesn't seem to work with setup script
Issue -
State: open - Opened by ezyang 6 months ago
- 4 comments
Labels: module: build, module: cuda, triaged
#155341 - Document the default garbage_collection_threshold value and improve the organization of cuda docs
Pull Request -
State: closed - Opened by ParagEkbote 6 months ago
- 6 comments
Labels: module: docs, module: cuda, triaged, open source, Merged, topic: docs, topic: not user facing, easy, docathon-h1-2025
#155341 - Document the default garbage_collection_threshold value and improve the organization of cuda docs
Pull Request -
State: closed - Opened by ParagEkbote 6 months ago
- 6 comments
Labels: module: docs, module: cuda, triaged, open source, Merged, topic: docs, topic: not user facing, merging, easy, docathon-h1-2025
#155288 - Fused RMSNorm Implementation
Pull Request -
State: open - Opened by AaronWang04 6 months ago
- 2 comments
Labels: module: cuda, open source, module: norms and normalization, topic: not user facing
#155225 - canUse32BitIndexMath set to False with efficient net
Issue -
State: closed - Opened by jjh42 6 months ago
- 13 comments
Labels: module: nn, module: cuda, triaged
#155145 - [CUDA] fix illegal memory access in attention
Pull Request -
State: closed - Opened by Isalia20 6 months ago
- 10 comments
Labels: module: cuda, triaged, open source, release notes: cuda, module: sdpa
#154778 - [CUDA] Fix missing bounds check in `Softmax.cu`
Pull Request -
State: open - Opened by eqy 6 months ago
- 5 comments
Labels: module: cuda, open source, topic: not user facing
#154566 - DISABLED test_mempool_with_allocator (__main__.TestMemPool)
Issue -
State: open - Opened by pytorch-bot[bot] 6 months ago
- 3 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped
#154293 - [cuBLAS][cuBLASLt] Reduce scale of inputs for reduced precision reduction matmul test
Pull Request -
State: open - Opened by eqy 6 months ago
- 13 comments
Labels: module: cuda, triaged, module: cublas, open source, module: bfloat16, module: half, ciflow/trunk, topic: not user facing, ciflow/periodic, keep-going
#154293 - [cuBLAS][cuBLASLt] Reduce scale of inputs for reduced precision reduction matmul test
Pull Request -
State: open - Opened by eqy 6 months ago
Labels: module: cuda, module: cublas, open source, module: bfloat16, module: half, ciflow/trunk, topic: not user facing
#154170 - [cuBLASLt][cuBLAS] Support 2D bias and `beta != 1.0` in cuBLASLt
Pull Request -
State: open - Opened by eqy 6 months ago
- 44 comments
Labels: module: cuda, triaged, open source, Merged, Reverted, ciflow/trunk, topic: not user facing, matrix multiplication, ciflow/rocm, ci-no-td, ciflow/inductor-rocm, ciflow/rocm-mi300
#154170 - [cuBLASLt][cuBLAS] Support 2D bias and `beta != 1.0` in cuBLASLt
Pull Request -
State: closed - Opened by eqy 6 months ago
- 48 comments
Labels: module: cuda, triaged, open source, Merged, Reverted, Stale, ciflow/trunk, topic: not user facing, matrix multiplication, ciflow/rocm, ci-no-td, ciflow/inductor-rocm, ciflow/rocm-mi300
#154029 - SDPA fix memory efficient attention for large batch dim
Pull Request -
State: closed - Opened by Isalia20 6 months ago
- 11 comments
Labels: module: cuda, triaged, open source, Merged, ciflow/trunk, release notes: cuda, merging, module: sdpa
#154029 - SDPA fix memory efficient attention for large batch dim
Pull Request -
State: open - Opened by Isalia20 6 months ago
- 8 comments
Labels: module: cuda, triaged, open source, ciflow/trunk, release notes: cuda, module: sdpa
#153794 - Fix add enabled in _initRecordAnnotations()
Pull Request -
State: closed - Opened by ghostspiders 6 months ago
- 1 comment
Labels: module: cuda, open source, topic: not user facing
#153794 - Fix add enabled in _initRecordAnnotations()
Pull Request -
State: closed - Opened by ghostspiders 6 months ago
- 1 comment
Labels: module: cuda, open source, topic: not user facing
#153782 - [CUDA] Fused optimizers - write lerp with fma instead of regular lerp
Pull Request -
State: open - Opened by Isalia20 6 months ago
- 3 comments
Labels: module: optimizer, module: cuda, open source, release notes: cuda
#153782 - [CUDA] Fused optimizers - write lerp with fma instead of regular lerp
Pull Request -
State: closed - Opened by Isalia20 6 months ago
- 11 comments
Labels: module: optimizer, module: cuda, open source, release notes: cuda
#153675 - [cuBLASLt] relax `addmm` cuBLASLt constraint
Pull Request -
State: open - Opened by eqy 6 months ago
- 12 comments
Labels: module: cuda, open source, Merged, Reverted, ciflow/trunk, topic: not user facing, matrix multiplication, ci-no-td
#153666 - Fused RMSNorm implementation
Pull Request -
State: open - Opened by AaronWang04 6 months ago
- 93 comments
Labels: module: bc-breaking, module: cuda, module: cpu, triaged, open source, Merged, Reverted, ciflow/trunk, release notes: cuda, topic: bc breaking, module: inductor, rocm, ci-no-td, release notes: inductor (aoti)
#153655 - [CUDA][cuBLAS][cuBLASLt] avoid polluting prefer cuBLAS/Lt setting across tests
Pull Request -
State: closed - Opened by eqy 6 months ago
- 11 comments
Labels: module: cuda, open source, Merged, Reverted, ciflow/trunk, topic: not user facing, matrix multiplication, merging, ci-no-td
#153655 - [CUDA][cuBLAS][cuBLASLt] avoid polluting prefer cuBLAS/Lt setting across tests
Pull Request -
State: open - Opened by eqy 6 months ago
- 6 comments
Labels: module: cuda, open source, Merged, Reverted, ciflow/trunk, topic: not user facing, matrix multiplication, ci-no-td
#153643 - [SDPA][EZ] Abate narrowing conversion warning spam in `flash_api.cpp`
Pull Request -
State: open - Opened by eqy 6 months ago
- 3 comments
Labels: module: cuda, module: build warnings, open source, ciflow/trunk, topic: not user facing, merging, module: sdpa
#153643 - [SDPA][EZ] Abate narrowing conversion warning spam in `flash_api.cpp`
Pull Request -
State: closed - Opened by eqy 6 months ago
- 3 comments
Labels: module: cuda, module: build warnings, open source, Merged, ciflow/trunk, topic: not user facing, merging, module: sdpa
#153571 - torch.cuda.memory._record_memory_history(enabled=None) does not clean up previously added hooks
Issue -
State: closed - Opened by ahmadsharif1 6 months ago
- 8 comments
Labels: module: cuda, oncall: profiler
#153556 - [cuBLAS][cuBLASLt] Use cuBLAS default workspace size in Lt
Pull Request -
State: closed - Opened by eqy 6 months ago
- 17 comments
Labels: module: cuda, module: cublas, open source, Merged, Reverted, ciflow/trunk, topic: not user facing, module: dynamo, ciflow/inductor, matrix multiplication, merging, ci-no-td
#153556 - [cuBLAS][cuBLASLt] Use cuBLAS default workspace size in Lt
Pull Request -
State: closed - Opened by eqy 6 months ago
- 17 comments
Labels: module: cuda, module: cublas, open source, Merged, Reverted, ciflow/trunk, topic: not user facing, module: dynamo, ciflow/inductor, matrix multiplication, ci-no-td
#153541 - [BE]: Update CUTLASS submodule to 4.0.0rc
Pull Request -
State: open - Opened by Skylion007 6 months ago
- 6 comments
Labels: module: cuda, open source, module: linear algebra, ciflow/trunk, topic: not user facing, ciflow/inductor-cu124, ciflow/inductor-cu126, module: sdpa
#153460 - DISABLED test_mempool_ctx_multithread (__main__.TestMemPool)
Issue -
State: open - Opened by pytorch-bot[bot] 6 months ago
Labels: module: cuda, triaged, module: flaky-tests, skipped
#153373 - [ATen][CUDA][CUB] Implement changes to CCCL (CUB/Thrust/LibCUDACXX) usage in ATen
Pull Request -
State: open - Opened by Aidyn-A 6 months ago
- 1 comment
Labels: module: cuda, triaged, open source, ciflow/trunk, release notes: cuda, topic: not user facing, module: core aten
#153373 - [ATen][CUDA][CUB] Implement changes to CCCL (CUB/Thrust/LibCUDACXX) usage in ATen
Pull Request -
State: closed - Opened by Aidyn-A 6 months ago
- 10 comments
Labels: module: cuda, triaged, open source, Merged, ciflow/trunk, release notes: cuda, topic: not user facing, merging, ciflow/rocm, module: core aten
#153272 - [CUDA] Allow cuDNN or flash attn in `test_activation_checkpointing` pattern match check
Pull Request -
State: open - Opened by eqy 6 months ago
- 10 comments
Labels: module: activation checkpointing, module: cuda, triaged, open source, topic: not user facing, module: dynamo, ciflow/inductor, module: sdpa
#153272 - [CUDA] Allow cuDNN or flash attn in `test_activation_checkpointing` pattern match check
Pull Request -
State: open - Opened by eqy 6 months ago
Labels: module: activation checkpointing, module: cuda, open source, topic: not user facing, module: sdpa
#153109 - Inconsistent size passed to custom CUDA alloc/free in torch::unique_consecutive
Issue -
State: closed - Opened by darrin-willis 7 months ago
- 2 comments
Labels: module: cuda, triaged
#153101 - [CUDA][CUDNN] Dispatch to cuDNN for non-batch-splittable 64-bit NCHW convolutions
Pull Request -
State: closed - Opened by eqy 7 months ago
- 20 comments
Labels: module: cuda, module: cpu, module: convolution, open source, Merged, Reverted, ciflow/trunk, topic: not user facing, ciflow/mps, merging, ciflow/rocm, ci-no-td
#153101 - [CUDA][CUDNN] Dispatch to cuDNN for non-batch-splittable 64-bit NCHW convolutions
Pull Request -
State: open - Opened by eqy 7 months ago
- 7 comments
Labels: module: cuda, module: cpu, module: convolution, open source, ciflow/trunk, topic: not user facing
#153083 - [CUDA][cuBLASLt] Fix scale setting for `allowFP16AccumulationCuBLAS` `true` case
Pull Request -
State: open - Opened by eqy 7 months ago
Labels: module: cuda, module: cublas, open source, module: half, topic: not user facing
#153083 - [CUDA][cuBLASLt] Fix scale setting for `allowFP16AccumulationCuBLAS` `true` case
Pull Request -
State: open - Opened by eqy 7 months ago
- 6 comments
Labels: module: cuda, triaged, module: cublas, open source, module: half, ciflow/trunk, release notes: cuda, merging
#152923 - Upgrade to CUDA 12.8.1 for nightly binaries
Pull Request -
State: closed - Opened by tinglvv 7 months ago
- 21 comments
Labels: module: cuda, triaged, open source, Merged, ciflow/binaries, ciflow/trunk, topic: not user facing
#152923 - Upgrade to CUDA 12.8.1 for nightly binaries
Pull Request -
State: closed - Opened by tinglvv 7 months ago
- 21 comments
Labels: module: cuda, triaged, open source, ciflow/binaries, ciflow/trunk, topic: not user facing, merging
#152816 - Depthwise Separable Convolutions with Large Tensors (> 2**31) Elements) Fail Despite cuDNN 64-bit Indexing Support
Issue -
State: closed - Opened by lely475 7 months ago
- 6 comments
Labels: module: cudnn, module: cuda, module: convolution, triaged, module: 64-bit
#152814 - [TEST][ATen][CUDA] Skip row-wise scaled matrix mmultiplication tests on sm_120+
Pull Request -
State: closed - Opened by Aidyn-A 7 months ago
- 13 comments
Labels: module: cuda, triaged, open source, Merged, ciflow/trunk, topic: not user facing, merging
#152745 - [CUDA][cuDNN] Fix handling of `CPU` side input and target length tensors in `CTCLoss`
Pull Request -
State: closed - Opened by eqy 7 months ago
- 3 comments
Labels: module: cudnn, module: cuda, open source, Merged, ciflow/trunk, topic: bug fixes, topic: not user facing, merging
#152745 - [CUDA][cuDNN] Fix handling of `CPU` side input and target length tensors in `CTCLoss`
Pull Request -
State: open - Opened by eqy 7 months ago
- 1 comment
Labels: module: cudnn, module: cuda, open source, ciflow/trunk, topic: bug fixes, topic: not user facing
#152731 - Inconsistent float16 overflow behavior between CPU and CUDA devices
Issue -
State: open - Opened by SilentTester73 7 months ago
- 3 comments
Labels: module: cuda, low priority, triaged, module: half, actionable, module: edge cases
#152695 - set CUDA_MODULE_LOADING for older drivers only
Pull Request -
State: closed - Opened by ptrblck 7 months ago
- 7 comments
Labels: module: cuda, open source, Merged, ciflow/trunk, topic: not user facing, merging
#152642 - [CUTLASS][WIP] Gate rowwise matmul CUTLASS kernels by compute capability
Pull Request -
State: open - Opened by eqy 7 months ago
- 1 comment
Labels: module: cuda, triaged, open source, topic: not user facing, module: float8
#152618 - [CUDA][TF32] Account for TF32 in `test_conv2d_same_padding`
Pull Request -
State: closed - Opened by eqy 7 months ago
- 3 comments
Labels: module: cuda, module: convolution, open source, Merged, module: tf32, ciflow/trunk, topic: not user facing, merging
#152540 - [CUDA] Rest peak memory stats before running `test_set_per_process_memory_fraction`
Pull Request -
State: open - Opened by eqy 7 months ago
- 1 comment
Labels: module: cuda, open source, ciflow/trunk, topic: not user facing
#152540 - [CUDA] Rest peak memory stats before running `test_set_per_process_memory_fraction`
Pull Request -
State: open - Opened by eqy 7 months ago
Labels: module: cuda, open source, topic: not user facing
#152491 - [CUDA][SDPA] Bump python `fused_attention_vs_math_ref_grads` `fudge_factor` for `sm120`
Pull Request -
State: closed - Opened by eqy 7 months ago
- 3 comments
Labels: module: cuda, open source, Merged, ciflow/trunk, topic: not user facing, merging, module: sdpa
#152468 - [CUDA][TF32] Account for TF32 in `compile_kernel_advanced`
Pull Request -
State: closed - Opened by eqy 7 months ago
- 3 comments
Labels: module: cuda, open source, Merged, module: tf32, ciflow/trunk, topic: not user facing, merging
#152468 - [CUDA][TF32] Account for TF32 in `compile_kernel_advanced`
Pull Request -
State: closed - Opened by eqy 7 months ago
- 3 comments
Labels: module: cuda, open source, Merged, module: tf32, ciflow/trunk, topic: not user facing