An open API service for providing issue and pull request metadata for open source projects.

GitHub / pytorch/pytorch issues and pull requests

Labelled with: module: cuda

#157381 - Torch is unusable when cuda-12.4 is installed locally

Issue - State: closed - Opened by sekyondaMeta 5 months ago - 6 comments
Labels: module: binaries, module: cuda, triaged, module: regression

#157366 - DISABLED test_graph_partition_cpu_tensor_symints (__main__.CudaGraphTreeTests)

Issue - State: closed - Opened by pytorch-bot[bot] 5 months ago - 1 comment
Labels: module: cuda, module: rocm, triaged, module: flaky-tests, skipped

#157359 - DISABLED test_graph_partition_cpu_scalar_mutation (__main__.CudaGraphTreeTests)

Issue - State: closed - Opened by pytorch-bot[bot] 5 months ago - 11 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped

#157350 - DISABLED test_graph_partition_cpu_scalar4 (__main__.CudaGraphTreeTests)

Issue - State: open - Opened by pytorch-bot[bot] 5 months ago - 15 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped

#157339 - DISABLED test_graph_partition_cpu_scalar3 (__main__.CudaGraphTreeTests)

Issue - State: open - Opened by pytorch-bot[bot] 5 months ago - 12 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped

#157312 - DISABLED test_graph_partition_cpu_scalar2 (__main__.CudaGraphTreeTests)

Issue - State: open - Opened by pytorch-bot[bot] 5 months ago - 22 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped

#157280 - DISABLED test_graph_partition_cpu_scalar1 (__main__.CudaGraphTreeTests)

Issue - State: closed - Opened by pytorch-bot[bot] 5 months ago - 18 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped

#157276 - several `transformers` tests fail with `torch 2.8 RC` but pass with `torch 2.7.1` on `T4` (but both pass on `A10`)

Issue - State: open - Opened by ydshieh 5 months ago - 3 comments
Labels: high priority, triage review, module: cuda, oncall: pt2, module: inductor, module: higher order operators, module: pt2-dispatcher, module: flex attention

#157258 - DISABLED test_graph_partition_cpu_op_and_dynamic_shapes (__main__.CudaGraphTreeTests)

Issue - State: closed - Opened by pytorch-bot[bot] 5 months ago - 23 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped

#157256 - DISABLED test_mempool_limited_memory_with_allocator (__main__.TestMemPool)

Issue - State: open - Opened by pytorch-bot[bot] 5 months ago - 15 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped

#157172 - DISABLED test_compile_kernel_advanced (__main__.TestCompileKernel)

Issue - State: open - Opened by pytorch-bot[bot] 5 months ago - 10 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped

#157145 - [SDPA] Fix `alloc_with_matching_layout` stride sorting

Pull Request - State: closed - Opened by eqy 5 months ago - 3 comments
Labels: module: cuda, open source, Merged, ciflow/trunk, topic: not user facing, merging, module: sdpa

#157143 - DISABLED test_function_compiled_multiple_times (__main__.CudaGraphTreeTests)

Issue - State: closed - Opened by pytorch-bot[bot] 5 months ago - 26 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped

#157112 - DISABLED test_frozen_fn (__main__.CudaGraphTreeTests)

Issue - State: closed - Opened by pytorch-bot[bot] 5 months ago - 23 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped

#157110 - DISABLED test_invalid_status_for_legacy_api (__main__.TestCuda)

Issue - State: open - Opened by pytorch-bot[bot] 5 months ago - 12 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped

#157109 - DISABLED test_invalid_status_for_legacy_api (__main__.TestCuda)

Issue - State: closed - Opened by pytorch-bot[bot] 5 months ago - 1 comment
Labels: module: cuda, triaged, module: flaky-tests, skipped

#157086 - DISABLED test_forward_with_skipped_cudagraphed_backward (__main__.CudaGraphTreeTests)

Issue - State: open - Opened by pytorch-bot[bot] 5 months ago - 25 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped

#157057 - DISABLED test_forward_generation (__main__.CudaGraphTreeTests)

Issue - State: open - Opened by pytorch-bot[bot] 5 months ago
Labels: module: cuda, module: rocm, triaged, module: flaky-tests, skipped

#156984 - DISABLED test_forward_backward_not_called_backend_cudagraphs (__main__.CudaGraphTreeTests)

Issue - State: open - Opened by pytorch-bot[bot] 5 months ago - 24 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped

#156957 - DISABLED test_forward_backward (__main__.CudaGraphTreeTests)

Issue - State: open - Opened by pytorch-bot[bot] 5 months ago - 17 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped

#156886 - DISABLED test_expanded_inputs (__main__.CudaGraphTreeTests)

Issue - State: closed - Opened by pytorch-bot[bot] 5 months ago - 8 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped

#156838 - DISABLED test_execution_into_recording (__main__.CudaGraphTreeTests)

Issue - State: closed - Opened by pytorch-bot[bot] 5 months ago - 26 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped

#156815 - `load_inline` forcibly adds legacy `-gencode` flags and cannot be overridden, preventing use of newer compute capabilities

Issue - State: closed - Opened by yuxuan-z19 5 months ago - 3 comments
Labels: module: cpp-extensions, module: cuda, triaged

#156801 - DISABLED test_error_on_dealloc_use (__main__.CudaGraphTreeTests)

Issue - State: closed - Opened by pytorch-bot[bot] 5 months ago - 26 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped

#156777 - DISABLED test_end_recording_early (__main__.CudaGraphTreeTests)

Issue - State: closed - Opened by pytorch-bot[bot] 5 months ago - 1 comment
Labels: module: cuda, module: rocm, triaged, module: flaky-tests, skipped

#156754 - DISABLED test_empty_storage (__main__.CudaGraphTreeTests)

Issue - State: open - Opened by pytorch-bot[bot] 5 months ago
Labels: module: cuda, module: rocm, triaged, module: flaky-tests, skipped

#156747 - Restore CUDA 12.4 manylinux build and test in CI

Issue - State: closed - Opened by atalman 5 months ago - 1 comment
Labels: module: build, module: cuda, module: ci, triaged

#156735 - DISABLED test_empty_cpu_tensor (__main__.CudaGraphTreeTests)

Issue - State: closed - Opened by pytorch-bot[bot] 5 months ago - 31 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped

#156693 - DISABLED test_dynamic_warmup (__main__.CudaGraphTreeTests)

Issue - State: closed - Opened by pytorch-bot[bot] 5 months ago - 31 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped

#156548 - [CUDA] Skip test on low vram machines

Pull Request - State: closed - Opened by Isalia20 5 months ago - 8 comments
Labels: module: cuda, open source, Merged, ciflow/trunk, topic: not user facing, merging

#156548 - [CUDA] Skip test on low vram machines

Pull Request - State: closed - Opened by Isalia20 5 months ago - 8 comments
Labels: module: cuda, open source, Merged, ciflow/trunk, topic: not user facing

#156203 - [CUTLASS] [CUDA] SM100 GroupMM

Pull Request - State: closed - Opened by AaronWang04 5 months ago - 16 comments
Labels: module: cuda, triaged, open source, Merged, ciflow/trunk, topic: not user facing, merging

#156203 - [CUTLASS] [CUDA] SM100 GroupMM

Pull Request - State: closed - Opened by AaronWang04 5 months ago - 16 comments
Labels: module: cuda, triaged, open source, Merged, ciflow/trunk, topic: not user facing

#156202 - Upgrade torch._grouped_mm to SM100+

Issue - State: closed - Opened by syed-ahmed 5 months ago
Labels: module: cuda, triaged

#156160 - [SDPA] RTX5080 is different from CPU calculation result in backward with long seq

Issue - State: closed - Opened by O5-7 5 months ago - 2 comments
Labels: module: numerical-stability, module: cuda, triaged, module: sdpa

#156140 - [cuDNN][64-bit indexing] update conv depthwise 64bit indexing dispatch condition to match native kernel

Pull Request - State: open - Opened by eqy 5 months ago - 13 comments
Labels: module: cudnn, module: cuda, module: cpu, module: convolution, triaged, open source, Merged, Reverted, ciflow/trunk, release notes: cuda, topic: bug fixes, ci-no-td

#156074 - Improve IPC for Expandable Segments to use fabric handle when possible

Pull Request - State: closed - Opened by youkaichao 5 months ago - 11 comments
Labels: module: cuda, triaged, open source, Merged, ciflow/trunk, topic: not user facing, merging

#156015 - Function 'MmBackward0' returned nan values in its 0th output.

Issue - State: closed - Opened by O5-7 5 months ago - 3 comments
Labels: module: cuda

#155900 - [C10][CUDA] Eagerly create context on torch.cuda.set_device(device) call

Pull Request - State: open - Opened by Aidyn-A 5 months ago - 1 comment
Labels: module: cuda, open source, release notes: cuda, topic: not user facing

#155889 - Context on torch.cuda.memory._record_memory_history max_entries

Pull Request - State: open - Opened by b-koopman 5 months ago - 7 comments
Labels: module: docs, module: cuda, triaged, open source, medium, topic: not user facing, docathon-h1-2025

#155889 - Context on torch.cuda.memory._record_memory_history max_entries

Pull Request - State: closed - Opened by b-koopman 5 months ago - 9 comments
Labels: module: docs, module: cuda, triaged, open source, medium, Merged, ciflow/trunk, topic: not user facing, merging, docathon-h1-2025

#155888 - [CUDA][CUTLASS] test_cutlass_backend.py unit test failures on SM90+

Issue - State: closed - Opened by nWEIdia 5 months ago - 2 comments
Labels: module: cuda, module: tests, triaged

#155857 - Optionally avoid `record_streams` in autograd with `TORCH_AUTOGRAD_AVOID_RECORD_STREAMS=1`

Pull Request - State: open - Opened by eqy 5 months ago - 1 comment
Labels: module: cuda, module: memory usage, open source, module: cuda graphs, topic: not user facing

#155857 - Optionally avoid `record_streams` in autograd with `TORCH_AUTOGRAD_AVOID_RECORD_STREAMS=1`

Pull Request - State: closed - Opened by eqy 5 months ago - 2 comments
Labels: module: cuda, module: memory usage, triaged, open source, module: cuda graphs, Stale, topic: not user facing

#155668 - torch.cuda.set_device(0) behaves differently from torch.cuda.set_device(1) in terms of cuda context

Issue - State: closed - Opened by youkaichao 5 months ago - 2 comments
Labels: module: cuda, triaged

#155397 - [CUDA] fix illegal memory access in attention

Pull Request - State: closed - Opened by Isalia20 5 months ago - 23 comments
Labels: module: cuda, triaged, open source, Merged, ciflow/trunk, release notes: cuda, merging, module: sdpa

#155397 - [CUDA] fix illegal memory access in attention

Pull Request - State: closed - Opened by Isalia20 5 months ago - 25 comments
Labels: module: cuda, triaged, open source, Merged, ciflow/trunk, release notes: cuda, module: sdpa

#155350 - CUDA_HOME doesn't seem to work with setup script

Issue - State: open - Opened by ezyang 6 months ago - 4 comments
Labels: module: build, module: cuda, triaged

#155341 - Document the default garbage_collection_threshold value and improve the organization of cuda docs

Pull Request - State: closed - Opened by ParagEkbote 6 months ago - 6 comments
Labels: module: docs, module: cuda, triaged, open source, Merged, topic: docs, topic: not user facing, easy, docathon-h1-2025

#155341 - Document the default garbage_collection_threshold value and improve the organization of cuda docs

Pull Request - State: closed - Opened by ParagEkbote 6 months ago - 6 comments
Labels: module: docs, module: cuda, triaged, open source, Merged, topic: docs, topic: not user facing, merging, easy, docathon-h1-2025

#155288 - Fused RMSNorm Implementation

Pull Request - State: open - Opened by AaronWang04 6 months ago - 2 comments
Labels: module: cuda, open source, module: norms and normalization, topic: not user facing

#155225 - canUse32BitIndexMath set to False with efficient net

Issue - State: closed - Opened by jjh42 6 months ago - 13 comments
Labels: module: nn, module: cuda, triaged

#155145 - [CUDA] fix illegal memory access in attention

Pull Request - State: closed - Opened by Isalia20 6 months ago - 10 comments
Labels: module: cuda, triaged, open source, release notes: cuda, module: sdpa

#154778 - [CUDA] Fix missing bounds check in `Softmax.cu`

Pull Request - State: open - Opened by eqy 6 months ago - 5 comments
Labels: module: cuda, open source, topic: not user facing

#154566 - DISABLED test_mempool_with_allocator (__main__.TestMemPool)

Issue - State: open - Opened by pytorch-bot[bot] 6 months ago - 3 comments
Labels: module: cuda, triaged, module: flaky-tests, skipped

#154293 - [cuBLAS][cuBLASLt] Reduce scale of inputs for reduced precision reduction matmul test

Pull Request - State: open - Opened by eqy 6 months ago - 13 comments
Labels: module: cuda, triaged, module: cublas, open source, module: bfloat16, module: half, ciflow/trunk, topic: not user facing, ciflow/periodic, keep-going

#154293 - [cuBLAS][cuBLASLt] Reduce scale of inputs for reduced precision reduction matmul test

Pull Request - State: open - Opened by eqy 6 months ago
Labels: module: cuda, module: cublas, open source, module: bfloat16, module: half, ciflow/trunk, topic: not user facing

#154170 - [cuBLASLt][cuBLAS] Support 2D bias and `beta != 1.0` in cuBLASLt

Pull Request - State: open - Opened by eqy 6 months ago - 44 comments
Labels: module: cuda, triaged, open source, Merged, Reverted, ciflow/trunk, topic: not user facing, matrix multiplication, ciflow/rocm, ci-no-td, ciflow/inductor-rocm, ciflow/rocm-mi300

#154170 - [cuBLASLt][cuBLAS] Support 2D bias and `beta != 1.0` in cuBLASLt

Pull Request - State: closed - Opened by eqy 6 months ago - 48 comments
Labels: module: cuda, triaged, open source, Merged, Reverted, Stale, ciflow/trunk, topic: not user facing, matrix multiplication, ciflow/rocm, ci-no-td, ciflow/inductor-rocm, ciflow/rocm-mi300

#154029 - SDPA fix memory efficient attention for large batch dim

Pull Request - State: closed - Opened by Isalia20 6 months ago - 11 comments
Labels: module: cuda, triaged, open source, Merged, ciflow/trunk, release notes: cuda, merging, module: sdpa

#154029 - SDPA fix memory efficient attention for large batch dim

Pull Request - State: open - Opened by Isalia20 6 months ago - 8 comments
Labels: module: cuda, triaged, open source, ciflow/trunk, release notes: cuda, module: sdpa

#153794 - Fix add enabled in _initRecordAnnotations()

Pull Request - State: closed - Opened by ghostspiders 6 months ago - 1 comment
Labels: module: cuda, open source, topic: not user facing

#153794 - Fix add enabled in _initRecordAnnotations()

Pull Request - State: closed - Opened by ghostspiders 6 months ago - 1 comment
Labels: module: cuda, open source, topic: not user facing

#153782 - [CUDA] Fused optimizers - write lerp with fma instead of regular lerp

Pull Request - State: open - Opened by Isalia20 6 months ago - 3 comments
Labels: module: optimizer, module: cuda, open source, release notes: cuda

#153782 - [CUDA] Fused optimizers - write lerp with fma instead of regular lerp

Pull Request - State: closed - Opened by Isalia20 6 months ago - 11 comments
Labels: module: optimizer, module: cuda, open source, release notes: cuda

#153675 - [cuBLASLt] relax `addmm` cuBLASLt constraint

Pull Request - State: open - Opened by eqy 6 months ago - 12 comments
Labels: module: cuda, open source, Merged, Reverted, ciflow/trunk, topic: not user facing, matrix multiplication, ci-no-td

#153666 - Fused RMSNorm implementation

Pull Request - State: open - Opened by AaronWang04 6 months ago - 93 comments
Labels: module: bc-breaking, module: cuda, module: cpu, triaged, open source, Merged, Reverted, ciflow/trunk, release notes: cuda, topic: bc breaking, module: inductor, rocm, ci-no-td, release notes: inductor (aoti)

#153655 - [CUDA][cuBLAS][cuBLASLt] avoid polluting prefer cuBLAS/Lt setting across tests

Pull Request - State: closed - Opened by eqy 6 months ago - 11 comments
Labels: module: cuda, open source, Merged, Reverted, ciflow/trunk, topic: not user facing, matrix multiplication, merging, ci-no-td

#153655 - [CUDA][cuBLAS][cuBLASLt] avoid polluting prefer cuBLAS/Lt setting across tests

Pull Request - State: open - Opened by eqy 6 months ago - 6 comments
Labels: module: cuda, open source, Merged, Reverted, ciflow/trunk, topic: not user facing, matrix multiplication, ci-no-td

#153643 - [SDPA][EZ] Abate narrowing conversion warning spam in `flash_api.cpp`

Pull Request - State: open - Opened by eqy 6 months ago - 3 comments
Labels: module: cuda, module: build warnings, open source, ciflow/trunk, topic: not user facing, merging, module: sdpa

#153643 - [SDPA][EZ] Abate narrowing conversion warning spam in `flash_api.cpp`

Pull Request - State: closed - Opened by eqy 6 months ago - 3 comments
Labels: module: cuda, module: build warnings, open source, Merged, ciflow/trunk, topic: not user facing, merging, module: sdpa

#153571 - torch.cuda.memory._record_memory_history(enabled=None) does not clean up previously added hooks

Issue - State: closed - Opened by ahmadsharif1 6 months ago - 8 comments
Labels: module: cuda, oncall: profiler

#153556 - [cuBLAS][cuBLASLt] Use cuBLAS default workspace size in Lt

Pull Request - State: closed - Opened by eqy 6 months ago - 17 comments
Labels: module: cuda, module: cublas, open source, Merged, Reverted, ciflow/trunk, topic: not user facing, module: dynamo, ciflow/inductor, matrix multiplication, merging, ci-no-td

#153556 - [cuBLAS][cuBLASLt] Use cuBLAS default workspace size in Lt

Pull Request - State: closed - Opened by eqy 6 months ago - 17 comments
Labels: module: cuda, module: cublas, open source, Merged, Reverted, ciflow/trunk, topic: not user facing, module: dynamo, ciflow/inductor, matrix multiplication, ci-no-td

#153541 - [BE]: Update CUTLASS submodule to 4.0.0rc

Pull Request - State: open - Opened by Skylion007 6 months ago - 6 comments
Labels: module: cuda, open source, module: linear algebra, ciflow/trunk, topic: not user facing, ciflow/inductor-cu124, ciflow/inductor-cu126, module: sdpa

#153460 - DISABLED test_mempool_ctx_multithread (__main__.TestMemPool)

Issue - State: open - Opened by pytorch-bot[bot] 6 months ago
Labels: module: cuda, triaged, module: flaky-tests, skipped

#153373 - [ATen][CUDA][CUB] Implement changes to CCCL (CUB/Thrust/LibCUDACXX) usage in ATen

Pull Request - State: open - Opened by Aidyn-A 6 months ago - 1 comment
Labels: module: cuda, triaged, open source, ciflow/trunk, release notes: cuda, topic: not user facing, module: core aten

#153373 - [ATen][CUDA][CUB] Implement changes to CCCL (CUB/Thrust/LibCUDACXX) usage in ATen

Pull Request - State: closed - Opened by Aidyn-A 6 months ago - 10 comments
Labels: module: cuda, triaged, open source, Merged, ciflow/trunk, release notes: cuda, topic: not user facing, merging, ciflow/rocm, module: core aten

#153272 - [CUDA] Allow cuDNN or flash attn in `test_activation_checkpointing` pattern match check

Pull Request - State: open - Opened by eqy 6 months ago - 10 comments
Labels: module: activation checkpointing, module: cuda, triaged, open source, topic: not user facing, module: dynamo, ciflow/inductor, module: sdpa

#153272 - [CUDA] Allow cuDNN or flash attn in `test_activation_checkpointing` pattern match check

Pull Request - State: open - Opened by eqy 6 months ago
Labels: module: activation checkpointing, module: cuda, open source, topic: not user facing, module: sdpa

#153109 - Inconsistent size passed to custom CUDA alloc/free in torch::unique_consecutive

Issue - State: closed - Opened by darrin-willis 7 months ago - 2 comments
Labels: module: cuda, triaged

#153101 - [CUDA][CUDNN] Dispatch to cuDNN for non-batch-splittable 64-bit NCHW convolutions

Pull Request - State: closed - Opened by eqy 7 months ago - 20 comments
Labels: module: cuda, module: cpu, module: convolution, open source, Merged, Reverted, ciflow/trunk, topic: not user facing, ciflow/mps, merging, ciflow/rocm, ci-no-td

#153101 - [CUDA][CUDNN] Dispatch to cuDNN for non-batch-splittable 64-bit NCHW convolutions

Pull Request - State: open - Opened by eqy 7 months ago - 7 comments
Labels: module: cuda, module: cpu, module: convolution, open source, ciflow/trunk, topic: not user facing

#153083 - [CUDA][cuBLASLt] Fix scale setting for `allowFP16AccumulationCuBLAS` `true` case

Pull Request - State: open - Opened by eqy 7 months ago
Labels: module: cuda, module: cublas, open source, module: half, topic: not user facing

#153083 - [CUDA][cuBLASLt] Fix scale setting for `allowFP16AccumulationCuBLAS` `true` case

Pull Request - State: open - Opened by eqy 7 months ago - 6 comments
Labels: module: cuda, triaged, module: cublas, open source, module: half, ciflow/trunk, release notes: cuda, merging

#152923 - Upgrade to CUDA 12.8.1 for nightly binaries

Pull Request - State: closed - Opened by tinglvv 7 months ago - 21 comments
Labels: module: cuda, triaged, open source, Merged, ciflow/binaries, ciflow/trunk, topic: not user facing

#152923 - Upgrade to CUDA 12.8.1 for nightly binaries

Pull Request - State: closed - Opened by tinglvv 7 months ago - 21 comments
Labels: module: cuda, triaged, open source, ciflow/binaries, ciflow/trunk, topic: not user facing, merging

#152816 - Depthwise Separable Convolutions with Large Tensors (> 2**31) Elements) Fail Despite cuDNN 64-bit Indexing Support

Issue - State: closed - Opened by lely475 7 months ago - 6 comments
Labels: module: cudnn, module: cuda, module: convolution, triaged, module: 64-bit

#152814 - [TEST][ATen][CUDA] Skip row-wise scaled matrix mmultiplication tests on sm_120+

Pull Request - State: closed - Opened by Aidyn-A 7 months ago - 13 comments
Labels: module: cuda, triaged, open source, Merged, ciflow/trunk, topic: not user facing, merging

#152745 - [CUDA][cuDNN] Fix handling of `CPU` side input and target length tensors in `CTCLoss`

Pull Request - State: closed - Opened by eqy 7 months ago - 3 comments
Labels: module: cudnn, module: cuda, open source, Merged, ciflow/trunk, topic: bug fixes, topic: not user facing, merging

#152745 - [CUDA][cuDNN] Fix handling of `CPU` side input and target length tensors in `CTCLoss`

Pull Request - State: open - Opened by eqy 7 months ago - 1 comment
Labels: module: cudnn, module: cuda, open source, ciflow/trunk, topic: bug fixes, topic: not user facing

#152731 - Inconsistent float16 overflow behavior between CPU and CUDA devices

Issue - State: open - Opened by SilentTester73 7 months ago - 3 comments
Labels: module: cuda, low priority, triaged, module: half, actionable, module: edge cases

#152695 - set CUDA_MODULE_LOADING for older drivers only

Pull Request - State: closed - Opened by ptrblck 7 months ago - 7 comments
Labels: module: cuda, open source, Merged, ciflow/trunk, topic: not user facing, merging

#152642 - [CUTLASS][WIP] Gate rowwise matmul CUTLASS kernels by compute capability

Pull Request - State: open - Opened by eqy 7 months ago - 1 comment
Labels: module: cuda, triaged, open source, topic: not user facing, module: float8

#152618 - [CUDA][TF32] Account for TF32 in `test_conv2d_same_padding`

Pull Request - State: closed - Opened by eqy 7 months ago - 3 comments
Labels: module: cuda, module: convolution, open source, Merged, module: tf32, ciflow/trunk, topic: not user facing, merging

#152540 - [CUDA] Rest peak memory stats before running `test_set_per_process_memory_fraction`

Pull Request - State: open - Opened by eqy 7 months ago - 1 comment
Labels: module: cuda, open source, ciflow/trunk, topic: not user facing

#152540 - [CUDA] Rest peak memory stats before running `test_set_per_process_memory_fraction`

Pull Request - State: open - Opened by eqy 7 months ago
Labels: module: cuda, open source, topic: not user facing

#152491 - [CUDA][SDPA] Bump python `fused_attention_vs_math_ref_grads` `fudge_factor` for `sm120`

Pull Request - State: closed - Opened by eqy 7 months ago - 3 comments
Labels: module: cuda, open source, Merged, ciflow/trunk, topic: not user facing, merging, module: sdpa

#152468 - [CUDA][TF32] Account for TF32 in `compile_kernel_advanced`

Pull Request - State: closed - Opened by eqy 7 months ago - 3 comments
Labels: module: cuda, open source, Merged, module: tf32, ciflow/trunk, topic: not user facing, merging

#152468 - [CUDA][TF32] Account for TF32 in `compile_kernel_advanced`

Pull Request - State: closed - Opened by eqy 7 months ago - 3 comments
Labels: module: cuda, open source, Merged, module: tf32, ciflow/trunk, topic: not user facing