Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / NVIDIA/nccl-tests issues and pull requests

#223 - mpirun all_reduce_perf hang with multi-device test

Issue - State: open - Opened by 913871734 4 months ago

#221 - how to support One Device per Process?

Issue - State: open - Opened by jiangxiaobin96 4 months ago - 3 comments

#220 - 1 GiB headroom might be too small

Issue - State: open - Opened by Namnamseo 4 months ago

#218 - Rank Assignment Issue under four containers on two different servers.

Issue - State: closed - Opened by thsmfe001 4 months ago - 8 comments

#217 - all_reduce_perf hangs; using a single GPU on a 4GPU machine

Issue - State: closed - Opened by isaacgerg 4 months ago - 18 comments

#216 - NCCL initialization hangs with 4 GPUs, but works with 2 GPUs

Issue - State: open - Opened by mickaelseznec 4 months ago - 4 comments

#215 - NCCL_ALGO on multi-node and multi-GPU

Issue - State: open - Opened by MajidSalimi 4 months ago - 1 comment

#214 - SendRecv Time

Issue - State: open - Opened by osayamenja 5 months ago - 2 comments

#213 - Nccl test seems run seperately on multi nodes

Issue - State: closed - Opened by jianh619 5 months ago - 6 comments

#212 - H100 all reduce performance is poor

Issue - State: open - Opened by liminn 5 months ago - 11 comments

#211 - undefined reference nccl*

Issue - State: closed - Opened by gongyguo 5 months ago - 1 comment

#208 - Test NCCL failure common.cu:954 'unhandled cuda error

Issue - State: closed - Opened by YingYellow 5 months ago - 1 comment

#205 - cputime

Issue - State: open - Opened by tks2004 6 months ago

#203 - Add bisection test

Pull Request - State: open - Opened by x41lakazam 6 months ago - 3 comments

#202 - Why getBw don't have access to agg_iters ?

Issue - State: closed - Opened by x41lakazam 6 months ago - 1 comment

#201 - Performance lack of NCCL Test

Issue - State: open - Opened by shengode503 7 months ago - 5 comments

#200 - Multi node test hang phenomenon

Issue - State: closed - Opened by gim4moon 7 months ago - 2 comments

#198 - How is the maximum number of bytes for all_reduce operation calculated?

Issue - State: closed - Opened by jxh314 7 months ago - 3 comments

#197 - How to explain Bus Bandwidth in Allreduce Operation?

Issue - State: open - Opened by HydraQYH 7 months ago

#195 - undefined reference to ncclRedOpDestroy

Issue - State: open - Opened by freshduer 8 months ago - 2 comments

#191 - nccl-tests result is only a half of ib_write_bw

Issue - State: open - Opened by HeGaoYuan 9 months ago

#189 - clarify that the measurement is unidirectional

Pull Request - State: open - Opened by stas00 10 months ago - 11 comments

#188 - misc/socket.cc:441 NCCL WARN socketFinalizeAccept: wrong type 4 != 3

Issue - State: closed - Opened by MiyazonoKaori 10 months ago - 6 comments

#187 - NCCL alltoall_perf hangs via PXN

Issue - State: closed - Opened by gavin1332 10 months ago - 1 comment

#186 - how can i run nccl-test use max bandwidth

Issue - State: open - Opened by liuxingbo12138 10 months ago

#184 - Nsight Profiling: one ncclAllReduce takes too long

Issue - State: open - Opened by yanminjia 10 months ago

#183 - Test NCCL failure common.cu:954 'unhandled cuda error" when test on >2 GPUs

Issue - State: closed - Opened by caopulan 10 months ago - 4 comments

#181 - AlltoAllGetBw is incorrect when used with multiple nodes

Issue - State: open - Opened by sukoncon 10 months ago - 1 comment

#180 - ./build/all_reduce_perf between nodes failed

Issue - State: open - Opened by sleepwalker2017 10 months ago - 1 comment

#179 - nccl-test is throwing timeout error on two nodes

Issue - State: open - Opened by manomugdha 11 months ago - 26 comments

#178 - A100 - All reduce performance

Issue - State: open - Opened by arul-lm 11 months ago - 1 comment

#177 - bus error

Issue - State: closed - Opened by bltcn 11 months ago - 3 comments

#176 - what does error in nccl-test output represent?

Issue - State: open - Opened by blackgold 11 months ago - 3 comments

#175 - Two A100 nodes cannot reach ideal all-reduce performance

Issue - State: open - Opened by lcw2 11 months ago - 4 comments

#173 - Collecting latency data per coll.

Pull Request - State: closed - Opened by nv-udeodhar 11 months ago

#172 - Why need more than one iteration to check data?

Issue - State: closed - Opened by FarmerLiuAng 11 months ago - 4 comments

#170 - unhandled cuda error during test

Issue - State: closed - Opened by mlinmg 12 months ago - 1 comment

#168 - Test in dockers of multi-node

Issue - State: open - Opened by jiangxiaobin96 12 months ago

#165 - Test CUDA failure common.cu:892 'invalid device ordinal'

Issue - State: closed - Opened by marabgol about 1 year ago - 10 comments

#164 - Calculating "net_bw" in addition to "bus_bw"

Issue - State: open - Opened by yehuday about 1 year ago

#159 - The difference between algbw and busbw

Issue - State: open - Opened by shaoyezuizuishuai about 1 year ago

#158 - Testing git\n

Pull Request - State: closed - Opened by BhaviniMishra about 1 year ago

#157 - New AlltoAllV (Imbalanced AlltoAll) benchmark.

Pull Request - State: open - Opened by babusid about 1 year ago - 1 comment

#156 - Two A800 nodes cannot reach ideal all-reduce performance

Issue - State: open - Opened by joydchh about 1 year ago - 18 comments

#155 - Debugging with cuda-gdb causes problems

Issue - State: open - Opened by minihu-crypto about 1 year ago

#154 - Bandwidth result not equal to ib_write_bw result

Issue - State: closed - Opened by Jiaao-Bai about 1 year ago - 3 comments

#153 - `busbw` does not reflect the speed of hardware bottleneck in H800

Issue - State: open - Opened by zhangmenghao about 1 year ago - 7 comments

#152 - Origin of Poor Internode NCCL Performance

Issue - State: closed - Opened by vitduck about 1 year ago - 11 comments

#151 - Don't call MPI_Comm_split if NCCL_TESTS_SPLIT_MASK is not set

Pull Request - State: open - Opened by tstruk over 1 year ago - 2 comments

#150 - all_reduce_perf fails on 2 nodes

Issue - State: closed - Opened by scvance over 1 year ago - 2 comments

#149 - Expected bandwidth results? 8x A100 GPUs over NVLink

Issue - State: open - Opened by acgandhi over 1 year ago - 10 comments

#148 - test error: stuck when run test example

Issue - State: open - Opened by zhengmq2010 over 1 year ago - 4 comments

#147 - src/Makefile: remove unused variables

Pull Request - State: open - Opened by yangxingwu over 1 year ago

#146 - makefile: remove extra space

Pull Request - State: closed - Opened by yangxingwu over 1 year ago

#144 - question regarding versioning

Issue - State: closed - Opened by monajalal over 1 year ago

#143 - Bus error when using 16 GPUs in one node

Issue - State: closed - Opened by richardsliu over 1 year ago - 7 comments

#142 - why add ALIGN in allgather/reducescatter/hypercube

Issue - State: closed - Opened by ziyueSeo over 1 year ago

#141 - Multi-node test within a docker container

Issue - State: open - Opened by deepakn94 over 1 year ago - 1 comment

#140 - the tag v2.12.10 is missing

Issue - State: open - Opened by terryhy520 over 1 year ago - 1 comment

#139 - All2All Benchmarks on Perlmutter

Issue - State: open - Opened by caoshiyi over 1 year ago - 8 comments

#138 - Csv format

Pull Request - State: open - Opened by lipovsek-aws over 1 year ago - 3 comments

#136 - No commuication between two nodes

Issue - State: closed - Opened by GongZhengLi over 1 year ago - 2 comments

#135 - fix handling of variable NVCC.

Pull Request - State: closed - Opened by aavbsouza over 1 year ago

#134 - Update README.md

Pull Request - State: closed - Opened by flx42 over 1 year ago

#133 - Running nccl-test on two nodes failed

Issue - State: open - Opened by zhangciba over 1 year ago - 1 comment

#132 - unable to complete a TCP connection to another process

Issue - State: open - Opened by odellus over 1 year ago - 5 comments

#131 - ./all_reduce_perf: "Out of bounds values: 50 FAILED" [2 GPUs, PHB]

Issue - State: open - Opened by Meriipu over 1 year ago - 12 comments

#130 - algorithm bandwidth of all2all

Issue - State: closed - Opened by de1star over 1 year ago - 2 comments

#129 - "alias must point to a defined variable or function"

Issue - State: open - Opened by rainwoodman over 1 year ago - 1 comment

#128 - nccl-tests on different GPUs

Issue - State: closed - Opened by de1star over 1 year ago - 11 comments

#127 - Test bench

Pull Request - State: open - Opened by novaCoder-zrk over 1 year ago

#126 - Align print format string for column names and units

Pull Request - State: open - Opened by dmitrygx over 1 year ago - 5 comments

#125 - Failure when more than 2 GPUs in each node

Issue - State: closed - Opened by dogacancolak almost 2 years ago - 5 comments