Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / NVIDIA/nccl-tests issues and pull requests
#224 - alltoall_perf: each rank is only sending to half of the other ranks
Issue -
State: open - Opened by russilwvong 4 months ago
#223 - mpirun all_reduce_perf hang with multi-device test
Issue -
State: open - Opened by 913871734 4 months ago
#222 - NCCL WARN Cannot use cuda/gdr transports as part of specified UCX_TLS
Issue -
State: open - Opened by liuxingbo12138 4 months ago
- 4 comments
#221 - how to support One Device per Process?
Issue -
State: open - Opened by jiangxiaobin96 4 months ago
- 3 comments
#220 - 1 GiB headroom might be too small
Issue -
State: open - Opened by Namnamseo 4 months ago
#219 - Test NCCL failure common.cu:959 'internal error - please report this issue to the NCCL developers / '
Issue -
State: open - Opened by Assassin187 4 months ago
- 9 comments
#218 - Rank Assignment Issue under four containers on two different servers.
Issue -
State: closed - Opened by thsmfe001 4 months ago
- 8 comments
#217 - all_reduce_perf hangs; using a single GPU on a 4GPU machine
Issue -
State: closed - Opened by isaacgerg 4 months ago
- 18 comments
#216 - NCCL initialization hangs with 4 GPUs, but works with 2 GPUs
Issue -
State: open - Opened by mickaelseznec 4 months ago
- 4 comments
#215 - NCCL_ALGO on multi-node and multi-GPU
Issue -
State: open - Opened by MajidSalimi 4 months ago
- 1 comment
#214 - SendRecv Time
Issue -
State: open - Opened by osayamenja 5 months ago
- 2 comments
#213 - Nccl test seems run seperately on multi nodes
Issue -
State: closed - Opened by jianh619 5 months ago
- 6 comments
#212 - H100 all reduce performance is poor
Issue -
State: open - Opened by liminn 5 months ago
- 11 comments
#211 - undefined reference nccl*
Issue -
State: closed - Opened by gongyguo 5 months ago
- 1 comment
#210 - Differences problems in performance data of HGX A800 single server N GPUs nccl testing
Issue -
State: open - Opened by cloveryyg 5 months ago
#209 - The network bandwidth in the alltoall_perf test failed to meet expectations
Issue -
State: open - Opened by fj1425fj 5 months ago
- 3 comments
#208 - Test NCCL failure common.cu:954 'unhandled cuda error
Issue -
State: closed - Opened by YingYellow 5 months ago
- 1 comment
#207 - make failed, error -- unsupported GNU version! gcc versions later than 11 are not supported!
Issue -
State: closed - Opened by jxh314 6 months ago
#206 - misc/ibvwrap.cc:278 NCCL WARN Call to ibv_reg_mr_iova2 failed with error Cannot allocate memory
Issue -
State: closed - Opened by jxh314 6 months ago
- 2 comments
#205 - cputime
Issue -
State: open - Opened by tks2004 6 months ago
#204 - Test NCCL failure common.cu:961 'internal error - please report this issue to the NCCL developers / '
Issue -
State: open - Opened by a-c-dream 6 months ago
- 7 comments
#203 - Add bisection test
Pull Request -
State: open - Opened by x41lakazam 6 months ago
- 3 comments
#202 - Why getBw don't have access to agg_iters ?
Issue -
State: closed - Opened by x41lakazam 6 months ago
- 1 comment
#201 - Performance lack of NCCL Test
Issue -
State: open - Opened by shengode503 7 months ago
- 5 comments
#200 - Multi node test hang phenomenon
Issue -
State: closed - Opened by gim4moon 7 months ago
- 2 comments
#199 - Interaction between NCCL_IB_SL and NCCL_IB_ADAPTIVE_ROUTING
Issue -
State: open - Opened by DanieleDeSensi 7 months ago
#198 - How is the maximum number of bytes for all_reduce operation calculated?
Issue -
State: closed - Opened by jxh314 7 months ago
- 3 comments
#197 - How to explain Bus Bandwidth in Allreduce Operation?
Issue -
State: open - Opened by HydraQYH 7 months ago
#196 - busbw exceeds network bandwidth (2 nodes, 16 gpus, 100Gbps intel NIC, no NVSwitch) - what algorithm is used?
Issue -
State: closed - Opened by ofilip 8 months ago
- 5 comments
#195 - undefined reference to ncclRedOpDestroy
Issue -
State: open - Opened by freshduer 8 months ago
- 2 comments
#194 - all_reduce_perf between NVLINK connected H100 PCIe GPUs lower than A100 SXM4 GPUs
Issue -
State: open - Opened by chinthysl 8 months ago
#193 - NCCL Test hang when the number of nodes goes beyond 18, and CPU usage is very high
Issue -
State: open - Opened by chgdragon2023 9 months ago
- 2 comments
#192 - NCCL Test Does not work with GID 3 or GID 1, but it works fine for GID 0
Issue -
State: open - Opened by chgdragon2023 9 months ago
#191 - nccl-tests result is only a half of ib_write_bw
Issue -
State: open - Opened by HeGaoYuan 9 months ago
#190 - hypercube out-of-bound errors with single-proc + `gpus-per-thread=4`, not with multi-proc + `gpus-per-thread=1`
Issue -
State: open - Opened by robogast 9 months ago
- 1 comment
#189 - clarify that the measurement is unidirectional
Pull Request -
State: open - Opened by stas00 10 months ago
- 11 comments
#188 - misc/socket.cc:441 NCCL WARN socketFinalizeAccept: wrong type 4 != 3
Issue -
State: closed - Opened by MiyazonoKaori 10 months ago
- 6 comments
#187 - NCCL alltoall_perf hangs via PXN
Issue -
State: closed - Opened by gavin1332 10 months ago
- 1 comment
#186 - how can i run nccl-test use max bandwidth
Issue -
State: open - Opened by liuxingbo12138 10 months ago
#185 - misc/ibvwrap.cc:187 NCCL WARN Call to ibv_modify_qp failed with error Network is unreachable
Issue -
State: open - Opened by chgdragon2023 10 months ago
- 3 comments
#184 - Nsight Profiling: one ncclAllReduce takes too long
Issue -
State: open - Opened by yanminjia 10 months ago
#183 - Test NCCL failure common.cu:954 'unhandled cuda error" when test on >2 GPUs
Issue -
State: closed - Opened by caopulan 10 months ago
- 4 comments
#182 - Although it is an InfiniBand environment, it seems that the average Bandwidth is not as good as expected.
Issue -
State: open - Opened by gim4moon 10 months ago
- 4 comments
#181 - AlltoAllGetBw is incorrect when used with multiple nodes
Issue -
State: open - Opened by sukoncon 10 months ago
- 1 comment
#180 - ./build/all_reduce_perf between nodes failed
Issue -
State: open - Opened by sleepwalker2017 10 months ago
- 1 comment
#179 - nccl-test is throwing timeout error on two nodes
Issue -
State: open - Opened by manomugdha 11 months ago
- 26 comments
#178 - A100 - All reduce performance
Issue -
State: open - Opened by arul-lm 11 months ago
- 1 comment
#177 - bus error
Issue -
State: closed - Opened by bltcn 11 months ago
- 3 comments
#176 - what does error in nccl-test output represent?
Issue -
State: open - Opened by blackgold 11 months ago
- 3 comments
#175 - Two A100 nodes cannot reach ideal all-reduce performance
Issue -
State: open - Opened by lcw2 11 months ago
- 4 comments
#174 - No explanation on BusBW factor regarding alltoall in docs
Issue -
State: open - Opened by lappazos 11 months ago
#173 - Collecting latency data per coll.
Pull Request -
State: closed - Opened by nv-udeodhar 11 months ago
#172 - Why need more than one iteration to check data?
Issue -
State: closed - Opened by FarmerLiuAng 11 months ago
- 4 comments
#171 - Issue Running NCCL Tests on Gentoo with Varying GPU Availability: CUDA failure common.cu:892 'invalid device ordinal'
Issue -
State: closed - Opened by SweeneyJun 12 months ago
- 3 comments
#170 - unhandled cuda error during test
Issue -
State: closed - Opened by mlinmg 12 months ago
- 1 comment
#169 - if the bandwidth results of the Nccl test are related to the number of nodes?
Issue -
State: open - Opened by PrometheusComing 12 months ago
- 2 comments
#168 - Test in dockers of multi-node
Issue -
State: open - Opened by jiangxiaobin96 12 months ago
#167 - all_reduce_perf(--op='sum') get wrong results when size is over specific value
Issue -
State: closed - Opened by metaVariable 12 months ago
- 9 comments
#166 - Test NCCL failure common.cu:958 'internal error - please report this issue to the NCCL developers / '
Issue -
State: closed - Opened by kylematoba about 1 year ago
- 10 comments
#165 - Test CUDA failure common.cu:892 'invalid device ordinal'
Issue -
State: closed - Opened by marabgol about 1 year ago
- 10 comments
#164 - Calculating "net_bw" in addition to "bus_bw"
Issue -
State: open - Opened by yehuday about 1 year ago
#163 - when i am running this command : mpirun -np 1 ./build/all_reduce_perf -b 8 -e 128M -f 2 -g 2. I found this
Issue -
State: open - Opened by james2v about 1 year ago
- 2 comments
#162 - Nccl test fails on 8 x V100- misc/socket.cc:483 NCCL WARN socketStartConnect: Connect to xxx failed : Software caused connection abort
Issue -
State: closed - Opened by hacker-jerry about 1 year ago
- 9 comments
#161 - When I am running on multiple nodes, I can get the corresponding results when running on 3 nodes, and an exception will occur when more than 3 nodes are executed.
Issue -
State: open - Opened by songqimao about 1 year ago
- 3 comments
#160 - what do algobw actually mean when I run test with more than one nodes?speed between nodes or speed between gpus.
Issue -
State: closed - Opened by wenjunlong about 1 year ago
- 3 comments
#159 - The difference between algbw and busbw
Issue -
State: open - Opened by shaoyezuizuishuai about 1 year ago
#158 - Testing git\n
Pull Request -
State: closed - Opened by BhaviniMishra about 1 year ago
#157 - New AlltoAllV (Imbalanced AlltoAll) benchmark.
Pull Request -
State: open - Opened by babusid about 1 year ago
- 1 comment
#156 - Two A800 nodes cannot reach ideal all-reduce performance
Issue -
State: open - Opened by joydchh about 1 year ago
- 18 comments
#155 - Debugging with cuda-gdb causes problems
Issue -
State: open - Opened by minihu-crypto about 1 year ago
#154 - Bandwidth result not equal to ib_write_bw result
Issue -
State: closed - Opened by Jiaao-Bai about 1 year ago
- 3 comments
#153 - `busbw` does not reflect the speed of hardware bottleneck in H800
Issue -
State: open - Opened by zhangmenghao about 1 year ago
- 7 comments
#152 - Origin of Poor Internode NCCL Performance
Issue -
State: closed - Opened by vitduck about 1 year ago
- 11 comments
#151 - Don't call MPI_Comm_split if NCCL_TESTS_SPLIT_MASK is not set
Pull Request -
State: open - Opened by tstruk over 1 year ago
- 2 comments
#150 - all_reduce_perf fails on 2 nodes
Issue -
State: closed - Opened by scvance over 1 year ago
- 2 comments
#149 - Expected bandwidth results? 8x A100 GPUs over NVLink
Issue -
State: open - Opened by acgandhi over 1 year ago
- 10 comments
#148 - test error: stuck when run test example
Issue -
State: open - Opened by zhengmq2010 over 1 year ago
- 4 comments
#147 - src/Makefile: remove unused variables
Pull Request -
State: open - Opened by yangxingwu over 1 year ago
#146 - makefile: remove extra space
Pull Request -
State: closed - Opened by yangxingwu over 1 year ago
#145 - [91mnvcc fatal : Unsupported gpu architecture 'compute_35' [0m[91mmake[1]: *** [Makefile:84: ../build/all_reduce.o] Error 1 for nvcr.io/nvidia/pytorch:23.02-py3
Issue -
State: closed - Opened by monajalal over 1 year ago
- 2 comments
#144 - question regarding versioning
Issue -
State: closed - Opened by monajalal over 1 year ago
#143 - Bus error when using 16 GPUs in one node
Issue -
State: closed - Opened by richardsliu over 1 year ago
- 7 comments
#142 - why add ALIGN in allgather/reducescatter/hypercube
Issue -
State: closed - Opened by ziyueSeo over 1 year ago
#141 - Multi-node test within a docker container
Issue -
State: open - Opened by deepakn94 over 1 year ago
- 1 comment
#140 - the tag v2.12.10 is missing
Issue -
State: open - Opened by terryhy520 over 1 year ago
- 1 comment
#139 - All2All Benchmarks on Perlmutter
Issue -
State: open - Opened by caoshiyi over 1 year ago
- 8 comments
#138 - Csv format
Pull Request -
State: open - Opened by lipovsek-aws over 1 year ago
- 3 comments
#137 - nccl-tests ignores NCCL_HOME if there exists system wide installation in /usr
Issue -
State: open - Opened by nishshah0 over 1 year ago
- 6 comments
#136 - No commuication between two nodes
Issue -
State: closed - Opened by GongZhengLi over 1 year ago
- 2 comments
#135 - fix handling of variable NVCC.
Pull Request -
State: closed - Opened by aavbsouza over 1 year ago
#134 - Update README.md
Pull Request -
State: closed - Opened by flx42 over 1 year ago
#133 - Running nccl-test on two nodes failed
Issue -
State: open - Opened by zhangciba over 1 year ago
- 1 comment
#132 - unable to complete a TCP connection to another process
Issue -
State: open - Opened by odellus over 1 year ago
- 5 comments
#131 - ./all_reduce_perf: "Out of bounds values: 50 FAILED" [2 GPUs, PHB]
Issue -
State: open - Opened by Meriipu over 1 year ago
- 12 comments
#130 - algorithm bandwidth of all2all
Issue -
State: closed - Opened by de1star over 1 year ago
- 2 comments
#129 - "alias must point to a defined variable or function"
Issue -
State: open - Opened by rainwoodman over 1 year ago
- 1 comment
#128 - nccl-tests on different GPUs
Issue -
State: closed - Opened by de1star over 1 year ago
- 11 comments
#127 - Test bench
Pull Request -
State: open - Opened by novaCoder-zrk over 1 year ago
#126 - Align print format string for column names and units
Pull Request -
State: open - Opened by dmitrygx over 1 year ago
- 5 comments
#125 - Failure when more than 2 GPUs in each node
Issue -
State: closed - Opened by dogacancolak almost 2 years ago
- 5 comments