Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / aws/aws-ofi-nccl issues and pull requests
#682 - .ci/aws: Move p5 capacity to CGK and other changes
Pull Request -
State: closed - Opened by sunkuamzn 4 months ago
- 3 comments
#681 - .ci/aws: Move p5 capacity to CGK and other changes
Pull Request -
State: closed - Opened by sunkuamzn 4 months ago
- 2 comments
#680 - .ci/aws: Move p5 capacity to CGK
Pull Request -
State: closed - Opened by sunkuamzn 4 months ago
#679 - Fix device sorting on aws platforms
Pull Request -
State: closed - Opened by bwbarrett 4 months ago
- 4 comments
#678 - prepare v1.12.1-aws
Pull Request -
State: closed - Opened by aws-nslick 4 months ago
- 1 comment
#677 - prepare v1.11.1-aws
Pull Request -
State: closed - Opened by aws-nslick 4 months ago
- 5 comments
#676 - Revert vf rail sorting patches
Pull Request -
State: closed - Opened by bwbarrett 4 months ago
- 1 comment
#675 - Revert vf rail sorting patches
Pull Request -
State: closed - Opened by bwbarrett 4 months ago
- 1 comment
#674 - Revert "platform-aws: Add EFA-specific rail sorting on VF index"
Pull Request -
State: closed - Opened by bwbarrett 4 months ago
- 1 comment
#673 - rdma: add option to round robin the ctrl msg, and use shared CQs for control and data endpoints
Pull Request -
State: closed - Opened by AmedeoSapio 4 months ago
- 1 comment
#672 - Add p5en platform_data and update default latency for undefined platforms
Pull Request -
State: closed - Opened by rajachan 4 months ago
#671 - Test CI
Pull Request -
State: closed - Opened by a-szegel 4 months ago
#670 - Cleanups from adding a domain interface
Pull Request -
State: closed - Opened by bwbarrett 4 months ago
- 2 comments
#669 - fix: Change multiplexer scheduler to use two rails instead of three
Pull Request -
State: closed - Opened by arunkarthik-akkart 4 months ago
- 2 comments
#668 - rdma: add option to round robin the ctrl msg
Pull Request -
State: closed - Opened by AmedeoSapio 4 months ago
- 1 comment
#667 - Fix a number of duplicate definition names
Pull Request -
State: closed - Opened by bwbarrett 4 months ago
- 4 comments
#666 - Simplify locking and enable FI_THREAD_DOMAIN
Pull Request -
State: closed - Opened by bwbarrett 4 months ago
- 1 comment
#665 - Add domain object to transports
Pull Request -
State: closed - Opened by bwbarrett 4 months ago
- 12 comments
#664 - rdma: Revert commits eliminating eager waits
Pull Request -
State: closed - Opened by rauteric 4 months ago
- 2 comments
#663 - ncclInternalError: Internal check failed. | NET/OFI Unable to insert remote address into address vector for device 1
Issue -
State: closed - Opened by emorikawa 4 months ago
- 8 comments
#662 - Fix abort when cache is disabled.
Pull Request -
State: closed - Opened by bwbarrett 4 months ago
- 1 comment
#661 - Couple of accessor function / code cleanups
Pull Request -
State: closed - Opened by bwbarrett 4 months ago
- 2 comments
#660 - rdma: remove "request completed with error" message
Pull Request -
State: closed - Opened by rauteric 4 months ago
- 5 comments
#659 - Fix use of uninitialized lock
Pull Request -
State: closed - Opened by bwbarrett 4 months ago
- 2 comments
#658 - feat(ci): add workflow_dispatch to distcheck
Pull Request -
State: closed - Opened by aws-nslick 4 months ago
- 1 comment
#657 - feat: Make tuner platform specific
Pull Request -
State: closed - Opened by arunkarthik-akkart 4 months ago
#656 - ignore
Pull Request -
State: closed - Opened by bwbarrett 4 months ago
#655 - Ignore
Pull Request -
State: closed - Opened by bwbarrett 4 months ago
#654 - Add a proper endpoint interface
Pull Request -
State: closed - Opened by bwbarrett 4 months ago
#653 - [v1.12.x-aws] update version number to v1.12.1a1-aws
Pull Request -
State: closed - Opened by AmedeoSapio 4 months ago
#652 - rdma: do local RDMA read on all NIC rails for flush()
Pull Request -
State: closed - Opened by taeilum00 4 months ago
- 1 comment
#651 - Update version number and changelog for v1.12.0-aws release
Pull Request -
State: closed - Opened by AmedeoSapio 4 months ago
#650 - Revert "neuron: Disable rdma eager messages by default"
Pull Request -
State: closed - Opened by maxtmann 4 months ago
#649 - fix : Fix flexible array member allocation
Pull Request -
State: closed - Opened by arunkarthik-akkart 4 months ago
- 1 comment
#648 - .ci/aws: All CI use ami with EFA Installer
Pull Request -
State: closed - Opened by a-szegel 5 months ago
- 1 comment
#647 - Cherry pick g4dn tests to v1.12.x-aws
Pull Request -
State: closed - Opened by AmedeoSapio 5 months ago
#646 - use C++ for unit tests
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
- 4 comments
#645 - add `--enable-cpp` build flag
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
#644 - feat(ci/gha): enable unit tests for neuron builds
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
#643 - Cherry-picks for v1.12.x aws
Pull Request -
State: closed - Opened by AmedeoSapio 5 months ago
- 3 comments
#642 - Cherry-picks for v1.11.x aws
Pull Request -
State: closed - Opened by AmedeoSapio 5 months ago
- 3 comments
#641 - tuner: add regions for AllGather/ReduceScatter in the one rank per node case
Pull Request -
State: closed - Opened by AmedeoSapio 5 months ago
- 1 comment
#640 - fix(rdma): send periodic control messages to sync sender/receiver
Pull Request -
State: closed - Opened by rauteric 5 months ago
- 4 comments
#639 - testing
Pull Request -
State: closed - Opened by vidsouza 5 months ago
- 2 comments
#638 - Add platform data settings for TRN2N
Pull Request -
State: closed - Opened by maxtmann 5 months ago
#637 - testing
Pull Request -
State: closed - Opened by vidsouza 5 months ago
#636 - vidsouza-p5-ub22-testing
Pull Request -
State: closed - Opened by vidsouza 5 months ago
#635 - only run ub2204 for debugging ssh issues
Pull Request -
State: closed - Opened by a-szegel 5 months ago
- 3 comments
#634 - separate out 3rd-party headers
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
- 1 comment
#633 - enable more warnings
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
- 1 comment
#632 - feat(build): add -fanalyzer when --enable-werror
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
#631 - Try an ami that passed a few days ago
Pull Request -
State: closed - Opened by a-szegel 5 months ago
#630 - fix: rdma: inverted print statement
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
#629 - fix(init): fix sendrecv fallback logic
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
#628 - fix(ci): prefer ecr to dockerhub
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
#627 - Combined -Wextra -Werror Commits
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
- 4 comments
#626 - rdma: Use get_device_from_ep() accessor
Pull Request -
State: closed - Opened by bwbarrett 5 months ago
- 4 comments
#625 - aws: Skip the WRITE_IN_ORDER_ALIGNED_128_BYTES check for P5en
Pull Request -
State: closed - Opened by rajachan 5 months ago
#624 - feat(build): disable semantic interposition
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
- 3 comments
#623 - fix(build): ensure -pthread is passed
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
#622 - fix(build): add missing AC_PROG_RANLIB
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
#621 - fix(rdma): stop setting FI_ORDER_NONE
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
#620 - Improve end of process cleanup and reporting
Pull Request -
State: closed - Opened by bwbarrett 5 months ago
#619 - .ci/aws: re-Add trainium tests to CI
Pull Request -
State: closed - Opened by a-szegel 5 months ago
#618 - feat: add DMA-BUF support
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
- 6 comments
#617 - Fully destroy endpoints when refcount is 0
Pull Request -
State: closed - Opened by bwbarrett 5 months ago
Labels: previously-passed-ci
#616 - fix(m4): set redzone size to 0
Pull Request -
State: closed - Opened by rauteric 5 months ago
Labels: previously-passed-ci
#615 - Fix log format string behavior
Pull Request -
State: closed - Opened by bwbarrett 5 months ago
#614 - rdma: add separate bounce buffer freelist for data (eager) messages
Pull Request -
State: closed - Opened by rauteric 5 months ago
- 14 comments
#613 - util: Use FI_ENOPROTOOPT to check for a provider's support for option
Pull Request -
State: closed - Opened by rajachan 5 months ago
#612 - CI updates
Pull Request -
State: closed - Opened by rajachan 5 months ago
#611 - "Request completed with error" log leads to p5e cluster collapse
Issue -
State: closed - Opened by vmarkovtsev 5 months ago
- 1 comment
#610 - Improve protocol selection logic
Pull Request -
State: closed - Opened by bwbarrett 5 months ago
#609 - NCCL RDMA expects fi_cq_data_entry, but OPX provider fills CQ with fi_cq_tagged_entry
Issue -
State: closed - Opened by lsavers 5 months ago
- 2 comments
#608 - feat(ci/github): use docker instead of codebuild
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
#607 - fix(valgrind): fix autotools mistake
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
#606 - Initialization fails for OPX Libfabric Provider
Issue -
State: closed - Opened by lsavers 5 months ago
#605 - fix(tree): import libfabric's container_of macro
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
#604 - Add Multiplexed-round-robin scheduler
Pull Request -
State: closed - Opened by arunkarthik-akkart 5 months ago
- 3 comments
Labels: previously-passed-ci
#603 - platform: trn1 default protocol send receive
Pull Request -
State: closed - Opened by hunnorth 5 months ago
- 5 comments
#602 - Fix: access domain from ep during mr on device
Pull Request -
State: closed - Opened by maxtmann 5 months ago
- 1 comment
#601 - feat(build): disable semantic interposition
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
- 2 comments
#600 - freelist: separate out metadata from user data
Pull Request -
State: closed - Opened by rauteric 5 months ago
- 6 comments
Labels: previously-passed-ci
#599 - Seg Fault during RDMA NCCL Connection with OPX Provider
Issue -
State: closed - Opened by lsavers 5 months ago
- 4 comments
#598 - fix(sendrecv): fix a memory leak
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
#597 - No include folder after installation
Issue -
State: closed - Opened by YJHMITWEB 5 months ago
- 5 comments
#596 - feat(build): better --enable-debug defaults
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
Labels: previously-passed-ci
#595 - fix(platform-aws): fill all platform values
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
#594 - fix(tree): use empty brace initializers for zero-initialization
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
- 2 comments
#593 - fix(tracing/nvtx): silence -Wmissing-field-initializer warnings
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
Labels: previously-passed-ci
#592 - feat(ci): add package generation
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
#591 - feat(rdma): constrain C linkage to init
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
- 2 comments
#590 - fix(tracing): use header-only nvtx3
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
Labels: previously-passed-ci
#589 - fix(build): check features before mangling CFLAGS
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
- 1 comment
Labels: previously-passed-ci
#588 - feat(build): add -Wextra to "picky" compiler flags
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
#587 - fix(test): fix typing issues
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
Labels: previously-passed-ci
#586 - fix(rdma): avoid enum/integral comparison
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
Labels: previously-passed-ci
#585 - fix(tree): add fallthrough switch markers
Pull Request -
State: closed - Opened by aws-nslick 5 months ago
- 1 comment
Labels: previously-passed-ci
#584 - register_mr_buffers:544 NCCL WARN NET/OFI Unable to register memory (type = 2) for device 0. RC: -22, Error: Invalid argument
Issue -
State: closed - Opened by visatish 5 months ago
- 10 comments
#583 - fix(tuner): don't choose NVLSTree if nRanks==nNodes
Pull Request -
State: closed - Opened by AmedeoSapio 5 months ago
- 1 comment