Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / leptonai/gpud issues and pull requests

#204 - fix(nvidia/infiniband): match mellanox to count PCI devices

Pull Request - State: open - Opened by gyuho 4 days ago

#203 - can't get gpu info with wsl platform

Issue - State: open - Opened by zhuima 5 days ago - 1 comment
Labels: awaiting feedback

#202 - feat(nvidia): rorder nvidia-smi collect after NVML calls

Pull Request - State: open - Opened by gyuho 5 days ago

#199 - fix(nvidia/infiniband): use "<" to evaluate ip port rates

Pull Request - State: closed - Opened by gyuho 5 days ago
Labels: bug

#197 - fix(join): remove space in provider

Pull Request - State: closed - Opened by cardyok 5 days ago

#196 - feat(nvidia/infiniband): make port states configurable

Pull Request - State: closed - Opened by gyuho 5 days ago

#194 - feat(session): add idle session timeout

Pull Request - State: closed - Opened by cardyok 6 days ago

#192 - fix(log/tail): correctly collect xid/sxid events from log scanner

Pull Request - State: closed - Opened by gyuho 7 days ago
Labels: bug

#186 - feat(internal/server): periodic status check logs in debug level

Pull Request - State: closed - Opened by gyuho 11 days ago

#186 - feat(internal/server): periodic status check logs in debug level

Pull Request - State: closed - Opened by gyuho 11 days ago

#184 - fix(accelerator/nvidia): add missing poller initialization

Pull Request - State: closed - Opened by gyuho 11 days ago
Labels: critical-bug

#184 - fix(accelerator/nvidia): add missing poller initialization

Pull Request - State: closed - Opened by gyuho 11 days ago
Labels: critical-bug

#183 - feat(query/log/tail): log stream with deduper

Pull Request - State: closed - Opened by gyuho 12 days ago

#183 - feat(query/log/tail): log stream with deduper

Pull Request - State: closed - Opened by gyuho 12 days ago

#182 - fix(components/dmesg): do not read raw dmesg file with unix time

Pull Request - State: closed - Opened by gyuho 12 days ago
Labels: bug

#182 - fix(components/dmesg): do not read raw dmesg file with unix time

Pull Request - State: closed - Opened by gyuho 12 days ago
Labels: bug

#181 - fix(nvidia/query): quote unusual process name for nvidia-smi parsing

Pull Request - State: closed - Opened by gyuho 12 days ago
Labels: bug

#181 - fix(nvidia/query): quote unusual process name for nvidia-smi parsing

Pull Request - State: closed - Opened by gyuho 12 days ago
Labels: bug

#99 - feat(nvidia/ibstat): check "Physical state" as fallback

Pull Request - State: closed - Opened by gyuho about 2 months ago

#99 - feat(nvidia/ibstat): check "Physical state" as fallback

Pull Request - State: closed - Opened by gyuho about 2 months ago

#98 - feat(session): support reboot method

Pull Request - State: closed - Opened by cardyok about 2 months ago

#98 - feat(session): support reboot method

Pull Request - State: closed - Opened by cardyok about 2 months ago

#97 - feat(build, release): support Amazon Linux 2 and 2023 (experimental)

Pull Request - State: closed - Opened by gyuho about 2 months ago

#97 - feat(build, release): support Amazon Linux 2 and 2023 (experimental)

Pull Request - State: closed - Opened by gyuho about 2 months ago

#96 - feat(pkg/reboot): initial commit

Pull Request - State: closed - Opened by gyuho about 2 months ago

#96 - feat(pkg/reboot): initial commit

Pull Request - State: closed - Opened by gyuho about 2 months ago

#95 - feat(components): add accelerator detect func, "gpud accelerator" subcommand

Pull Request - State: closed - Opened by gyuho about 2 months ago

#95 - feat(components): add accelerator detect func, "gpud accelerator" subcommand

Pull Request - State: closed - Opened by gyuho about 2 months ago

#94 - feat(server): allow custom uid with cli

Pull Request - State: closed - Opened by cardyok about 2 months ago

#94 - feat(server): allow custom uid with cli

Pull Request - State: closed - Opened by cardyok about 2 months ago

#91 - doc(sxid): add more example events for gpu-operator

Pull Request - State: closed - Opened by gyuho about 2 months ago - 1 comment

#91 - doc(sxid): add more example events for gpu-operator

Pull Request - State: closed - Opened by gyuho about 2 months ago - 1 comment

#90 - Installation on Amazon Linux2 version `GLIBC_2.28' not found

Issue - State: closed - Opened by chatter92 about 2 months ago - 7 comments
Labels: question, dependency-issue, awaiting feedback

#90 - Installation on Amazon Linux2 version `GLIBC_2.28' not found

Issue - State: closed - Opened by chatter92 about 2 months ago - 7 comments
Labels: question, dependency-issue, awaiting feedback

#89 - feat(nvidia/xid,sxid,remapped rows): add required actions field to /states, /events

Pull Request - State: closed - Opened by gyuho about 2 months ago - 1 comment

#89 - feat(nvidia/xid,sxid,remapped rows): add required actions field to /states, /events

Pull Request - State: closed - Opened by gyuho about 2 months ago - 1 comment

#88 - feat(nvidia/query): shorter timeouts for "nvidia-smi" calls

Pull Request - State: closed - Opened by gyuho about 2 months ago - 1 comment

#88 - feat(nvidia/query): shorter timeouts for "nvidia-smi" calls

Pull Request - State: closed - Opened by gyuho about 2 months ago - 1 comment

#87 - feat(nvidia/ecc): rename state name key to "ecc" (from ecc_errors)

Pull Request - State: closed - Opened by gyuho about 2 months ago - 2 comments

#87 - feat(nvidia/ecc): rename state name key to "ecc" (from ecc_errors)

Pull Request - State: closed - Opened by gyuho about 2 months ago - 2 comments

#86 - feat(nvidia): track "ECC mode" (enabled/disabled) using nvidia-smi and NVML

Pull Request - State: closed - Opened by gyuho about 2 months ago - 3 comments

#86 - feat(nvidia): track "ECC mode" (enabled/disabled) using nvidia-smi and NVML

Pull Request - State: closed - Opened by gyuho about 2 months ago - 3 comments

#85 - doc(nvidia/sxid): README to expain xid 79, sxid 20034 as an example

Pull Request - State: closed - Opened by gyuho about 2 months ago

#85 - doc(nvidia/sxid): README to expain xid 79, sxid 20034 as an example

Pull Request - State: closed - Opened by gyuho about 2 months ago

#84 - feat(nvidia): add non-fatal sxid "20012" code, rename Detail.ID to SXID

Pull Request - State: closed - Opened by gyuho about 2 months ago

#84 - feat(nvidia): add non-fatal sxid "20012" code, rename Detail.ID to SXID

Pull Request - State: closed - Opened by gyuho about 2 months ago

#83 - fix(nvidia): return empty output object if smi/nvml is nil

Pull Request - State: closed - Opened by gyuho about 2 months ago

#83 - fix(nvidia): return empty output object if smi/nvml is nil

Pull Request - State: closed - Opened by gyuho about 2 months ago

#82 - Update mothership endpoint

Pull Request - State: closed - Opened by cardyok 2 months ago

#82 - Update mothership endpoint

Pull Request - State: closed - Opened by cardyok 2 months ago

#80 - feat(nvidia): track row remapping, RMA/GPU reset status

Pull Request - State: closed - Opened by gyuho 2 months ago

#80 - feat(nvidia): track row remapping, RMA/GPU reset status

Pull Request - State: closed - Opened by gyuho 2 months ago

#79 - nits(nvidia/query/nvml): remove unused GPUID fields

Pull Request - State: closed - Opened by gyuho 2 months ago

#79 - nits(nvidia/query/nvml): remove unused GPUID fields

Pull Request - State: closed - Opened by gyuho 2 months ago

#78 - feat(internal/server): dynamically refresh containerd, docker, kubelet components

Pull Request - State: closed - Opened by gyuho 2 months ago - 1 comment

#78 - feat(internal/server): dynamically refresh containerd, docker, kubelet components

Pull Request - State: closed - Opened by gyuho 2 months ago - 1 comment

#76 - fix(power): fix power segfault

Pull Request - State: closed - Opened by cardyok 2 months ago

#76 - fix(power): fix power segfault

Pull Request - State: closed - Opened by cardyok 2 months ago

#75 - Question Regarding Remediation

Issue - State: closed - Opened by ivelichkovich 2 months ago - 1 comment
Labels: question

#75 - Question Regarding Remediation

Issue - State: closed - Opened by ivelichkovich 2 months ago - 1 comment
Labels: question

#74 - feat(nvidia/peermem): track dmesg events for invalid context errors

Pull Request - State: closed - Opened by gyuho 2 months ago

#74 - feat(nvidia/peermem): track dmesg events for invalid context errors

Pull Request - State: closed - Opened by gyuho 2 months ago

#72 - fix(pkg/process): panic on wait before process initialization

Pull Request - State: closed - Opened by gyuho 2 months ago

#72 - fix(pkg/process): panic on wait before process initialization

Pull Request - State: closed - Opened by gyuho 2 months ago

#71 - feat(nvidia/fabric-manager): alert on nvlink multicast failures

Pull Request - State: closed - Opened by gyuho 2 months ago

#71 - feat(nvidia/fabric-manager): alert on nvlink multicast failures

Pull Request - State: closed - Opened by gyuho 2 months ago

#70 - feat(dmesg): add oom-kill:constraint regex for cri-containerd events

Pull Request - State: closed - Opened by gyuho 2 months ago

#70 - feat(dmesg): add oom-kill:constraint regex for cri-containerd events

Pull Request - State: closed - Opened by gyuho 2 months ago

#69 - feat(nvidia/query): fabric manager debugging info from journalctl

Pull Request - State: closed - Opened by gyuho 2 months ago