Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / leptonai/gpud issues and pull requests

#194 - feat(session): add idle session timeout

Pull Request - State: closed - Opened by cardyok 3 months ago

#192 - fix(log/tail): correctly collect xid/sxid events from log scanner

Pull Request - State: closed - Opened by gyuho 3 months ago
Labels: bug

#191 - feat(component/kernel-module): initial commit (track /etc/modules)

Pull Request - State: closed - Opened by gyuho 3 months ago

#186 - feat(internal/server): periodic status check logs in debug level

Pull Request - State: closed - Opened by gyuho 3 months ago

#186 - feat(internal/server): periodic status check logs in debug level

Pull Request - State: closed - Opened by gyuho 3 months ago

#184 - fix(accelerator/nvidia): add missing poller initialization

Pull Request - State: closed - Opened by gyuho 3 months ago
Labels: critical-bug

#184 - fix(accelerator/nvidia): add missing poller initialization

Pull Request - State: closed - Opened by gyuho 3 months ago
Labels: critical-bug

#183 - feat(query/log/tail): log stream with deduper

Pull Request - State: closed - Opened by gyuho 3 months ago

#183 - feat(query/log/tail): log stream with deduper

Pull Request - State: closed - Opened by gyuho 3 months ago

#182 - fix(components/dmesg): do not read raw dmesg file with unix time

Pull Request - State: closed - Opened by gyuho 3 months ago
Labels: bug

#182 - fix(components/dmesg): do not read raw dmesg file with unix time

Pull Request - State: closed - Opened by gyuho 3 months ago
Labels: bug

#181 - fix(nvidia/query): quote unusual process name for nvidia-smi parsing

Pull Request - State: closed - Opened by gyuho 3 months ago
Labels: bug

#181 - fix(nvidia/query): quote unusual process name for nvidia-smi parsing

Pull Request - State: closed - Opened by gyuho 3 months ago
Labels: bug

#99 - feat(nvidia/ibstat): check "Physical state" as fallback

Pull Request - State: closed - Opened by gyuho 5 months ago

#99 - feat(nvidia/ibstat): check "Physical state" as fallback

Pull Request - State: closed - Opened by gyuho 5 months ago

#98 - feat(session): support reboot method

Pull Request - State: closed - Opened by cardyok 5 months ago

#98 - feat(session): support reboot method

Pull Request - State: closed - Opened by cardyok 5 months ago

#97 - feat(build, release): support Amazon Linux 2 and 2023 (experimental)

Pull Request - State: closed - Opened by gyuho 5 months ago

#97 - feat(build, release): support Amazon Linux 2 and 2023 (experimental)

Pull Request - State: closed - Opened by gyuho 5 months ago

#96 - feat(pkg/reboot): initial commit

Pull Request - State: closed - Opened by gyuho 5 months ago

#96 - feat(pkg/reboot): initial commit

Pull Request - State: closed - Opened by gyuho 5 months ago

#94 - feat(server): allow custom uid with cli

Pull Request - State: closed - Opened by cardyok 5 months ago

#94 - feat(server): allow custom uid with cli

Pull Request - State: closed - Opened by cardyok 5 months ago

#91 - doc(sxid): add more example events for gpu-operator

Pull Request - State: closed - Opened by gyuho 5 months ago - 1 comment

#91 - doc(sxid): add more example events for gpu-operator

Pull Request - State: closed - Opened by gyuho 5 months ago - 1 comment

#90 - Installation on Amazon Linux2 version `GLIBC_2.28' not found

Issue - State: closed - Opened by chatter92 5 months ago - 7 comments
Labels: question, dependency-issue, awaiting feedback

#90 - Installation on Amazon Linux2 version `GLIBC_2.28' not found

Issue - State: closed - Opened by chatter92 5 months ago - 7 comments
Labels: question, dependency-issue, awaiting feedback

#89 - feat(nvidia/xid,sxid,remapped rows): add required actions field to /states, /events

Pull Request - State: closed - Opened by gyuho 5 months ago - 1 comment

#89 - feat(nvidia/xid,sxid,remapped rows): add required actions field to /states, /events

Pull Request - State: closed - Opened by gyuho 5 months ago - 1 comment

#88 - feat(nvidia/query): shorter timeouts for "nvidia-smi" calls

Pull Request - State: closed - Opened by gyuho 5 months ago - 1 comment

#88 - feat(nvidia/query): shorter timeouts for "nvidia-smi" calls

Pull Request - State: closed - Opened by gyuho 5 months ago - 1 comment

#87 - feat(nvidia/ecc): rename state name key to "ecc" (from ecc_errors)

Pull Request - State: closed - Opened by gyuho 5 months ago - 2 comments

#87 - feat(nvidia/ecc): rename state name key to "ecc" (from ecc_errors)

Pull Request - State: closed - Opened by gyuho 5 months ago - 2 comments

#86 - feat(nvidia): track "ECC mode" (enabled/disabled) using nvidia-smi and NVML

Pull Request - State: closed - Opened by gyuho 5 months ago - 3 comments

#86 - feat(nvidia): track "ECC mode" (enabled/disabled) using nvidia-smi and NVML

Pull Request - State: closed - Opened by gyuho 5 months ago - 3 comments

#85 - doc(nvidia/sxid): README to expain xid 79, sxid 20034 as an example

Pull Request - State: closed - Opened by gyuho 5 months ago

#85 - doc(nvidia/sxid): README to expain xid 79, sxid 20034 as an example

Pull Request - State: closed - Opened by gyuho 5 months ago

#83 - fix(nvidia): return empty output object if smi/nvml is nil

Pull Request - State: closed - Opened by gyuho 5 months ago

#83 - fix(nvidia): return empty output object if smi/nvml is nil

Pull Request - State: closed - Opened by gyuho 5 months ago

#82 - Update mothership endpoint

Pull Request - State: closed - Opened by cardyok 5 months ago

#82 - Update mothership endpoint

Pull Request - State: closed - Opened by cardyok 5 months ago

#80 - feat(nvidia): track row remapping, RMA/GPU reset status

Pull Request - State: closed - Opened by gyuho 5 months ago

#80 - feat(nvidia): track row remapping, RMA/GPU reset status

Pull Request - State: closed - Opened by gyuho 5 months ago

#79 - nits(nvidia/query/nvml): remove unused GPUID fields

Pull Request - State: closed - Opened by gyuho 5 months ago

#79 - nits(nvidia/query/nvml): remove unused GPUID fields

Pull Request - State: closed - Opened by gyuho 5 months ago

#78 - feat(internal/server): dynamically refresh containerd, docker, kubelet components

Pull Request - State: closed - Opened by gyuho 5 months ago - 1 comment

#78 - feat(internal/server): dynamically refresh containerd, docker, kubelet components

Pull Request - State: closed - Opened by gyuho 5 months ago - 1 comment

#76 - fix(power): fix power segfault

Pull Request - State: closed - Opened by cardyok 5 months ago

#76 - fix(power): fix power segfault

Pull Request - State: closed - Opened by cardyok 5 months ago

#75 - Question Regarding Remediation

Issue - State: closed - Opened by ivelichkovich 5 months ago - 1 comment
Labels: question

#75 - Question Regarding Remediation

Issue - State: closed - Opened by ivelichkovich 5 months ago - 1 comment
Labels: question

#74 - feat(nvidia/peermem): track dmesg events for invalid context errors

Pull Request - State: closed - Opened by gyuho 5 months ago

#74 - feat(nvidia/peermem): track dmesg events for invalid context errors

Pull Request - State: closed - Opened by gyuho 5 months ago

#72 - fix(pkg/process): panic on wait before process initialization

Pull Request - State: closed - Opened by gyuho 5 months ago

#72 - fix(pkg/process): panic on wait before process initialization

Pull Request - State: closed - Opened by gyuho 5 months ago

#71 - feat(nvidia/fabric-manager): alert on nvlink multicast failures

Pull Request - State: closed - Opened by gyuho 5 months ago

#71 - feat(nvidia/fabric-manager): alert on nvlink multicast failures

Pull Request - State: closed - Opened by gyuho 5 months ago

#70 - feat(dmesg): add oom-kill:constraint regex for cri-containerd events

Pull Request - State: closed - Opened by gyuho 5 months ago

#70 - feat(dmesg): add oom-kill:constraint regex for cri-containerd events

Pull Request - State: closed - Opened by gyuho 5 months ago

#69 - feat(nvidia/query): fabric manager debugging info from journalctl

Pull Request - State: closed - Opened by gyuho 5 months ago

#69 - feat(nvidia/query): fabric manager debugging info from journalctl

Pull Request - State: closed - Opened by gyuho 5 months ago

#68 - feat(pkg/process): rename stop to abort, add systemd/journal utils

Pull Request - State: closed - Opened by gyuho 5 months ago

#68 - feat(pkg/process): rename stop to abort, add systemd/journal utils

Pull Request - State: closed - Opened by gyuho 5 months ago

#67 - feat(pkg/systemd): remove redundant utils, move "pkg/update"

Pull Request - State: closed - Opened by gyuho 5 months ago

#67 - feat(pkg/systemd): remove redundant utils, move "pkg/update"

Pull Request - State: closed - Opened by gyuho 5 months ago

#66 - feat(internal/session): add missing writer close for session writer

Pull Request - State: closed - Opened by gyuho 5 months ago - 2 comments

#66 - feat(internal/session): add missing writer close for session writer

Pull Request - State: closed - Opened by gyuho 5 months ago - 2 comments

#65 - client(v1): move examples, add info by component

Pull Request - State: closed - Opened by gyuho 5 months ago

#65 - client(v1): move examples, add info by component

Pull Request - State: closed - Opened by gyuho 5 months ago

#64 - feat(docker): list all containers in docker

Pull Request - State: closed - Opened by cardyok 5 months ago