Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / NVIDIA/k8s-device-plugin issues and pull requests
#502 - How to trigger gpu failure, the gpu count of node's allocatable field will be dynamically decrease
Issue -
State: open - Opened by yizhouv5 8 months ago
- 4 comments
#443 - MPS with Kubernetes on NVIDIA GPU
Issue -
State: open - Opened by selinnilesy 12 months ago
- 27 comments
Labels: feature
#426 - GPU gets marked as unhealthy on systemctl daemon-reloads + kubelet restarts (on Kubernetes Upgrades)
Issue -
State: open - Opened by sstrk about 1 year ago
- 5 comments
Labels: needs-triage
#404 - Questions about GPU time-sharing on Kubernetes
Issue -
State: open - Opened by jxl4650152 over 1 year ago
#403 - Add ability to restart container on device failures
Pull Request -
State: closed - Opened by lxpbl over 1 year ago
#402 - Clarify FS watcher error with path
Pull Request -
State: open - Opened by Dentrax over 1 year ago
- 1 comment
#401 - Who is maintaining this repo??
Issue -
State: closed - Opened by maaft over 1 year ago
- 3 comments
#400 - Error: template: nvidia-device-plugin/templates/gfd.yml:22:19: executing "nvidia-device-plugin/templates/gfd.yml" at <.Subcharts.gfd>: nil pointer evaluating interface {}.gfd
Issue -
State: open - Opened by hanzhc over 1 year ago
- 7 comments
#399 - On the node with mig enabled, nvidia-device-plugin reports an error when it starts
Issue -
State: closed - Opened by yeqiugt over 1 year ago
- 2 comments
#398 - feat(plugin): Make resource name configurable
Pull Request -
State: closed - Opened by YitzyD over 1 year ago
#397 - failed to construct NVML resource managers
Issue -
State: closed - Opened by shortwavedave over 1 year ago
- 1 comment
#396 - Bump github.com/opencontainers/runc from 1.1.4 to 1.1.5
Pull Request -
State: open - Opened by dependabot[bot] over 1 year ago
Labels: dependencies
#395 - update readme to state that gitlab is the location where development …
Pull Request -
State: open - Opened by kannon92 over 1 year ago
#394 - ubuntu 22.04: pods get killed when any pod resources differ between limits and requests
Issue -
State: open - Opened by maaft over 1 year ago
#393 - gitlab location is where developers should open PRs on
Pull Request -
State: closed - Opened by kannon92 over 1 year ago
#392 - K8 Job does not get marked as completed after the pod succeeds in AKS version 1.25.5
Issue -
State: closed - Opened by narendrakumar-nj over 1 year ago
- 2 comments
Labels: lifecycle/stale
#391 - Add prestart-hook for device plugin
Pull Request -
State: closed - Opened by kannon92 over 1 year ago
- 7 comments
#390 - NVIDIA device plugin isn't advertising the GPUs
Issue -
State: closed - Opened by glopezdiest over 1 year ago
- 12 comments
#389 - Is restarting the plugin the only way to update the node GPU profile after mig-enabled GPUs get repartitioned?
Issue -
State: open - Opened by WindowsXp-Beta over 1 year ago
- 1 comment
#388 - Plugin usage on a mixed node group EKS infrastructure
Issue -
State: open - Opened by kaustubh-reinvent over 1 year ago
#387 - ETA on new release to address 0.13.0 Security Vulnerabilities?
Issue -
State: open - Opened by jhawkins1 over 1 year ago
#386 - report index as RegisteredDevices when device-id-strategy set to index
Pull Request -
State: closed - Opened by borgerli over 1 year ago
- 4 comments
#385 - Bump golang.org/x/text from 0.3.3 to 0.3.8
Pull Request -
State: closed - Opened by dependabot[bot] over 1 year ago
- 2 comments
Labels: dependencies
#384 - Add documentation for CRI-O
Issue -
State: open - Opened by AndreasMurk over 1 year ago
#383 - Feature: Deleting memory of prevous GPU runs before running the existing job
Issue -
State: open - Opened by kannon92 over 1 year ago
#382 - 0/1 nodes are available: 1 Insufficient nvidia.com/gpu.
Issue -
State: open - Opened by liufangpeng over 1 year ago
- 2 comments
#381 - nvidia-cuda-mps-control: command not found
Issue -
State: closed - Opened by LoRKaa over 1 year ago
#380 - fix expired link
Pull Request -
State: closed - Opened by zhouhao3 over 1 year ago
- 2 comments
#379 - About GPU cleanup features
Issue -
State: open - Opened by zhouhao3 over 1 year ago
#378 - can not distinguish t4 and a100 ?
Issue -
State: open - Opened by ggjjlldd over 1 year ago
- 1 comment
#377 - Plug in does not detect Tegra device Jetson Nano
Issue -
State: open - Opened by VladoPortos over 1 year ago
- 9 comments
#376 - The plugin container registry is inaccessible via IPv6
Issue -
State: open - Opened by osipov over 1 year ago
#375 - Update README.md
Pull Request -
State: closed - Opened by HeGaoYuan over 1 year ago
- 1 comment
#374 - Update README.md
Pull Request -
State: closed - Opened by HeGaoYuan over 1 year ago
#373 - container failed to start after the VM node migrated to another host
Issue -
State: open - Opened by borgerli over 1 year ago
- 3 comments
#372 - nvidia-device-plugin getting CrashLoopBackOff while installing using helm
Issue -
State: open - Opened by captainsk7 over 1 year ago
- 2 comments
#370 - Update README.md
Pull Request -
State: open - Opened by hholst80 almost 2 years ago
#369 - apt-key is deprecated
Issue -
State: open - Opened by hholst80 almost 2 years ago
#368 - k8s-device-plugin restarts on k3s deployment (on top of containerd)
Issue -
State: open - Opened by hholst80 almost 2 years ago
- 15 comments
Labels: lifecycle/stale
#365 - #364 Check config symlink instead of file existence in config-manager
Pull Request -
State: open - Opened by Telemaco019 almost 2 years ago
#364 - Time-slicing config update: "Error: error creating symlink: file exists"
Issue -
State: open - Opened by Telemaco019 almost 2 years ago
#354 - timeslice config
Issue -
State: open - Opened by segamishuichi almost 2 years ago
#353 - Enable resource renaming in time-slicing shared GPUs
Issue -
State: closed - Opened by Telemaco019 almost 2 years ago
- 4 comments
#352 - k3s nvidia-device-plugin-daemonset report error - Fixed
Issue -
State: open - Opened by liujie1008cn almost 2 years ago
- 2 comments
#351 - compatible gpu type with k8s gpu time slicing
Issue -
State: open - Opened by alirezadaghigh99 almost 2 years ago
#350 - compatible gpu type with k8s gpu time slicing
Issue -
State: open - Opened by alirezadaghigh99 almost 2 years ago
#349 - How do I know that the upgrade of the NVIDIA device plugin went well?
Issue -
State: closed - Opened by Borchies almost 2 years ago
- 4 comments
#348 - Insufficient nvidia.com/gpu. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
Issue -
State: open - Opened by somethingwentwell almost 2 years ago
- 4 comments
#347 - how to support nvlink between several k8s pods?
Issue -
State: open - Opened by pokerc almost 2 years ago
- 3 comments
#346 - "nvidia-smi": executable file not found in $PATH: unknown
Issue -
State: open - Opened by devriesewouter89 almost 2 years ago
- 3 comments
#345 - How does GPU Pod dynamically schedule clusters
Issue -
State: open - Opened by Kry1702 almost 2 years ago
- 1 comment
#344 - GPU is not available with a GPU EC2 instance in EKS cluster (1.23)
Issue -
State: open - Opened by garyyang6 almost 2 years ago
- 1 comment
#343 - Question about MIG config persistent
Issue -
State: open - Opened by slow-zhang almost 2 years ago
- 11 comments
#342 - Share GPU in same pod using volume-mounts strategy
Issue -
State: open - Opened by dcarrion87 almost 2 years ago
- 5 comments
#341 - nvdi-smi hogs CPU
Issue -
State: open - Opened by duk0011 almost 2 years ago
- 2 comments
#340 - MIG for A6000
Issue -
State: closed - Opened by arijitthegame almost 2 years ago
- 2 comments
#339 - Question about safely upgrading device plugin
Issue -
State: open - Opened by henrysecond1 almost 2 years ago
#338 - Device driver panics randomly with unknown error
Issue -
State: open - Opened by olemarkus almost 2 years ago
- 2 comments
#337 - Update README.md
Pull Request -
State: closed - Opened by tico88612 almost 2 years ago
- 1 comment
#336 - How do I run my application on the GPU assigned by k8s
Issue -
State: closed - Opened by c-android almost 2 years ago
- 2 comments
#335 - Pods with GPU terminating very slowly
Issue -
State: open - Opened by Zhurik about 2 years ago
#334 - Previous issue - #297, Is plugin ready for jetson nano devices
Issue -
State: open - Opened by ravinayag about 2 years ago
- 8 comments
#333 - 4pdvGPU: ERROR get_vdevice_index: Assertion `0' failed and Aborted (core dumped)
Issue -
State: open - Opened by Chenxs1122 about 2 years ago
#332 - Getting GPU device minor number: Not Supported
Issue -
State: open - Opened by zengzhengrong about 2 years ago
- 13 comments
#331 - Function not found for nvml methods
Issue -
State: open - Opened by kbkartik about 2 years ago
- 3 comments
#330 - Startup race condition with dcgm-exporter
Issue -
State: closed - Opened by skraga about 2 years ago
- 3 comments
#329 - How can I use nvidia gpu in kubernetes pod?
Issue -
State: open - Opened by misupopo about 2 years ago
- 1 comment
#328 - Pods are not scheduled in all GPUs of a physical server.
Issue -
State: closed - Opened by shan100github about 2 years ago
- 25 comments
#327 - Make repo go install friendly
Issue -
State: open - Opened by anthonyrisinger about 2 years ago
#326 - undefined symbol nvmlGpuInstanceGetComputeInstanceProfileInfoV in v12+
Issue -
State: closed - Opened by anthonyrisinger about 2 years ago
- 3 comments
#325 - helm 0.12.2 - nfd-worker logs permission denied on selinux and gfd
Issue -
State: open - Opened by RichardSufliarsky about 2 years ago
#324 - Failure: nvidia-container-cli.real: container error: cgroup subsystem devices not found
Issue -
State: open - Opened by mpu-creare about 2 years ago
- 1 comment
#323 - CUDA memory error
Issue -
State: closed - Opened by Borchies about 2 years ago
- 8 comments
#322 - Failed to initialize NVML: Unknown Error for when changed runtime from docker to containerd
Issue -
State: open - Opened by zvier about 2 years ago
- 13 comments
#321 - Unable to get nvidia.com/gpu: "1" greater than 1 for Quadro P2000
Issue -
State: closed - Opened by brianbrady about 2 years ago
- 13 comments
#320 - Incorrect indentation for securityContext > capability in daemonset
Issue -
State: closed - Opened by waynelwh about 2 years ago
- 2 comments
#319 - "CUDA unknown error" when using pytorch, and recovered by restarting the nvidia plugin pod
Issue -
State: open - Opened by chxk about 2 years ago
#318 - How to check the NVIDIA k8-device-plugin version?
Issue -
State: closed - Opened by esparig about 2 years ago
- 7 comments
#317 - NVIDIA A10 GPUs - are these drivers in the NVIDIA / k8s-device-plugin
Issue -
State: open - Opened by jeffreydahan about 2 years ago
- 9 comments
#315 - nvidia-device-plugin daemonset has 0 desired and no pod is launched
Issue -
State: open - Opened by blackjack2015 over 2 years ago
- 4 comments
#314 - device plugin default_runtime_name requirement and documentation
Issue -
State: closed - Opened by rptaylor over 2 years ago
- 2 comments
#311 - Fix containerd config in README.md
Pull Request -
State: closed - Opened by gmrukwa over 2 years ago
- 2 comments
#302 - How to use the device plugin with new k8s 1.24 version?
Issue -
State: open - Opened by Zigko over 2 years ago
- 21 comments
#300 - [add]: support for hostNetwork parameter in daemonset deployment
Pull Request -
State: closed - Opened by vasudev-singhc-by over 2 years ago
- 2 comments
#298 - fix: Fixed a build error on io.ReadAll
Pull Request -
State: closed - Opened by aelgasser over 2 years ago
- 6 comments
#297 - Cannot run nvidia-device-plugins on arm64 with cuda 10.2
Issue -
State: closed - Opened by Fvoiretryzig over 2 years ago
- 3 comments
#292 - how to schedule jobs to different type of gpus?
Issue -
State: closed - Opened by silverlining21 over 2 years ago
- 6 comments
#289 - pod fail to find gpu some time after created
Issue -
State: closed - Opened by JuHyung-Son over 2 years ago
- 14 comments
#284 - update the Dockerfile: NVIDIA_DRIVER_CAPABILITIES=utility,compute
Pull Request -
State: closed - Opened by alex337 almost 3 years ago
- 2 comments
#274 - Cannot find GPU information in Capacity when kubectl describe a K8s GPU node.
Issue -
State: open - Opened by Zeyu-ZEYU almost 3 years ago
- 3 comments
#267 - k8s-device-plugin seems to think gpu healthy when it is not usable due to Uncorrectable ECC Error
Issue -
State: open - Opened by tingweiwu about 3 years ago
- 8 comments
Labels: lifecycle/stale
#266 - spec.template.metadata.annotations[scheduler.alpha.kubernetes.io/critical-pod]: non-functional in v1.16+; use the "priorityClassName" field instead daemonset.apps/nvidia-device-plugin-daemonset created
Issue -
State: closed - Opened by wajeehulhassanvii about 3 years ago
- 4 comments
#258 - Multi card
Pull Request -
State: closed - Opened by archlitchi about 3 years ago
- 2 comments
#253 - Installation failed k8s-device-plugin(v0.9.0)
Issue -
State: open - Opened by Kwonho over 3 years ago
- 12 comments
#240 - Device-plugin does not bother to properly do a cleanup of the info about GPUs after MIG enable/disable or after reconfiguration
Issue -
State: open - Opened by dchirikov over 3 years ago
- 15 comments
#203 - With volume-mounts strategy, pod shouldn't fail when no permission to read NVIDIA_VISIBLE_DEVICES
Issue -
State: open - Opened by zhsj almost 4 years ago
- 3 comments
#199 - Setting "failOnInitError" unexpectedly "works" with a small 2 node cluster.
Issue -
State: closed - Opened by supertetelman almost 4 years ago
- 2 comments
#169 - Support sharing GPUs
Issue -
State: open - Opened by ktarplee over 4 years ago
- 37 comments
#151 - Update nvidia-device-plugin.yml
Pull Request -
State: closed - Opened by frankenstien-831 almost 5 years ago
- 3 comments
#143 - How to use specific NVIDIA GPU type(model) in pod yaml
Issue -
State: closed - Opened by estherxyz almost 5 years ago
- 8 comments