Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / NVIDIA/gpu-operator issues and pull requests

#543 - Update blossom-ci.yml

Pull Request - State: open - Opened by rorajani over 1 year ago - 2 comments
Labels: invalid

#542 - GPU-operator not applying driver version changes on EKS

Issue - State: closed - Opened by sjkoelle over 1 year ago - 8 comments

#541 - GPU operator 22.9.2 installation is failing

Issue - State: closed - Opened by likku123 over 1 year ago - 5 comments

#540 - Unable to deploy GPU Operator on MicroShift 4.13

Issue - State: closed - Opened by sjug over 1 year ago - 3 comments

#539 - GPU operator validator fails to create host device symlinks

Issue - State: open - Opened by adamancini over 1 year ago - 2 comments

#538 - GPU Operator Install with Terraform Not Working - Chart Not found

Issue - State: closed - Opened by nirajdesai2909 over 1 year ago - 3 comments

#537 - Relabelings for ServiceMonitor

Issue - State: closed - Opened by faurik over 1 year ago - 1 comment

#536 - Unable to cordon nodes

Issue - State: open - Opened by guyst16 over 1 year ago - 1 comment

#535 - dcgm-exporter missing metrics for A100 when mig enabled

Issue - State: closed - Opened by alloydm over 1 year ago

#534 - TEST CI

Pull Request - State: open - Opened by rorajani over 1 year ago - 1 comment
Labels: invalid

#532 - Does this tool support windows nodes?

Issue - State: open - Opened by skiwheelr over 1 year ago - 1 comment

#529 - GPU Operator installation failure in AKS

Issue - State: closed - Opened by sidharthkumarpradhan over 1 year ago - 9 comments

#527 - Problem configuring vGPU access using Kubevirt

Issue - State: open - Opened by nadav213000 over 1 year ago - 14 comments

#525 - GPU-Operator does not install the specified driver version in AKS GPU Node

Issue - State: closed - Opened by xcheng85 over 1 year ago - 3 comments

#524 - Overriding The PrometheusRule Objects Alerts

Issue - State: open - Opened by guyst16 over 1 year ago - 1 comment
Labels: enhancement

#522 - Problem installing gpu-operator on rke2

Issue - State: closed - Opened by aavbsouza over 1 year ago - 5 comments

#521 - gpu-operator as OCI artifact

Issue - State: open - Opened by dioguerra over 1 year ago - 5 comments

#519 - Documentation clarification about containerd tweaks

Issue - State: open - Opened by aavbsouza over 1 year ago - 5 comments

#518 - Rename mig resources

Issue - State: closed - Opened by maaft over 1 year ago - 4 comments

#517 - rename

Issue - State: closed - Opened by maaft over 1 year ago

#516 - nvidia-settings and nvidia-xconfig not mounted to Pods

Issue - State: open - Opened by elgalu over 1 year ago - 2 comments

#515 - Add priorityClassName to nfd's pods

Pull Request - State: open - Opened by boniek83 over 1 year ago

#514 - Add priorityClassName to nfd's pods

Issue - State: open - Opened by boniek83 over 1 year ago - 1 comment

#512 - feature discovery worker pod unable to connect to worker node

Issue - State: closed - Opened by gakshat14 over 1 year ago - 2 comments

#510 - I cannot install because plugin-validation and cuda-validation fail.

Issue - State: closed - Opened by koh-hr over 1 year ago - 8 comments

#508 - Interaction between operator-validator and device-plugin causes error state.

Issue - State: open - Opened by neggert over 1 year ago - 3 comments

#507 - Feature request: More support for Ada/Hopper Generation gpus

Issue - State: open - Opened by hy-tomas-terala over 1 year ago - 2 comments

#506 - Does gpu-operaor's MIG work with AWS A10G?

Issue - State: closed - Opened by randxie over 1 year ago - 2 comments

#505 - DCGM Exporter breaks after upgrade to 22.9.12

Issue - State: closed - Opened by dcarrion87 over 1 year ago - 5 comments

#501 - notebook nvidia-smi command show nothing

Issue - State: closed - Opened by tvtv511 over 1 year ago

#496 - [BUG]: console-plugin-nvidia-gpu

Issue - State: closed - Opened by grvn over 1 year ago - 6 comments

#493 - dcgmproftester pod from install docs using outdated cuda

Issue - State: open - Opened by benlsheets over 1 year ago - 3 comments

#490 - Deprecated API used

Issue - State: closed - Opened by tormig-softronic over 1 year ago - 5 comments

#489 - Bump golang.org/x/net from 0.1.0 to 0.7.0

Pull Request - State: closed - Opened by dependabot[bot] over 1 year ago - 2 comments
Labels: dependencies

#488 - NVIDIA GPU Operator installation failed with Helm

Issue - State: closed - Opened by somethingwentwell almost 2 years ago - 2 comments

#487 - Some pods are stuck in init on one of our clusters

Issue - State: open - Opened by Alwinator almost 2 years ago - 8 comments

#484 - gpu-operator fails to start due to deletion of nonexistent resources

Issue - State: closed - Opened by xknight almost 2 years ago - 8 comments

#483 - How do I install using Kustomize?

Issue - State: open - Opened by choyuansu almost 2 years ago - 1 comment

#482 - gpu-operator cannot discover the newly added GPU

Issue - State: open - Opened by zhouhao3 almost 2 years ago - 3 comments

#481 - GPU Operator reconciliation loop failed

Issue - State: closed - Opened by arpitsharma-hexad almost 2 years ago - 3 comments

#480 - device-plugin-validator fails if all gpu resources are allocated on a node

Issue - State: open - Opened by dcarrion87 almost 2 years ago - 11 comments

#479 - A problem that labels are not normally created when using custom-config

Issue - State: open - Opened by brinst07 almost 2 years ago - 1 comment

#478 - ClusterPolicy generated by Helm chart is not valid

Issue - State: closed - Opened by mkjpryor almost 2 years ago - 1 comment

#477 - gpu-operator injecting runtimeClass after transitioning containerd runtime node

Issue - State: open - Opened by dcarrion87 almost 2 years ago - 10 comments

#476 - Forced Driver Update with v22.9.1

Issue - State: open - Opened by BCJuan almost 2 years ago - 2 comments

#475 - [Feature Request] Make nvidia-operator-validator add a validation successful label or taint on the node

Issue - State: open - Opened by chiragjn almost 2 years ago - 4 comments
Labels: enhancement

#474 - Failed to remove GPU in nvidia-driver container

Issue - State: open - Opened by zhouhao3 almost 2 years ago - 5 comments

#473 - NVIDIA Container Toolkit fails to set default runtime on RKE2

Issue - State: closed - Opened by eabochasjauregui almost 2 years ago - 13 comments

#472 - PODNAME not populated in DCGM metrics

Issue - State: open - Opened by harjitdotsingh almost 2 years ago

#471 - HPA using gpu-operator

Issue - State: open - Opened by harjitdotsingh almost 2 years ago

#470 - OpenShift GPU operator only working on 1/2 physical nodes properly

Issue - State: closed - Opened by mgiessing almost 2 years ago - 2 comments

#468 - Changing Node workload type on running node

Issue - State: open - Opened by nadav213000 almost 2 years ago - 1 comment

#467 - Not able to see DCGM Metrics in prometheus

Issue - State: closed - Opened by harjitdotsingh almost 2 years ago - 3 comments

#466 - Gpu operator does not work with cri-o user namespaces

Issue - State: open - Opened by robertdavidsmith almost 2 years ago - 1 comment

#465 - Time-slicing with multiple GPUs - asking for ability to block single GPU

Issue - State: open - Opened by Alexbay218 almost 2 years ago - 1 comment
Labels: enhancement

#464 - Will gpu-operator support Rocky linux in the furture?

Issue - State: open - Opened by carlwang87 almost 2 years ago

#462 - [Feature Request] console-plugin-nvidia-gpu / GPU Operator Dashboard per project

Issue - State: closed - Opened by Alwinator almost 2 years ago - 2 comments

#461 - Cluster policy templating broken with default values

Issue - State: closed - Opened by danmx almost 2 years ago - 5 comments

#460 - Changing MIG strategy while Kubernetes cluster and gpu-operator running

Issue - State: closed - Opened by esparig almost 2 years ago - 2 comments

#458 - BREAKS ON 1.25: Does not work on k8s 1.25 due to node API deprecation

Issue - State: open - Opened by sfxworks almost 2 years ago - 16 comments

#457 - v22.9.0 - nvidia-driver-daemonset/nvidia-driver-ctr fails to start

Issue - State: closed - Opened by jeremy-london almost 2 years ago - 11 comments

#455 - Possible incompatibility with cpumanager, memorymanager, or topologymanager.

Issue - State: open - Opened by benlsheets almost 2 years ago - 3 comments

#454 - About the behavior of GPU-Operator when updating EUS

Issue - State: open - Opened by kousui-dev almost 2 years ago - 13 comments

#452 - Chcon command fails in nvidia-driver init - nvidia driver installation aborts

Issue - State: closed - Opened by snirkatriel almost 2 years ago - 5 comments

#451 - gpu-operator - deprecated API 1.25 call in audit log

Issue - State: closed - Opened by jpeimer almost 2 years ago - 3 comments

#441 - Error: failed to create FS watcher: too many open files

Issue - State: closed - Opened by EajksEajks about 2 years ago - 6 comments

#439 - DCGM exporter NodePort vs ClusterIP

Issue - State: closed - Opened by dcarrion87 about 2 years ago - 1 comment

#432 - Failed to get sandbox runtime: no runtime for nvidia is configured

Issue - State: open - Opened by denissabramovs about 2 years ago - 32 comments

#430 - Failed to initialize NVML: Unknown Error

Issue - State: open - Opened by hoangtnm about 2 years ago - 27 comments

#429 - gpu-operator-nfd-worker fails to read net interface attribute speed

Issue - State: closed - Opened by yotama-anv about 2 years ago - 13 comments

#422 - console-plugin-nvidia-gpu / GPU Operator Dashboard not showing

Issue - State: closed - Opened by Alwinator about 2 years ago - 8 comments