Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / NVIDIA/gpu-operator issues and pull requests
#978 - disable privileged mode for toolkit-validation init containers
Pull Request -
State: closed - Opened by tariq1890 2 months ago
#977 - Bump github.com/prometheus/client_golang from 1.20.2 to 1.20.3
Pull Request -
State: closed - Opened by dependabot[bot] 2 months ago
Labels: dependencies
#976 - add gpu driver 560.35.03
Pull Request -
State: closed - Opened by tariq1890 2 months ago
#975 - Bump golang.org/x/mod from 0.20.0 to 0.21.0
Pull Request -
State: closed - Opened by dependabot[bot] 2 months ago
Labels: dependencies
#974 - Add validate nouveau whether in blacklist
Issue -
State: open - Opened by lengrongfu 2 months ago
- 2 comments
Labels: feature
#973 - Bump github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring from 0.76.0 to 0.76.1
Pull Request -
State: closed - Opened by dependabot[bot] 2 months ago
Labels: dependencies
#972 - Bump NVIDIA/holodeck from 0.2.1 to 0.2.4
Pull Request -
State: open - Opened by dependabot[bot] 3 months ago
- 1 comment
Labels: dependencies, github_actions
#971 - Is k8s 1.21 no longer supported by any version of gpu-operator?
Issue -
State: closed - Opened by qingfenghcy 3 months ago
- 3 comments
#970 - Broken driver toolkit detected with v24.6.1 and OpenShift 4.15.22
Issue -
State: closed - Opened by tlagrenfkit 3 months ago
- 4 comments
#969 - add make target to automate sync'ing of generated crds into helm and olm artifacts
Pull Request -
State: closed - Opened by tariq1890 3 months ago
#968 - sync generated assets with controller-tools v0.16.2
Pull Request -
State: closed - Opened by tariq1890 3 months ago
#967 - add support configuring tolerations in cleanupCRD JOB
Pull Request -
State: closed - Opened by tariq1890 3 months ago
#966 - Bump sigs.k8s.io/controller-tools from 0.16.1 to 0.16.2 in /tools
Pull Request -
State: closed - Opened by dependabot[bot] 3 months ago
Labels: dependencies
#965 - Bump github.com/Masterminds/sprig/v3 from 3.2.3 to 3.3.0
Pull Request -
State: closed - Opened by dependabot[bot] 3 months ago
Labels: dependencies
#964 - drop the DIST (ubi9) suffix in image tags
Pull Request -
State: closed - Opened by tariq1890 3 months ago
#963 - Bump github.com/onsi/ginkgo/v2 from 2.20.1 to 2.20.2
Pull Request -
State: closed - Opened by dependabot[bot] 3 months ago
Labels: dependencies
#962 - Bump github.com/onsi/gomega from 1.34.1 to 1.34.2
Pull Request -
State: closed - Opened by dependabot[bot] 3 months ago
Labels: dependencies
#961 - GPU Operator pod complains no nodes matching the given node selector for gpu-driver
Issue -
State: open - Opened by shankarpentyala07 3 months ago
#960 - add support for configuring tolerations in upgrade-crd hook
Pull Request -
State: closed - Opened by tariq1890 3 months ago
#959 - Bump github.com/operator-framework/api from 0.26.0 to 0.27.0
Pull Request -
State: closed - Opened by dependabot[bot] 3 months ago
Labels: dependencies
#958 - bump k8s-operator-libs dep to the latest version
Pull Request -
State: closed - Opened by tariq1890 3 months ago
#957 - Bump github.com/prometheus/client_golang from 1.19.1 to 1.20.2
Pull Request -
State: closed - Opened by dependabot[bot] 3 months ago
Labels: dependencies
#956 - update gitlab CI and vfio-manager to use ubi9 DIST
Pull Request -
State: closed - Opened by tariq1890 3 months ago
#955 - update dependabot gomod update frequency to daily
Pull Request -
State: closed - Opened by tariq1890 3 months ago
#954 - update go and golangci-lint versions
Pull Request -
State: closed - Opened by tariq1890 3 months ago
#953 - [CVE-2024-41110] bump go-helm-client to v0.12.13
Pull Request -
State: closed - Opened by tariq1890 3 months ago
#952 - update operator initContainer and vfioManager images to 12.6.0
Pull Request -
State: closed - Opened by tariq1890 3 months ago
- 2 comments
#951 - Unexpected GPU Allocation with NVIDIA_VISIBLE_DEVICES in Kubernetes
Issue -
State: open - Opened by qiangyupei 3 months ago
- 5 comments
#950 - Bump github.com/onsi/ginkgo/v2 from 2.19.0 to 2.20.1
Pull Request -
State: closed - Opened by dependabot[bot] 3 months ago
Labels: dependencies
#949 - Allow custom metrics for DCGM Exporter
Pull Request -
State: closed - Opened by chipzoller 3 months ago
- 9 comments
#948 - Bump NVIDIA/holodeck from 0.2.1 to 0.2.3
Pull Request -
State: closed - Opened by dependabot[bot] 3 months ago
- 1 comment
Labels: dependencies, github_actions
#947 - Revert "Bump NVIDIA/holodeck from 0.2.1 to 0.2.3"
Pull Request -
State: closed - Opened by tariq1890 3 months ago
#946 - Nvidia Operator Documentation for installing on Amazon Linux 2023 and Bottle Rocket AMIs
Issue -
State: open - Opened by dgr237 3 months ago
- 1 comment
#945 - GPU-operator 安装失败,安装后pod状态不是running
Issue -
State: open - Opened by 452256 3 months ago
- 3 comments
#944 - Support for Configuring GPU Access for Both Containers and VMs on the Same Node
Issue -
State: open - Opened by ss-armada 3 months ago
- 4 comments
Labels: feature
#943 - add H800 GPU to the MIG configmap
Pull Request -
State: closed - Opened by tariq1890 3 months ago
#942 - [Feature] Support to set power limit through the gpu-operator
Issue -
State: open - Opened by Krast76 3 months ago
- 4 comments
Labels: feature
#941 - Bump NVIDIA/holodeck from 0.2.1 to 0.2.3
Pull Request -
State: closed - Opened by dependabot[bot] 3 months ago
Labels: dependencies, github_actions
#939 - Bump sigs.k8s.io/controller-runtime from 0.18.4 to 0.19.0
Pull Request -
State: closed - Opened by dependabot[bot] 3 months ago
Labels: dependencies
#938 - Bump the k8sio group across 1 directory with 3 updates
Pull Request -
State: closed - Opened by dependabot[bot] 3 months ago
- 1 comment
Labels: dependencies
#937 - update base image to the ubi9 variant
Pull Request -
State: closed - Opened by tariq1890 3 months ago
- 1 comment
#936 - toolkit-validation container errors out with "nvidia-smi": executable file not found in $PATH after migrating to containerd
Issue -
State: closed - Opened by jpdstan 3 months ago
- 1 comment
#935 - Document `config` object in DCGM Exporter values
Pull Request -
State: closed - Opened by chipzoller 3 months ago
- 2 comments
#934 - [Feature] Support for new `customMetrics` value in DCGM Exporter
Issue -
State: closed - Opened by chipzoller 3 months ago
- 8 comments
Labels: feature
#933 - 550.90.07-5.15.0-1061-gke-ubuntu22.04 image tag not found when installing with `driver.usePrecompiled` on GKE
Issue -
State: open - Opened by chipzoller 3 months ago
- 6 comments
#932 - Bump sigs.k8s.io/controller-tools from 0.15.0 to 0.16.1 in /tools
Pull Request -
State: closed - Opened by dependabot[bot] 3 months ago
- 2 comments
#930 - Bump github.com/prometheus-operator/prometheus-operator/pkg/apis/monitoring from 0.73.2 to 0.76.0
Pull Request -
State: closed - Opened by dependabot[bot] 3 months ago
Labels: dependencies
#929 - Bump github.com/urfave/cli/v2 from 2.27.2 to 2.27.4
Pull Request -
State: closed - Opened by dependabot[bot] 3 months ago
Labels: dependencies
#925 - The installation of GPU Operator version v24.6.1 fails to install the /usr/local/nvidia/toolkit
Issue -
State: open - Opened by coderRenxy 3 months ago
- 1 comment
#923 - Bump k8s.io/code-generator from 0.30.2 to 0.31.0 in /tools
Pull Request -
State: closed - Opened by dependabot[bot] 3 months ago
- 2 comments
#919 - Bump nvidia/cuda from 12.5.1-base-ubi8 to 12.6.0-base-ubi8 in /validator
Pull Request -
State: closed - Opened by dependabot[bot] 3 months ago
Labels: dependencies, docker
#914 - gpu-operator executable is not on $PATH
Issue -
State: closed - Opened by ashvin-pidaparti 3 months ago
- 7 comments
#911 - Bump github.com/onsi/ginkgo/v2 from 2.19.0 to 2.20.0
Pull Request -
State: closed - Opened by dependabot[bot] 3 months ago
- 1 comment
Labels: dependencies
#910 - usePrecompiled and new versions
Issue -
State: open - Opened by easyrider14 3 months ago
- 1 comment
#908 - Update OLM bundle to use staging images built from release branch
Pull Request -
State: open - Opened by cdesiniotis 3 months ago
#904 - node-feature-discovery of gpu-operator sends excessive LIST requests to the API server
Issue -
State: open - Opened by jslouisyou 3 months ago
- 3 comments
#895 - Bump github.com/onsi/gomega from 1.33.1 to 1.34.1
Pull Request -
State: closed - Opened by dependabot[bot] 4 months ago
Labels: dependencies
#884 - Bump github.com/docker/docker from 25.0.5+incompatible to 25.0.6+incompatible
Pull Request -
State: closed - Opened by dependabot[bot] 4 months ago
- 6 comments
Labels: dependencies, go
#878 - update prometheus-operator to version v0.75.2
Pull Request -
State: closed - Opened by ajayk 4 months ago
- 7 comments
#860 - Bump sigs.k8s.io/kustomize/kustomize/v5 from 5.4.2 to 5.4.3 in /tools
Pull Request -
State: closed - Opened by dependabot[bot] 4 months ago
- 2 comments
#859 - RuntimeClass apiversion not right,I am using the latest gpu operator in master branch
Issue -
State: open - Opened by 13567436138 4 months ago
- 4 comments
#850 - changes to allow custom labels for ServiceMonitor
Pull Request -
State: open - Opened by csauoss 4 months ago
- 3 comments
#840 - Bump cuda-samples version to 12.5.0 in validator
Pull Request -
State: open - Opened by cdesiniotis 4 months ago
- 1 comment
#808 - Configure NVIDIA_CDI_HOOK_PATH envvar in mig-manager
Pull Request -
State: closed - Opened by cdesiniotis 5 months ago
#751 - Enabling gpu on microk8s, pod/nvidia-driver-daemonset restart many times at status CrashLoopBackOff
Issue -
State: closed - Opened by haiph-dev 5 months ago
- 4 comments
#733 - Enable the use of CDI on OpenShift
Pull Request -
State: open - Opened by cdesiniotis 5 months ago
- 1 comment
#732 - Migrate from ClusterPolicy to NVIDIADriver owned driver daemonsets
Pull Request -
State: open - Opened by cdesiniotis 5 months ago
#725 - Use NVIDIADriver CRD to install GPU driver in a centos7 and a ubuntu22.04
Issue -
State: closed - Opened by lengrongfu 6 months ago
- 5 comments
Labels: feature
#722 - Ubuntu 24.04 Image Missing For nvidia-driver-daemonset
Issue -
State: open - Opened by isugimpy 6 months ago
- 6 comments
Labels: feature
#684 - Allocatable gpu value not correct after configuring time slicing
Issue -
State: open - Opened by shashiranjan84 8 months ago
- 5 comments
#681 - Allow annotations to be added to just to the nvidia-dcgm-node-exporter daemonset for datadog monitoring via helm install
Issue -
State: open - Opened by flowinh2o 8 months ago
- 4 comments
#670 - validator: should validate GPU healthiness by using DCGM
Issue -
State: open - Opened by Dentrax 9 months ago
- 2 comments
Labels: feature
#658 - gpu-operator-nfd-worker fails to read net interface attribute speed
Issue -
State: open - Opened by blackliner 10 months ago
- 8 comments
#642 - k8s-driver-manager use a unified version
Issue -
State: open - Opened by lengrongfu 11 months ago
- 3 comments
#620 - When there are nodes with containerd runtime in the cluster, it can cause nodes running on the Docker runtime to break down
Issue -
State: closed - Opened by quanguachong 12 months ago
- 1 comment
#607 - No GPU node in the cluster, do not create DaemonSets
Issue -
State: open - Opened by joshpwrk about 1 year ago
- 8 comments
#596 - Safely rolling out gpu-operator components (toolkit and device plugin) - maxSurge support?
Issue -
State: closed - Opened by chiragjn about 1 year ago
- 1 comment
#586 - gpu-operator filter not enabled component
Issue -
State: closed - Opened by lengrongfu about 1 year ago
- 2 comments
#567 - Toolkit DaemonSet stuck in init phase after upgrade
Issue -
State: open - Opened by heilerich over 1 year ago
- 5 comments
#566 - gpu driver is in init state after rebooting the gpu node
Issue -
State: open - Opened by alloydm over 1 year ago
- 1 comment
#565 - Support matrix - RedHat OpenShift Container Platform 4.10/11 and Nvidia GPU Operator 22.9.2
Issue -
State: open - Opened by LiquIDMeowz over 1 year ago
- 2 comments
#564 - The GPU Operator driver build fails on GCP when using Ubuntu 22.04.
Issue -
State: open - Opened by uniit over 1 year ago
- 1 comment
#563 - How can I make a pod use a specific GPU from a k8s node?
Issue -
State: closed - Opened by FLM210 over 1 year ago
- 4 comments
#562 - Heterogenous cluster with airgap failing to detect customrepo configmap
Issue -
State: open - Opened by alloydm over 1 year ago
- 1 comment
#561 - Update issue templates
Pull Request -
State: closed - Opened by shivamerla over 1 year ago
#560 - imagepullbackoff on the nvidia-operator w/ nvcr.io/nvidia/cuda
Issue -
State: open - Opened by jayunit100 over 1 year ago
- 2 comments
#559 - Install gpu-operator will cause GPU Error
Issue -
State: closed - Opened by quanguachong over 1 year ago
- 4 comments
#558 - Need support to be able to deploy Gpu-Operator & driver chart on debian11 based node
Issue -
State: open - Opened by ayuzzz over 1 year ago
#557 - Need support to be able to deploy Gpu-Operator & driver chart on debian11 based node
Issue -
State: open - Opened by ayuzzz over 1 year ago
#556 - kubevirt vm-passthrough Tesla M6 strange name : ARIEL_DEVICE_24_FUNCTION_3
Issue -
State: closed - Opened by jear over 1 year ago
- 3 comments
#555 - Fix reconciliation failure monitor.
Pull Request -
State: closed - Opened by montaguethomas over 1 year ago
- 1 comment
#554 - vGPU device manager pod stuck in init container phase
Issue -
State: open - Opened by aravindgpd over 1 year ago
- 1 comment
#553 - GPU Operator with RHEL8/SElinux: Driver Container failed to deploy if SELinux Enforcing mode is activated. Error message: modprobe: ERROR: could not insert 'nvidia': Permission denied
Issue -
State: open - Opened by francisguillier over 1 year ago
- 1 comment
#552 - nvidia container toolkit stay in waiting to start
Issue -
State: open - Opened by yingding over 1 year ago
- 1 comment
#551 - Issue processing pci.ids in vm-passthrough causes resourcename to fail and second device not to be registered
Issue -
State: closed - Opened by StefanDeltaRay over 1 year ago
- 7 comments
#550 - GPU Operator with KubeVirt - Node Configuration
Issue -
State: open - Opened by doronkg over 1 year ago
- 2 comments
#549 - Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox runtime: no runtime for "nvidia" is configure
Issue -
State: open - Opened by BartoszZawadzki over 1 year ago
- 3 comments
#548 - Error on PCI Passthrough using new L40 Openshift 4.1x
Issue -
State: closed - Opened by clrfuerst over 1 year ago
- 2 comments
#547 - Deploy nvidia-device-plugin-daemonset to only certain nodes
Issue -
State: open - Opened by Hayes-buzzni over 1 year ago
#546 - ServiceAccount cannot create resource "nodefeatures"
Issue -
State: open - Opened by benbaker76 over 1 year ago
- 1 comment