Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / NVIDIA/dcgm-exporter issues and pull requests
#355 - [dashboard] Rework dashboard (MIG support, Grafana deprecations, Hostname)
Pull Request -
State: open - Opened by frittentheke 5 months ago
#354 - Why `DCGM_FI_DEV_PCIE_{TX,RX}_THROUGHPUT` is default instead of `DCGM_FI_PROF_PCIE_{TX,RX}_BYTES `?
Issue -
State: closed - Opened by koshieguchi 5 months ago
- 2 comments
Labels: question
#354 - Why `DCGM_FI_DEV_PCIE_{TX,RX}_THROUGHPUT` is default instead of `DCGM_FI_PROF_PCIE_{TX,RX}_BYTES `?
Issue -
State: closed - Opened by koshieguchi 5 months ago
- 2 comments
Labels: question
#353 - Duplicated, missing or wrong metrics if using MIG, Grafana dashboard showing wrong duplicated / false values
Issue -
State: open - Opened by frittentheke 5 months ago
- 2 comments
Labels: bug
#353 - Duplicated, missing or wrong metrics if using MIG, Grafana dashboard showing wrong duplicated / false values
Issue -
State: open - Opened by frittentheke 5 months ago
- 2 comments
Labels: bug
#352 - cannot get DCGM_FI_PROF_SM_ACTIVE metrics
Issue -
State: open - Opened by qingfenghcy 5 months ago
- 1 comment
Labels: question
#351 - [Helm] Enable custom metrics, mount ConfigMap by default
Pull Request -
State: closed - Opened by chipzoller 5 months ago
- 32 comments
#351 - [Helm] Enable custom metrics, mount ConfigMap by default
Pull Request -
State: closed - Opened by chipzoller 5 months ago
- 32 comments
#350 - [Helm] Enable ConfigMap mount by default
Pull Request -
State: closed - Opened by chipzoller 5 months ago
- 8 comments
#350 - [Helm] Enable ConfigMap mount by default
Pull Request -
State: closed - Opened by chipzoller 5 months ago
- 8 comments
#349 - enable DCGM_EXPORTER_KUBERNETES and podrequestapi is avaiable but not found container and namespace label in Metrics
Issue -
State: closed - Opened by Kevinz857 5 months ago
- 4 comments
Labels: bug
#349 - enable DCGM_EXPORTER_KUBERNETES and podrequestapi is avaiable but not found container and namespace label in Metrics
Issue -
State: closed - Opened by Kevinz857 5 months ago
- 4 comments
Labels: bug
#348 - GPU Failure Detection and Alerting Enhancement
Issue -
State: open - Opened by jz543fm 5 months ago
- 14 comments
Labels: enhancement
#347 - Cannot Retrieve GPU PIDs from DCGM Metrics
Issue -
State: closed - Opened by doronkg 5 months ago
- 4 comments
Labels: question
#347 - Cannot Retrieve GPU PIDs from DCGM Metrics
Issue -
State: closed - Opened by doronkg 5 months ago
- 4 comments
Labels: question
#346 - fix: correct metric help text
Pull Request -
State: closed - Opened by pintohutch 5 months ago
- 1 comment
#346 - fix: correct metric help text
Pull Request -
State: closed - Opened by pintohutch 5 months ago
- 1 comment
#345 - DCGM_FI_DEV_MEM_COPY_UTIL not correct always 1 or 2
Issue -
State: closed - Opened by xuchenCN 5 months ago
- 3 comments
Labels: bug
#345 - DCGM_FI_DEV_MEM_COPY_UTIL not correct always 1 or 2
Issue -
State: closed - Opened by xuchenCN 5 months ago
- 3 comments
Labels: bug
#344 - How to install dcgm-exporter on Windows Server?
Issue -
State: closed - Opened by LittleNewton 5 months ago
- 6 comments
Labels: question
#343 - How to obtain the namespace , pod and container data
Issue -
State: closed - Opened by aikikia 5 months ago
- 6 comments
Labels: question
#343 - How to obtain the namespace , pod and container data
Issue -
State: closed - Opened by aikikia 5 months ago
- 6 comments
Labels: question
#342 - `namespace` and `pod` labels are sometimes missing from metrics
Issue -
State: open - Opened by Altair-Bueno 6 months ago
- 16 comments
Labels: bug
#342 - `namespace` and `pod` labels are sometimes missing from metrics
Issue -
State: open - Opened by Altair-Bueno 6 months ago
- 16 comments
Labels: bug
#341 - Switch GPU Util metric to `DCGM_FI_PROF_GR_ENGINE_ACTIVE` in NVIDIA DCGM Metrics Dashboard
Issue -
State: open - Opened by wabouhamad 6 months ago
Labels: enhancement
#340 - exported_pod cause issue with query -> every sample a different metrics
Issue -
State: open - Opened by amir-bialek 6 months ago
- 3 comments
Labels: question
#340 - exported_pod cause issue with query -> every sample a different metrics
Issue -
State: open - Opened by amir-bialek 6 months ago
- 3 comments
Labels: question
#339 - can I get computeRunningProcesses and graphicsRunningProcesses this two metrics??
Issue -
State: closed - Opened by suxwang 6 months ago
- 1 comment
Labels: bug
#339 - can I get computeRunningProcesses and graphicsRunningProcesses this two metrics??
Issue -
State: closed - Opened by suxwang 6 months ago
- 1 comment
Labels: bug
#338 - config csv DCGM_FI_DEV_CORRECTABLE_REMAPPED_ROWS, but cannot get on metrics
Issue -
State: closed - Opened by suxwang 6 months ago
- 2 comments
Labels: bug
#338 - config csv DCGM_FI_DEV_CORRECTABLE_REMAPPED_ROWS, but cannot get on metrics
Issue -
State: closed - Opened by suxwang 6 months ago
- 2 comments
Labels: bug
#337 - I can't get the following metrics, but I've set the environment variable
Issue -
State: closed - Opened by kameriso-zga 6 months ago
- 6 comments
Labels: question
#336 - nvlink metrics are not available on the gh200 gpu node
Issue -
State: open - Opened by AnjirwalaAnuj 6 months ago
- 2 comments
Labels: question
#336 - nvlink metrics are not available on the gh200 gpu node
Issue -
State: open - Opened by AnjirwalaAnuj 6 months ago
- 2 comments
Labels: question
#335 - https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ is not signed
Issue -
State: closed - Opened by jjziets 6 months ago
- 2 comments
Labels: bug
#335 - https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ is not signed
Issue -
State: closed - Opened by jjziets 6 months ago
- 2 comments
Labels: bug
#330 - Failed to watch metrics: Error watching fields: The third-party Profiling module returned an u
Issue -
State: open - Opened by 287400117 6 months ago
- 2 comments
Labels: bug
#329 - Could not enable kubernetes metric collection: nvml: Unknown Error
Issue -
State: open - Opened by 287400117 6 months ago
- 2 comments
Labels: bug
#329 - Could not enable kubernetes metric collection: nvml: Unknown Error
Issue -
State: open - Opened by 287400117 6 months ago
- 2 comments
Labels: bug
#327 - hello,I use docker run -d --gpus all --rm -p 9400:9400 nvcr.io/nvidia/k8s/dcgm-exporter:3.3.6-3.4.2-ubuntu22.04 to start the container and an error message readlink: missing operand
Issue -
State: open - Opened by nvvfedorov 6 months ago
- 5 comments
#327 - hello,I use docker run -d --gpus all --rm -p 9400:9400 nvcr.io/nvidia/k8s/dcgm-exporter:3.3.6-3.4.2-ubuntu22.04 to start the container and an error message readlink: missing operand
Issue -
State: open - Opened by nvvfedorov 6 months ago
- 5 comments
#326 - feat: add pci_bus_id label for metrics
Pull Request -
State: closed - Opened by fungaren 6 months ago
- 5 comments
#321 - Cannot build from source
Issue -
State: closed - Opened by jz543fm 7 months ago
- 9 comments
Labels: bug
#321 - Cannot build from source
Issue -
State: closed - Opened by jz543fm 7 months ago
- 9 comments
Labels: bug
#318 - SIGSEGV: segmentation violation
Issue -
State: closed - Opened by amybachir 7 months ago
- 9 comments
Labels: bug
#318 - SIGSEGV: segmentation violation
Issue -
State: closed - Opened by amybachir 7 months ago
- 9 comments
Labels: bug
#317 - Failed to add DCGM_EXP_CLOCK_EVENTS_COUNT
Issue -
State: open - Opened by CodeBrek 7 months ago
- 8 comments
Labels: bug
#317 - Failed to add DCGM_EXP_CLOCK_EVENTS_COUNT
Issue -
State: open - Opened by CodeBrek 7 months ago
- 8 comments
Labels: bug
#316 - Missing NVLINK bandwidth metrics in dcgm-exporter
Issue -
State: open - Opened by jz543fm 7 months ago
- 6 comments
Labels: bug
#316 - Missing NVLINK bandwidth metrics in dcgm-exporter
Issue -
State: open - Opened by jz543fm 7 months ago
- 6 comments
Labels: bug
#314 - The pod for a given GPU in k8s mode cannot be captured
Issue -
State: open - Opened by rokkiter 8 months ago
- 8 comments
Labels: enhancement
#314 - The pod for a given GPU in k8s mode cannot be captured
Issue -
State: open - Opened by rokkiter 8 months ago
- 8 comments
Labels: enhancement
#312 - Extremely high GPU temperature reported by dcgm-exporter
Issue -
State: closed - Opened by age9990 8 months ago
- 7 comments
Labels: bug
#308 - Support collect detail error message with the xid
Issue -
State: closed - Opened by zhucan 8 months ago
- 1 comment
Labels: enhancement
#308 - Support collect detail error message with the xid
Issue -
State: closed - Opened by zhucan 8 months ago
- 1 comment
Labels: enhancement
#293 - How to get current device MIG model is single or mixed?
Issue -
State: open - Opened by lengrongfu 8 months ago
- 3 comments
Labels: question, action_required_from_requester
#293 - How to get current device MIG model is single or mixed?
Issue -
State: open - Opened by lengrongfu 8 months ago
- 3 comments
Labels: question, action_required_from_requester
#288 - Fix power calculation
Pull Request -
State: open - Opened by Apsu 8 months ago
- 2 comments
#288 - Fix power calculation
Pull Request -
State: open - Opened by Apsu 8 months ago
- 2 comments
#272 - Expose Container info for MIG enabled GPU
Issue -
State: open - Opened by krishh85 9 months ago
- 86 comments
Labels: bug
#272 - Expose Container info for MIG enabled GPU
Issue -
State: open - Opened by krishh85 9 months ago
- 86 comments
Labels: bug
#271 - sum of DCGM_FI_DEV_FB_USED and DCGM_FI_DEV_FB_FREE is not const
Issue -
State: open - Opened by ccding 9 months ago
- 6 comments
#257 - Attributing GPU power among MIG instances.
Issue -
State: closed - Opened by fali007 9 months ago
- 6 comments
#257 - Attributing GPU power among MIG instances.
Issue -
State: closed - Opened by fali007 9 months ago
- 6 comments
#242 - Segfault (SEGV) when upgrading 3.2.0 to 3.3.0
Issue -
State: closed - Opened by biz812 10 months ago
- 15 comments
#242 - Segfault (SEGV) when upgrading 3.2.0 to 3.3.0
Issue -
State: closed - Opened by biz812 10 months ago
- 15 comments
#238 - Collect container name even when not using K8S
Issue -
State: open - Opened by BryanQuigley 10 months ago
- 17 comments
Labels: enhancement
#238 - Collect container name even when not using K8S
Issue -
State: open - Opened by BryanQuigley 10 months ago
- 17 comments
Labels: enhancement
#199 - make kubelet pod-resources socket directory configurable
Pull Request -
State: closed - Opened by zclyne about 1 year ago
- 3 comments
#170 - Helm: exporter-metrics-config-map should not be applied by default
Issue -
State: closed - Opened by maingoh over 1 year ago
Labels: enhancement
#164 - Occasional metric loss and hangs in DCGM Exporter
Issue -
State: closed - Opened by zlseu-edu over 1 year ago
- 7 comments
#158 - Does Tesla P40 support DCP metric (DCGM-FI_PROF_ *)?
Issue -
State: open - Opened by asskss over 1 year ago
- 2 comments
#157 - ecc errors metrics
Issue -
State: open - Opened by jaywlm over 1 year ago
- 1 comment
#156 - Update go-dcgm bindings
Pull Request -
State: closed - Opened by glowkey over 1 year ago
#155 - can support nvlink/nvswitch throughput metrics?
Issue -
State: open - Opened by faryang-sh over 1 year ago
- 3 comments
#154 - kubernetes cluster deployment nvidia/dcgm-exporter:3.1.7-3.1.4-ubuntu20.04, container always quits
Issue -
State: closed - Opened by sanmv over 1 year ago
#153 - Remove unused mapPodMetrics helm chart setting
Pull Request -
State: closed - Opened by brannondorsey over 1 year ago
#152 - Does DCP metric (DCGM_FI_PROF_*)support RTX 3090 GPUs?
Issue -
State: open - Opened by asaderasxyz over 1 year ago
- 2 comments
#151 - no metrics labels about pod namespace/name when Pod uses time slicing GPU
Issue -
State: open - Opened by quanguachong over 1 year ago
- 1 comment
#150 - metric label pod、namespace empty
Issue -
State: open - Opened by lppsuixn over 1 year ago
- 3 comments
#149 - Bump version to 3.1.7-3.1.4
Pull Request -
State: closed - Opened by glowkey over 1 year ago
#148 - Error watching fields: The third-party Profiling module returned an unrecoverable error
Issue -
State: open - Opened by AlanFokCo over 1 year ago
- 2 comments
#147 - Why is DCGM_FI_DEV_MEM_COPY_UTIL not equal to DCGM_FI_DEV_FB_USED/(DCGM_FI_DEV_FB_FREE+DCGM_FI_DEV_FB_USED)?
Issue -
State: closed - Opened by IsQiao over 1 year ago
- 4 comments
#146 - Grafana dashboard: fix GPU Power Total
Pull Request -
State: closed - Opened by fschlich over 1 year ago
#145 - msg="Failed to collect metrics with error: Failed to transform metrics for transform unsupported KubernetesGPUIDType for MetricID 'device_name': podMapper"
Issue -
State: closed - Opened by suchisur over 1 year ago
- 1 comment
#144 - Not able to obtain per process GPU Utilization, no pods except dcgm-exporter itself available in the metrics collected. We are using Time Slicing GPU sharing between two pods on a single GPU node.
Issue -
State: open - Opened by suchisur over 1 year ago
- 7 comments
#143 - Running dcgm exporter without root privileges
Issue -
State: open - Opened by thekuffs over 1 year ago
#142 - go mod is outdated
Pull Request -
State: closed - Opened by sozercan over 1 year ago
#141 - dcgm-exporter vulnerable to CVE-2022-27664
Issue -
State: closed - Opened by MyStarInYourSky over 1 year ago
- 2 comments
#140 - How to stop dcgm-exporter from collecting metrics after pod termination?
Issue -
State: open - Opened by devnjw over 1 year ago
- 4 comments
#139 - Getting Metric not enabled on DCP metric
Issue -
State: open - Opened by avickars almost 2 years ago
- 1 comment
#138 - Pod label not coming up for some pod
Issue -
State: closed - Opened by tsingh-asapp almost 2 years ago
- 2 comments
#137 - Kernel panic when running on GKE
Issue -
State: closed - Opened by fredr almost 2 years ago
- 2 comments
#136 - Memory Metrics Incorrect
Issue -
State: closed - Opened by choyuansu almost 2 years ago
- 2 comments
#135 - Bump version to 3.1.6-3.1.3
Pull Request -
State: closed - Opened by glowkey almost 2 years ago
#134 - Issue 133 - remove kubernetes transforms for links and switches
Pull Request -
State: closed - Opened by glowkey almost 2 years ago
#133 - `dcgm-exporter` panics while attempting to associate metrics with pods
Issue -
State: closed - Opened by cjgibson almost 2 years ago
- 5 comments
#132 - Broken go.mod
Issue -
State: open - Opened by starry91 almost 2 years ago
- 3 comments
#131 - Filtering deployment to certain nodes
Issue -
State: closed - Opened by coleary-hyperscience almost 2 years ago
- 2 comments
#130 - Enable x-content-type-options in http header
Pull Request -
State: closed - Opened by glowkey almost 2 years ago