Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / NVIDIA/dcgm-exporter issues and pull requests

#420 - DCGM-Exporter release 3.3.9-3.6.1

Pull Request - State: closed - Opened by glowkey 7 days ago

#418 - DCGM_FI_DEV_GPU_UTIL abnormal point

Issue - State: open - Opened by dafu-wu 8 days ago - 1 comment
Labels: bug

#417 - dcgm-exporter counter value goes down

Issue - State: open - Opened by luccabb 11 days ago - 1 comment
Labels: bug

#415 - Checksum mismatch for github.com/emicklei/go-restful/[email protected]

Issue - State: open - Opened by WilliamVenner 18 days ago - 1 comment
Labels: bug

#414 - Compiled locally, server runs, fails

Issue - State: closed - Opened by basi-a 20 days ago
Labels: question

#413 - fix: bump grpc dependency to 1.64.1

Pull Request - State: closed - Opened by pintohutch 21 days ago

#412 - Segfaults with dcgm-exporter 3.3.0 and higher

Issue - State: open - Opened by andrewjamesbrown 26 days ago - 4 comments
Labels: bug

#411 - Pod and Namespace Labels Missing in dcgm-exporter Metrics

Issue - State: open - Opened by qimike 26 days ago - 2 comments

#410 - Can dcgm-export be used with Apptainer instead of Docker?

Issue - State: closed - Opened by sorenwacker 27 days ago - 3 comments
Labels: question

#407 - Why SYS_ADMIN is required?

Issue - State: open - Opened by Yvanll about 1 month ago - 1 comment

#407 - Why SYS_ADMIN is required?

Issue - State: open - Opened by Yvanll about 1 month ago - 1 comment

#406 - Fix Helm Templates Generation

Pull Request - State: closed - Opened by Indresh2410 about 1 month ago - 4 comments

#405 - Helm templates not getting populated when built from source

Issue - State: closed - Opened by Indresh2410 about 1 month ago
Labels: bug

#404 - Is RTX4090 supported?

Issue - State: closed - Opened by fzyzcjy about 1 month ago - 2 comments

#404 - Is RTX4090 supported?

Issue - State: closed - Opened by fzyzcjy about 1 month ago - 2 comments

#403 - Maintain uniformity with helm chart and static yaml's by adding securityContext

Pull Request - State: closed - Opened by Indresh2410 about 1 month ago - 1 comment

#402 - Maintain uniformity with helm chart and static yaml's

Issue - State: closed - Opened by Indresh2410 about 1 month ago - 10 comments
Labels: bug

#401 - can exporter the uce error?

Issue - State: open - Opened by zhucan about 1 month ago - 1 comment
Labels: bug

#401 - can exporter the uce error?

Issue - State: open - Opened by zhucan about 1 month ago - 1 comment
Labels: bug

#400 - Overhead of Enabling `DCGM_FI_PROF_SM_ACTIVE` and `DCGM_FI_PROF_SM_OCCUPANCY` Metrics

Issue - State: closed - Opened by hongpeng-guo about 1 month ago - 2 comments
Labels: question

#400 - Overhead of Enabling `DCGM_FI_PROF_SM_ACTIVE` and `DCGM_FI_PROF_SM_OCCUPANCY` Metrics

Issue - State: closed - Opened by hongpeng-guo about 1 month ago - 2 comments
Labels: question

#399 - I want to see how many GPU cores have been allocated to each container through metrics.

Issue - State: open - Opened by changhyuni about 1 month ago
Labels: enhancement

#399 - I want to see how many GPU cores have been allocated to each container through metrics.

Issue - State: open - Opened by changhyuni about 1 month ago
Labels: enhancement

#397 - can not collect gpu utilization metric when mig enable for some pods

Issue - State: open - Opened by melikeiremguler about 2 months ago - 1 comment
Labels: bug

#396 - doc: golang >= 1.23 is required

Pull Request - State: closed - Opened by stas00 about 2 months ago - 2 comments

#396 - doc: golang >= 1.23 is required

Pull Request - State: closed - Opened by stas00 about 2 months ago - 2 comments

#395 - DCGM_FI_PROF_GR_ENGINE_ACTIVE not emitted on system with more than one GPU

Issue - State: closed - Opened by chipzoller about 2 months ago - 2 comments
Labels: question

#395 - DCGM_FI_PROF_GR_ENGINE_ACTIVE not emitted on system with more than one GPU

Issue - State: closed - Opened by chipzoller about 2 months ago - 2 comments
Labels: question

#394 - Bug with DCGM_FI_DEV_VGPU_INSTANCE_IDS metric

Issue - State: closed - Opened by Deezzir about 2 months ago - 7 comments
Labels: bug

#394 - Bug with DCGM_FI_DEV_VGPU_INSTANCE_IDS metric

Issue - State: closed - Opened by Deezzir about 2 months ago - 7 comments
Labels: bug

#393 - dcgm-exporter daemonset Startup error Failed to pass the health check

Issue - State: open - Opened by guoliangmiao 2 months ago - 2 comments
Labels: question

#393 - dcgm-exporter daemonset Startup error Failed to pass the health check

Issue - State: open - Opened by guoliangmiao 2 months ago - 2 comments
Labels: question

#391 - Service monitor API value configurable

Pull Request - State: closed - Opened by dtzar 2 months ago

#391 - Service monitor API value configurable

Pull Request - State: closed - Opened by dtzar 2 months ago

#390 - DCGM-Exporter release 3.3.8-3.6.0

Pull Request - State: closed - Opened by glowkey 2 months ago

#389 - Missing 3.3.8 builds

Issue - State: closed - Opened by xnox 2 months ago - 2 comments
Labels: bug

#389 - Missing 3.3.8 builds

Issue - State: closed - Opened by xnox 2 months ago - 2 comments
Labels: bug

#388 - DCGM Exporter does not collect individual pod metrics when MPS is enabled in Kubernetes

Issue - State: closed - Opened by valafon 2 months ago - 1 comment
Labels: bug

#388 - DCGM Exporter does not collect individual pod metrics when MPS is enabled in Kubernetes

Issue - State: closed - Opened by valafon 2 months ago - 1 comment
Labels: bug

#387 - DCGM Exporter in EKS p4d.24xlarge instance type controller error

Issue - State: open - Opened by camilopaezrios 3 months ago
Labels: bug

#387 - DCGM Exporter in EKS p4d.24xlarge instance type controller error

Issue - State: open - Opened by camilopaezrios 3 months ago
Labels: bug

#385 - DCGM-exporter pods stuck in Running State, Not getting Ready without GPU allocation.

Issue - State: open - Opened by rohitreddy1698 3 months ago - 12 comments
Labels: question

#384 - Add a health status metric for every gpu card

Issue - State: open - Opened by lx1036 3 months ago - 1 comment
Labels: question

#384 - Add a health status metric for every gpu card

Issue - State: open - Opened by lx1036 3 months ago - 1 comment
Labels: question

#383 - How does the DCGM exporter work with DCGM?

Issue - State: closed - Opened by changhyuni 3 months ago - 3 comments
Labels: question

#383 - How does the DCGM exporter work with DCGM?

Issue - State: closed - Opened by changhyuni 3 months ago - 3 comments
Labels: question

#382 - fix: edit gitignore and require dir & file

Pull Request - State: closed - Opened by kschoi93 3 months ago - 6 comments

#382 - fix: edit gitignore and require dir & file

Pull Request - State: closed - Opened by kschoi93 3 months ago - 6 comments

#381 - Error with "make binary" operation in local development

Issue - State: open - Opened by kschoi93 3 months ago
Labels: bug

#381 - Error with "make binary" operation in local development

Issue - State: open - Opened by kschoi93 3 months ago
Labels: bug

#380 - No DCGM_FI_DEV_FB_FREE reported for MIG-enabled GPUs

Issue - State: open - Opened by george-kuanli-peng 3 months ago
Labels: bug

#378 - failed to transform metrics for transform 'podMapper'

Issue - State: open - Opened by jicki 3 months ago
Labels: bug

#378 - failed to transform metrics for transform 'podMapper'

Issue - State: open - Opened by jicki 3 months ago
Labels: bug

#376 - Update contribution doc to require signing

Issue - State: open - Opened by chipzoller 3 months ago

#375 - Allow selecting the service's ClusterIP

Pull Request - State: closed - Opened by remram44 3 months ago - 6 comments

#374 - Rename 'secuity' to 'security'

Pull Request - State: open - Opened by remram44 3 months ago - 6 comments

#374 - Rename 'secuity' to 'security'

Pull Request - State: closed - Opened by remram44 3 months ago - 6 comments

#370 - dcp metrics supports gpu architecture

Issue - State: closed - Opened by lxzjd 4 months ago - 4 comments
Labels: question

#369 - MIG device support for hpc_job metric labels

Issue - State: open - Opened by jbrobstw 4 months ago - 4 comments
Labels: enhancement

#368 - Start the recompiled dcgm-exporter fails to collect GPU metrics with an error

Issue - State: open - Opened by 15234660879 4 months ago - 3 comments
Labels: question

#368 - Start the recompiled dcgm-exporter fails to collect GPU metrics with an error

Issue - State: open - Opened by 15234660879 4 months ago - 3 comments
Labels: question

#367 - Let dcgm-exporter be a daemon

Issue - State: open - Opened by zvonkok 4 months ago - 5 comments
Labels: enhancement

#367 - Let dcgm-exporter be a daemon

Issue - State: open - Opened by zvonkok 4 months ago - 5 comments
Labels: enhancement

#366 - DCGM-Exporter release version 3.3.7-3.5.0

Pull Request - State: closed - Opened by glowkey 4 months ago

#366 - DCGM-Exporter release version 3.3.7-3.5.0

Pull Request - State: closed - Opened by glowkey 4 months ago

#365 - Can't collecting DCP metrics

Issue - State: open - Opened by jeffreyyjp 4 months ago - 4 comments
Labels: bug

#365 - Can't collecting DCP metrics

Issue - State: open - Opened by jeffreyyjp 4 months ago - 4 comments
Labels: bug

#364 - DCGM exporter image vulnerable to https://nvd.nist.gov/vuln/detail/CVE-2024-24790

Issue - State: open - Opened by alexglenn-ddl 4 months ago - 1 comment
Labels: question

#363 - dcgm-exporter dont show metrics from other namespaces and pods k8s

Issue - State: open - Opened by hive74 4 months ago - 12 comments
Labels: question

#363 - dcgm-exporter dont show metrics from other namespaces and pods k8s

Issue - State: open - Opened by hive74 4 months ago - 12 comments
Labels: question

#362 - dcgm-exporter log: No Kubelet socket, ignoring

Issue - State: closed - Opened by jeffreyyjp 4 months ago - 2 comments
Labels: bug

#361 - Protobuf handling is incorrect

Issue - State: open - Opened by fbacchella 4 months ago - 2 comments
Labels: bug

#361 - Protobuf handling is incorrect

Issue - State: open - Opened by fbacchella 4 months ago - 2 comments
Labels: bug

#360 - dcgm-exporter crashes when run on Debian 12

Issue - State: closed - Opened by stevenmcastano 4 months ago - 1 comment
Labels: bug

#359 - Make nvidia resource names configurable

Pull Request - State: closed - Opened by lx1036 5 months ago - 1 comment

#357 - Rename default PCIe metrics for better readability

Pull Request - State: closed - Opened by koshieguchi 5 months ago - 1 comment

#357 - Rename default PCIe metrics for better readability

Pull Request - State: closed - Opened by koshieguchi 5 months ago - 1 comment

#356 - Seeking community feedback on potential new feature: Standardize labels for next major release

Issue - State: open - Opened by glowkey 5 months ago - 6 comments
Labels: enhancement

#356 - Seeking community feedback on potential new feature: Standardize labels for next major release

Issue - State: open - Opened by glowkey 5 months ago - 6 comments
Labels: enhancement