Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / NVIDIA/dcgm-exporter issues and pull requests
#425 - Memory usage increased 2.25x after upgrading from 3.3.6-3.4.2 to 3.3.9-3.6.1
Issue -
State: open - Opened by age9990 7 days ago
- 3 comments
#424 - 3.6.1 Helm chart not found in repository
Issue -
State: closed - Opened by otherguy 9 days ago
- 2 comments
Labels: bug
#424 - 3.6.1 Helm chart not found in repository
Issue -
State: closed - Opened by otherguy 9 days ago
- 2 comments
Labels: bug
#423 - Support collecting pod labels
Issue -
State: open - Opened by mtparet 9 days ago
- 1 comment
Labels: enhancement
#422 - DCGM is not getting loaded
Issue -
State: open - Opened by Pryz 10 days ago
- 3 comments
Labels: bug
#421 - Understand how exporter is able to query metrics
Issue -
State: closed - Opened by Indresh2410 10 days ago
- 2 comments
Labels: question
#420 - DCGM-Exporter release 3.3.9-3.6.1
Pull Request -
State: closed - Opened by glowkey 11 days ago
#419 - replace deprecated method grpc.DialContext in favour of grpc.NewClient
Pull Request -
State: open - Opened by tariq1890 11 days ago
#418 - DCGM_FI_DEV_GPU_UTIL abnormal point
Issue -
State: open - Opened by dafu-wu 12 days ago
- 2 comments
Labels: bug
#417 - dcgm-exporter counter value goes down
Issue -
State: open - Opened by luccabb 15 days ago
- 1 comment
Labels: bug
#416 - Not collecting GPU metrics; Error getting devices count: Cannot perform the requested operation because NVML doesn't exist on this system
Issue -
State: open - Opened by saichanumolu9 15 days ago
Labels: question
#415 - Checksum mismatch for github.com/emicklei/go-restful/[email protected]
Issue -
State: open - Opened by WilliamVenner 22 days ago
- 1 comment
Labels: bug
#414 - Compiled locally, server runs, fails
Issue -
State: closed - Opened by basi-a 24 days ago
Labels: question
#413 - fix: bump grpc dependency to 1.64.1
Pull Request -
State: closed - Opened by pintohutch 25 days ago
#412 - Segfaults with dcgm-exporter 3.3.0 and higher
Issue -
State: open - Opened by andrewjamesbrown 30 days ago
- 4 comments
Labels: bug
#411 - Pod and Namespace Labels Missing in dcgm-exporter Metrics
Issue -
State: open - Opened by qimike 30 days ago
- 3 comments
#410 - Can dcgm-export be used with Apptainer instead of Docker?
Issue -
State: closed - Opened by sorenwacker about 1 month ago
- 3 comments
Labels: question
#409 - Segmentation fault when running with the default configuration for the GPU Operator on kind
Issue -
State: open - Opened by klueska about 1 month ago
- 2 comments
Labels: bug
#408 - failed to transform metrics for transform 'podMapper'; err: failure getting pod resources;
Issue -
State: open - Opened by jicki about 1 month ago
Labels: bug
#407 - Why SYS_ADMIN is required?
Issue -
State: open - Opened by Yvanll about 1 month ago
- 1 comment
#407 - Why SYS_ADMIN is required?
Issue -
State: open - Opened by Yvanll about 1 month ago
- 1 comment
#406 - Fix Helm Templates Generation
Pull Request -
State: closed - Opened by Indresh2410 about 1 month ago
- 4 comments
#405 - Helm templates not getting populated when built from source
Issue -
State: closed - Opened by Indresh2410 about 1 month ago
Labels: bug
#404 - Is RTX4090 supported?
Issue -
State: closed - Opened by fzyzcjy about 1 month ago
- 2 comments
#404 - Is RTX4090 supported?
Issue -
State: closed - Opened by fzyzcjy about 1 month ago
- 2 comments
#403 - Maintain uniformity with helm chart and static yaml's by adding securityContext
Pull Request -
State: closed - Opened by Indresh2410 about 1 month ago
- 1 comment
#402 - Maintain uniformity with helm chart and static yaml's
Issue -
State: closed - Opened by Indresh2410 about 1 month ago
- 10 comments
Labels: bug
#401 - can exporter the uce error?
Issue -
State: open - Opened by zhucan about 1 month ago
- 1 comment
Labels: bug
#401 - can exporter the uce error?
Issue -
State: open - Opened by zhucan about 1 month ago
- 1 comment
Labels: bug
#400 - Overhead of Enabling `DCGM_FI_PROF_SM_ACTIVE` and `DCGM_FI_PROF_SM_OCCUPANCY` Metrics
Issue -
State: closed - Opened by hongpeng-guo about 1 month ago
- 2 comments
Labels: question
#400 - Overhead of Enabling `DCGM_FI_PROF_SM_ACTIVE` and `DCGM_FI_PROF_SM_OCCUPANCY` Metrics
Issue -
State: closed - Opened by hongpeng-guo about 1 month ago
- 2 comments
Labels: question
#399 - I want to see how many GPU cores have been allocated to each container through metrics.
Issue -
State: open - Opened by changhyuni about 2 months ago
Labels: enhancement
#399 - I want to see how many GPU cores have been allocated to each container through metrics.
Issue -
State: open - Opened by changhyuni about 2 months ago
Labels: enhancement
#398 - INFO[0000] Not collecting DCP metrics: This request is serviced by a module of DCGM that is not currently loaded
Issue -
State: open - Opened by fortminors about 2 months ago
- 5 comments
#398 - INFO[0000] Not collecting DCP metrics: This request is serviced by a module of DCGM that is not currently loaded
Issue -
State: open - Opened by fortminors about 2 months ago
- 5 comments
#397 - can not collect gpu utilization metric when mig enable for some pods
Issue -
State: open - Opened by melikeiremguler about 2 months ago
- 1 comment
Labels: bug
#396 - doc: golang >= 1.23 is required
Pull Request -
State: closed - Opened by stas00 about 2 months ago
- 2 comments
#396 - doc: golang >= 1.23 is required
Pull Request -
State: closed - Opened by stas00 about 2 months ago
- 2 comments
#395 - DCGM_FI_PROF_GR_ENGINE_ACTIVE not emitted on system with more than one GPU
Issue -
State: closed - Opened by chipzoller about 2 months ago
- 2 comments
Labels: question
#395 - DCGM_FI_PROF_GR_ENGINE_ACTIVE not emitted on system with more than one GPU
Issue -
State: closed - Opened by chipzoller about 2 months ago
- 2 comments
Labels: question
#394 - Bug with DCGM_FI_DEV_VGPU_INSTANCE_IDS metric
Issue -
State: closed - Opened by Deezzir 2 months ago
- 7 comments
Labels: bug
#394 - Bug with DCGM_FI_DEV_VGPU_INSTANCE_IDS metric
Issue -
State: closed - Opened by Deezzir 2 months ago
- 7 comments
Labels: bug
#393 - dcgm-exporter daemonset Startup error Failed to pass the health check
Issue -
State: open - Opened by guoliangmiao 2 months ago
- 2 comments
Labels: question
#393 - dcgm-exporter daemonset Startup error Failed to pass the health check
Issue -
State: open - Opened by guoliangmiao 2 months ago
- 2 comments
Labels: question
#392 - In the case of gpu pass-through, does dcgm-exporter on the physical host support capturing gpu metrics of kvm virtual machines?
Issue -
State: open - Opened by lddlww 2 months ago
Labels: question
#392 - In the case of gpu pass-through, does dcgm-exporter on the physical host support capturing gpu metrics of kvm virtual machines?
Issue -
State: open - Opened by lddlww 2 months ago
- 1 comment
Labels: question
#391 - Service monitor API value configurable
Pull Request -
State: closed - Opened by dtzar 2 months ago
#391 - Service monitor API value configurable
Pull Request -
State: closed - Opened by dtzar 2 months ago
#390 - DCGM-Exporter release 3.3.8-3.6.0
Pull Request -
State: closed - Opened by glowkey 2 months ago
#389 - Missing 3.3.8 builds
Issue -
State: closed - Opened by xnox 2 months ago
- 2 comments
Labels: bug
#389 - Missing 3.3.8 builds
Issue -
State: closed - Opened by xnox 2 months ago
- 2 comments
Labels: bug
#388 - DCGM Exporter does not collect individual pod metrics when MPS is enabled in Kubernetes
Issue -
State: closed - Opened by valafon 2 months ago
- 1 comment
Labels: bug
#388 - DCGM Exporter does not collect individual pod metrics when MPS is enabled in Kubernetes
Issue -
State: closed - Opened by valafon 2 months ago
- 1 comment
Labels: bug
#387 - DCGM Exporter in EKS p4d.24xlarge instance type controller error
Issue -
State: open - Opened by camilopaezrios 3 months ago
Labels: bug
#387 - DCGM Exporter in EKS p4d.24xlarge instance type controller error
Issue -
State: open - Opened by camilopaezrios 3 months ago
Labels: bug
#386 - DCGM Exporter in EKS p4d.24xlarge instance type controller error
Issue -
State: open - Opened by camilopaezrios 3 months ago
#386 - DCGM Exporter in EKS p4d.24xlarge instance type controller error
Issue -
State: open - Opened by camilopaezrios 3 months ago
#385 - DCGM-exporter pods stuck in Running State, Not getting Ready without GPU allocation.
Issue -
State: open - Opened by rohitreddy1698 3 months ago
- 12 comments
Labels: question
#384 - Add a health status metric for every gpu card
Issue -
State: open - Opened by lx1036 3 months ago
- 1 comment
Labels: question
#384 - Add a health status metric for every gpu card
Issue -
State: open - Opened by lx1036 3 months ago
- 1 comment
Labels: question
#383 - How does the DCGM exporter work with DCGM?
Issue -
State: closed - Opened by changhyuni 3 months ago
- 3 comments
Labels: question
#383 - How does the DCGM exporter work with DCGM?
Issue -
State: closed - Opened by changhyuni 3 months ago
- 3 comments
Labels: question
#382 - fix: edit gitignore and require dir & file
Pull Request -
State: closed - Opened by kschoi93 3 months ago
- 6 comments
#382 - fix: edit gitignore and require dir & file
Pull Request -
State: closed - Opened by kschoi93 3 months ago
- 6 comments
#381 - Error with "make binary" operation in local development
Issue -
State: open - Opened by kschoi93 3 months ago
Labels: bug
#381 - Error with "make binary" operation in local development
Issue -
State: open - Opened by kschoi93 3 months ago
Labels: bug
#380 - No DCGM_FI_DEV_FB_FREE reported for MIG-enabled GPUs
Issue -
State: open - Opened by george-kuanli-peng 3 months ago
Labels: bug
#379 - Getting "Error from server (NotFound): the server could not find the metric DCGM_FI_DEV_GPU_UTIL for pods",I am not getting DCGM_FI_DEV_GPU_UTIL metrics from prometheus
Issue -
State: open - Opened by Vijaygawate 3 months ago
- 2 comments
Labels: question
#379 - Getting "Error from server (NotFound): the server could not find the metric DCGM_FI_DEV_GPU_UTIL for pods",I am not getting DCGM_FI_DEV_GPU_UTIL metrics from prometheus
Issue -
State: open - Opened by Vijaygawate 3 months ago
- 2 comments
Labels: question
#378 - failed to transform metrics for transform 'podMapper'
Issue -
State: open - Opened by jicki 3 months ago
Labels: bug
#378 - failed to transform metrics for transform 'podMapper'
Issue -
State: open - Opened by jicki 3 months ago
Labels: bug
#377 - How does dcgm-exporter, when running on k8s as a daemonset, communicate with the host's dcgm host engine?
Issue -
State: open - Opened by yx-lamini 3 months ago
Labels: question
#377 - How does dcgm-exporter, when running on k8s as a daemonset, communicate with the host's dcgm host engine?
Issue -
State: open - Opened by yx-lamini 3 months ago
Labels: question
#376 - Update contribution doc to require signing
Issue -
State: open - Opened by chipzoller 3 months ago
#375 - Allow selecting the service's ClusterIP
Pull Request -
State: closed - Opened by remram44 4 months ago
- 6 comments
#374 - Rename 'secuity' to 'security'
Pull Request -
State: open - Opened by remram44 4 months ago
- 6 comments
#374 - Rename 'secuity' to 'security'
Pull Request -
State: closed - Opened by remram44 4 months ago
- 6 comments
#373 - The pod and namespace information in the monitoring indicators of some Gpus occupied by Pods is empty
Issue -
State: open - Opened by qingfenghcy 4 months ago
Labels: bug
#373 - The pod and namespace information in the monitoring indicators of some Gpus occupied by Pods is empty
Issue -
State: open - Opened by qingfenghcy 4 months ago
Labels: bug
#372 - time="2024-08-08T03:09:05Z" level=error msg="Failed to write response." error="write tcp 10.202.3.1:9400->10.202.2.2:49674: i/o timeout
Issue -
State: open - Opened by safeAndSound3 4 months ago
Labels: bug
#372 - time="2024-08-08T03:09:05Z" level=error msg="Failed to write response." error="write tcp 10.202.3.1:9400->10.202.2.2:49674: i/o timeout
Issue -
State: open - Opened by safeAndSound3 4 months ago
Labels: bug
#371 - Start the recompiled dcgm-exporter fails to collect GPU metrics with an error
Issue -
State: open - Opened by 15234660879 4 months ago
Labels: question
#370 - dcp metrics supports gpu architecture
Issue -
State: closed - Opened by lxzjd 4 months ago
- 4 comments
Labels: question
#369 - MIG device support for hpc_job metric labels
Issue -
State: open - Opened by jbrobstw 4 months ago
- 4 comments
Labels: enhancement
#368 - Start the recompiled dcgm-exporter fails to collect GPU metrics with an error
Issue -
State: open - Opened by 15234660879 4 months ago
- 3 comments
Labels: question
#368 - Start the recompiled dcgm-exporter fails to collect GPU metrics with an error
Issue -
State: open - Opened by 15234660879 4 months ago
- 3 comments
Labels: question
#367 - Let dcgm-exporter be a daemon
Issue -
State: open - Opened by zvonkok 4 months ago
- 5 comments
Labels: enhancement
#367 - Let dcgm-exporter be a daemon
Issue -
State: open - Opened by zvonkok 4 months ago
- 5 comments
Labels: enhancement
#366 - DCGM-Exporter release version 3.3.7-3.5.0
Pull Request -
State: closed - Opened by glowkey 4 months ago
#366 - DCGM-Exporter release version 3.3.7-3.5.0
Pull Request -
State: closed - Opened by glowkey 4 months ago
#365 - Can't collecting DCP metrics
Issue -
State: open - Opened by jeffreyyjp 4 months ago
- 4 comments
Labels: bug
#365 - Can't collecting DCP metrics
Issue -
State: open - Opened by jeffreyyjp 4 months ago
- 4 comments
Labels: bug
#364 - DCGM exporter image vulnerable to https://nvd.nist.gov/vuln/detail/CVE-2024-24790
Issue -
State: open - Opened by alexglenn-ddl 4 months ago
- 1 comment
Labels: question
#363 - dcgm-exporter dont show metrics from other namespaces and pods k8s
Issue -
State: open - Opened by hive74 4 months ago
- 12 comments
Labels: question
#363 - dcgm-exporter dont show metrics from other namespaces and pods k8s
Issue -
State: open - Opened by hive74 4 months ago
- 12 comments
Labels: question
#362 - dcgm-exporter log: No Kubelet socket, ignoring
Issue -
State: closed - Opened by jeffreyyjp 4 months ago
- 2 comments
Labels: bug
#361 - Protobuf handling is incorrect
Issue -
State: open - Opened by fbacchella 4 months ago
- 2 comments
Labels: bug
#361 - Protobuf handling is incorrect
Issue -
State: open - Opened by fbacchella 4 months ago
- 2 comments
Labels: bug
#360 - dcgm-exporter crashes when run on Debian 12
Issue -
State: closed - Opened by stevenmcastano 5 months ago
- 1 comment
Labels: bug
#359 - Make nvidia resource names configurable
Pull Request -
State: closed - Opened by lx1036 5 months ago
- 1 comment