Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / NVIDIA/dcgm-exporter issues and pull requests
#257 - Attributing GPU power among MIG instances.
Issue -
State: closed - Opened by fali007 12 months ago
- 6 comments
#242 - Segfault (SEGV) when upgrading 3.2.0 to 3.3.0
Issue -
State: closed - Opened by biz812 about 1 year ago
- 15 comments
#242 - Segfault (SEGV) when upgrading 3.2.0 to 3.3.0
Issue -
State: closed - Opened by biz812 about 1 year ago
- 15 comments
#238 - Collect container name even when not using K8S
Issue -
State: open - Opened by BryanQuigley about 1 year ago
- 17 comments
Labels: enhancement
#238 - Collect container name even when not using K8S
Issue -
State: open - Opened by BryanQuigley about 1 year ago
- 17 comments
Labels: enhancement
#199 - make kubelet pod-resources socket directory configurable
Pull Request -
State: closed - Opened by zclyne over 1 year ago
- 3 comments
#176 - Reduce Docker image size
Issue -
State: closed - Opened by ralbertazzi over 1 year ago
- 1 comment
#170 - Helm: exporter-metrics-config-map should not be applied by default
Issue -
State: closed - Opened by maingoh over 1 year ago
Labels: enhancement
#169 - How to mount the necessary nvswitch devices and libnvidia-nscq files within the dcgm-exporter container.
Issue -
State: closed - Opened by lyj8330328 over 1 year ago
- 5 comments
#164 - Occasional metric loss and hangs in DCGM Exporter
Issue -
State: closed - Opened by zlseu-edu over 1 year ago
- 7 comments
#158 - Does Tesla P40 support DCP metric (DCGM-FI_PROF_ *)?
Issue -
State: open - Opened by asskss almost 2 years ago
- 2 comments
#157 - ecc errors metrics
Issue -
State: open - Opened by jaywlm almost 2 years ago
- 1 comment
#156 - Update go-dcgm bindings
Pull Request -
State: closed - Opened by glowkey almost 2 years ago
#155 - can support nvlink/nvswitch throughput metrics?
Issue -
State: open - Opened by faryang-sh almost 2 years ago
- 3 comments
#154 - kubernetes cluster deployment nvidia/dcgm-exporter:3.1.7-3.1.4-ubuntu20.04, container always quits
Issue -
State: closed - Opened by sanmv almost 2 years ago
#153 - Remove unused mapPodMetrics helm chart setting
Pull Request -
State: closed - Opened by brannondorsey almost 2 years ago
#152 - Does DCP metric (DCGM_FI_PROF_*)support RTX 3090 GPUs?
Issue -
State: open - Opened by asaderasxyz almost 2 years ago
- 2 comments
#151 - no metrics labels about pod namespace/name when Pod uses time slicing GPU
Issue -
State: open - Opened by quanguachong almost 2 years ago
- 1 comment
#150 - metric label pod、namespace empty
Issue -
State: open - Opened by lppsuixn almost 2 years ago
- 3 comments
#149 - Bump version to 3.1.7-3.1.4
Pull Request -
State: closed - Opened by glowkey almost 2 years ago
#148 - Error watching fields: The third-party Profiling module returned an unrecoverable error
Issue -
State: open - Opened by AlanFokCo almost 2 years ago
- 2 comments
#147 - Why is DCGM_FI_DEV_MEM_COPY_UTIL not equal to DCGM_FI_DEV_FB_USED/(DCGM_FI_DEV_FB_FREE+DCGM_FI_DEV_FB_USED)?
Issue -
State: closed - Opened by IsQiao almost 2 years ago
- 4 comments
#146 - Grafana dashboard: fix GPU Power Total
Pull Request -
State: closed - Opened by fschlich almost 2 years ago
#145 - msg="Failed to collect metrics with error: Failed to transform metrics for transform unsupported KubernetesGPUIDType for MetricID 'device_name': podMapper"
Issue -
State: closed - Opened by suchisur almost 2 years ago
- 1 comment
#144 - Not able to obtain per process GPU Utilization, no pods except dcgm-exporter itself available in the metrics collected. We are using Time Slicing GPU sharing between two pods on a single GPU node.
Issue -
State: open - Opened by suchisur almost 2 years ago
- 7 comments
#143 - Running dcgm exporter without root privileges
Issue -
State: open - Opened by thekuffs almost 2 years ago
#142 - go mod is outdated
Pull Request -
State: closed - Opened by sozercan almost 2 years ago
#141 - dcgm-exporter vulnerable to CVE-2022-27664
Issue -
State: closed - Opened by MyStarInYourSky almost 2 years ago
- 2 comments
#140 - How to stop dcgm-exporter from collecting metrics after pod termination?
Issue -
State: open - Opened by devnjw almost 2 years ago
- 4 comments
#139 - Getting Metric not enabled on DCP metric
Issue -
State: open - Opened by avickars almost 2 years ago
- 1 comment
#138 - Pod label not coming up for some pod
Issue -
State: closed - Opened by tsingh-asapp almost 2 years ago
- 2 comments
#137 - Kernel panic when running on GKE
Issue -
State: closed - Opened by fredr about 2 years ago
- 2 comments
#136 - Memory Metrics Incorrect
Issue -
State: closed - Opened by choyuansu about 2 years ago
- 2 comments
#135 - Bump version to 3.1.6-3.1.3
Pull Request -
State: closed - Opened by glowkey about 2 years ago
#134 - Issue 133 - remove kubernetes transforms for links and switches
Pull Request -
State: closed - Opened by glowkey about 2 years ago
#133 - `dcgm-exporter` panics while attempting to associate metrics with pods
Issue -
State: closed - Opened by cjgibson about 2 years ago
- 5 comments
#132 - Broken go.mod
Issue -
State: open - Opened by starry91 about 2 years ago
- 3 comments
#131 - Filtering deployment to certain nodes
Issue -
State: closed - Opened by coleary-hyperscience about 2 years ago
- 2 comments
#130 - Enable x-content-type-options in http header
Pull Request -
State: closed - Opened by glowkey about 2 years ago
#129 - Use NODE_NAME env instead of hostname (which is podname) for the metrics
Pull Request -
State: closed - Opened by shivamerla about 2 years ago
- 1 comment
#128 - Align framebuffer panel legend
Pull Request -
State: closed - Opened by doronkg about 2 years ago
#127 - nvidia.com/gpu: 1
Issue -
State: open - Opened by hecheng64 about 2 years ago
- 6 comments
#126 - Fatal error while running DCGM Exporter on AKS
Issue -
State: closed - Opened by harjitdotsingh about 2 years ago
- 7 comments
#125 - Error with chart install
Issue -
State: open - Opened by tapter-mwm about 2 years ago
- 12 comments
#124 - Fix for filtering VGPU metrics (Issue #123)
Pull Request -
State: closed - Opened by glowkey about 2 years ago
#123 - DCGM_FI_DEV_VGPU_LICENSE_STATUS missing in latest version of exporter
Issue -
State: open - Opened by sidewinder12s about 2 years ago
- 1 comment
#122 - Add optional TLS support for exporter
Pull Request -
State: open - Opened by gmintoco about 2 years ago
#121 - Update to DCGM 3.1.3
Pull Request -
State: closed - Opened by glowkey about 2 years ago
#120 - Update daemonset.yaml
Pull Request -
State: closed - Opened by reyvonger about 2 years ago
#119 - fix: convert config map without removing comments
Pull Request -
State: closed - Opened by LetFu about 2 years ago
#118 - The configmap does not work properly when there are comments in it
Issue -
State: closed - Opened by LetFu about 2 years ago
#117 - Update go bindings, remove nvlink status workaround
Pull Request -
State: closed - Opened by glowkey over 2 years ago
#116 - The way to understand additional dcgm-exporter Prometheus metric type
Issue -
State: closed - Opened by k0nstantinv over 2 years ago
- 2 comments
Labels: documentation
#115 - feat: add custom hostname CLI and env parameter
Pull Request -
State: closed - Opened by rcbop over 2 years ago
- 2 comments
#114 - Enable some commented by default metrics
Issue -
State: closed - Opened by esparig over 2 years ago
- 5 comments
#113 - Enable nvswitch/nvlink metric support
Pull Request -
State: closed - Opened by glowkey over 2 years ago
#112 - Exit when specified configmap isn't available #111
Pull Request -
State: closed - Opened by glowkey over 2 years ago
#111 - Exporter should exit when specified configmap isn't available
Issue -
State: closed - Opened by glowkey over 2 years ago
#110 - Exit if the hostengine connection goes down
Pull Request -
State: closed - Opened by glowkey over 2 years ago
#109 - Exporting processes with DCGM
Issue -
State: open - Opened by saifhaq over 2 years ago
- 1 comment
#108 - Bump version to 3.0.4-3.0.0
Pull Request -
State: closed - Opened by glowkey over 2 years ago
- 3 comments
#107 - process's SM Utilization is always lower than the gpu's SM Utilization
Issue -
State: open - Opened by seanchen022 over 2 years ago
#106 - Bump version to 2.4.7-2.6.11
Pull Request -
State: closed - Opened by glowkey over 2 years ago
#105 - DCGM_FI_DEV_GPU_UTIL doesn't show up with A100 GPU in MIG mode
Issue -
State: open - Opened by cy-zheng over 2 years ago
- 3 comments
Labels: documentation
#104 - Enable serviceMonitor support prometheus relabelings
Pull Request -
State: closed - Opened by kindomLee over 2 years ago
- 2 comments
#103 - Metric DCGM_FI_DEV_FB_FREE is not exported as part of 2.4.5-2.6.7-ubuntu20.04 image
Issue -
State: open - Opened by omer-dayan over 2 years ago
- 2 comments
#102 - Allow setting runtimeClassName
Issue -
State: open - Opened by murata-yu over 2 years ago
- 3 comments
#101 - dcgm-exporter docker fails to start on Jetson
Issue -
State: open - Opened by tom-pleno over 2 years ago
- 1 comment
#100 - Dashboard reports no data
Issue -
State: closed - Opened by catid over 2 years ago
- 2 comments
Labels: question
#99 - Export Kubernetes Labels with Pods
Issue -
State: open - Opened by alex-g-tejada over 2 years ago
- 5 comments
Labels: enhancement
#98 - GPU Tesla T4 哪些工具支持业务进程显存监控?
Issue -
State: open - Opened by kelonsen over 2 years ago
- 2 comments
Labels: question
#97 - Fix propagation of pod labels on GKE with MIG devices by scanning for GKE device ID format
Pull Request -
State: closed - Opened by suffiank over 2 years ago
- 2 comments
#96 - GPU freeezes when dcgm-exporter is used
Issue -
State: closed - Opened by skraga over 2 years ago
- 7 comments
Labels: bug, question
#95 - How to interpret nvlink metrics and xid error value behaviour
Issue -
State: open - Opened by Omoong over 2 years ago
#94 - Metric about compute apps
Issue -
State: open - Opened by onstring over 2 years ago
- 2 comments
Labels: enhancement, question
#93 - Applying the latest dcgm-exporter some issues with the exporter container
Issue -
State: closed - Opened by amrragab8080 over 2 years ago
- 5 comments
Labels: question, wontfix
#92 - Pod labels are not propagated for MIGs on GKE [Flag "..._GPU_ID_TYPE" has no effect for MIG devices]
Issue -
State: closed - Opened by suffiank over 2 years ago
- 3 comments
Labels: enhancement
#91 - Bump version to 2.4.6-2.6.10
Pull Request -
State: closed - Opened by glowkey over 2 years ago
#90 - [Dashboard - BUG] Grafana dashboard: ${DS_PROMETHEUS} - not found
Issue -
State: open - Opened by awoimbee over 3 years ago
Labels: question, inactive
#89 - Add support for string fields as labels
Pull Request -
State: closed - Opened by bmerry over 2 years ago
- 6 comments
#88 - Fix the type of the PCIE_TX/RX metrics and provide more accurate description.
Pull Request -
State: closed - Opened by nikkon-dev over 2 years ago
- 3 comments
Labels: bug, documentation
#87 - how to interpret DCGM_FI_PROF_PCIE_TX_BYTES metric
Issue -
State: open - Opened by Omoong over 2 years ago
- 5 comments
Labels: bug, documentation
#86 - 82 - query supported metric groups and skip unsupported
Pull Request -
State: closed - Opened by glowkey over 2 years ago
#85 - No exported_pod in metrics
Issue -
State: open - Opened by Muscule over 2 years ago
- 8 comments
#84 - GPU freezes when dcgm-exporter is SIGKILL'd
Issue -
State: open - Opened by mac-chaffee over 2 years ago
- 12 comments
#83 - Metric DCGM_FI_DEV_FB_RESERVED does not appear to be reported by dcgm-exporter (2.4.6-2.6.9)
Issue -
State: open - Opened by hassanbabaie over 2 years ago
- 5 comments
#82 - Error with unsupported new metrics on V100 GPU's
Issue -
State: open - Opened by hassanbabaie over 2 years ago
- 4 comments
#81 - Bump version to 2.4.6-2.6.9
Pull Request -
State: closed - Opened by glowkey over 2 years ago
#80 - Issue running 2.4.6-2.6.8
Issue -
State: closed - Opened by hassanbabaie over 2 years ago
- 6 comments
#79 - Bump version to 2.4.6-2.6.8
Pull Request -
State: closed - Opened by glowkey over 2 years ago
#78 - Add Kubernetes node name to exported labels
Issue -
State: open - Opened by neggert over 2 years ago
- 6 comments
#77 - Will DCGM_FI_DEV_FB_USED_PERCENT be coming to the exporter in the next release
Issue -
State: open - Opened by hassanbabaie over 2 years ago
- 1 comment
#76 - Allow specifying honorLabels in ServiceMonitor spec
Pull Request -
State: closed - Opened by chrissng over 2 years ago
#75 - Support export of string metrics as labels
Pull Request -
State: closed - Opened by bmerry over 2 years ago
- 2 comments
#74 - Fix a typo in an error message
Pull Request -
State: closed - Opened by bmerry over 2 years ago
#73 - Fix link to field identifiers
Pull Request -
State: closed - Opened by bmerry over 2 years ago
- 6 comments
#72 - Support for reporting driver version
Issue -
State: closed - Opened by bmerry over 2 years ago
- 12 comments
#71 - Latest Release bugs 2.4.5-2.6.7 - metrics missing
Issue -
State: closed - Opened by hassanbabaie over 2 years ago
- 5 comments
#70 - Bump version to 2.4.5-2.6.7
Pull Request -
State: closed - Opened by glowkey over 2 years ago
#68 - TYPE DCGM_FI_PROF_ metrics value issue
Issue -
State: open - Opened by Omoong over 2 years ago
- 12 comments