Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / NVIDIA/dcgm-exporter issues and pull requests

#257 - Attributing GPU power among MIG instances.

Issue - State: closed - Opened by fali007 12 months ago - 6 comments

#242 - Segfault (SEGV) when upgrading 3.2.0 to 3.3.0

Issue - State: closed - Opened by biz812 about 1 year ago - 15 comments

#242 - Segfault (SEGV) when upgrading 3.2.0 to 3.3.0

Issue - State: closed - Opened by biz812 about 1 year ago - 15 comments

#238 - Collect container name even when not using K8S

Issue - State: open - Opened by BryanQuigley about 1 year ago - 17 comments
Labels: enhancement

#238 - Collect container name even when not using K8S

Issue - State: open - Opened by BryanQuigley about 1 year ago - 17 comments
Labels: enhancement

#199 - make kubelet pod-resources socket directory configurable

Pull Request - State: closed - Opened by zclyne over 1 year ago - 3 comments

#176 - Reduce Docker image size

Issue - State: closed - Opened by ralbertazzi over 1 year ago - 1 comment

#170 - Helm: exporter-metrics-config-map should not be applied by default

Issue - State: closed - Opened by maingoh over 1 year ago
Labels: enhancement

#164 - Occasional metric loss and hangs in DCGM Exporter

Issue - State: closed - Opened by zlseu-edu over 1 year ago - 7 comments

#158 - Does Tesla P40 support DCP metric (DCGM-FI_PROF_ *)?

Issue - State: open - Opened by asskss almost 2 years ago - 2 comments

#157 - ecc errors metrics

Issue - State: open - Opened by jaywlm almost 2 years ago - 1 comment

#156 - Update go-dcgm bindings

Pull Request - State: closed - Opened by glowkey almost 2 years ago

#155 - can support nvlink/nvswitch throughput metrics?

Issue - State: open - Opened by faryang-sh almost 2 years ago - 3 comments

#153 - Remove unused mapPodMetrics helm chart setting

Pull Request - State: closed - Opened by brannondorsey almost 2 years ago

#152 - Does DCP metric (DCGM_FI_PROF_*)support RTX 3090 GPUs?

Issue - State: open - Opened by asaderasxyz almost 2 years ago - 2 comments

#151 - no metrics labels about pod namespace/name when Pod uses time slicing GPU

Issue - State: open - Opened by quanguachong almost 2 years ago - 1 comment

#150 - metric label pod、namespace empty

Issue - State: open - Opened by lppsuixn almost 2 years ago - 3 comments

#149 - Bump version to 3.1.7-3.1.4

Pull Request - State: closed - Opened by glowkey almost 2 years ago

#146 - Grafana dashboard: fix GPU Power Total

Pull Request - State: closed - Opened by fschlich almost 2 years ago

#143 - Running dcgm exporter without root privileges

Issue - State: open - Opened by thekuffs almost 2 years ago

#142 - go mod is outdated

Pull Request - State: closed - Opened by sozercan almost 2 years ago

#141 - dcgm-exporter vulnerable to CVE-2022-27664

Issue - State: closed - Opened by MyStarInYourSky almost 2 years ago - 2 comments

#140 - How to stop dcgm-exporter from collecting metrics after pod termination?

Issue - State: open - Opened by devnjw almost 2 years ago - 4 comments

#139 - Getting Metric not enabled on DCP metric

Issue - State: open - Opened by avickars almost 2 years ago - 1 comment

#138 - Pod label not coming up for some pod

Issue - State: closed - Opened by tsingh-asapp almost 2 years ago - 2 comments

#137 - Kernel panic when running on GKE

Issue - State: closed - Opened by fredr about 2 years ago - 2 comments

#136 - Memory Metrics Incorrect

Issue - State: closed - Opened by choyuansu about 2 years ago - 2 comments

#135 - Bump version to 3.1.6-3.1.3

Pull Request - State: closed - Opened by glowkey about 2 years ago

#134 - Issue 133 - remove kubernetes transforms for links and switches

Pull Request - State: closed - Opened by glowkey about 2 years ago

#133 - `dcgm-exporter` panics while attempting to associate metrics with pods

Issue - State: closed - Opened by cjgibson about 2 years ago - 5 comments

#132 - Broken go.mod

Issue - State: open - Opened by starry91 about 2 years ago - 3 comments

#131 - Filtering deployment to certain nodes

Issue - State: closed - Opened by coleary-hyperscience about 2 years ago - 2 comments

#130 - Enable x-content-type-options in http header

Pull Request - State: closed - Opened by glowkey about 2 years ago

#129 - Use NODE_NAME env instead of hostname (which is podname) for the metrics

Pull Request - State: closed - Opened by shivamerla about 2 years ago - 1 comment

#128 - Align framebuffer panel legend

Pull Request - State: closed - Opened by doronkg about 2 years ago

#127 - nvidia.com/gpu: 1

Issue - State: open - Opened by hecheng64 about 2 years ago - 6 comments

#126 - Fatal error while running DCGM Exporter on AKS

Issue - State: closed - Opened by harjitdotsingh about 2 years ago - 7 comments

#125 - Error with chart install

Issue - State: open - Opened by tapter-mwm about 2 years ago - 12 comments

#124 - Fix for filtering VGPU metrics (Issue #123)

Pull Request - State: closed - Opened by glowkey about 2 years ago

#123 - DCGM_FI_DEV_VGPU_LICENSE_STATUS missing in latest version of exporter

Issue - State: open - Opened by sidewinder12s about 2 years ago - 1 comment

#122 - Add optional TLS support for exporter

Pull Request - State: open - Opened by gmintoco about 2 years ago

#121 - Update to DCGM 3.1.3

Pull Request - State: closed - Opened by glowkey about 2 years ago

#120 - Update daemonset.yaml

Pull Request - State: closed - Opened by reyvonger about 2 years ago

#119 - fix: convert config map without removing comments

Pull Request - State: closed - Opened by LetFu about 2 years ago

#117 - Update go bindings, remove nvlink status workaround

Pull Request - State: closed - Opened by glowkey over 2 years ago

#116 - The way to understand additional dcgm-exporter Prometheus metric type

Issue - State: closed - Opened by k0nstantinv over 2 years ago - 2 comments
Labels: documentation

#115 - feat: add custom hostname CLI and env parameter

Pull Request - State: closed - Opened by rcbop over 2 years ago - 2 comments

#114 - Enable some commented by default metrics

Issue - State: closed - Opened by esparig over 2 years ago - 5 comments

#113 - Enable nvswitch/nvlink metric support

Pull Request - State: closed - Opened by glowkey over 2 years ago

#112 - Exit when specified configmap isn't available #111

Pull Request - State: closed - Opened by glowkey over 2 years ago

#111 - Exporter should exit when specified configmap isn't available

Issue - State: closed - Opened by glowkey over 2 years ago

#110 - Exit if the hostengine connection goes down

Pull Request - State: closed - Opened by glowkey over 2 years ago

#109 - Exporting processes with DCGM

Issue - State: open - Opened by saifhaq over 2 years ago - 1 comment

#108 - Bump version to 3.0.4-3.0.0

Pull Request - State: closed - Opened by glowkey over 2 years ago - 3 comments

#106 - Bump version to 2.4.7-2.6.11

Pull Request - State: closed - Opened by glowkey over 2 years ago

#105 - DCGM_FI_DEV_GPU_UTIL doesn't show up with A100 GPU in MIG mode

Issue - State: open - Opened by cy-zheng over 2 years ago - 3 comments
Labels: documentation

#104 - Enable serviceMonitor support prometheus relabelings

Pull Request - State: closed - Opened by kindomLee over 2 years ago - 2 comments

#102 - Allow setting runtimeClassName

Issue - State: open - Opened by murata-yu over 2 years ago - 3 comments

#101 - dcgm-exporter docker fails to start on Jetson

Issue - State: open - Opened by tom-pleno over 2 years ago - 1 comment

#100 - Dashboard reports no data

Issue - State: closed - Opened by catid over 2 years ago - 2 comments
Labels: question

#99 - Export Kubernetes Labels with Pods

Issue - State: open - Opened by alex-g-tejada over 2 years ago - 5 comments
Labels: enhancement

#98 - GPU Tesla T4 哪些工具支持业务进程显存监控?

Issue - State: open - Opened by kelonsen over 2 years ago - 2 comments
Labels: question

#97 - Fix propagation of pod labels on GKE with MIG devices by scanning for GKE device ID format

Pull Request - State: closed - Opened by suffiank over 2 years ago - 2 comments

#96 - GPU freeezes when dcgm-exporter is used

Issue - State: closed - Opened by skraga over 2 years ago - 7 comments
Labels: bug, question

#95 - How to interpret nvlink metrics and xid error value behaviour

Issue - State: open - Opened by Omoong over 2 years ago

#94 - Metric about compute apps

Issue - State: open - Opened by onstring over 2 years ago - 2 comments
Labels: enhancement, question

#93 - Applying the latest dcgm-exporter some issues with the exporter container

Issue - State: closed - Opened by amrragab8080 over 2 years ago - 5 comments
Labels: question, wontfix

#92 - Pod labels are not propagated for MIGs on GKE [Flag "..._GPU_ID_TYPE" has no effect for MIG devices]

Issue - State: closed - Opened by suffiank over 2 years ago - 3 comments
Labels: enhancement

#91 - Bump version to 2.4.6-2.6.10

Pull Request - State: closed - Opened by glowkey over 2 years ago

#90 - [Dashboard - BUG] Grafana dashboard: ${DS_PROMETHEUS} - not found

Issue - State: open - Opened by awoimbee over 3 years ago
Labels: question, inactive

#89 - Add support for string fields as labels

Pull Request - State: closed - Opened by bmerry over 2 years ago - 6 comments

#88 - Fix the type of the PCIE_TX/RX metrics and provide more accurate description.

Pull Request - State: closed - Opened by nikkon-dev over 2 years ago - 3 comments
Labels: bug, documentation

#87 - how to interpret DCGM_FI_PROF_PCIE_TX_BYTES metric

Issue - State: open - Opened by Omoong over 2 years ago - 5 comments
Labels: bug, documentation

#86 - 82 - query supported metric groups and skip unsupported

Pull Request - State: closed - Opened by glowkey over 2 years ago

#85 - No exported_pod in metrics

Issue - State: open - Opened by Muscule over 2 years ago - 8 comments

#84 - GPU freezes when dcgm-exporter is SIGKILL'd

Issue - State: open - Opened by mac-chaffee over 2 years ago - 12 comments

#82 - Error with unsupported new metrics on V100 GPU's

Issue - State: open - Opened by hassanbabaie over 2 years ago - 4 comments

#81 - Bump version to 2.4.6-2.6.9

Pull Request - State: closed - Opened by glowkey over 2 years ago

#80 - Issue running 2.4.6-2.6.8

Issue - State: closed - Opened by hassanbabaie over 2 years ago - 6 comments

#79 - Bump version to 2.4.6-2.6.8

Pull Request - State: closed - Opened by glowkey over 2 years ago

#78 - Add Kubernetes node name to exported labels

Issue - State: open - Opened by neggert over 2 years ago - 6 comments

#76 - Allow specifying honorLabels in ServiceMonitor spec

Pull Request - State: closed - Opened by chrissng over 2 years ago

#75 - Support export of string metrics as labels

Pull Request - State: closed - Opened by bmerry over 2 years ago - 2 comments

#74 - Fix a typo in an error message

Pull Request - State: closed - Opened by bmerry over 2 years ago

#73 - Fix link to field identifiers

Pull Request - State: closed - Opened by bmerry over 2 years ago - 6 comments

#72 - Support for reporting driver version

Issue - State: closed - Opened by bmerry over 2 years ago - 12 comments

#71 - Latest Release bugs 2.4.5-2.6.7 - metrics missing

Issue - State: closed - Opened by hassanbabaie over 2 years ago - 5 comments

#70 - Bump version to 2.4.5-2.6.7

Pull Request - State: closed - Opened by glowkey over 2 years ago

#68 - TYPE DCGM_FI_PROF_ metrics value issue

Issue - State: open - Opened by Omoong over 2 years ago - 12 comments