Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / NVIDIA/dcgm-exporter issues and pull requests

#352 - cannot get DCGM_FI_PROF_SM_ACTIVE metrics

Issue - State: open - Opened by qingfenghcy 5 months ago - 1 comment
Labels: question

#351 - [Helm] Enable custom metrics, mount ConfigMap by default

Pull Request - State: closed - Opened by chipzoller 5 months ago - 32 comments

#351 - [Helm] Enable custom metrics, mount ConfigMap by default

Pull Request - State: closed - Opened by chipzoller 5 months ago - 32 comments

#350 - [Helm] Enable ConfigMap mount by default

Pull Request - State: closed - Opened by chipzoller 5 months ago - 8 comments

#350 - [Helm] Enable ConfigMap mount by default

Pull Request - State: closed - Opened by chipzoller 5 months ago - 8 comments

#348 - GPU Failure Detection and Alerting Enhancement

Issue - State: open - Opened by jz543fm 5 months ago - 14 comments
Labels: enhancement

#347 - Cannot Retrieve GPU PIDs from DCGM Metrics

Issue - State: closed - Opened by doronkg 5 months ago - 4 comments
Labels: question

#347 - Cannot Retrieve GPU PIDs from DCGM Metrics

Issue - State: closed - Opened by doronkg 5 months ago - 4 comments
Labels: question

#346 - fix: correct metric help text

Pull Request - State: closed - Opened by pintohutch 5 months ago - 1 comment

#346 - fix: correct metric help text

Pull Request - State: closed - Opened by pintohutch 5 months ago - 1 comment

#345 - DCGM_FI_DEV_MEM_COPY_UTIL not correct always 1 or 2

Issue - State: closed - Opened by xuchenCN 5 months ago - 3 comments
Labels: bug

#345 - DCGM_FI_DEV_MEM_COPY_UTIL not correct always 1 or 2

Issue - State: closed - Opened by xuchenCN 5 months ago - 3 comments
Labels: bug

#344 - How to install dcgm-exporter on Windows Server?

Issue - State: closed - Opened by LittleNewton 5 months ago - 6 comments
Labels: question

#343 - How to obtain the namespace , pod and container data

Issue - State: closed - Opened by aikikia 5 months ago - 6 comments
Labels: question

#343 - How to obtain the namespace , pod and container data

Issue - State: closed - Opened by aikikia 5 months ago - 6 comments
Labels: question

#342 - `namespace` and `pod` labels are sometimes missing from metrics

Issue - State: open - Opened by Altair-Bueno 6 months ago - 16 comments
Labels: bug

#342 - `namespace` and `pod` labels are sometimes missing from metrics

Issue - State: open - Opened by Altair-Bueno 6 months ago - 16 comments
Labels: bug

#340 - exported_pod cause issue with query -> every sample a different metrics

Issue - State: open - Opened by amir-bialek 6 months ago - 3 comments
Labels: question

#340 - exported_pod cause issue with query -> every sample a different metrics

Issue - State: open - Opened by amir-bialek 6 months ago - 3 comments
Labels: question

#339 - can I get computeRunningProcesses and graphicsRunningProcesses this two metrics??

Issue - State: closed - Opened by suxwang 6 months ago - 1 comment
Labels: bug

#339 - can I get computeRunningProcesses and graphicsRunningProcesses this two metrics??

Issue - State: closed - Opened by suxwang 6 months ago - 1 comment
Labels: bug

#338 - config csv DCGM_FI_DEV_CORRECTABLE_REMAPPED_ROWS, but cannot get on metrics

Issue - State: closed - Opened by suxwang 6 months ago - 2 comments
Labels: bug

#338 - config csv DCGM_FI_DEV_CORRECTABLE_REMAPPED_ROWS, but cannot get on metrics

Issue - State: closed - Opened by suxwang 6 months ago - 2 comments
Labels: bug

#337 - I can't get the following metrics, but I've set the environment variable

Issue - State: closed - Opened by kameriso-zga 6 months ago - 6 comments
Labels: question

#336 - nvlink metrics are not available on the gh200 gpu node

Issue - State: open - Opened by AnjirwalaAnuj 6 months ago - 2 comments
Labels: question

#336 - nvlink metrics are not available on the gh200 gpu node

Issue - State: open - Opened by AnjirwalaAnuj 6 months ago - 2 comments
Labels: question

#335 - https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ is not signed

Issue - State: closed - Opened by jjziets 6 months ago - 2 comments
Labels: bug

#335 - https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/x86_64/ is not signed

Issue - State: closed - Opened by jjziets 6 months ago - 2 comments
Labels: bug

#329 - Could not enable kubernetes metric collection: nvml: Unknown Error

Issue - State: open - Opened by 287400117 6 months ago - 2 comments
Labels: bug

#329 - Could not enable kubernetes metric collection: nvml: Unknown Error

Issue - State: open - Opened by 287400117 6 months ago - 2 comments
Labels: bug

#326 - feat: add pci_bus_id label for metrics

Pull Request - State: closed - Opened by fungaren 6 months ago - 5 comments

#321 - Cannot build from source

Issue - State: closed - Opened by jz543fm 7 months ago - 9 comments
Labels: bug

#321 - Cannot build from source

Issue - State: closed - Opened by jz543fm 7 months ago - 9 comments
Labels: bug

#318 - SIGSEGV: segmentation violation

Issue - State: closed - Opened by amybachir 7 months ago - 9 comments
Labels: bug

#318 - SIGSEGV: segmentation violation

Issue - State: closed - Opened by amybachir 7 months ago - 9 comments
Labels: bug

#317 - Failed to add DCGM_EXP_CLOCK_EVENTS_COUNT

Issue - State: open - Opened by CodeBrek 7 months ago - 8 comments
Labels: bug

#317 - Failed to add DCGM_EXP_CLOCK_EVENTS_COUNT

Issue - State: open - Opened by CodeBrek 7 months ago - 8 comments
Labels: bug

#316 - Missing NVLINK bandwidth metrics in dcgm-exporter

Issue - State: open - Opened by jz543fm 7 months ago - 6 comments
Labels: bug

#316 - Missing NVLINK bandwidth metrics in dcgm-exporter

Issue - State: open - Opened by jz543fm 7 months ago - 6 comments
Labels: bug

#314 - The pod for a given GPU in k8s mode cannot be captured

Issue - State: open - Opened by rokkiter 8 months ago - 8 comments
Labels: enhancement

#314 - The pod for a given GPU in k8s mode cannot be captured

Issue - State: open - Opened by rokkiter 8 months ago - 8 comments
Labels: enhancement

#312 - Extremely high GPU temperature reported by dcgm-exporter

Issue - State: closed - Opened by age9990 8 months ago - 7 comments
Labels: bug

#308 - Support collect detail error message with the xid

Issue - State: closed - Opened by zhucan 8 months ago - 1 comment
Labels: enhancement

#308 - Support collect detail error message with the xid

Issue - State: closed - Opened by zhucan 8 months ago - 1 comment
Labels: enhancement

#293 - How to get current device MIG model is single or mixed?

Issue - State: open - Opened by lengrongfu 8 months ago - 3 comments
Labels: question, action_required_from_requester

#293 - How to get current device MIG model is single or mixed?

Issue - State: open - Opened by lengrongfu 8 months ago - 3 comments
Labels: question, action_required_from_requester

#288 - Fix power calculation

Pull Request - State: open - Opened by Apsu 8 months ago - 2 comments

#288 - Fix power calculation

Pull Request - State: open - Opened by Apsu 8 months ago - 2 comments

#272 - Expose Container info for MIG enabled GPU

Issue - State: open - Opened by krishh85 9 months ago - 86 comments
Labels: bug

#272 - Expose Container info for MIG enabled GPU

Issue - State: open - Opened by krishh85 9 months ago - 86 comments
Labels: bug

#271 - sum of DCGM_FI_DEV_FB_USED and DCGM_FI_DEV_FB_FREE is not const

Issue - State: open - Opened by ccding 9 months ago - 6 comments

#257 - Attributing GPU power among MIG instances.

Issue - State: closed - Opened by fali007 9 months ago - 6 comments

#257 - Attributing GPU power among MIG instances.

Issue - State: closed - Opened by fali007 9 months ago - 6 comments

#242 - Segfault (SEGV) when upgrading 3.2.0 to 3.3.0

Issue - State: closed - Opened by biz812 10 months ago - 15 comments

#242 - Segfault (SEGV) when upgrading 3.2.0 to 3.3.0

Issue - State: closed - Opened by biz812 10 months ago - 15 comments

#238 - Collect container name even when not using K8S

Issue - State: open - Opened by BryanQuigley 10 months ago - 17 comments
Labels: enhancement

#238 - Collect container name even when not using K8S

Issue - State: open - Opened by BryanQuigley 10 months ago - 17 comments
Labels: enhancement

#199 - make kubelet pod-resources socket directory configurable

Pull Request - State: closed - Opened by zclyne about 1 year ago - 3 comments

#170 - Helm: exporter-metrics-config-map should not be applied by default

Issue - State: closed - Opened by maingoh over 1 year ago
Labels: enhancement

#164 - Occasional metric loss and hangs in DCGM Exporter

Issue - State: closed - Opened by zlseu-edu over 1 year ago - 7 comments

#158 - Does Tesla P40 support DCP metric (DCGM-FI_PROF_ *)?

Issue - State: open - Opened by asskss over 1 year ago - 2 comments

#157 - ecc errors metrics

Issue - State: open - Opened by jaywlm over 1 year ago - 1 comment

#156 - Update go-dcgm bindings

Pull Request - State: closed - Opened by glowkey over 1 year ago

#155 - can support nvlink/nvswitch throughput metrics?

Issue - State: open - Opened by faryang-sh over 1 year ago - 3 comments

#153 - Remove unused mapPodMetrics helm chart setting

Pull Request - State: closed - Opened by brannondorsey over 1 year ago

#152 - Does DCP metric (DCGM_FI_PROF_*)support RTX 3090 GPUs?

Issue - State: open - Opened by asaderasxyz over 1 year ago - 2 comments

#150 - metric label pod、namespace empty

Issue - State: open - Opened by lppsuixn over 1 year ago - 3 comments

#149 - Bump version to 3.1.7-3.1.4

Pull Request - State: closed - Opened by glowkey over 1 year ago

#146 - Grafana dashboard: fix GPU Power Total

Pull Request - State: closed - Opened by fschlich over 1 year ago

#143 - Running dcgm exporter without root privileges

Issue - State: open - Opened by thekuffs over 1 year ago

#142 - go mod is outdated

Pull Request - State: closed - Opened by sozercan over 1 year ago

#141 - dcgm-exporter vulnerable to CVE-2022-27664

Issue - State: closed - Opened by MyStarInYourSky over 1 year ago - 2 comments

#140 - How to stop dcgm-exporter from collecting metrics after pod termination?

Issue - State: open - Opened by devnjw over 1 year ago - 4 comments

#139 - Getting Metric not enabled on DCP metric

Issue - State: open - Opened by avickars almost 2 years ago - 1 comment

#138 - Pod label not coming up for some pod

Issue - State: closed - Opened by tsingh-asapp almost 2 years ago - 2 comments

#137 - Kernel panic when running on GKE

Issue - State: closed - Opened by fredr almost 2 years ago - 2 comments

#136 - Memory Metrics Incorrect

Issue - State: closed - Opened by choyuansu almost 2 years ago - 2 comments

#135 - Bump version to 3.1.6-3.1.3

Pull Request - State: closed - Opened by glowkey almost 2 years ago

#134 - Issue 133 - remove kubernetes transforms for links and switches

Pull Request - State: closed - Opened by glowkey almost 2 years ago

#133 - `dcgm-exporter` panics while attempting to associate metrics with pods

Issue - State: closed - Opened by cjgibson almost 2 years ago - 5 comments

#132 - Broken go.mod

Issue - State: open - Opened by starry91 almost 2 years ago - 3 comments

#131 - Filtering deployment to certain nodes

Issue - State: closed - Opened by coleary-hyperscience almost 2 years ago - 2 comments

#130 - Enable x-content-type-options in http header

Pull Request - State: closed - Opened by glowkey almost 2 years ago