Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / NVIDIA/deepops issues and pull requests

#1319 - vagrant_startup.sh ubuntu 22.04 update

Issue - State: closed - Opened by hpcadmins 4 months ago - 1 comment
Labels: no-issue-activity

#1318 - Increase KillWait to 120 in slurm.conf

Pull Request - State: closed - Opened by ilya-da 5 months ago

#1317 - KillWait default in slurm slurm

Issue - State: closed - Opened by ilya-da 5 months ago - 2 comments
Labels: no-issue-activity

#1316 - Update GPU process cleanup logic in SLURM epilog script

Pull Request - State: closed - Opened by ilya-da 5 months ago

#1315 - fetching PIDs for timeout jobs for cleanup sometimes fail to kill processes

Issue - State: closed - Opened by ilya-da 5 months ago - 2 comments
Labels: no-issue-activity

#1313 - deepops 24.08?

Issue - State: closed - Opened by mathrock74 5 months ago - 3 comments

#1312 - Unable to install some galaxy collections using ./scripts/setup.sh

Issue - State: closed - Opened by alldino 5 months ago - 2 comments
Labels: no-issue-activity

#1311 - Ansible playbook failing to add RHEL 8 DGX Node in K8s cluster

Issue - State: closed - Opened by subasathees 6 months ago - 1 comment
Labels: no-issue-activity

#1310 - It's seem miss galaxy folder

Issue - State: closed - Opened by v-ducnt69 6 months ago - 1 comment
Labels: no-issue-activity

#1309 - Adding a Lua submission script

Issue - State: closed - Opened by clemsgrs 7 months ago - 2 comments

#1308 - Upgrading NVIDIA Driver without reseting cluster

Issue - State: closed - Opened by Heegreis 8 months ago - 3 comments
Labels: no-issue-activity

#1307 - Errors in deepops/slurm-exporter

Issue - State: closed - Opened by fa-ina-tic 9 months ago - 4 comments

#1306 - NIS configuration

Issue - State: closed - Opened by nttg8100 10 months ago - 1 comment
Labels: no-issue-activity

#1305 - Compatibility with DGX H100

Issue - State: closed - Opened by anubhavpatrick 11 months ago - 1 comment

#1304 - Enabling persistent MIG in GPU instances of DGX-A100

Issue - State: closed - Opened by murukessanap about 1 year ago - 2 comments
Labels: no-issue-activity

#1303 - Deepops Slurm NCCL Fail

Issue - State: closed - Opened by andrevianadf about 1 year ago - 2 comments
Labels: no-issue-activity

#1302 - Error Running ansible-playbook on slurm-cluster: Docker-ce Repository Activation Issue

Issue - State: closed - Opened by sikso1892 about 1 year ago - 1 comment
Labels: no-issue-activity

#1301 - Update ansible.cfg

Pull Request - State: closed - Opened by Musab0 over 1 year ago

#1300 - playbook slurm-cluster fails on DGX OS 6 on nvidia-peer-memory task

Issue - State: closed - Opened by itzsimpl over 1 year ago - 1 comment
Labels: no-issue-activity

#1299 - TLS certificate replacement steps are unclear

Issue - State: closed - Opened by programmer94 over 1 year ago - 1 comment
Labels: no-issue-activity

#1298 - Extend single node K8s DeepOps with additional nodes

Issue - State: closed - Opened by cocakohler over 1 year ago - 1 comment
Labels: no-issue-activity

#1297 - NVML version + H100 GPU

Issue - State: closed - Opened by mathrock74 over 1 year ago - 3 comments

#1296 - Release 23.08

Pull Request - State: closed - Opened by dholt over 1 year ago

#1295 - slurm-master without GPU failed at nvml autodetect

Issue - State: closed - Opened by leoncamel over 1 year ago - 3 comments

#1294 - Release updates

Pull Request - State: closed - Opened by dholt over 1 year ago

#1293 - Fix for docker install playbook due to kubespray changes

Pull Request - State: closed - Opened by dholt over 1 year ago

#1292 - update nvidia_driver_ubuntu_cuda_keyring_package to latest version

Pull Request - State: closed - Opened by JH-LEE-KR over 1 year ago

#1291 - Update the Network Operator

Issue - State: closed - Opened by supertetelman over 1 year ago - 1 comment
Labels: enhancement, no-issue-activity

#1290 - Docker installation playbook no longer working

Issue - State: closed - Opened by supertetelman over 1 year ago
Labels: bug

#1289 - K8s dashboard is not viewable by default due to https configuration

Issue - State: closed - Opened by supertetelman over 1 year ago - 1 comment
Labels: bug, no-issue-activity

#1288 - update roles to latest versions

Pull Request - State: closed - Opened by dholt over 1 year ago

#1287 - fix for out-of-date 3rd party ansible role causing error

Pull Request - State: closed - Opened by dholt over 1 year ago - 1 comment

#1286 - BUG:1284 - K8s Dashboard update

Pull Request - State: closed - Opened by supertetelman over 1 year ago

#1285 - nodelocaldns forever crashing/restarting [Info/Solution]

Issue - State: closed - Opened by Steven9Smith over 1 year ago - 2 comments
Labels: no-issue-activity

#1284 - no token generate with ./scripts/k8s/deploy_dashboard_user.sh

Issue - State: closed - Opened by Steven9Smith over 1 year ago - 3 comments

#1283 - Bump Kubeflow (1.7.0) and kustomize (5.1.0)

Pull Request - State: closed - Opened by supertetelman over 1 year ago - 2 comments

#1282 - Bump Kubespray to v2.22.1

Pull Request - State: closed - Opened by supertetelman over 1 year ago

#1281 - Version bumps for GPU Operator, GFD, and Device Plugin (23.3.2)

Pull Request - State: closed - Opened by supertetelman over 1 year ago

#1280 - Is this proyect alive?

Issue - State: closed - Opened by morsoinferno over 1 year ago - 3 comments

#1279 - Minor: Fix hardcoded slurm username

Pull Request - State: closed - Opened by jeremyfix over 1 year ago - 1 comment

#1277 - Building Slurm with Lua

Issue - State: closed - Opened by rkevk over 1 year ago - 2 comments
Labels: no-issue-activity

#1276 - Error: alpine-glibc-shim was not installed

Issue - State: closed - Opened by paoloaq over 1 year ago - 2 comments
Labels: no-issue-activity

#1275 - [HELP] How can we add all available gpus?

Issue - State: closed - Opened by asher-lab over 1 year ago - 1 comment

#1274 - Deos Deepops support NVIDIA driver version 515 or 525?

Issue - State: closed - Opened by Meeshel7 over 1 year ago - 1 comment
Labels: no-issue-activity

#1273 - Error mounting /home: umount: /home: target is busy

Issue - State: closed - Opened by starlitsky2010 over 1 year ago - 2 comments
Labels: no-issue-activity

#1272 - ERROR! 'include' is not a valid attribute for a Play

Issue - State: closed - Opened by jerry-birdseye over 1 year ago - 2 comments
Labels: no-issue-activity

#1270 - nvme Operation not permitted

Issue - State: closed - Opened by georgecreis over 1 year ago - 1 comment
Labels: no-issue-activity

#1269 - Ensure docker-ce repository is enabled failed

Issue - State: closed - Opened by hakimamarullah almost 2 years ago

#1267 - node exporters don't work after initial run of slurm playbook

Issue - State: closed - Opened by jsharpe almost 2 years ago - 5 comments
Labels: no-issue-activity

#1266 - Slurm build deps on Ubuntu missing libdbus-1-dev

Issue - State: closed - Opened by jsharpe almost 2 years ago - 2 comments

#1263 - [wip] Bump metallb from 0.12.1 to 0.13.9

Pull Request - State: closed - Opened by supertetelman almost 2 years ago - 1 comment

#1262 - Install new jmespath requirement in setup.sh

Pull Request - State: closed - Opened by supertetelman almost 2 years ago

#1261 - Conform to standard gpu operator namespacing

Pull Request - State: closed - Opened by supertetelman almost 2 years ago

#1260 - the role 'kubespray-defaults' was not found

Issue - State: closed - Opened by sagigithubcorner almost 2 years ago - 2 comments
Labels: no-issue-activity

#1259 - Is ssh into the Enroot container supposed to be passwordless?

Issue - State: closed - Opened by stephandooper almost 2 years ago - 1 comment
Labels: no-issue-activity

#1258 - [ISSUE][deepops, tag: 20.04.2] In CentOS 7.9 x64, msg: 'Not a public key: https://getfedora.org/static/fedora.gpg'

Issue - State: closed - Opened by ScGPS almost 2 years ago - 2 comments
Labels: no-issue-activity

#1257 - Issue with K8 Cluster not detecting GPUs

Issue - State: closed - Opened by mlahir1 almost 2 years ago - 2 comments
Labels: no-issue-activity

#1256 - Ports closed on docker startup

Issue - State: closed - Opened by clemsgrs almost 2 years ago - 4 comments
Labels: no-issue-activity

#1255 - Uninstall DeepOps and single-node slurm completely

Issue - State: closed - Opened by adimukewar almost 2 years ago - 1 comment
Labels: no-issue-activity

#1254 - NVIDIA deepops is support GPU Time Slicing ?

Issue - State: closed - Opened by jjsair0412 about 2 years ago - 4 comments

#1253 - [WIP]Bump to latest Kubespray and accomodate docker deprecation in tests

Pull Request - State: closed - Opened by supertetelman about 2 years ago - 2 comments

#1251 - Bump Network Operator

Issue - State: closed - Opened by supertetelman about 2 years ago - 1 comment
Labels: no-issue-activity

#1250 - Bump GPU Operator to v22.9.1

Pull Request - State: closed - Opened by supertetelman about 2 years ago

#1249 - Kubeflow v1.6.1 Upgrade & drop failing docker runtime tests

Pull Request - State: closed - Opened by supertetelman about 2 years ago

#1248 - crictl does not respect proxy config

Issue - State: closed - Opened by fecet about 2 years ago - 3 comments
Labels: no-issue-activity

#1247 - Galaxy setup failed

Issue - State: closed - Opened by fecet about 2 years ago - 1 comment

#1246 - GPU is disassociating after running a playbook

Issue - State: closed - Opened by georgettica about 2 years ago - 3 comments
Labels: no-issue-activity

#1245 - Any plans of OnDemand support for Kubernetes cluster?

Issue - State: closed - Opened by jungyh0218 about 2 years ago - 1 comment
Labels: no-issue-activity

#1244 - Add CodeQL workflow for GitHub code scanning

Pull Request - State: closed - Opened by lgtm-com[bot] about 2 years ago

#1243 - Update air-gapped documentation.

Pull Request - State: closed - Opened by mkunin-work about 2 years ago

#1242 - Uninstalling SLURM

Issue - State: closed - Opened by mfruhner about 2 years ago - 7 comments
Labels: no-issue-activity

#1241 - golang install fails

Issue - State: closed - Opened by arnoldas500 about 2 years ago - 4 comments
Labels: no-issue-activity

#1240 - ./scripts/k8s/verify_gpu.sh fail

Issue - State: closed - Opened by TranThanh96 about 2 years ago - 3 comments
Labels: no-issue-activity

#1239 - Error while trying for air gapped environment

Issue - State: closed - Opened by saingithub about 2 years ago - 2 comments
Labels: no-issue-activity

#1238 - msg: apt cache update failed

Issue - State: closed - Opened by TranThanh96 about 2 years ago - 3 comments
Labels: no-issue-activity

#1236 - 2 slurm clusters in Deepops

Issue - State: closed - Opened by meeshel78 over 2 years ago - 3 comments
Labels: no-issue-activity

#1233 - Deepops upgrade issue v21.06

Issue - State: closed - Opened by subasathees over 2 years ago - 5 comments
Labels: no-issue-activity

#1232 - Cgroup v2 support for SLURM cluster (singuality, grafana, slurm)

Issue - State: closed - Opened by biocyberman over 2 years ago - 1 comment
Labels: no-issue-activity

#1231 - Implementation Fails on RHEL 7.6 - UndefinedError: 'dict object' has no attribute 'kube_node'

Issue - State: closed - Opened by anieshmathew over 2 years ago - 1 comment
Labels: no-issue-activity

#1229 - Configuration FOr RHEL

Issue - State: closed - Opened by jittu11 over 2 years ago - 1 comment
Labels: no-issue-activity

#1224 - The PyMySQL or MySQL-python module is required

Issue - State: closed - Opened by zstreeter over 2 years ago - 1 comment

#1217 - Maas Packer Can't Find EFI Partition to Load GRUB After Imaging with DGX 5.4 iso

Issue - State: closed - Opened by kschlichter over 2 years ago - 2 comments
Labels: no-issue-activity

#1212 - OpenOnDemand 2.0 releasing .deb files today

Issue - State: closed - Opened by johrstrom over 2 years ago - 7 comments
Labels: no-issue-activity

#1211 - verify all GPU nodes plug-ins in the Kubernetes cluster Fails

Issue - State: closed - Opened by arnoldas500 over 2 years ago - 7 comments
Labels: question, no-issue-activity

#1205 - Pytorch multi-gpu example hangs with Kubeflow but works with straight Docker

Issue - State: closed - Opened by cupdike over 2 years ago - 2 comments
Labels: no-issue-activity

#1204 - Copy Kubectl to /usr/local/bin hanging

Issue - State: closed - Opened by iamadrigal over 2 years ago - 4 comments
Labels: no-issue-activity

#1201 - Support for Ubuntu 22.04

Issue - State: closed - Opened by ajdecon over 2 years ago - 3 comments
Labels: no-stale

#1147 - Upgrade to Kubeflow v1.6 when it is available, until then Kubeflow is unsupported.

Issue - State: closed - Opened by supertetelman almost 3 years ago - 2 comments
Labels: no-stale

#1120 - Add support for RHEL 8 in DGX Stack role

Issue - State: open - Opened by ajdecon almost 3 years ago - 1 comment
Labels: no-stale

#1118 - Migrate CentOS-8 molecule tests to Rocky

Pull Request - State: closed - Opened by ajdecon almost 3 years ago - 3 comments
Labels: no-pr-activity

#1117 - Update NVIDIA DGX role to match DGX Software Stack for Ubuntu

Pull Request - State: closed - Opened by ajdecon almost 3 years ago - 6 comments
Labels: no-pr-activity

#1048 - Enable rootless docker-daemon per Slurm job.

Pull Request - State: closed - Opened by avolkov1 over 3 years ago - 8 comments
Labels: no-pr-activity

#942 - pmix failures with test playbook

Issue - State: closed - Opened by verdurin almost 4 years ago - 3 comments
Labels: no-issue-activity

#939 - NFS client error

Issue - State: closed - Opened by verdurin almost 4 years ago - 6 comments

#550 - DeepOps.hosts role is fragile to inventory configuration

Issue - State: closed - Opened by ajdecon over 4 years ago - 6 comments
Labels: no-issue-activity