Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / NVIDIA/deepops issues and pull requests
#1319 - vagrant_startup.sh ubuntu 22.04 update
Issue -
State: closed - Opened by hpcadmins 4 months ago
- 1 comment
Labels: no-issue-activity
#1318 - Increase KillWait to 120 in slurm.conf
Pull Request -
State: closed - Opened by ilya-da 5 months ago
#1317 - KillWait default in slurm slurm
Issue -
State: closed - Opened by ilya-da 5 months ago
- 2 comments
Labels: no-issue-activity
#1316 - Update GPU process cleanup logic in SLURM epilog script
Pull Request -
State: closed - Opened by ilya-da 5 months ago
#1315 - fetching PIDs for timeout jobs for cleanup sometimes fail to kill processes
Issue -
State: closed - Opened by ilya-da 5 months ago
- 2 comments
Labels: no-issue-activity
#1314 - msg: 'Error connecting: Error while fetching server API version: Not supported URL scheme http+docker'
Issue -
State: open - Opened by seltsa 5 months ago
- 4 comments
#1313 - deepops 24.08?
Issue -
State: closed - Opened by mathrock74 5 months ago
- 3 comments
#1312 - Unable to install some galaxy collections using ./scripts/setup.sh
Issue -
State: closed - Opened by alldino 5 months ago
- 2 comments
Labels: no-issue-activity
#1311 - Ansible playbook failing to add RHEL 8 DGX Node in K8s cluster
Issue -
State: closed - Opened by subasathees 6 months ago
- 1 comment
Labels: no-issue-activity
#1310 - It's seem miss galaxy folder
Issue -
State: closed - Opened by v-ducnt69 6 months ago
- 1 comment
Labels: no-issue-activity
#1309 - Adding a Lua submission script
Issue -
State: closed - Opened by clemsgrs 7 months ago
- 2 comments
#1308 - Upgrading NVIDIA Driver without reseting cluster
Issue -
State: closed - Opened by Heegreis 8 months ago
- 3 comments
Labels: no-issue-activity
#1307 - Errors in deepops/slurm-exporter
Issue -
State: closed - Opened by fa-ina-tic 9 months ago
- 4 comments
#1306 - NIS configuration
Issue -
State: closed - Opened by nttg8100 10 months ago
- 1 comment
Labels: no-issue-activity
#1305 - Compatibility with DGX H100
Issue -
State: closed - Opened by anubhavpatrick 11 months ago
- 1 comment
#1304 - Enabling persistent MIG in GPU instances of DGX-A100
Issue -
State: closed - Opened by murukessanap about 1 year ago
- 2 comments
Labels: no-issue-activity
#1303 - Deepops Slurm NCCL Fail
Issue -
State: closed - Opened by andrevianadf about 1 year ago
- 2 comments
Labels: no-issue-activity
#1302 - Error Running ansible-playbook on slurm-cluster: Docker-ce Repository Activation Issue
Issue -
State: closed - Opened by sikso1892 about 1 year ago
- 1 comment
Labels: no-issue-activity
#1301 - Update ansible.cfg
Pull Request -
State: closed - Opened by Musab0 over 1 year ago
#1300 - playbook slurm-cluster fails on DGX OS 6 on nvidia-peer-memory task
Issue -
State: closed - Opened by itzsimpl over 1 year ago
- 1 comment
Labels: no-issue-activity
#1299 - TLS certificate replacement steps are unclear
Issue -
State: closed - Opened by programmer94 over 1 year ago
- 1 comment
Labels: no-issue-activity
#1298 - Extend single node K8s DeepOps with additional nodes
Issue -
State: closed - Opened by cocakohler over 1 year ago
- 1 comment
Labels: no-issue-activity
#1297 - NVML version + H100 GPU
Issue -
State: closed - Opened by mathrock74 over 1 year ago
- 3 comments
#1296 - Release 23.08
Pull Request -
State: closed - Opened by dholt over 1 year ago
#1295 - slurm-master without GPU failed at nvml autodetect
Issue -
State: closed - Opened by leoncamel over 1 year ago
- 3 comments
#1294 - Release updates
Pull Request -
State: closed - Opened by dholt over 1 year ago
#1293 - Fix for docker install playbook due to kubespray changes
Pull Request -
State: closed - Opened by dholt over 1 year ago
#1292 - update nvidia_driver_ubuntu_cuda_keyring_package to latest version
Pull Request -
State: closed - Opened by JH-LEE-KR over 1 year ago
#1291 - Update the Network Operator
Issue -
State: closed - Opened by supertetelman over 1 year ago
- 1 comment
Labels: enhancement, no-issue-activity
#1290 - Docker installation playbook no longer working
Issue -
State: closed - Opened by supertetelman over 1 year ago
Labels: bug
#1289 - K8s dashboard is not viewable by default due to https configuration
Issue -
State: closed - Opened by supertetelman over 1 year ago
- 1 comment
Labels: bug, no-issue-activity
#1288 - update roles to latest versions
Pull Request -
State: closed - Opened by dholt over 1 year ago
#1287 - fix for out-of-date 3rd party ansible role causing error
Pull Request -
State: closed - Opened by dholt over 1 year ago
- 1 comment
#1286 - BUG:1284 - K8s Dashboard update
Pull Request -
State: closed - Opened by supertetelman over 1 year ago
#1285 - nodelocaldns forever crashing/restarting [Info/Solution]
Issue -
State: closed - Opened by Steven9Smith over 1 year ago
- 2 comments
Labels: no-issue-activity
#1284 - no token generate with ./scripts/k8s/deploy_dashboard_user.sh
Issue -
State: closed - Opened by Steven9Smith over 1 year ago
- 3 comments
#1283 - Bump Kubeflow (1.7.0) and kustomize (5.1.0)
Pull Request -
State: closed - Opened by supertetelman over 1 year ago
- 2 comments
#1282 - Bump Kubespray to v2.22.1
Pull Request -
State: closed - Opened by supertetelman over 1 year ago
#1281 - Version bumps for GPU Operator, GFD, and Device Plugin (23.3.2)
Pull Request -
State: closed - Opened by supertetelman over 1 year ago
#1280 - Is this proyect alive?
Issue -
State: closed - Opened by morsoinferno over 1 year ago
- 3 comments
#1279 - Minor: Fix hardcoded slurm username
Pull Request -
State: closed - Opened by jeremyfix over 1 year ago
- 1 comment
#1277 - Building Slurm with Lua
Issue -
State: closed - Opened by rkevk over 1 year ago
- 2 comments
Labels: no-issue-activity
#1276 - Error: alpine-glibc-shim was not installed
Issue -
State: closed - Opened by paoloaq over 1 year ago
- 2 comments
Labels: no-issue-activity
#1275 - [HELP] How can we add all available gpus?
Issue -
State: closed - Opened by asher-lab over 1 year ago
- 1 comment
#1274 - Deos Deepops support NVIDIA driver version 515 or 525?
Issue -
State: closed - Opened by Meeshel7 over 1 year ago
- 1 comment
Labels: no-issue-activity
#1273 - Error mounting /home: umount: /home: target is busy
Issue -
State: closed - Opened by starlitsky2010 over 1 year ago
- 2 comments
Labels: no-issue-activity
#1272 - ERROR! 'include' is not a valid attribute for a Play
Issue -
State: closed - Opened by jerry-birdseye over 1 year ago
- 2 comments
Labels: no-issue-activity
#1270 - nvme Operation not permitted
Issue -
State: closed - Opened by georgecreis over 1 year ago
- 1 comment
Labels: no-issue-activity
#1269 - Ensure docker-ce repository is enabled failed
Issue -
State: closed - Opened by hakimamarullah almost 2 years ago
#1267 - node exporters don't work after initial run of slurm playbook
Issue -
State: closed - Opened by jsharpe almost 2 years ago
- 5 comments
Labels: no-issue-activity
#1266 - Slurm build deps on Ubuntu missing libdbus-1-dev
Issue -
State: closed - Opened by jsharpe almost 2 years ago
- 2 comments
#1265 - Add virtual/vagrant support for Ubuntu 22.04 in order to test via Jenkins
Pull Request -
State: closed - Opened by supertetelman almost 2 years ago
#1264 - Fix non-GPU Operator installs by allowing installation into default namespace
Pull Request -
State: closed - Opened by supertetelman almost 2 years ago
#1263 - [wip] Bump metallb from 0.12.1 to 0.13.9
Pull Request -
State: closed - Opened by supertetelman almost 2 years ago
- 1 comment
#1262 - Install new jmespath requirement in setup.sh
Pull Request -
State: closed - Opened by supertetelman almost 2 years ago
#1261 - Conform to standard gpu operator namespacing
Pull Request -
State: closed - Opened by supertetelman almost 2 years ago
#1260 - the role 'kubespray-defaults' was not found
Issue -
State: closed - Opened by sagigithubcorner almost 2 years ago
- 2 comments
Labels: no-issue-activity
#1259 - Is ssh into the Enroot container supposed to be passwordless?
Issue -
State: closed - Opened by stephandooper almost 2 years ago
- 1 comment
Labels: no-issue-activity
#1258 - [ISSUE][deepops, tag: 20.04.2] In CentOS 7.9 x64, msg: 'Not a public key: https://getfedora.org/static/fedora.gpg'
Issue -
State: closed - Opened by ScGPS almost 2 years ago
- 2 comments
Labels: no-issue-activity
#1257 - Issue with K8 Cluster not detecting GPUs
Issue -
State: closed - Opened by mlahir1 almost 2 years ago
- 2 comments
Labels: no-issue-activity
#1256 - Ports closed on docker startup
Issue -
State: closed - Opened by clemsgrs almost 2 years ago
- 4 comments
Labels: no-issue-activity
#1255 - Uninstall DeepOps and single-node slurm completely
Issue -
State: closed - Opened by adimukewar almost 2 years ago
- 1 comment
Labels: no-issue-activity
#1254 - NVIDIA deepops is support GPU Time Slicing ?
Issue -
State: closed - Opened by jjsair0412 about 2 years ago
- 4 comments
#1253 - [WIP]Bump to latest Kubespray and accomodate docker deprecation in tests
Pull Request -
State: closed - Opened by supertetelman about 2 years ago
- 2 comments
#1252 - [Error] When provisioing the k8s cluster, an error occurs when setup.sh running the script. - ImportError: cannot import name 'soft_unicode' from 'markupsafe'
Issue -
State: closed - Opened by jjsair0412 about 2 years ago
- 3 comments
Labels: no-issue-activity
#1251 - Bump Network Operator
Issue -
State: closed - Opened by supertetelman about 2 years ago
- 1 comment
Labels: no-issue-activity
#1250 - Bump GPU Operator to v22.9.1
Pull Request -
State: closed - Opened by supertetelman about 2 years ago
#1249 - Kubeflow v1.6.1 Upgrade & drop failing docker runtime tests
Pull Request -
State: closed - Opened by supertetelman about 2 years ago
#1248 - crictl does not respect proxy config
Issue -
State: closed - Opened by fecet about 2 years ago
- 3 comments
Labels: no-issue-activity
#1247 - Galaxy setup failed
Issue -
State: closed - Opened by fecet about 2 years ago
- 1 comment
#1246 - GPU is disassociating after running a playbook
Issue -
State: closed - Opened by georgettica about 2 years ago
- 3 comments
Labels: no-issue-activity
#1245 - Any plans of OnDemand support for Kubernetes cluster?
Issue -
State: closed - Opened by jungyh0218 about 2 years ago
- 1 comment
Labels: no-issue-activity
#1244 - Add CodeQL workflow for GitHub code scanning
Pull Request -
State: closed - Opened by lgtm-com[bot] about 2 years ago
#1243 - Update air-gapped documentation.
Pull Request -
State: closed - Opened by mkunin-work about 2 years ago
#1242 - Uninstalling SLURM
Issue -
State: closed - Opened by mfruhner about 2 years ago
- 7 comments
Labels: no-issue-activity
#1241 - golang install fails
Issue -
State: closed - Opened by arnoldas500 about 2 years ago
- 4 comments
Labels: no-issue-activity
#1240 - ./scripts/k8s/verify_gpu.sh fail
Issue -
State: closed - Opened by TranThanh96 about 2 years ago
- 3 comments
Labels: no-issue-activity
#1239 - Error while trying for air gapped environment
Issue -
State: closed - Opened by saingithub about 2 years ago
- 2 comments
Labels: no-issue-activity
#1238 - msg: apt cache update failed
Issue -
State: closed - Opened by TranThanh96 about 2 years ago
- 3 comments
Labels: no-issue-activity
#1236 - 2 slurm clusters in Deepops
Issue -
State: closed - Opened by meeshel78 over 2 years ago
- 3 comments
Labels: no-issue-activity
#1233 - Deepops upgrade issue v21.06
Issue -
State: closed - Opened by subasathees over 2 years ago
- 5 comments
Labels: no-issue-activity
#1232 - Cgroup v2 support for SLURM cluster (singuality, grafana, slurm)
Issue -
State: closed - Opened by biocyberman over 2 years ago
- 1 comment
Labels: no-issue-activity
#1231 - Implementation Fails on RHEL 7.6 - UndefinedError: 'dict object' has no attribute 'kube_node'
Issue -
State: closed - Opened by anieshmathew over 2 years ago
- 1 comment
Labels: no-issue-activity
#1229 - Configuration FOr RHEL
Issue -
State: closed - Opened by jittu11 over 2 years ago
- 1 comment
Labels: no-issue-activity
#1224 - The PyMySQL or MySQL-python module is required
Issue -
State: closed - Opened by zstreeter over 2 years ago
- 1 comment
#1217 - Maas Packer Can't Find EFI Partition to Load GRUB After Imaging with DGX 5.4 iso
Issue -
State: closed - Opened by kschlichter over 2 years ago
- 2 comments
Labels: no-issue-activity
#1214 - AutoDetect=nvml on gres.conf not working. Error "fatal: We were configured to autodetect nvml functionality, but we weren't able to find that lib when Slurm was configured"
Issue -
State: closed - Opened by anateshan over 2 years ago
- 5 comments
#1212 - OpenOnDemand 2.0 releasing .deb files today
Issue -
State: closed - Opened by johrstrom over 2 years ago
- 7 comments
Labels: no-issue-activity
#1211 - verify all GPU nodes plug-ins in the Kubernetes cluster Fails
Issue -
State: closed - Opened by arnoldas500 over 2 years ago
- 7 comments
Labels: question, no-issue-activity
#1205 - Pytorch multi-gpu example hangs with Kubeflow but works with straight Docker
Issue -
State: closed - Opened by cupdike over 2 years ago
- 2 comments
Labels: no-issue-activity
#1204 - Copy Kubectl to /usr/local/bin hanging
Issue -
State: closed - Opened by iamadrigal over 2 years ago
- 4 comments
Labels: no-issue-activity
#1201 - Support for Ubuntu 22.04
Issue -
State: closed - Opened by ajdecon over 2 years ago
- 3 comments
Labels: no-stale
#1147 - Upgrade to Kubeflow v1.6 when it is available, until then Kubeflow is unsupported.
Issue -
State: closed - Opened by supertetelman almost 3 years ago
- 2 comments
Labels: no-stale
#1120 - Add support for RHEL 8 in DGX Stack role
Issue -
State: open - Opened by ajdecon almost 3 years ago
- 1 comment
Labels: no-stale
#1118 - Migrate CentOS-8 molecule tests to Rocky
Pull Request -
State: closed - Opened by ajdecon almost 3 years ago
- 3 comments
Labels: no-pr-activity
#1117 - Update NVIDIA DGX role to match DGX Software Stack for Ubuntu
Pull Request -
State: closed - Opened by ajdecon almost 3 years ago
- 6 comments
Labels: no-pr-activity
#1048 - Enable rootless docker-daemon per Slurm job.
Pull Request -
State: closed - Opened by avolkov1 over 3 years ago
- 8 comments
Labels: no-pr-activity
#942 - pmix failures with test playbook
Issue -
State: closed - Opened by verdurin almost 4 years ago
- 3 comments
Labels: no-issue-activity
#939 - NFS client error
Issue -
State: closed - Opened by verdurin almost 4 years ago
- 6 comments
#550 - DeepOps.hosts role is fragile to inventory configuration
Issue -
State: closed - Opened by ajdecon over 4 years ago
- 6 comments
Labels: no-issue-activity