Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / guidebooks/store issues and pull requests
#791 - fix: remove Items field
Pull Request -
State: closed - Opened by Sara-KS over 1 year ago
#790 - feat: Update to mcad v1.34.1 support and torchx 0.6.0
Pull Request -
State: closed - Opened by Sara-KS over 1 year ago
#789 - fix: more EOF protection fixes
Pull Request -
State: closed - Opened by starpit over 1 year ago
#788 - Update pvc.yaml - add diskfree parameter
Pull Request -
State: closed - Opened by ykoyfman over 1 year ago
#787 - fix: ray head init container should print a message when it is done waiting for workers
Pull Request -
State: closed - Opened by starpit over 1 year ago
#786 - fix: cpu utilization information may be bogus; switch to cgroup-based stats
Pull Request -
State: closed - Opened by starpit over 1 year ago
#785 - fix: increase max log requests for app logs
Pull Request -
State: closed - Opened by starpit over 1 year ago
#784 - fix: ray head wait-for-workers initContainer should retry if wait fails
Pull Request -
State: closed - Opened by starpit over 1 year ago
#783 - fix: multinic detection was broken; also was hard-wiring name of resource
Pull Request -
State: closed - Opened by starpit over 1 year ago
#782 - fix: custodian logs container fails due to unescaped $ in $TAIL
Pull Request -
State: closed - Opened by starpit over 1 year ago
#781 - fix: cache ray/torchx helm chart
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#780 - fix: improve torchx support for running multiple gpus per pod
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#779 - feat: add some NCCL tweaks
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#778 - fix: syntax error in multinic for torchx
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#777 - feat: add multinic support
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#776 - fix: ray wait for workers initContainer not needed with 0 workers
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#775 - fix: use initContainer to wait for ray workers
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#774 - fix: increase ray gcs rpc timeout to 30s
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#773 - fix: more EOF resiliency fixes for ray and torchx
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#772 - fix: increase torchx log streaming resilience to network disconnects
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#771 - fix: wait for ray workers prior to server-side job submit
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#770 - fix: restore helm delete and increase resilience to network disconnects
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#769 - fix: avoid helm delete in custodian for now
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#768 - Revert "fix: avoid use of all-containers in ray log streamer"
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#767 - fix: all-containers fix should async app logs and sync on ray head logs
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#766 - Revert "fix: avoid use of all-containers in ray log streamer"
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#765 - fix: avoid use of all-containers in ray log streamer
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#764 - fix: increase memory for runtime-env custodian pod
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#763 - fix: increase memory for ray head logs container
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#762 - fix: torchx volume mount paths have extra quotes
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#761 - fix: remove reliance on wget in ray head container
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#760 - fix: improve custodian memory requests for larger jobs
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#759 - fix: ignore __pycache__ when bundling up workdir
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#758 - fix: improve support for pytorch lightning's fsspec[s3] support
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#757 - fix: do not create gpu custodian container for non-gpu runs
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#756 - fix: lower memory requests for some of the custodian pods
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#755 - chore: move custodian to ml/codeflare/custodian
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#754 - fix: add worker-status to custodian
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#753 - fix: add runtime-env-setup to custodian
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#752 - chore: remove old untested 'in-cluster' log aggregator
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#751 - fix: eliminate newlines from base64
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#750 - feat: add gpu utilization pod to custodian
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#749 - feat: add memory utilization pod to custodian
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#748 - feat: add cpu utilization pod to custodian
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#747 - fix: use multi-line yaml to improve formatting of logs args
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#746 - fix: lower custodian logs container 100m/128Mi -> 50m/32Mi
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#745 - fix: clean up custodian command, and rename container 'logs'
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#744 - fix: torchx cluster name may end with a dash
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#743 - fix: owner label default needs to be quoted
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#742 - fix: add app.kubernetes.io/owner label to pods
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#741 - fix: add 'app.kubernetes.io/managed-by: codeflare' label to custodian
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#740 - feat: improve custodian support for torchx, use smaller base image
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#739 - fix: logs custodian should pull from kubectl logs, not ray job logs
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#738 - fix: logs custodian has errors with tee'ing to file
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#737 - feat: rename self-destruct to logs; and increase ttl timeout on its job
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#736 - fix: final Succeeded message not shown in ray jobs
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#735 - fix: further improvements to ray log streaming
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#734 - fix: ray logs not smooth
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#733 - feat: avoid websocat in ml/ray/run/logs
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#732 - fix: websocat ray log streaming can be simplified
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#731 - fix: decrease epochs from 5 to 2 for getting started ray example
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#730 - fix: ray labels were using /name should use /instance
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#729 - fix: vmstat data lacks pod/ prefix on pod name
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#728 - fix: ray jobs emit job env.json only after job is running
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#727 - fix: improve messaging of torchx wait-till-running
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#726 - fix: pod-memory stream lacked pod/ prefix for hostname
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#725 - fix: torchx wait-till-running was not waiting till *all* workers were running
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#724 - fix: torchx env isn't written out till the job is already running
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#723 - fix: capture job env vars for torchx runs
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#722 - fix: torchx captured logs may not include Succeeded/Failed events
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#721 - fix: syntax error in code block in torchx status poller
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#720 - fix: torchx exit handlers were not right
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#719 - fix: small refinements to torchx logs
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#718 - fix: remove leftover 'set -x' from debugging
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#717 - fix: torchx job status file needs to use tee -a to append
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#716 - fix: improved event handling for torchx exit
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#715 - fix: improve torchx status events to show Job status
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#714 - fix: torchx jobs lacked kube event stream
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#713 - fix: torchx script logic fails if python prefix is not python3
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#712 - fix: clean up content and coloring of helm install output
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#711 - fix: torchx cli install fails on zsh
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#710 - fix: sed RE error can occur in torchx log streamer
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#709 - fix: pass through guidebook env vars to torchx
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#708 - fix: ml/torchx/run may fail for users with long user names
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#707 - fix: torchx log streamer would fail if lines contained control chars
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#706 - fix: update to official torchx 0.5.0 release
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#705 - fix: don't fail if we can't hack uid-range
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#704 - fix: in CI, don't try to use ssh git cloning for workdir
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#703 - feat: add support for workdir being a github https:// url
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#702 - fix: ml/torchx/run fails if main python file is not 'main.py'
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#701 - fix: another fix for relative workdir
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#700 - fix: further improvements to helm install with relative workdir
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#699 - fix: improved support for installing and running torchx on 3.9.6 on m…
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#698 - fix: force vmstat timestamps to use UTC timezone
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#697 - fix: capture env.json in log aggregation
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#696 - fix: another fix to improve syntactic conformance of gpu utilization …
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#695 - fix: gpu stream displays temps with % unit
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#694 - fix: update gpu utilization stream to conform to vmstat and events log structure
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#693 - fix: kubectl linux-arm64 installs arm32 binary
Pull Request -
State: closed - Opened by starpit almost 2 years ago
#692 - fix: bump to madwizard@8 to adopt shell.stdin convention
Pull Request -
State: closed - Opened by starpit almost 2 years ago