Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / google/xpk issues and pull requests

#186 - Fix autoprovisioning with spot nodes

Pull Request - State: open - Opened by avrittrohwer about 2 months ago

#185 - Fix GKE node version selection logic

Pull Request - State: open - Opened by 44past4 about 2 months ago

#184 - better core dump for debugging

Pull Request - State: open - Opened by ZhiyuLi-goog about 2 months ago

#183 - Fix debug logging (--enable-debug-logs)

Pull Request - State: closed - Opened by Obliviour 2 months ago

#182 - Fixes a typo in the base command description

Pull Request - State: closed - Opened by lukebaumann 2 months ago

#181 - Fix debug logging

Pull Request - State: closed - Opened by Obliviour 2 months ago - 1 comment

#180 - Add Zarr Flag for Pathways

Pull Request - State: closed - Opened by SujeethJinesh 2 months ago - 1 comment

#179 - v6e device support

Pull Request - State: closed - Opened by Obliviour 2 months ago - 1 comment

#177 - Enabling Workload Identity and GCSFuse driver flags added.

Pull Request - State: closed - Opened by sharabiani 3 months ago - 1 comment

#176 - Pbundyra refactor commands

Pull Request - State: closed - Opened by PBundyra 3 months ago

#172 - Update RxDM image version from v1.0.8 to v1.0.9.

Pull Request - State: closed - Opened by yangyuwei 3 months ago

#171 - Update RxDM image version from v1.0.8 to v1.0.9.

Pull Request - State: closed - Opened by yangyuwei 3 months ago - 1 comment

#170 - Move SystemCharacteristics to a separate module

Pull Request - State: closed - Opened by PBundyra 3 months ago

#169 - Allow debug_dump_gcs to be specified with other XLA_FLAGS

Pull Request - State: open - Opened by jonb377 3 months ago

#168 - Create `parser` package. Move logic from `xpk.py` to `parser` package.

Pull Request - State: closed - Opened by PBundyra 3 months ago - 1 comment

#167 - Fix issue with device check failure

Pull Request - State: closed - Opened by jonb377 3 months ago - 1 comment

#166 - Create `xpk` package with `utils` module

Pull Request - State: closed - Opened by PBundyra 3 months ago - 3 comments

#165 - Create xpk package, utils module and refactor

Pull Request - State: closed - Opened by PBundyra 3 months ago

#164 - Enabling Workload Identity and GCSFuse driver flags

Pull Request - State: closed - Opened by sharabiani 3 months ago

#163 - Python3.10 fix - use CSV format for gcloud commands to simplify parsing

Pull Request - State: closed - Opened by nhira 3 months ago - 1 comment

#162 - Allow SIGTERM error code to be returned from XPK

Pull Request - State: closed - Opened by Obliviour 3 months ago

#161 - Create cluster from several reservations

Issue - State: open - Opened by DwarKapex 3 months ago - 1 comment

#159 - Use csv formatting instead in the gcloud command to split the names o…

Pull Request - State: closed - Opened by Obliviour 4 months ago - 1 comment

#157 - Correct Suspend/Resume backoffLimit for Pathways

Pull Request - State: closed - Opened by SujeethJinesh 4 months ago - 3 comments

#156 - Remove flag `pathways_compilation_mode` from xpk.py

Pull Request - State: closed - Opened by norx1991 4 months ago

#155 - Remove incorrect plural from filter-by-job

Pull Request - State: closed - Opened by Obliviour 5 months ago

#154 - Update XPK to support topology-aware scheduler for GPU workloads.

Pull Request - State: closed - Opened by yangyuwei 5 months ago - 1 comment

#153 - Update the CloudDNS check.

Pull Request - State: open - Opened by lukebaumann 5 months ago - 3 comments

#152 - Add logic to fail Pathways jobs on user code errors.

Pull Request - State: closed - Opened by RoshaniN 5 months ago - 1 comment

#149 - Fixing device_type in nightly tests.

Pull Request - State: closed - Opened by RoshaniN 5 months ago

#147 - Fix stacktrace sidecar container yaml

Pull Request - State: closed - Opened by SurbhiJainUSC 6 months ago

#146 - Don't Kill RM or Proxy on user job failure

Pull Request - State: open - Opened by SujeethJinesh 6 months ago

#145 - Enable cluster and workload creation on A3+.

Pull Request - State: closed - Opened by yangyuwei 6 months ago - 1 comment

#144 - Add Unified Logging View for Pathways on Cloud

Pull Request - State: closed - Opened by SujeethJinesh 6 months ago - 1 comment

#143 - Update pathways server and proxy server image locations

Pull Request - State: closed - Opened by sadikneipp 6 months ago

#142 - Fixing misleading message on Validate Docker Image.

Pull Request - State: closed - Opened by RoshaniN 6 months ago

#141 - Pathways in headless mode.

Pull Request - State: closed - Opened by RoshaniN 6 months ago

#140 - Allow JAX coordinator to find the JobSet name.

Pull Request - State: closed - Opened by RoshaniN 6 months ago

#139 - Making exit flow similar to other XPK commands.

Pull Request - State: closed - Opened by RoshaniN 7 months ago - 1 comment

#138 - Update pip version to 0.5.0

Pull Request - State: closed - Opened by SurbhiJainUSC 7 months ago - 1 comment

#137 - Update xpk.py

Pull Request - State: closed - Opened by kyle-google 7 months ago - 2 comments

#133 - enable create workload for h150

Pull Request - State: open - Opened by NinaCai 7 months ago

#132 - Fix incorrect indent in workload list output

Pull Request - State: closed - Opened by Obliviour 7 months ago

#131 - Disable service account feature

Pull Request - State: closed - Opened by SurbhiJainUSC 7 months ago - 1 comment

#130 - Set gcloud zone property for build and nightly tests

Pull Request - State: closed - Opened by SurbhiJainUSC 7 months ago

#129 - Add project flag to service account commands

Pull Request - State: closed - Opened by SurbhiJainUSC 7 months ago

#125 - Add configuration setting for default pool num nodes

Pull Request - State: closed - Opened by Obliviour 7 months ago

#124 - Add custom env variables to CPU workloads.

Pull Request - State: closed - Opened by RoshaniN 7 months ago

#123 - Enable formating with pyink to adhere with google3 style.

Pull Request - State: closed - Opened by Obliviour 7 months ago

#121 - XPK cleanup: integ tests and code cleanup

Pull Request - State: open - Opened by Obliviour 7 months ago

#118 - Revert "Nina/unify gpu container yaml"

Pull Request - State: closed - Opened by NinaCai 7 months ago - 2 comments

#117 - Add timeout=0 to readme

Pull Request - State: closed - Opened by raymondzouu 7 months ago

#116 - Add wait-for-job-completion to integration test

Pull Request - State: closed - Opened by raymondzouu 7 months ago

#115 - Nina/unify gpu container yaml

Pull Request - State: closed - Opened by NinaCai 7 months ago

#114 - CPU shared clusters for Llama and Mistral model runs.

Pull Request - State: closed - Opened by RoshaniN 7 months ago

#112 - return pid exit code if it is non-zero

Pull Request - State: closed - Opened by NinaCai 7 months ago

#111 - Delete subnets when deleting the cluster

Pull Request - State: closed - Opened by NinaCai 7 months ago

#109 - Retry again with longer wait times for kueue credentials step

Pull Request - State: closed - Opened by Obliviour 7 months ago

#108 - Add --second-docker-image option

Pull Request - State: closed - Opened by tonyjohnchen 7 months ago - 2 comments

#107 - Add workload list wait-for-job-completion feature

Pull Request - State: closed - Opened by raymondzouu 7 months ago

#106 - Enable Autoprovisioning Support in XPK

Pull Request - State: closed - Opened by Obliviour 7 months ago

#105 - Add dynamic versioning for pip package

Pull Request - State: closed - Opened by SurbhiJainUSC 7 months ago

#104 - Support --env flag and Artifact Registry image validation

Pull Request - State: closed - Opened by jonb377 7 months ago

#103 - Add Pathways end-to-end tests to build tests and nightly tests.

Pull Request - State: closed - Opened by RoshaniN 7 months ago

#102 - XPK update for hybridsim

Pull Request - State: closed - Opened by tonyjohnchen 7 months ago - 2 comments

#101 - Create Vertex Experiment in workload create

Pull Request - State: closed - Opened by SurbhiJainUSC 7 months ago

#100 - Ensure proxy and server images are only provided with --use-pathways.

Pull Request - State: open - Opened by RoshaniN 7 months ago - 1 comment

#99 - Add flag to restart-on-user-failures, otherwise do not

Pull Request - State: closed - Opened by Obliviour 7 months ago

#98 - Add gpu_multi_process_run.sh

Pull Request - State: open - Opened by NinaCai 7 months ago

#97 - debug dump gcs using gsutil -m

Pull Request - State: closed - Opened by GallagherCommaJack 7 months ago

#96 - Fix number of nodes in CPUs.

Pull Request - State: closed - Opened by RoshaniN 7 months ago

#95 - More integ tests: workload create/list/delete and inspector

Pull Request - State: closed - Opened by Obliviour 8 months ago

#94 - Create Tensorboard instance in Vertex AI in cluster create

Pull Request - State: closed - Opened by SurbhiJainUSC 8 months ago

#93 - Move to reserved TPU capacity

Pull Request - State: closed - Opened by Obliviour 8 months ago

#92 - Update xpk.py

Pull Request - State: closed - Opened by sadikneipp 8 months ago - 3 comments

#90 - Move always() to be part of the delete step

Pull Request - State: closed - Opened by Obliviour 8 months ago

#87 - Nina xpk gpu h100

Pull Request - State: closed - Opened by NinaCai 8 months ago - 1 comment