Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / google/xpk issues and pull requests
#186 - Fix autoprovisioning with spot nodes
Pull Request -
State: open - Opened by avrittrohwer about 2 months ago
#185 - Fix GKE node version selection logic
Pull Request -
State: open - Opened by 44past4 about 2 months ago
#184 - better core dump for debugging
Pull Request -
State: open - Opened by ZhiyuLi-goog about 2 months ago
#183 - Fix debug logging (--enable-debug-logs)
Pull Request -
State: closed - Opened by Obliviour 2 months ago
#182 - Fixes a typo in the base command description
Pull Request -
State: closed - Opened by lukebaumann 2 months ago
#181 - Fix debug logging
Pull Request -
State: closed - Opened by Obliviour 2 months ago
- 1 comment
#180 - Add Zarr Flag for Pathways
Pull Request -
State: closed - Opened by SujeethJinesh 2 months ago
- 1 comment
#179 - v6e device support
Pull Request -
State: closed - Opened by Obliviour 2 months ago
- 1 comment
#178 - Added advanced usage example for a notebook interacting with a Cloud …
Pull Request -
State: closed - Opened by nhira 3 months ago
#177 - Enabling Workload Identity and GCSFuse driver flags added.
Pull Request -
State: closed - Opened by sharabiani 3 months ago
- 1 comment
#176 - Pbundyra refactor commands
Pull Request -
State: closed - Opened by PBundyra 3 months ago
#175 - Create `commands` package and `core/` modules for NAP, Kueue and Pathways
Pull Request -
State: closed - Opened by PBundyra 3 months ago
#174 - Add quotes even to example output to help devs who copy commands from…
Pull Request -
State: closed - Opened by nhira 3 months ago
#173 - Add quotes even in example output to help devs who copy commands from the example output comments
Pull Request -
State: closed - Opened by nhira 3 months ago
#172 - Update RxDM image version from v1.0.8 to v1.0.9.
Pull Request -
State: closed - Opened by yangyuwei 3 months ago
#171 - Update RxDM image version from v1.0.8 to v1.0.9.
Pull Request -
State: closed - Opened by yangyuwei 3 months ago
- 1 comment
#170 - Move SystemCharacteristics to a separate module
Pull Request -
State: closed - Opened by PBundyra 3 months ago
#169 - Allow debug_dump_gcs to be specified with other XLA_FLAGS
Pull Request -
State: open - Opened by jonb377 3 months ago
#168 - Create `parser` package. Move logic from `xpk.py` to `parser` package.
Pull Request -
State: closed - Opened by PBundyra 3 months ago
- 1 comment
#167 - Fix issue with device check failure
Pull Request -
State: closed - Opened by jonb377 3 months ago
- 1 comment
#166 - Create `xpk` package with `utils` module
Pull Request -
State: closed - Opened by PBundyra 3 months ago
- 3 comments
#165 - Create xpk package, utils module and refactor
Pull Request -
State: closed - Opened by PBundyra 3 months ago
#164 - Enabling Workload Identity and GCSFuse driver flags
Pull Request -
State: closed - Opened by sharabiani 3 months ago
#163 - Python3.10 fix - use CSV format for gcloud commands to simplify parsing
Pull Request -
State: closed - Opened by nhira 3 months ago
- 1 comment
#162 - Allow SIGTERM error code to be returned from XPK
Pull Request -
State: closed - Opened by Obliviour 3 months ago
#161 - Create cluster from several reservations
Issue -
State: open - Opened by DwarKapex 3 months ago
- 1 comment
#160 - Fix non-accelerator pools from being part of accelerator node pool cr…
Pull Request -
State: closed - Opened by Obliviour 4 months ago
#159 - Use csv formatting instead in the gcloud command to split the names o…
Pull Request -
State: closed - Opened by Obliviour 4 months ago
- 1 comment
#158 - xpk Cluster Queue resource group "cpu" resource quota incorrect for a CPU-only cluster
Issue -
State: open - Opened by bernardhan33 4 months ago
- 11 comments
#157 - Correct Suspend/Resume backoffLimit for Pathways
Pull Request -
State: closed - Opened by SujeethJinesh 4 months ago
- 3 comments
#156 - Remove flag `pathways_compilation_mode` from xpk.py
Pull Request -
State: closed - Opened by norx1991 4 months ago
#155 - Remove incorrect plural from filter-by-job
Pull Request -
State: closed - Opened by Obliviour 5 months ago
#154 - Update XPK to support topology-aware scheduler for GPU workloads.
Pull Request -
State: closed - Opened by yangyuwei 5 months ago
- 1 comment
#153 - Update the CloudDNS check.
Pull Request -
State: open - Opened by lukebaumann 5 months ago
- 3 comments
#152 - Add logic to fail Pathways jobs on user code errors.
Pull Request -
State: closed - Opened by RoshaniN 5 months ago
- 1 comment
#151 - Add a check and update existing Pathways clusters to use CloudDNS.
Pull Request -
State: closed - Opened by RoshaniN 5 months ago
#150 - Move all clusters to be RAPID clusters, and verify them using valid_v…
Pull Request -
State: closed - Opened by Obliviour 5 months ago
#149 - Fixing device_type in nightly tests.
Pull Request -
State: closed - Opened by RoshaniN 5 months ago
#148 - Restrict Pathways unified debugging logs to just first worker
Pull Request -
State: closed - Opened by SujeethJinesh 6 months ago
#147 - Fix stacktrace sidecar container yaml
Pull Request -
State: closed - Opened by SurbhiJainUSC 6 months ago
#146 - Don't Kill RM or Proxy on user job failure
Pull Request -
State: open - Opened by SujeethJinesh 6 months ago
#145 - Enable cluster and workload creation on A3+.
Pull Request -
State: closed - Opened by yangyuwei 6 months ago
- 1 comment
#144 - Add Unified Logging View for Pathways on Cloud
Pull Request -
State: closed - Opened by SujeethJinesh 6 months ago
- 1 comment
#143 - Update pathways server and proxy server image locations
Pull Request -
State: closed - Opened by sadikneipp 6 months ago
#142 - Fixing misleading message on Validate Docker Image.
Pull Request -
State: closed - Opened by RoshaniN 6 months ago
#141 - Pathways in headless mode.
Pull Request -
State: closed - Opened by RoshaniN 6 months ago
#140 - Allow JAX coordinator to find the JobSet name.
Pull Request -
State: closed - Opened by RoshaniN 6 months ago
#139 - Making exit flow similar to other XPK commands.
Pull Request -
State: closed - Opened by RoshaniN 7 months ago
- 1 comment
#138 - Update pip version to 0.5.0
Pull Request -
State: closed - Opened by SurbhiJainUSC 7 months ago
- 1 comment
#137 - Update xpk.py
Pull Request -
State: closed - Opened by kyle-google 7 months ago
- 2 comments
#136 - Remove user-managed service account and attach default compute engine service account to node pools
Pull Request -
State: closed - Opened by SurbhiJainUSC 7 months ago
#135 - Dynamically determine GKE Version for Cluster and Node Pool Creation
Pull Request -
State: closed - Opened by Obliviour 7 months ago
#134 - Prevent Pathways SIGTERMs from counting against backoffLimit
Pull Request -
State: open - Opened by SujeethJinesh 7 months ago
#133 - enable create workload for h150
Pull Request -
State: open - Opened by NinaCai 7 months ago
#132 - Fix incorrect indent in workload list output
Pull Request -
State: closed - Opened by Obliviour 7 months ago
#131 - Disable service account feature
Pull Request -
State: closed - Opened by SurbhiJainUSC 7 months ago
- 1 comment
#130 - Set gcloud zone property for build and nightly tests
Pull Request -
State: closed - Opened by SurbhiJainUSC 7 months ago
#129 - Add project flag to service account commands
Pull Request -
State: closed - Opened by SurbhiJainUSC 7 months ago
#128 - Add project flag to service account commands and add random gcloud properties to integ tests
Pull Request -
State: closed - Opened by SurbhiJainUSC 7 months ago
#127 - Add Support for Pathways Expected Instances & Larger Default Worker Backoff Limit
Pull Request -
State: closed - Opened by SujeethJinesh 7 months ago
#126 - Import cloud-accelerator-diagnostics only when Vertex AI Tensorboard flag is set
Pull Request -
State: closed - Opened by SurbhiJainUSC 7 months ago
#125 - Add configuration setting for default pool num nodes
Pull Request -
State: closed - Opened by Obliviour 7 months ago
#124 - Add custom env variables to CPU workloads.
Pull Request -
State: closed - Opened by RoshaniN 7 months ago
#123 - Enable formating with pyink to adhere with google3 style.
Pull Request -
State: closed - Opened by Obliviour 7 months ago
#122 - Update README with Vertex AI Tensorboard information and update pip version to 0.4.0
Pull Request -
State: closed - Opened by SurbhiJainUSC 7 months ago
- 1 comment
#121 - XPK cleanup: integ tests and code cleanup
Pull Request -
State: open - Opened by Obliviour 7 months ago
#120 - Check cluster arguments and update nodepools in existing cluster when requesting different device_type
Pull Request -
State: closed - Opened by SurbhiJainUSC 7 months ago
#119 - Fix None docker_name and wait-for-workload-completition poll mode
Pull Request -
State: closed - Opened by Obliviour 7 months ago
#118 - Revert "Nina/unify gpu container yaml"
Pull Request -
State: closed - Opened by NinaCai 7 months ago
- 2 comments
#117 - Add timeout=0 to readme
Pull Request -
State: closed - Opened by raymondzouu 7 months ago
#116 - Add wait-for-job-completion to integration test
Pull Request -
State: closed - Opened by raymondzouu 7 months ago
#115 - Nina/unify gpu container yaml
Pull Request -
State: closed - Opened by NinaCai 7 months ago
#114 - CPU shared clusters for Llama and Mistral model runs.
Pull Request -
State: closed - Opened by RoshaniN 7 months ago
#113 - Add warning when user schedules workload on a cluster created using previous XPK version
Pull Request -
State: closed - Opened by SurbhiJainUSC 7 months ago
#112 - return pid exit code if it is non-zero
Pull Request -
State: closed - Opened by NinaCai 7 months ago
#111 - Delete subnets when deleting the cluster
Pull Request -
State: closed - Opened by NinaCai 7 months ago
#110 - Change tensorboard_location to tensorboard_region for compatibility
Pull Request -
State: closed - Opened by SurbhiJainUSC 7 months ago
#109 - Retry again with longer wait times for kueue credentials step
Pull Request -
State: closed - Opened by Obliviour 7 months ago
#108 - Add --second-docker-image option
Pull Request -
State: closed - Opened by tonyjohnchen 7 months ago
- 2 comments
#107 - Add workload list wait-for-job-completion feature
Pull Request -
State: closed - Opened by raymondzouu 7 months ago
#106 - Enable Autoprovisioning Support in XPK
Pull Request -
State: closed - Opened by Obliviour 7 months ago
#105 - Add dynamic versioning for pip package
Pull Request -
State: closed - Opened by SurbhiJainUSC 7 months ago
#104 - Support --env flag and Artifact Registry image validation
Pull Request -
State: closed - Opened by jonb377 7 months ago
#103 - Add Pathways end-to-end tests to build tests and nightly tests.
Pull Request -
State: closed - Opened by RoshaniN 7 months ago
#102 - XPK update for hybridsim
Pull Request -
State: closed - Opened by tonyjohnchen 7 months ago
- 2 comments
#101 - Create Vertex Experiment in workload create
Pull Request -
State: closed - Opened by SurbhiJainUSC 7 months ago
#100 - Ensure proxy and server images are only provided with --use-pathways.
Pull Request -
State: open - Opened by RoshaniN 7 months ago
- 1 comment
#99 - Add flag to restart-on-user-failures, otherwise do not
Pull Request -
State: closed - Opened by Obliviour 7 months ago
#98 - Add gpu_multi_process_run.sh
Pull Request -
State: open - Opened by NinaCai 7 months ago
#97 - debug dump gcs using gsutil -m
Pull Request -
State: closed - Opened by GallagherCommaJack 7 months ago
#96 - Fix number of nodes in CPUs.
Pull Request -
State: closed - Opened by RoshaniN 7 months ago
#95 - More integ tests: workload create/list/delete and inspector
Pull Request -
State: closed - Opened by Obliviour 8 months ago
#94 - Create Tensorboard instance in Vertex AI in cluster create
Pull Request -
State: closed - Opened by SurbhiJainUSC 8 months ago
#93 - Move to reserved TPU capacity
Pull Request -
State: closed - Opened by Obliviour 8 months ago
#92 - Update xpk.py
Pull Request -
State: closed - Opened by sadikneipp 8 months ago
- 3 comments
#91 - Create a queue of nightly / build tests to avoid concurrent tests to step on each other
Pull Request -
State: closed - Opened by Obliviour 8 months ago
#90 - Move always() to be part of the delete step
Pull Request -
State: closed - Opened by Obliviour 8 months ago
#89 - Fixed bugs and added customization to Github workflows for tests
Pull Request -
State: closed - Opened by sushmarchandran 8 months ago
#88 - Create ConfigMap for cluster metadata and add ConfigMap details to xpk inspector
Pull Request -
State: closed - Opened by SurbhiJainUSC 8 months ago
#87 - Nina xpk gpu h100
Pull Request -
State: closed - Opened by NinaCai 8 months ago
- 1 comment