Ecosyste.ms: Issues
An open API service for providing issue and pull request metadata for open source projects.
GitHub / apple/axlearn issues and pull requests
#720 - Configurable concurrent restore gb
Pull Request -
State: open - Opened by hanzhi713 about 9 hours ago
#719 - Fix stale docstrings.
Pull Request -
State: closed - Opened by changlan 2 days ago
#718 - Improve AOT Compilation Accuracy.
Pull Request -
State: open - Opened by apghml 3 days ago
#717 - Upgrade to Python 3.10
Pull Request -
State: open - Opened by nicolov 3 days ago
- 1 comment
#716 - Add --megascale_abort_on_hangs flag for multi-slice TPU jobs
Pull Request -
State: open - Opened by mugithi 4 days ago
- 2 comments
#715 - Support enabling TPU smart repair
Pull Request -
State: closed - Opened by Ethanlm 5 days ago
#714 - Expose segment_ids to _compute_attention in FlashAttention
Pull Request -
State: closed - Opened by changlan 5 days ago
- 5 comments
#713 - fix tools call issue
Pull Request -
State: closed - Opened by gyin94 5 days ago
#712 - Set TF_FORCE_GPU_ALLOW_GROWTH=true by default
Pull Request -
State: open - Opened by samos123 6 days ago
- 1 comment
#711 - Fix submission of Dataflow jobs
Pull Request -
State: open - Opened by damccorm 6 days ago
- 1 comment
#710 - Don't inherit from protocol.
Pull Request -
State: closed - Opened by markblee 7 days ago
#709 - add bert 768
Pull Request -
State: closed - Opened by YXSIO 8 days ago
#708 - bump up typing-extensions==4.11.0
Pull Request -
State: closed - Opened by gyin94 10 days ago
Labels: wip
#707 - [Documentation] New Footnote for Abseil Flags
Pull Request -
State: closed - Opened by jlukecarlson 11 days ago
#706 - Publish Job Event from Bastion and GKE Runner
Pull Request -
State: closed - Opened by HaijingFu 11 days ago
- 1 comment
#705 - Add CuDNN fused MHA kernel to axlearn
Pull Request -
State: closed - Opened by kelvin-zou 12 days ago
- 6 comments
#704 - Fix race condition while doing async tf save.
Pull Request -
State: closed - Opened by hanzhi713 13 days ago
#703 - No-op when garbage collecting non-existent checkpoint dir.
Pull Request -
State: closed - Opened by markblee 13 days ago
#702 - Adds a `multihost_utils.sync_global_devices` after saving TF checkpoints to avoid race conditions.
Pull Request -
State: closed - Opened by ruomingp 13 days ago
- 1 comment
#701 - Fix a few issues for AOT compilation
Pull Request -
State: closed - Opened by changlan 15 days ago
- 2 comments
#700 - Add segment_ids to the forward pass of a Causal LM
Pull Request -
State: closed - Opened by changlan 18 days ago
#699 - Bow metrics
Pull Request -
State: closed - Opened by headmyshoulder 19 days ago
- 1 comment
#698 - Fix module import
Pull Request -
State: closed - Opened by vishesh9131 19 days ago
- 8 comments
#697 - Support skipping warmup in cosine schedule.
Pull Request -
State: closed - Opened by xianzhidu 19 days ago
#696 - Support customized mesh rules to support different HWs
Pull Request -
State: closed - Opened by kelvin-zou 20 days ago
#695 - segment_ids for text_to_lm_training_input outputs
Pull Request -
State: closed - Opened by changlan 20 days ago
#694 - Adds an fs abstraction and optimizes checkpoint glob.
Pull Request -
State: closed - Opened by markblee 20 days ago
#693 - Inherit dtype from parent to avoid defaulting to float32
Pull Request -
State: closed - Opened by a-metz 20 days ago
#692 - Adds __post_init__ to Module.
Pull Request -
State: closed - Opened by markblee 21 days ago
#691 - Add system characteristics for Trillium
Pull Request -
State: closed - Opened by ehorning 21 days ago
- 1 comment
#690 - Adding support for Pathways proxy
Pull Request -
State: open - Opened by jesus-orozco 22 days ago
#689 - ssm_enhancement
Pull Request -
State: open - Opened by vishesh9131 25 days ago
- 3 comments
#688 - Speed up Axlearn CI
Pull Request -
State: open - Opened by soundway 25 days ago
- 3 comments
#687 - Golden Logit tests to ensure Fuji v2 70B matches Llama 2 70B
Issue -
State: open - Opened by samos123 25 days ago
#686 - Supports config getattr override, minor fix to module path.
Pull Request -
State: closed - Opened by markblee 25 days ago
#685 - Small pytype fix for Python 3.10
Pull Request -
State: closed - Opened by nicolov 26 days ago
#684 - Add bastion-tier label to container metrics
Pull Request -
State: closed - Opened by Ethanlm 26 days ago
#683 - RuntimeError occurs when test sample >= 10 in math
Issue -
State: open - Opened by David-Li0406 27 days ago
- 2 comments
#682 - Add user-id label in k8s container metrics
Pull Request -
State: closed - Opened by Ethanlm about 1 month ago
#681 - Adds bundler.wait_until_finished and fixes local gke_runner.
Pull Request -
State: closed - Opened by markblee about 1 month ago
#680 - Cloudbuild List Filter Fix
Pull Request -
State: closed - Opened by amcw7777 about 1 month ago
#679 - Generalized top-k gating for MoE.
Pull Request -
State: closed - Opened by xianzhidu about 1 month ago
#678 - Average the aux-loss instead of sum
Pull Request -
State: closed - Opened by dunan about 1 month ago
#677 - Add job priority to k8s pod and node label
Pull Request -
State: closed - Opened by Ethanlm about 1 month ago
#676 - Fix weighted scalar division by zero.
Pull Request -
State: closed - Opened by markblee about 1 month ago
#675 - Copy duplicate leafs
Pull Request -
State: closed - Opened by weiliu89 about 1 month ago
#674 - Add tag to CloudBuild
Pull Request -
State: closed - Opened by amcw7777 about 1 month ago
#673 - Pin tensorboard version 1.61.0 to fix Tensorboard uploader TypeError: can only concatenate str (not "NoneType") to str
Pull Request -
State: closed - Opened by HaijingFu about 1 month ago
#672 - Style changes with py39+ as target.
Pull Request -
State: closed - Opened by miaojingang about 1 month ago
#671 - Data-sharded and zero-copy async checkpoints
Pull Request -
State: closed - Opened by hanzhi713 about 1 month ago
- 1 comment
#670 - Asynchronous save_tf_savables
Pull Request -
State: closed - Opened by hanzhi713 about 1 month ago
#669 - rollback
Pull Request -
State: closed - Opened by zhiyun about 1 month ago
#668 - Support Audio Summary in Axlearn
Pull Request -
State: closed - Opened by kmxyvb about 1 month ago
#667 - simplify _segment_ids_from_causal_input_ids
Pull Request -
State: closed - Opened by jasonmusespresso about 1 month ago
#666 - Simplify flatten_items.
Pull Request -
State: closed - Opened by markblee about 1 month ago
#665 - Adds support for private worker pools.
Pull Request -
State: closed - Opened by markblee about 1 month ago
#664 - Style changes with py39+ as target.
Pull Request -
State: closed - Opened by miaojingang about 1 month ago
#663 - Update axlearn version and adds changelog.
Pull Request -
State: closed - Opened by markblee about 1 month ago
#662 - Correct for conditioning token when generating initial token scores.
Pull Request -
State: closed - Opened by a-metz about 1 month ago
#661 - rename positions to input_positions
Pull Request -
State: closed - Opened by zhiyun about 1 month ago
#660 - bump up transformers to 4.44.1
Pull Request -
State: closed - Opened by gyin94 about 1 month ago
#659 - Asynchronous save_tf_savables
Pull Request -
State: closed - Opened by hanzhi713 about 1 month ago
- 1 comment
#658 - Add output_norm option in TimeStepEmbedding and make shift/scale/gate optional for DiTAttentionLayer
Pull Request -
State: closed - Opened by weiliu89 about 1 month ago
#657 - Pass serialized jobspec to bastion runner processes.
Pull Request -
State: closed - Opened by apghml about 1 month ago
#656 - Add litepod config for 70B model
Pull Request -
State: closed - Opened by kelvin-zou about 1 month ago
- 3 comments
#655 - support segment_ids in causal_lm
Pull Request -
State: closed - Opened by zhiyun about 2 months ago
#654 - A few updates to MoE and test_utils
Pull Request -
State: closed - Opened by xianzhidu about 2 months ago
#653 - Jax upgrade 4 30
Pull Request -
State: closed - Opened by kelvin-zou about 2 months ago
- 1 comment
#652 - Fix precision_recall_curve to correctly handle masked examples
Pull Request -
State: closed - Opened by dlin28 about 2 months ago
#651 - An integration of orbax checkpointer.
Pull Request -
State: closed - Opened by markblee about 2 months ago
- 1 comment
#650 - Bump tensorflow to 2.16.1 and tensorstore to >= 0.1.63.
Pull Request -
State: closed - Opened by markblee about 2 months ago
#649 - Makes BoundedAsyncCheckpointManager control `max_concurrent_gb` by the size of local shards instead of global shards.
Pull Request -
State: open - Opened by ruomingp about 2 months ago
#648 - Reduces shuffle buffer size for pajama_trainer.py.
Pull Request -
State: closed - Opened by ruomingp about 2 months ago
#647 - Add `@no_side_effects`.
Pull Request -
State: open - Opened by apghml about 2 months ago
#646 - Use paddings parameter in conv_norm
Pull Request -
State: closed - Opened by stefbraun about 2 months ago
#645 - Conformer: use paddings arg in conv_norm layer
Pull Request -
State: closed - Opened by stefbraun about 2 months ago
#644 - Conformer: use paddings arg in conv_norm layer
Pull Request -
State: closed - Opened by stefbraun about 2 months ago
#643 - Enable remat checkpoints to host instead of TPU memory
Pull Request -
State: closed - Opened by samos123 about 2 months ago
- 1 comment
#642 - Create initialize_evaluator func in open_api
Pull Request -
State: closed - Opened by gyin94 about 2 months ago
#641 - set hostNetwork to True for TPUGKEJob
Pull Request -
State: open - Opened by samos123 about 2 months ago
#640 - Factor the causal masking to support cleaner subclass override.
Pull Request -
State: closed - Opened by qdavid1 about 2 months ago
#639 - Preventing datasets instantiation from pretrained configs
Pull Request -
State: open - Opened by fauconnier about 2 months ago
#638 - Set remat_spec.policy None for fuji v2 70B
Pull Request -
State: closed - Opened by samos123 about 2 months ago
- 4 comments
#637 - Adds support for adaptive load balance loss to Top2Gating.
Pull Request -
State: closed - Opened by ruomingp about 2 months ago
#636 - Add job tier column to jobs table
Pull Request -
State: closed - Opened by TarangKhanna about 2 months ago
#635 - Updates checkpointer interface…
Pull Request -
State: closed - Opened by markblee about 2 months ago
#634 - Add transformer block output to module output
Pull Request -
State: closed - Opened by alex8937 about 2 months ago
#633 - Adds honeycrisp trainer configs with deterministic inputs
Pull Request -
State: closed - Opened by ruomingp about 2 months ago
#632 - Makes the TPU log command more prominent.
Pull Request -
State: closed - Opened by ruomingp about 2 months ago
#631 - Fix flaky test.
Pull Request -
State: closed - Opened by apghml about 2 months ago
#630 - Add attention_logit_biases in DiTAttentionLayer
Pull Request -
State: closed - Opened by weiliu89 about 2 months ago
#629 - Add core deps to different build targets.
Pull Request -
State: closed - Opened by markblee about 2 months ago
- 1 comment
#628 - Making core optional causing no module names absl error
Issue -
State: closed - Opened by samos123 about 2 months ago
- 1 comment
#627 - Add Orbax Checkpointing
Pull Request -
State: closed - Opened by jiya-zhang about 2 months ago
- 1 comment
#626 - Add structure option in DiT attn and feedforward layers
Pull Request -
State: closed - Opened by weiliu89 about 2 months ago
#625 - Adds honeycrisp models
Pull Request -
State: closed - Opened by ruomingp about 2 months ago
#624 - Make `tree_paths()` consistent with jax.
Pull Request -
State: closed - Opened by apghml about 2 months ago
- 1 comment
#623 - Enable support for Kueue for GKETPUJob
Pull Request -
State: closed - Opened by samos123 about 2 months ago
- 1 comment
#622 - allow using on-demand instead of spot only
Pull Request -
State: open - Opened by samos123 about 2 months ago
- 2 comments
#621 - Unable to use TPU on GKE using on-demand quota
Issue -
State: open - Opened by samos123 about 2 months ago