tensorflow/mesh issues and pull requests

#396 - Error while importing Meshtensorflow

Issue - State: closed - Opened by billygrahamram 11 months ago

#395 - Migrate references and remove legacy target tpu:tpu_estimator.

Pull Request - State: closed - Opened by copybara-service[bot] about 1 year ago

#394 - Update attention.py

Pull Request - State: open - Opened by sjw8793 about 1 year ago - 1 comment

#393 - Optimizer momentums not properly populated training model with DTensors

Issue - State: closed - Opened by pentney about 1 year ago - 1 comment

#392 - AttributeError: module 'tensorflow.python.framework.ops' has no attribute 'register_tensor_conversion_function'

Issue - State: closed - Opened by Xnhyacinth about 1 year ago - 4 comments

#391 - Does load-balanced loss help the loss converge？

Issue - State: open - Opened by mathfinder over 1 year ago

#389 - Move `convert_to_tensor`, `convert_to_tensor_v1`, `convert_to_tensor_v1_with_dispatch`, `convert_to_tensor_v2_with_dispatch`, and `convert_to_tensor_v2` into `tensor_conversion_registry`.

Pull Request - State: open - Opened by copybara-service[bot] over 1 year ago

#388 - feat(ci): enable `pip` caching in CI

Pull Request - State: closed - Opened by SauravMaheshkar over 1 year ago - 1 comment

#387 - Remove legacy references from `ops.py`.

Pull Request - State: closed - Opened by copybara-service[bot] almost 2 years ago

#386 - Remove legacy references from `ops.py`.

Pull Request - State: closed - Opened by copybara-service[bot] almost 2 years ago

#385 - Fix docstring typos

Pull Request - State: closed - Opened by copybara-service[bot] about 2 years ago - 1 comment

#384 - Enable multi-file inference

Pull Request - State: closed - Opened by copybara-service[bot] about 2 years ago - 1 comment

#383 - When running BERT on GPU: Resource exhausted: failed to allocate memory

Issue - State: open - Opened by Currycurrycurry about 2 years ago - 1 comment

#382 - Internal change

Pull Request - State: closed - Opened by copybara-service[bot] over 2 years ago - 1 comment

#381 - Make mesh_tensorflow's call of `get_replicated_var_handle` backward-compatible with tf <= 2.8.0. Fixes https://github.com/google-research/text-to-text-transfer-transformer/issues/1020.

Pull Request - State: closed - Opened by copybara-service[bot] over 2 years ago

#380 - bump version number to release updated PyPI package that includes last year enhancements

Pull Request - State: closed - Opened by copybara-service[bot] over 2 years ago

#379 - Getting "NanLossDuringTrainingError: NaN loss during training."

Issue - State: open - Opened by dhruval-p over 2 years ago

#378 - mask_1_flat and mask_2_flat applied to gates twice?

Issue - State: open - Opened by marhlder over 2 years ago

#377 - Explicitly import estimator from tensorflow as a separate import instead of accessing it via tf.estimator and depend on the tensorflow estimator target.

Pull Request - State: closed - Opened by copybara-service[bot] over 2 years ago

#376 - Remove unused comments related to Python 2 compatibility.

Pull Request - State: closed - Opened by copybara-service[bot] over 2 years ago

#375 - Make TPU variable name deterministic.

Pull Request - State: closed - Opened by copybara-service[bot] over 2 years ago

#374 - Adding a new Gradient Estimator for Routing using REINFORCE with a leave-one-out baseline.

Pull Request - State: open - Opened by copybara-service[bot] over 2 years ago

#373 - #HyperPrompt Part 2 of HyperPrompt implementation: the actual computation of HyperPrompt inside self-attention layer.

Pull Request - State: closed - Opened by copybara-service[bot] almost 3 years ago

#372 - Use math.gcd instead of fractions.gcd, the former is deprecated in Python 3.5 and removed in 3.9.

Pull Request - State: closed - Opened by copybara-service[bot] almost 3 years ago

#371 - Split out optimizer call for internal purposes.

Pull Request - State: closed - Opened by copybara-service[bot] almost 3 years ago

#370 - fix typo in logging statement.

Pull Request - State: closed - Opened by copybara-service[bot] almost 3 years ago

#369 - About the mixture of expert model

Issue - State: open - Opened by fym0503 almost 3 years ago

#368 - Mesh-tf model conversion to onnx?

Issue - State: open - Opened by b-analyst about 3 years ago - 2 comments

#367 - Minor comment fix to refer to the correct argument name.

Pull Request - State: open - Opened by copybara-service[bot] about 3 years ago
Labels: cla: yes

#366 - Make sure gates are not normalized for n=1 for top_n routing

Pull Request - State: closed - Opened by copybara-service[bot] about 3 years ago - 3 comments
Labels: cla: no

#365 - Fix some example code in readme for einsum operation

Pull Request - State: open - Opened by baragona about 3 years ago - 2 comments
Labels: cla: yes

#364 - How to freeze embedding layers

Issue - State: open - Opened by lintangsutawika about 3 years ago

#363 - Add a link to the Primer paper

Pull Request - State: closed - Opened by copybara-service[bot] about 3 years ago - 4 comments
Labels: cla: no

#362 - Beam search

Issue - State: open - Opened by antonio-mastropaolo about 3 years ago

#361 - Output raw model outputs during eval

Pull Request - State: open - Opened by craffel about 3 years ago
Labels: cla: yes

#360 - Add utility to save score predictions to TFRecords for scoring large datasets.

Pull Request - State: closed - Opened by copybara-service[bot] about 3 years ago
Labels: cla: yes

#359 - Save scores lazily.

Pull Request - State: open - Opened by copybara-service[bot] about 3 years ago
Labels: cla: yes

#358 - Remove unnecessary name and cwise in squared relu.

Pull Request - State: closed - Opened by copybara-service[bot] about 3 years ago
Labels: cla: yes

#357 - Expert Attention Fixes:

Pull Request - State: closed - Opened by copybara-service[bot] about 3 years ago - 3 comments
Labels: cla: no

#356 - Squared ReLU from Primer paper.

Pull Request - State: closed - Opened by copybara-service[bot] about 3 years ago
Labels: cla: yes

#355 - Internal

Pull Request - State: closed - Opened by copybara-service[bot] about 3 years ago - 18 comments
Labels: cla: no

#354 - Remove dataset checkpoint policy override now that b/181765832 is resolved.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago
Labels: cla: yes

#353 - Add more extensive top-2 logging.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 3 comments
Labels: cla: no

#352 - Ability to add Custom Tensorflow Hooks

Issue - State: open - Opened by trisongz over 3 years ago

#351 - Only add z_loss to losses if during training.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 3 comments
Labels: cla: no

#350 - Expert Attention Fixes:

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 3 comments
Labels: cla: no

#349 - Fix bug in shared_kv attention for autoregressive decoding.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 2 comments
Labels: cla: no

#348 - Change second d_model_split dim's size to be the output shape, instead of input shape. This allows it to work for layers where the input size is different than the output size.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 3 comments
Labels: cla: no

#347 - heterogeneous mixture of experts layer

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 5 comments
Labels: cla: no

#346 - Add more options to Experts Attention. These options remove 1/3 of the all2all communication costs:

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 2 comments
Labels: cla: no

#345 - Update mesh tensorflow to use device assignments to map logical to physical processor numbers on N-D Meshes. Currently only enabled when logical cores per replica is set to 1.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 5 comments
Labels: cla: no

#344 - Add in Z-loss to all routing algorithms.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 3 comments
Labels: cla: no

#343 - Minor changes to make Experts Attention work.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 6 comments
Labels: cla: no

#342 - MODE models with hetereogeneous expert width

Pull Request - State: open - Opened by copybara-service[bot] over 3 years ago - 1 comment
Labels: cla: no

#341 - Add top-n routing, which generalized top-2 routing. Improves model quality for larger capacity factors (e.g. 2.0+).

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 3 comments
Labels: cla: no

#340 - Add z_loss on all attention logits. This does not change model quality and can effectively decrease the attention logits by order of magnitude.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 3 comments
Labels: cla: no

#339 - Using the soft loss dtype instead of hardcoding bfloat16.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 3 comments
Labels: cla: no

#338 - - Fix casting for NTLB.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 3 comments
Labels: cla: no

#337 - Switch logging to warm to not fail when using deterministic dataset checkpointing.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 3 comments
Labels: cla: no

#336 - Next gen fish optimizations for MeshTF.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 4 comments
Labels: cla: no

#335 - Add z-loss to the top_2_gating method.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 3 comments
Labels: cla: no

#334 - Add z-loss to the top_2_gating method.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago
Labels: cla: yes

#333 - Option to add a unique suffix to eval subdirectories. Allows to easily have many different eval jobs going with a single training job.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 1 comment
Labels: cla: no

#332 - Add option to stochastically use the non-top expert during training.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 3 comments
Labels: cla: no

#331 - Allow tokens embeddings to be used for routing decisions.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 3 comments
Labels: cla: no

#330 - [MOE-transformer] How do you build static graph of MOE-Model?

Issue - State: open - Opened by imyzx2017 over 3 years ago

#329 - Option to use mtf.Print to log which tokens are sent to which experts when run on CPU.

Pull Request - State: open - Opened by copybara-service[bot] over 3 years ago
Labels: cla: yes

#328 - How to use tf.contrib.opt.ScipyOptimizerInterface or tfp.optimizer.lbfgs_minimize with MeshTF ?

Issue - State: open - Opened by harshil-patel-code over 3 years ago

#327 - Make directory if it doesn't exist.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago
Labels: cla: yes

#326 - Splitting tokens when routing

Pull Request - State: open - Opened by copybara-service[bot] over 3 years ago - 2 comments
Labels: cla: no

#325 - Log expert_gating once it is been masked by the importance tensor to be sure no padded probabilities are being logged.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 10 comments
Labels: cla: no

#324 - How to assign values to specific slice of a data block on a specific GPU?

Issue - State: open - Opened by harshil-patel-code over 3 years ago

#323 - Unique variable names for ParallelLayer

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago
Labels: cla: yes

#322 - Modified the `eval_model` function in mesh_tensorflow/transformer/utils.py to accept Summary protos in addition to tag-to-scalar dicts.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 6 comments
Labels: cla: no

#321 - Add original AI2 version of c4 v3.0.1, ND3 deduplicated with param = 0.8, and LM1B, Wiki40B, and lm_first_len512 versions of original AI2 C4 and ND3 deduped AI2 C4 for evaluation.

Pull Request - State: open - Opened by copybara-service[bot] over 3 years ago
Labels: cla: yes

#320 - Add `cast` preprocessor and add tasks for inference prompts for deduplication project.

Pull Request - State: open - Opened by copybara-service[bot] over 3 years ago
Labels: cla: yes

#319 - Use %g instead of %f for printing in mesh_tensorflow/transformer/utils.py.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 5 comments
Labels: cla: no

#318 - performing the opposite of mtf.lowering

Issue - State: open - Opened by DavidPeleg6 over 3 years ago - 1 comment

#317 - Rolls back a change that broke several clients.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 4 comments
Labels: cla: no

#316 - Minor fix to make sure printing does not crash if a filter_fn is used.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 3 comments
Labels: cla: no

#315 - Internal only change : )

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago
Labels: cla: yes

#314 - Explicitly pass named-arg to mtf.dropout

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago
Labels: cla: yes

#313 - Fix ALBERT arXiv URL

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 3 comments
Labels: cla: no

#312 - [MTF] Minor usability change in get_inputs_from_file for accidentally empty files.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago
Labels: cla: yes

#311 - Add in z_loss for router softmax for switch layer.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 3 comments
Labels: cla: no

#310 - try to create gin related flags and pass if the flags are created.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago
Labels: cla: yes

#309 - Allow for not scaling certain parameters updates by its norm in Adafactor. Also add a parameter to allow for changing the Adafactor decay rate.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 3 comments
Labels: cla: no

#308 - no public changes

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago
Labels: cla: yes

#307 - Add flexible checkpoint loading option to allow for loading checkpoints

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 3 comments
Labels: cla: no

#303 - Add new routing method where each expert chooses when tokens it wants. A token can be chosen multiple times across different experts.

Pull Request - State: closed - Opened by copybara-service[bot] over 3 years ago - 3 comments
Labels: cla: no

GitHub / tensorflow/mesh issues and pull requests