facebookresearch/higher issues and pull requests

#140 - Support of AdamW

Issue - State: open - Opened by ridiculouz almost 2 years ago

#139 - Does higher work with huggingface (hugging face, HF) models? e.g. ViT?

Issue - State: open - Opened by brando90 almost 2 years ago - 2 comments

#138 - CUDA out of memory

Issue - State: open - Opened by aooating almost 2 years ago

#137 - once_differentiable

Issue - State: open - Opened by aooating almost 2 years ago - 1 comment

#136 - grad clip correctness

Issue - State: open - Opened by Whalefishin about 2 years ago

#135 - Using higher for hyperparameter optimization

Issue - State: open - Opened by aruniyer about 2 years ago - 1 comment

#134 - Non scalar loss

Issue - State: open - Opened by janglinko-dac about 2 years ago - 1 comment

#133 - Intialize Differentiable Optimizer with non-leaf tensort

Issue - State: open - Opened by andrearosasco over 2 years ago - 1 comment

#132 - higher for dpt architectures?

Issue - State: open - Opened by Ainaz99 over 2 years ago - 2 comments

#131 - Support learning rate optimization in the outer loop

Pull Request - State: closed - Opened by MichaelKonobeev almost 3 years ago - 3 comments
Labels: CLA Signed

#130 - How to Update/Optimize subset of params

Issue - State: open - Opened by blake-camp almost 3 years ago - 2 comments

#129 - Potential bug with first order MAML using only higher, setting track_higher_grads = False leads to .grad field to not be populated and be None, is that a bug?

Issue - State: open - Opened by brando90 about 3 years ago - 4 comments

#128 - Would making the gradient "data" by detaching them implement first order MAML?

Issue - State: open - Opened by brando90 about 3 years ago

#127 - Mixed precision training

Issue - State: open - Opened by zhiqihuang about 3 years ago

#126 - Update README.md

Pull Request - State: open - Opened by ruizhaoz about 3 years ago - 1 comment

#125 - In DifferentiableAdam, sqrt() is non-differentiable at zero

Issue - State: open - Opened by rickyloynd-microsoft about 3 years ago - 6 comments

#124 - Does higher work with hugging face Adafactor?

Issue - State: open - Opened by brando90 about 3 years ago - 2 comments

#123 - second-order deravative

Issue - State: open - Opened by xugy16 over 3 years ago

#122 - Fixing the data leakage from the maml omniglot example

Pull Request - State: open - Opened by brando90 over 3 years ago - 2 comments

#121 - Setting up a templates got gitissues that invite users to post to pytorch forum

Issue - State: open - Opened by brando90 over 3 years ago

#120 - What does the warning flag in testing for maml mean? what does it mean to "fine tune for testing"?

Issue - State: open - Opened by brando90 over 3 years ago

#119 - How does one return an adapted model without using the context manager?

Issue - State: open - Opened by brando90 over 3 years ago - 8 comments

#118 - Does higher work with DDP only test time evaluation (no training just testing)?

Issue - State: open - Opened by brando90 over 3 years ago

#117 - Fix link to the logo

Pull Request - State: closed - Opened by bamos over 3 years ago
Labels: CLA Signed

#116 - When will higher allow the use of DDP (distributed data parallel)?

Issue - State: open - Opened by brando90 over 3 years ago - 4 comments

#115 - implementing MAML with MiniImageNet

Issue - State: open - Opened by ligeng0197 over 3 years ago

#114 - higher for Reinforcement Learning?

Issue - State: closed - Opened by rickyloynd-microsoft over 3 years ago - 2 comments

#113 - metaclass issue with fastai

Issue - State: open - Opened by shayanfazeli over 3 years ago - 4 comments

#112 - Remove confusing comment in the omniglot example

Pull Request - State: open - Opened by bamos over 3 years ago
Labels: CLA Signed

#111 - maml omniglot - finetuning / test

Issue - State: open - Opened by shayanfazeli over 3 years ago - 5 comments

#110 - variables work outside of context manager scope

Issue - State: open - Opened by hughperkins over 3 years ago - 2 comments

#109 - Link to examples

Pull Request - State: closed - Opened by hughperkins over 3 years ago
Labels: CLA Signed

#108 - More complete example please?

Issue - State: closed - Opened by hughperkins over 3 years ago - 13 comments

#107 - Is there data leakage in the maml-omniglot example?

Issue - State: open - Opened by SunHaozhe almost 4 years ago - 6 comments

#106 - Tracking higher-order grads when arbitrarily combining submodules of functional module

Issue - State: open - Opened by dylandoblar almost 4 years ago

#105 - Memory not freed when moving out of scope?

Issue - State: open - Opened by jessicamecht almost 4 years ago - 1 comment

#104 - when do we divide by met_batch_size?

Issue - State: open - Opened by brando90 almost 4 years ago - 1 comment

#103 - The fmodel can't transport the running_mean and running_var of BN layer to the original model

Issue - State: open - Opened by ensiwalk almost 4 years ago

#102 - Why DifferentiableOptimizer detaches parameters when track_higher_grads = False?

Issue - State: open - Opened by Renovamen almost 4 years ago - 7 comments

#101 - Added code for getting the state dict of an optimizer, as well as tests

Pull Request - State: open - Opened by murrman95 almost 4 years ago - 2 comments
Labels: CLA Signed

#100 - Added AdamW to supported Differentiable Optimizers

Pull Request - State: closed - Opened by RashedDoha almost 4 years ago - 1 comment
Labels: CLA Signed

#99 - Is the higher library compatible with pytorch's distributed RPC?

Issue - State: open - Opened by brando90 almost 4 years ago - 6 comments
Labels: wontfix

#98 - Is higher compatibale with distributed data parallel DDP ?

Issue - State: closed - Opened by brando90 about 4 years ago - 8 comments
Labels: wontfix

#97 - How to evaluate model without gpu memory issues?

Issue - State: open - Opened by njwfish about 4 years ago - 1 comment

#96 - How does one execute an individual higher nn patched module?

Issue - State: open - Opened by brando90 about 4 years ago - 9 comments

#95 - Discrepancy between code & manual calculation of parameter gradient w.r.t older version of parameter

Issue - State: closed - Opened by hazrmard about 4 years ago - 2 comments

#94 - How to use multiple optimizer in the inner loop?

Issue - State: open - Opened by qinwei-hfut about 4 years ago - 1 comment
Labels: question

#93 - Copy diffopt state to original optimizer

Issue - State: open - Opened by brjathu about 4 years ago - 2 comments
Labels: question

#92 - is the accumulation of gradient done right, where do we divide the accumulator by number of tasks?

Issue - State: closed - Opened by brando90 about 4 years ago - 4 comments

#91 - initial version

Pull Request - State: open - Opened by xuanyuzhou98 over 4 years ago - 3 comments
Labels: CLA Signed

#90 - How to double check that 2nd order grads are being used

Issue - State: open - Opened by brando90 over 4 years ago - 1 comment
Labels: question

#89 - AttributeError: 'NoneType' object has no attribute '_parameters'

Issue - State: closed - Opened by zhaozj89 over 4 years ago - 2 comments

#88 - torch.optim.AdamW not in the list of supported optimizers

Issue - State: open - Opened by RashedDoha over 4 years ago - 2 comments
Labels: enhancement, good first issue

#87 - dependency installation issues with requirements.txt

Issue - State: open - Opened by RashedDoha over 4 years ago - 1 comment
Labels: help wanted

#86 - add links to MAML++ experiments

Pull Request - State: closed - Opened by bamos over 4 years ago
Labels: CLA Signed

#85 - Fix unit test helper function for pytorch 1.7 compatibility.

Pull Request - State: closed - Opened by egrefen over 4 years ago
Labels: CLA Signed

#84 - MAML++ implementation?

Issue - State: closed - Opened by brando90 over 4 years ago - 3 comments
Labels: question

#83 - DifferentiableOptimizer not setting self.param_groups to be the same as reference optimizer

Issue - State: open - Opened by Horse7354 over 4 years ago - 3 comments
Labels: bug

#82 - Documentation for the MonkeyPatched module class

Issue - State: open - Opened by egrefen over 4 years ago
Labels: documentation

#81 - How to train a model inside an innerloop context without higher order gradients?

Issue - State: closed - Opened by ferreirafabio over 4 years ago - 4 comments
Labels: bug

#80 - Understanding the higher and non-higher code snippets

Issue - State: open - Opened by kgarg8 over 4 years ago

#79 - Use better coding style in optim.py

Pull Request - State: closed - Opened by MarisaKirisame over 4 years ago - 4 comments
Labels: CLA Signed

#78 - installing with conda?

Issue - State: open - Opened by brando90 over 4 years ago - 4 comments
Labels: enhancement, help wanted, good first issue

#77 - How to get a completely parameter-less functional model?

Issue - State: open - Opened by lucaslie over 4 years ago - 1 comment

#76 - Can higher optimize a non-leaf node in an inner loop?

Issue - State: open - Opened by jwilles over 4 years ago - 1 comment
Labels: question

#75 - Memory Leak when using backward hooks with fmodel

Issue - State: open - Opened by vsieplus over 4 years ago - 4 comments
Labels: bug

#74 - Understanding inner optimizer parameters through MAML example

Issue - State: closed - Opened by kgarg8 over 4 years ago - 2 comments
Labels: question

#73 - Learning rate scheduling for the diffopt

Issue - State: open - Opened by pratikgujjar over 4 years ago - 2 comments
Labels: question

#72 - Fix discrepancy between DifferentiableAdam and torch.optim.Adam

Pull Request - State: closed - Opened by neitzal over 4 years ago - 3 comments
Labels: CLA Signed

#71 - Discrepancy between Adam and DifferentiableAdam

Issue - State: closed - Opened by neitzal over 4 years ago - 2 comments
Labels: bug

#70 - Computational graph not retained for BERT

Issue - State: closed - Opened by Nithin-Holla over 4 years ago - 8 comments

#69 - Grad of model parameters is None

Issue - State: closed - Opened by wubowen416 over 4 years ago - 2 comments
Labels: question

#68 - A potential use case for higher?

Issue - State: closed - Opened by 9yte over 4 years ago - 2 comments
Labels: question

#67 - Surpress RNN GPU flattening warnings.

Pull Request - State: closed - Opened by egrefen over 4 years ago - 1 comment
Labels: CLA Signed

#66 - Why are you not using torchmeta?

Issue - State: closed - Opened by renesax14 over 4 years ago - 1 comment
Labels: question

#65 - Pytorch asks to flatten the paramters when I'm using higher for meta-learning a simple seq2seq LSTM model.

Issue - State: closed - Opened by n-askarian over 4 years ago - 4 comments

#64 - Meta-Gradient through a KL_Div loss is zero

Issue - State: closed - Opened by pratikgujjar over 4 years ago - 2 comments

#63 - First Order MAML?

Issue - State: closed - Opened by MurtyShikhar over 4 years ago - 8 comments

#62 - How does one implemented a parametrized meta-learner (like meta-lstm optimizer) in higher?

Issue - State: open - Opened by renesax14 over 4 years ago - 9 comments

#61 - Rewrite docs to not use idiosyncratic historical terms like "fast weights"

Issue - State: open - Opened by egrefen over 4 years ago - 2 comments
Labels: documentation

#60 - Why not accumulate loss and then take derivative in MAML?

Issue - State: closed - Opened by renesax14 over 4 years ago - 8 comments
Labels: question

#59 - Relationship between the weights of a model and the weights of its functional version

Issue - State: closed - Opened by pratikgujjar over 4 years ago - 6 comments
Labels: question

#58 - Why does higher need to deep copy the parameters of the base model and the use of override?

Issue - State: open - Opened by renesax14 over 4 years ago - 10 comments
Labels: question

#57 - Retain graph for diffopt.step

Pull Request - State: closed - Opened by Nithin-Holla over 4 years ago - 3 comments
Labels: CLA Signed

#56 - Meaning of stop-gradient

Issue - State: closed - Opened by JonMuehlst over 4 years ago - 4 comments
Labels: question

#55 - Unexpected (?) behaviour during eval mode

Issue - State: closed - Opened by cemanil over 4 years ago - 4 comments
Labels: question

#54 - Rename copy_initial_weights to something more intuitive, and replace copy with detach where appropriate.

Issue - State: open - Opened by egrefen almost 5 years ago - 2 comments
Labels: enhancement

#53 - Add option to differentiable optimizers to treat buffers as constant

Issue - State: open - Opened by creiser almost 5 years ago - 2 comments
Labels: enhancement

#50 - Clipping or normalizing gradients

Issue - State: closed - Opened by FerranAlet almost 5 years ago - 4 comments
Labels: question

#46 - Questions about meaning of fast weights.

Issue - State: closed - Opened by briankosw almost 5 years ago - 2 comments
Labels: question

#42 - Questions about using fmodel' weights as model's weight

Issue - State: closed - Opened by xieshuqin almost 5 years ago - 2 comments
Labels: question

#37 - Memory leak in loop with higher.innerloop_ctx!

Issue - State: closed - Opened by nooralahzadeh almost 5 years ago - 5 comments
Labels: invalid

#32 - example of trainable optimizer?

Issue - State: closed - Opened by renesax14 about 5 years ago - 43 comments
Labels: help wanted, good first issue

#26 - Question about visualising differential optimizer in a computational graph

Issue - State: open - Opened by jurasq about 5 years ago - 4 comments

#24 - Is DataParallel supported?

Issue - State: closed - Opened by csyanbin about 5 years ago - 3 comments
Labels: wontfix

#22 - Question about gradient checkpointing

Issue - State: closed - Opened by JonMuehlst about 5 years ago - 4 comments

#20 - Question about step execution time

Issue - State: open - Opened by AntoineHX about 5 years ago - 7 comments
Labels: bug, do-not-reap

#14 - Inner loop incompatible with weight_norm

Issue - State: open - Opened by AllanYangZhou about 5 years ago - 11 comments

#10 - Feature request: utility functions to allow stopping meta-gradient propagation

Issue - State: open - Opened by llucid-97 over 5 years ago - 11 comments

#7 - (Meta-)gradient computation via multiple calls to backward()

Issue - State: closed - Opened by kylehkhsu over 5 years ago - 2 comments

GitHub / facebookresearch/higher issues and pull requests