luyug/GradCache issues and pull requests

#36 - Add language keywords for syntax highlighting

Pull Request - State: open - Opened by Danghor about 2 months ago

#35 - Fix typo

Pull Request - State: open - Opened by Danghor about 2 months ago

#33 - how to deal with same encoder

Issue - State: open - Opened by CSWellesSun 6 months ago

#32 - Support for Label-Dependent Loss Functions (e.g., Supervised Contrastive Loss)

Issue - State: open - Opened by penguinwang96825 6 months ago

#31 - Implement Grokfast into GradCache

Issue - State: open - Opened by ben-walczak 7 months ago - 2 comments

#30 - traning speed is very slow

Issue - State: open - Opened by liuweie 7 months ago - 6 comments

#29 - Role of dot product operation in forward-backward pass

Issue - State: open - Opened by ahmed-tabib 10 months ago

#28 - Questions about training

Issue - State: open - Opened by MikeDean2367 11 months ago

#27 - Add Support for GradCache in PyTorch Lightning for Multi-GPU and Mixed-Precision Training

Pull Request - State: closed - Opened by yang-su2000 11 months ago - 6 comments

#26 - [jax] single decorator grad cache

Pull Request - State: closed - Opened by luyug about 1 year ago

#25 - distributed loss for multiple GPUs

Issue - State: closed - Opened by x-zb about 1 year ago - 4 comments

#24 - Multiple outputs implementation

Issue - State: open - Opened by Soumya-dutta about 1 year ago - 1 comment

#23 - Gradient update is extremely slow

Issue - State: open - Opened by AshStuff about 1 year ago - 2 comments

#22 - How to use GradCache in non-single input function?

Issue - State: open - Opened by lxx909546478 over 1 year ago

#21 - `TypeError: call() takes 2 positional arguments but 3 were given` when using `@cached` and `@autocast`

Issue - State: closed - Opened by aaprasad over 1 year ago - 2 comments

#20 - Combining Gradient Caching with Gradient Accumulation/Checkpointing

Issue - State: open - Opened by aaprasad over 1 year ago

#19 - Surprising OOM error

Issue - State: open - Opened by kawshik8 over 1 year ago - 1 comment

#18 - Thanks to your work! I train CLIP with this project. I have some problems.

Issue - State: closed - Opened by zzk2021 almost 2 years ago - 1 comment

#17 - Documentation about autocast

Issue - State: open - Opened by jxmorris12 about 2 years ago

#16 - Tiny numerical differences, Weight updates not perfectly matching

Issue - State: open - Opened by Ar-Kareem about 2 years ago - 2 comments

#15 - How to handle BatchNorm ?

Issue - State: open - Opened by heleifz over 2 years ago - 1 comment

#14 - Can you please publish this to pypi please

Issue - State: open - Opened by shaileshj2803 over 2 years ago - 2 comments

#13 - the batchsize with the gradcache

Issue - State: open - Opened by here101 over 2 years ago - 8 comments

#12 - TypeError at grad_cache/functional.py:39

Issue - State: closed - Opened by syoungbaak almost 3 years ago - 4 comments

#11 - AttributeError: 'GCTrainer' object has no attribute 'scaler'

Issue - State: closed - Opened by ToluClassics almost 3 years ago - 5 comments

#10 - Great work! Helped creating sota embeddings

Issue - State: closed - Opened by Muennighoff almost 3 years ago - 1 comment

#9 - effective batch size with multiple GPUs

Issue - State: closed - Opened by shaileshj2803 almost 3 years ago - 2 comments

#8 - Example with pytorch lightning

Issue - State: open - Opened by shaileshj2803 almost 3 years ago - 3 comments

#7 - How does this provide the same gradient as a larger batch size?

Issue - State: open - Opened by sameerkhanna786 almost 3 years ago - 6 comments

#6 - Add argument Tensor all gather decorator for Pytorch functional

Pull Request - State: closed - Opened by luyug about 3 years ago

#5 - functional approach with distributed training

Issue - State: open - Opened by kevinlin311tw about 3 years ago - 3 comments

#4 - Requirements of the python env?

Issue - State: closed - Opened by MicPie about 3 years ago - 1 comment

#3 - Add Jax Support