facebookresearch/fairscale issues and pull requests

#1190 - support for grad acc

Pull Request - State: open - Opened by ngoyal2707 30 days ago
Labels: CLA Signed

#1189 - Hi, Groups division may be incorrect in initialize() in fairscale/nn/model_parallel/initialize.py

Issue - State: open - Opened by Youngluc about 1 month ago

#1188 - Raising `assert param.grad is not None` when finetuning LoRA.

Issue - State: open - Opened by HashimotoPatrickMu 3 months ago - 1 comment

#1187 - Bump scikit-learn from 1.1.3 to 1.5.0

Pull Request - State: open - Opened by dependabot[bot] 3 months ago
Labels: CLA Signed, dependencies

#1186 - [FSDPv1] Optimize memory usage for optimize_backward_concat=True

Pull Request - State: closed - Opened by chrisxcai 4 months ago
Labels: CLA Signed

#1185 - FP8 AllGather Support in Fairscale

Pull Request - State: open - Opened by levendlee 4 months ago
Labels: CLA Signed

#1184 - [FSDPv1] Only perform cat() during last microbatch backward() within FlattenParamsWrapper

Pull Request - State: closed - Opened by chrisxcai 5 months ago
Labels: CLA Signed

#1183 - Llama4 FP8 Training Debug - fairscale

Pull Request - State: open - Opened by jiecaoyu 5 months ago
Labels: CLA Signed

#1182 - Add timeout in initialize_model_parallel

Pull Request - State: closed - Opened by vladmihailescu 5 months ago
Labels: CLA Signed

#1181 - Fix minor grammatical corrections in docs

Pull Request - State: open - Opened by aakashapoorv 5 months ago
Labels: CLA Signed

#1180 - [FSDPv1] Only perform cat() during last microbatch backward() within FlattenParamsWrapper

Pull Request - State: open - Opened by chrisxcai 5 months ago
Labels: CLA Signed

#1179 - Updated the README file

Pull Request - State: closed - Opened by KPCOFGS 5 months ago
Labels: CLA Signed

#1178 - [WIP] Make FSDPv1 only perform cat() during last microbatch backward() within FlattenParamsWrapper

Pull Request - State: open - Opened by chrisxcai 5 months ago
Labels: CLA Signed

#1177 - sync fbcode cp pg initialize

Pull Request - State: closed - Opened by amylittleyang 6 months ago
Labels: CLA Signed

#1176 - add get_cp_ranks to model_parallel initialize

Pull Request - State: closed - Opened by amylittleyang 6 months ago
Labels: CLA Signed

#1175 - Add cast input argument

Pull Request - State: closed - Opened by whbldhwj 6 months ago
Labels: CLA Signed

#1174 - add context parallel group init to mp init

Pull Request - State: closed - Opened by amylittleyang 6 months ago
Labels: CLA Signed

#1173 - Make sure that tensor is contiguous before gathering across processes

Pull Request - State: open - Opened by patrickvonplaten 6 months ago
Labels: CLA Signed

#1172 - [question] Different training between DDP & Sharded DDP

Issue - State: open - Opened by kwohlfahrt 6 months ago

#1171 - Added requires_grad check for params_with_grad method

Pull Request - State: closed - Opened by whbldhwj 6 months ago
Labels: CLA Signed

#1170 - what are pointwise Optimizers and non-pointwise Optimizers?

Issue - State: closed - Opened by bugm 6 months ago - 4 comments

#1169 - Bump black from 22.3.0 to 24.3.0

Pull Request - State: open - Opened by dependabot[bot] 6 months ago
Labels: CLA Signed, dependencies

#1168 - Fairscale support for only performing allreduce in last microbatch

Pull Request - State: open - Opened by jiecaoyu 7 months ago
Labels: CLA Signed

#1167 - Fix params_with_grad in FSDP when the model has frozen parameters

Pull Request - State: open - Opened by whbldhwj 7 months ago
Labels: CLA Signed

#1166 - Changed to only run reshard hook if all gradients computed

Pull Request - State: closed - Opened by awgu 7 months ago
Labels: CLA Signed

#1165 - Example of MOE

Issue - State: open - Opened by Juanhui28 7 months ago - 1 comment

#1164 - Avoid calling _free_fp16_param_shard() too early

Pull Request - State: open - Opened by jiecaoyu 7 months ago - 2 comments
Labels: CLA Signed

#1163 - FSDP on the same CNN model requires more memory than DataParallel

Issue - State: closed - Opened by s-reaungamornrat 7 months ago

#1162 - Should assign norm_type instead of scale_grad_by_freq

Pull Request - State: closed - Opened by brad-mengchi 8 months ago - 1 comment
Labels: CLA Signed

#1161 - added option for no PG validation for faster init

Pull Request - State: closed - Opened by ngoyal2707 8 months ago
Labels: CLA Signed

#1160 - ci: Use GITHUB_OUTPUT envvar instead of set-output command

Pull Request - State: open - Opened by arunsathiya 8 months ago - 1 comment
Labels: CLA Signed

#1159 - Added reshard hook for frozen params in backward

Pull Request - State: open - Opened by awgu 9 months ago - 5 comments
Labels: CLA Signed

#1158 - Add support for `torch.set_default_device` when initializing model parameters

Pull Request - State: open - Opened by fshp971 9 months ago
Labels: CLA Signed

#1157 - Assign self.norm_type to input norm_type

Pull Request - State: closed - Opened by gtamer2 10 months ago - 1 comment
Labels: CLA Signed

#1156 - Issue in `ParallelEmbedding` constructor - scale_grad_by_freq being assigned to norm_type

Issue - State: closed - Opened by gtamer2 10 months ago - 2 comments

#1155 - How can I use torchrun + model parallelism + FSDP

Issue - State: open - Opened by HackGiter 10 months ago - 1 comment

#1154 - fixed broken clipping

Pull Request - State: closed - Opened by ngoyal2707 10 months ago
Labels: CLA Signed

#1153 - fix .grad=None issue when param is not sharded

Pull Request - State: closed - Opened by jiecaoyu 10 months ago
Labels: CLA Signed

#1152 - changes to keep reduced grad in fp32

Pull Request - State: closed - Opened by vedanuj 10 months ago
Labels: CLA Signed

#1151 - [not to be merged yet] added temp changes for fp32 main grad, might not work for TE

Pull Request - State: closed - Opened by ngoyal2707 10 months ago
Labels: CLA Signed

#1150 - fix no shard case

Pull Request - State: closed - Opened by artkorenev 10 months ago
Labels: CLA Signed

#1149 - Fix _free_full_params()

Pull Request - State: open - Opened by hadasah 10 months ago
Labels: CLA Signed

#1148 - Extend CheckpointFunction to track all tensor input/output

Pull Request - State: open - Opened by 000Justin000 11 months ago
Labels: CLA Signed

#1147 - [Not for merge] fp8allgather debug

Pull Request - State: open - Opened by jiecaoyu 11 months ago
Labels: CLA Signed

#1146 - It is dangerous to using default non_block=True.

Issue - State: open - Opened by heshenghuan 11 months ago

#1145 - torch.compile with FSDP

Issue - State: closed - Opened by santha96 12 months ago - 2 comments

#1144 - Added fns for manual free, reduce-scatter; removed stream sync if event sync

Pull Request - State: closed - Opened by awgu 12 months ago - 1 comment
Labels: CLA Signed

#1143 - Cleared backward hooks to avoid accumulating over iterations

Pull Request - State: closed - Opened by awgu 12 months ago
Labels: CLA Signed

#1142 - Add main grad before fwd pass

Pull Request - State: open - Opened by vedanuj 12 months ago - 2 comments
Labels: CLA Signed

#1141 - Removed extra `cat` before reduce-scatter

Pull Request - State: closed - Opened by awgu 12 months ago - 1 comment
Labels: CLA Signed

#1140 - Add main_grad

Pull Request - State: open - Opened by jianyuh 12 months ago
Labels: CLA Signed

#1139 - Fix fsdp+pp+te WPS decreasing issue

Pull Request - State: closed - Opened by jianyuh 12 months ago
Labels: CLA Signed

#1138 - Fix the parameter in ParallelEmbedding

Pull Request - State: closed - Opened by taowangcheng about 1 year ago - 2 comments
Labels: CLA Signed

#1137 - Fix missing params in unconsolidated models

Pull Request - State: closed - Opened by imjeremyhi about 1 year ago - 1 comment
Labels: CLA Signed

#1136 - Fp8 all gather hack

Pull Request - State: open - Opened by jspark1105 about 1 year ago - 1 comment
Labels: CLA Signed

#1135 - Fix a `ParallelEmbedding` bug

Pull Request - State: closed - Opened by chhwang about 1 year ago - 1 comment
Labels: CLA Signed

#1134 - assert self.has_full_params

Issue - State: open - Opened by pokameng about 1 year ago - 4 comments

#1133 - Hybrid Sharding in Fairscale's FSDP Implementation

Issue - State: closed - Opened by stephanpeitz about 1 year ago - 2 comments

#1132 - Fix typo in ParallelEmbedding argument assignment

Pull Request - State: open - Opened by hessamb about 1 year ago - 2 comments
Labels: CLA Signed

#1131 - Why ShardedDDP and OSS are slower than Vanilla DDP

Issue - State: open - Opened by powermano about 1 year ago

#1130 - pip install failed

Issue - State: open - Opened by dogxxxxx about 1 year ago

#1129 - Error with nested models "Caffe2 uses a lazy allocation..."

Issue - State: open - Opened by Emanuele97x about 1 year ago

#1128 - [bug] pip package 0.4.13 fails to build wheel

Issue - State: open - Opened by project-tuva about 1 year ago

#1127 - Add a context manager for activation sharding.

Pull Request - State: open - Opened by luyug over 1 year ago - 1 comment
Labels: CLA Signed

#1126 - Error Freezing Weights

Issue - State: open - Opened by mostafaelhoushi over 1 year ago

#1125 - added option to do backward AG over smaller set of gpus instead of full DDP world

Pull Request - State: open - Opened by ngoyal2707 over 1 year ago - 1 comment
Labels: CLA Signed

#1124 - Compatibility with Pytorch 2.0; failing test `test_gradient_value`

Issue - State: open - Opened by h-vetinari over 1 year ago - 4 comments

#1123 - Can exclude some layer parameter not to shard?

Issue - State: open - Opened by robotcator over 1 year ago - 5 comments

#1122 - Update oss_sdp_fsdp.rst

Pull Request - State: open - Opened by wenjun93 over 1 year ago - 1 comment
Labels: CLA Signed

#1121 - Update integrations.rst

Pull Request - State: closed - Opened by fc-synth over 1 year ago - 1 comment

#1120 - Update cross_entropy.py with no_grad

Pull Request - State: closed - Opened by Geeks-Sid over 1 year ago - 1 comment
Labels: CLA Signed

#1119 - FSDP on model that has requires_grad = false

Issue - State: closed - Opened by andrasiani over 1 year ago - 1 comment

#1118 - Fix docstring typo

Pull Request - State: closed - Opened by gregor-soniox over 1 year ago - 2 comments
Labels: CLA Signed

#1117 - All parameters cannot be shared amongst 2 different FSDP modules

Issue - State: closed - Opened by sarthakgarg over 1 year ago - 1 comment

#1116 - Update documentation to remove obsolete references

Pull Request - State: closed - Opened by daleevans over 1 year ago - 1 comment
Labels: CLA Signed

#1115 - Whether modifying the source code (fully_sharded_data_parallel.py) will bring safety hazard?

Issue - State: closed - Opened by dropreg over 1 year ago - 2 comments

#1114 - [AdaScale] self._hook() failure in init() of AdaScale() class

Issue - State: closed - Opened by connieKing511 over 1 year ago - 1 comment

#1113 - Combine powersgd with fairscale

Issue - State: closed - Opened by amsword over 1 year ago - 1 comment

#1112 - memory explodes after self._rebuild_full_params() function

Issue - State: closed - Opened by haorannlp over 1 year ago

#1111 - Unexpected Large Memory Consumption during Tensor Parallelism Training with OPT-1.3B

Issue - State: closed - Opened by dangxingyu over 1 year ago - 5 comments

#1110 - Fix bibtex entry

Pull Request - State: closed - Opened by mrbaozi over 1 year ago - 3 comments
Labels: CLA Signed

#1109 - Memory usage different from deepspeed

Issue - State: closed - Opened by x54-729 over 1 year ago - 8 comments

#1108 - make a logging warning once

Pull Request - State: closed - Opened by min-xu-ai over 1 year ago
Labels: CLA Signed

#1107 - Lots of Commandline Output from this line.

Issue - State: closed - Opened by jstraub over 1 year ago - 1 comment

#1106 - Remove `torch._six` from `init.py`

Pull Request - State: closed - Opened by malfet over 1 year ago - 1 comment
Labels: CLA Signed

#1105 - 8 bit all_gather

Pull Request - State: open - Opened by ngoyal2707 over 1 year ago - 4 comments
Labels: CLA Signed

#1104 - [fix] typo in wikitext2_data.py

Pull Request - State: closed - Opened by gajagajago over 1 year ago - 2 comments
Labels: CLA Signed

#1103 - [fix] typo in flatten_params_wrapper.py

Pull Request - State: closed - Opened by eltociear over 1 year ago - 1 comment
Labels: CLA Signed

#1102 - [FSDP] Training gets slower as iterations increase when flatten_parameters=False?

Issue - State: closed - Opened by woodyx218 over 1 year ago - 10 comments

#1101 - [FSDP] How to use customized backward hooks?

Issue - State: closed - Opened by woodyx218 over 1 year ago - 25 comments

#1100 - FSDP cannot consolidate optimizer state dict with flatten params is False

Issue - State: open - Opened by ShenglongZ almost 2 years ago - 3 comments

#1099 - [test] ci py 3.11 tests

Pull Request - State: closed - Opened by min-xu-ai almost 2 years ago
Labels: CLA Signed

#1098 - [chore] Ci fix

Pull Request - State: closed - Opened by min-xu-ai almost 2 years ago
Labels: CLA Signed

#1097 - [chore] add fair_dev packages

Pull Request - State: closed - Opened by min-xu-ai almost 2 years ago
Labels: CLA Signed

#1096 - Skip rather than fail tests in absence of `fair_dev`

Issue - State: closed - Opened by h-vetinari almost 2 years ago - 3 comments

#1095 - Implement _compute_intra_grad_corr_mean for gradient computation

Pull Request - State: closed - Opened by cyugao almost 2 years ago
Labels: CLA Signed

#1094 - Any examples using AdaScale with fairseq?

Issue - State: closed - Opened by kedarkolluri almost 2 years ago - 1 comment

#1093 - FSDP - Extra GPU memory consumption when maintaining a EMA weights

Issue - State: closed - Opened by syorami almost 2 years ago - 5 comments

#1092 - clip_grad_norm_ from fairscale downcasts to bf16 before all reduce

Issue - State: open - Opened by glample almost 2 years ago - 3 comments

#1091 - minor cleanup

Pull Request - State: closed - Opened by min-xu-ai almost 2 years ago
Labels: CLA Signed

GitHub / facebookresearch/fairscale issues and pull requests