THUDM/SwissArmyTransformer issues and pull requests

#188 - 为什么我的地址都改了，还会出现这样的错误：Traceback (most recent call last): File "/root/autodl-tmp/model/XrayGLM/VisualGLM-6B-main/finetune_visualglm.py", line 9, in <module> from sat.model.finetune.lora2 import LoraMixin ModuleNotFoundError: No module named 'sat.model.finetune.lora2' [2024-12-25 00:31:28,139] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 1338

Issue - State: open - Opened by zxdpro 27 days ago

#187 - ImportError: cannot import name 'rotate_half' from 'sat.model.official.llama_model'

Issue - State: open - Opened by kkkwjr about 2 months ago

#186 - UnboundLocalError: local variable 'batch_size' referenced before assignment

Issue - State: open - Opened by Money8888 about 2 months ago - 1 comment

#186 - UnboundLocalError: local variable 'batch_size' referenced before assignment

Issue - State: open - Opened by Money8888 about 2 months ago - 1 comment

#185 - MetaDistributedWebDataset传入的训练数据格式是否有范例参考？

Issue - State: open - Opened by xiabaoyulo 4 months ago

#184 - SAT fused_ema_adam_frontend error

Issue - State: closed - Opened by AlphaNext 4 months ago

#183 - assert isinstance(new_mixin, BaseMixin) AssertionError更新了权重吗？导致这里出错？

Issue - State: closed - Opened by corkiyao 4 months ago - 1 comment

#182 - 在使用Deepspeed的zero stage2训练visualglm-6B是，出现最终权重为25G的现象

Issue - State: open - Opened by corkiyao 4 months ago

#181 - 转换llama3.1遇到问题

Issue - State: closed - Opened by strivebfq 5 months ago - 2 comments

#180 - ema模型是否正确被保存

Issue - State: open - Opened by Blueskyvvvvv 5 months ago - 1 comment

#179 - TypeError: sat.model.transformer.BaseTransformer() got multiple values for keyword argument 'parallel_output'

Issue - State: open - Opened by deep-practice 6 months ago - 35 comments

#178 - 请问断点续训应该如何设置

Issue - State: open - Opened by elesun2018 9 months ago - 6 comments

#177 - transfer_param.py 转换vincuna hf模型成sat模型报错

Issue - State: open - Opened by Lunatic-Solar 10 months ago - 17 comments

#176 - How to install a model to the right path?

Issue - State: closed - Opened by link89 10 months ago - 1 comment

#175 - NO cogagent？

Issue - State: open - Opened by Mac0q 10 months ago - 2 comments

#174 - ModuleNotFoundError: No module named 'localAttention'

Issue - State: open - Opened by BlueSkyyyyyy 10 months ago

#173 - “No backend type associated with device type cpu” when run cli_demo_sat.py

Issue - State: open - Opened by yileld 11 months ago - 5 comments

#172 - 如果想绕过deepspeed做finetune，可以在train的时候直接model.step()来实现吗？

Issue - State: open - Opened by cocoshe 11 months ago - 2 comments

#171 - Using CogVLM - KeyError (MODEL_URLS) - Google Colab

Issue - State: closed - Opened by Baggiorobertozoba 11 months ago - 1 comment

#170 - MixtralMlpMixin()这个函数里面moe只是计算专家的logits但是没看到分发逻辑

Issue - State: open - Opened by AlenjandroWang 12 months ago - 1 comment

#169 - AutoModel.from_pretrained()里面不能加载hf版本的权重吗

Issue - State: open - Opened by AlenjandroWang 12 months ago - 1 comment

#168 - AutoModel.from_pretrained()里面不能加载hf的权重吗

Issue - State: closed - Opened by AlenjandroWang 12 months ago

#167 - 怎么从断点恢复微调训练

Issue - State: open - Opened by zoumaguanxin 12 months ago - 1 comment

#166 - MoE support

Pull Request - State: closed - Opened by 1049451037 12 months ago

#165 - fix rotary bug when q seqlen > cos seqlen

Pull Request - State: closed - Opened by leizhao1234 12 months ago

#164 - support chatglm rotary in triton

Pull Request - State: closed - Opened by leizhao1234 12 months ago

#163 - 请问针对样本数量不均衡的数据集怎么做样本均衡呢

Issue - State: open - Opened by lln556 12 months ago - 1 comment

#162 - Questions about your LoRA codes

Issue - State: closed - Opened by miznchimaki 12 months ago - 7 comments

#161 - deepspeed 分布式训练 loss nan or inf

Issue - State: open - Opened by JohnTang93 about 1 year ago - 1 comment

#160 - Is sat suuport saving checkpoint by using fp16 or bf16?

Issue - State: open - Opened by xxxwuwq about 1 year ago - 4 comments

#159 - add accumulate ema and fix fp32 weight bug

Pull Request - State: closed - Opened by leizhao1234 about 1 year ago

#158 - 单机多卡训练时内存占用过高

Issue - State: closed - Opened by zodiacg about 1 year ago - 2 comments

#157 - SwissArmyTransformer可以读bin权重文件吗？visualglm-6b项目里就没见pt文件，只有bin。难以微调

Issue - State: closed - Opened by qq577288254 about 1 year ago - 5 comments

#156 - fix zero3 check

Pull Request - State: closed - Opened by Sleepychord about 1 year ago

#155 - fix model parallel inconsistent init

Pull Request - State: closed - Opened by Sleepychord about 1 year ago

#154 - update ema

Pull Request - State: closed - Opened by leizhao1234 about 1 year ago

#153 - support MoE & Mixtral-8x7b

Pull Request - State: closed - Opened by 1049451037 about 1 year ago

#152 - fix profiling

Pull Request - State: closed - Opened by leizhao1234 about 1 year ago

#151 - merge main to glu

Pull Request - State: closed - Opened by 1049451037 about 1 year ago

#150 - add profiling

Pull Request - State: closed - Opened by leizhao1234 about 1 year ago

#149 - deepspeed分布式训练出现sat ValueError inconsistent

Issue - State: open - Opened by elesun2018 about 1 year ago - 1 comment

#148 - How to embed video encoder module from pytorch？

Issue - State: open - Opened by zyhzyh88 about 1 year ago - 3 comments

#147 - mqa cross & stream chat

Pull Request - State: closed - Opened by 1049451037 about 1 year ago

#146 - Can you help to confirm if chatglm3 model is same as GPT or it's original from GLM architecture?

Issue - State: closed - Opened by tiendung about 1 year ago - 3 comments

#145 - 请问如何使用hf加载icetk_glm_130B的tokenizer和GLM130B的模型？

Issue - State: closed - Opened by Ajay-Wong about 1 year ago - 6 comments

#144 - FileLock - out of date?

Issue - State: closed - Opened by taziksh about 1 year ago - 1 comment

#143 - How to load and initialize llama2 models downloaded from Huggingface

Issue - State: closed - Opened by microhu about 1 year ago - 2 comments

#142 - ore.exceptions.ResponseStreamingError

Issue - State: open - Opened by AnnaYang2020 about 1 year ago - 1 comment

#141 - Cannot use torch.compile with SAT

Issue - State: open - Opened by lijing1996 about 1 year ago

#140 - Rotary embedding

Pull Request - State: closed - Opened by leizhao1234 over 1 year ago

#139 - Rotary embedding

Pull Request - State: closed - Opened by leizhao1234 over 1 year ago

#138 - 不支持流式dataset

Issue - State: closed - Opened by af-74413592 over 1 year ago - 2 comments

#137 - Fail to load random states from checkpoints saved

Issue - State: open - Opened by minkowski0125 over 1 year ago - 2 comments

#136 - Fix params dtype bug

Pull Request - State: closed - Opened by Jintao-Huang over 1 year ago - 1 comment

#135 - fix lost bias when quantize from pre-trained model parameters

Pull Request - State: closed - Opened by jimmieliu over 1 year ago - 3 comments

#134 - fix lost bias when quantize from pre-trained model parameters

Pull Request - State: closed - Opened by jimmieliu over 1 year ago - 1 comment

#133 - ModuleNotFoundError: No module named 'SwissArmyTransformer'

Issue - State: open - Opened by B-1368 over 1 year ago - 6 comments

#132 - 使用微调时，由于数据集过大，内存不够如何处理？

Issue - State: closed - Opened by Syno8 over 1 year ago - 1 comment

#131 - 请教一个问题，使用mp_size=2时的loss应该怎么写

Issue - State: open - Opened by kunden0612 over 1 year ago - 1 comment

#130 - 模型并行的方式进行lora方式的finetuning要怎么设置呢

Issue - State: open - Opened by kunden0612 over 1 year ago - 5 comments

#129 - During BERT decoding, past_key_values is used to accelerate calculation. Do we have a similar implementation?

Issue - State: open - Opened by etrigger over 1 year ago - 1 comment

#128 - Window 安装错误

Issue - State: open - Opened by mai1015 over 1 year ago - 2 comments

#127 - 使用0.2.x版本时报错

Issue - State: open - Opened by Yuziyi1117 over 1 year ago - 1 comment

#126 - fix a slightly inappropriate default value.

Pull Request - State: closed - Opened by hhnqqq over 1 year ago

#125 - 测试源码中给的qlora.py报错

Issue - State: open - Opened by shituo123456 over 1 year ago - 7 comments

#124 - sat.arguments.get_args failed to handle the "-h" option

Issue - State: open - Opened by limjcst over 1 year ago - 1 comment

#123 - chatglm model parallel

Pull Request - State: closed - Opened by 1049451037 over 1 year ago

#122 - reframe mp_split

Pull Request - State: closed - Opened by 1049451037 over 1 year ago

#121 - llama-30b & -65b

Pull Request - State: closed - Opened by 1049451037 over 1 year ago

#120 - Kv cache

Pull Request - State: open - Opened by leizhao1234 over 1 year ago

#119 - change model-parallel-size online

Pull Request - State: closed - Opened by 1049451037 over 1 year ago

#118 - SAT Tokenizer 地址挂了

Issue - State: open - Opened by youngstu over 1 year ago - 1 comment

#117 - chatglm2 release

Pull Request - State: closed - Opened by 1049451037 over 1 year ago

#116 - ChatGLM2-6B

Pull Request - State: closed - Opened by 1049451037 over 1 year ago

#115 - save ema parameters

Pull Request - State: closed - Opened by leizhao1234 over 1 year ago - 1 comment

#114 - stream_filling_sequence function

Pull Request - State: closed - Opened by wenyihong over 1 year ago

#113 - 请问dataloader能shuffle吗？

Issue - State: closed - Opened by XaviLv over 1 year ago - 2 comments

#112 - 0.2.12 release版本的源码在哪个分支？

Issue - State: closed - Opened by guohuanliang1 over 1 year ago - 1 comment

#111 - huggingface版本的visualglm在前向传播报错，Exception: cuda rng state model-parallel-rng is not added

Issue - State: open - Opened by zhangyuanscall over 1 year ago - 1 comment

#110 - fix webds import

Pull Request - State: closed - Opened by 1049451037 over 1 year ago

#109 - fix prefatch typo

Pull Request - State: closed - Opened by Sleepychord over 1 year ago

#108 - merge meta info

Pull Request - State: closed - Opened by Sleepychord over 1 year ago

#107 - add llama 13b & generation example

Pull Request - State: closed - Opened by 1049451037 over 1 year ago

#106 - add llama inference

Pull Request - State: closed - Opened by 1049451037 over 1 year ago

#105 - add llama

Pull Request - State: closed - Opened by 1049451037 over 1 year ago

#104 - add ema-adam

Pull Request - State: closed - Opened by leizhao1234 over 1 year ago

#103 - Request for documentation: the relationship between "Swing Army Transformer" and Nvidia's "FasterTransformer"

Issue - State: closed - Opened by Oukaishen over 1 year ago - 1 comment

#102 - 在镜像中安装报错

Issue - State: open - Opened by xinyubai1209 over 1 year ago - 2 comments

#101 - fix lora merge

Pull Request - State: closed - Opened by 1049451037 over 1 year ago

#100 - sat 和 transformers & huggingface hub 可以无缝集成吗？

Issue - State: open - Opened by SwordFaith over 1 year ago - 1 comment

#99 - Qlora

Pull Request - State: closed - Opened by 1049451037 over 1 year ago

#98 - preserve linear parallel lora

Pull Request - State: closed - Opened by 1049451037 over 1 year ago

#97 - remove redundant argument

Pull Request - State: closed - Opened by 1049451037 over 1 year ago

#96 - add qlora support

Pull Request - State: closed - Opened by 1049451037 over 1 year ago

#95 - 怎样使用DeepSpeed的offload功能降低显存占用？

Issue - State: open - Opened by yt7589 over 1 year ago

#94 - add lora merging interface

Pull Request - State: closed - Opened by 1049451037 over 1 year ago

#93 - adapt vit to new version

Pull Request - State: closed - Opened by 1049451037 over 1 year ago

#92 - 安装时报错

Issue - State: open - Opened by ge90114b over 1 year ago - 1 comment

#91 - pypl 清华源，没有swissarmytransformer

Issue - State: closed - Opened by yhyu13 over 1 year ago - 2 comments

#90 - fix base_strategy

Pull Request - State: closed - Opened by lykeven over 1 year ago

GitHub / THUDM/SwissArmyTransformer issues and pull requests