dvlab-research/LongLoRA issues and pull requests

#195 - not able to reproduce the passkey retrieval accuracy

Issue - State: open - Opened by zhuconv 5 months ago - 4 comments

#194 - LongBench evaluation

Issue - State: open - Opened by Clement25 5 months ago

#193 - 是否支持如GPT2这类的supervised fine-tune？

Issue - State: open - Opened by CharRic 6 months ago

#192 - How LongAlpaca Data was constructed?

Issue - State: open - Opened by S1s-Z 6 months ago

#191 - 这套代码是否支持qwen/baichuan微调一个中文的长文本模型，代码需要做哪些修改？

Issue - State: open - Opened by jy-101361-1810897 7 months ago

#190 - norm层不是没有参数矩阵吗

Issue - State: open - Opened by changanxunyi 8 months ago

#189 - Update README.md

Pull Request - State: open - Opened by Dominic789654 8 months ago

#188 - I am unable to reproduce the results from the paper for llama-7B-32k-longlora ppl.

Issue - State: open - Opened by masteryqq 8 months ago - 1 comment

#187 - 模型完全没法正常输出

Issue - State: closed - Opened by Tangent-90C 9 months ago - 1 comment

#186 - embedding 为什么要resize成32001？

Issue - State: open - Opened by momandai 9 months ago

#185 - Something wrong with the torch version

Issue - State: open - Opened by dian1414 9 months ago

#184 - What's the trainset is used to obtain “Model with contextg extension via improved LoRA fine-tuning” (LoRA+)？

Issue - State: open - Opened by ZackZikaiXiao 10 months ago

#183 - How did make questions and answers for long context(LongAlpaca)?

Issue - State: open - Opened by ddoyles 11 months ago

#182 - When I set `per_device_train_batch_size=2`, the S2-Attn would not shift as expected

Issue - State: open - Opened by linhaojia13 11 months ago - 2 comments

#181 - HF models missing rope scaling in the config

Issue - State: open - Opened by hsiehjackson 11 months ago

#180 - Machine don't install Flash Attention

Issue - State: open - Opened by huilong-chen 12 months ago

#179 - global_step文件

Issue - State: open - Opened by xxcoco763 12 months ago

#178 - Add callback for saving trainable parameters and model config

Pull Request - State: open - Opened by GirinMan 12 months ago

#177 - Regarding the results in Table 8 and Table 14

Issue - State: open - Opened by Statisticss about 1 year ago

#176 - About the different datasets and corresponding models

Issue - State: open - Opened by Statisticss about 1 year ago

#175 - The proof-pile/test-sample-ids is not the exact ids for the proof-pile-testsample.bin

Issue - State: closed - Opened by pangjh3 about 1 year ago

#174 - Memory usage "too small" for 7B Llama-2

Issue - State: open - Opened by Linohong about 1 year ago

#173 - training a LLM w/ shifted sparse attention from the scratch?

Issue - State: open - Opened by we1k about 1 year ago

#172 - merge_lora_weights_and_save_hf_model.py Error while deserializing header: HeaderTooLarge

Issue - State: open - Opened by Spongeorge about 1 year ago

#171 - Distributed inference issue

Issue - State: open - Opened by yixliu1 about 1 year ago

#170 - 论文中的evaluate结果，推理时用的attention是shifted sparse attention？还是full attention？

Issue - State: open - Opened by zhangxiann about 1 year ago

#169 - Is it possible to increase the context length of phi-2 using LongLora? If yes, what changes need to be done to support it?

Issue - State: open - Opened by dbanka about 1 year ago - 1 comment

#168 - the value of loss is too unstable when supervised-finetune the 7b-100k-ft model

Issue - State: open - Opened by seanxuu about 1 year ago - 1 comment

#167 - streaming llm problem

Issue - State: open - Opened by seanxuu about 1 year ago

#166 - How can I use the Llama-2-7b-longlora-100k-ft model correctly

Issue - State: open - Opened by seanxuu about 1 year ago

#165 - bug report : RuntimeError: probability tensor contains either inf, nan or element < 0

Issue - State: open - Opened by seanxuu about 1 year ago

#164 - Is LongLoRA can be mixed with YaRN ?

Issue - State: open - Opened by DevNullx64 about 1 year ago

#163 - 推理时候显存分配

Issue - State: open - Opened by xxcoco763 about 1 year ago - 2 comments

#162 - Adapting to new models

Issue - State: open - Opened by epinnock about 1 year ago - 2 comments

#161 - 如何在LoRA训练中加入embed和norm层的训练？

Issue - State: open - Opened by Zheng-Jay about 1 year ago

#160 - Lora+deepspeed zero3 无法保存lora权重问题

Issue - State: closed - Opened by AresXD about 1 year ago - 6 comments

#159 - What llama attn replacement to use for SFT-based inference?

Issue - State: open - Opened by spring1915 about 1 year ago

#158 - 在没有报错的情况下，LongAlpaca-7B只对文本的第一段文字进行了响应

Issue - State: open - Opened by waleyW about 1 year ago

#157 - Configs in inference.py necessary for context length expansion in model serving?

Issue - State: open - Opened by spring1915 about 1 year ago

#156 - 训练的时候使用的什么外推方式

Issue - State: open - Opened by IT-five about 1 year ago

#155 - 支持qwen、baichuan等中文模型微调吗

Issue - State: open - Opened by kevinuserdd about 1 year ago

#154 - inference OOM

Issue - State: open - Opened by PharMolix about 1 year ago

#153 - Is LongAlpaca model fine-tuned from llama-2 or the Alpaca model?

Issue - State: open - Opened by Mooler0410 about 1 year ago

#152 - Can LongLoRA be used for incremental pre-training?

Issue - State: open - Opened by Zheng-Jay about 1 year ago

#151 - the current text generation call will exceed the model's predefined maximum length (4096)

Issue - State: open - Opened by waleyW about 1 year ago - 4 comments

#150 - 微调数据

Issue - State: closed - Opened by Go4miii about 1 year ago

#149 - 推理 group整除问题

Issue - State: closed - Opened by Michelleable about 1 year ago - 1 comment

#148 - LongLoRA + Flash Attention 2 causing illigal memory access

Issue - State: open - Opened by ArturNiederfahrenhorst about 1 year ago - 7 comments

#147 - 32k inference result is garbled

Issue - State: open - Opened by zhanglv0209 about 1 year ago - 8 comments

#146 - torch.cuda.OutOfMemoryError: CUDA out of memory

Issue - State: closed - Opened by zhanglv0209 about 1 year ago - 3 comments

#145 - 中文领域进展

Issue - State: closed - Opened by ccp123456789 about 1 year ago - 1 comment

#144 - Added multiple GPUs evaluation.

Pull Request - State: closed - Opened by weicheng113 about 1 year ago - 1 comment

#143 - 扩充词表后，不改变其他代码和参数，预训练过程中能否对新添加的词元进行训练

Issue - State: closed - Opened by THUchenzhou about 1 year ago

#142 - Qustions about dynamic NTK interpolation fine-tuning and non-linear interpolation methods

Issue - State: open - Opened by Yiyi-philosophy about 1 year ago - 1 comment

#141 - Question about inference use Llama-2-7b-longlora-8k-ft output nothing

Issue - State: closed - Opened by ysanimals about 1 year ago - 4 comments

#140 - Inquiry Regarding the Tokenize Function

Issue - State: closed - Opened by thanaphatt1 about 1 year ago - 3 comments

#139 - To save model in HF format after supervised-fine-tune-qlora

Issue - State: open - Opened by MyBruso about 1 year ago - 7 comments

#138 - How did you design questions and answers in the LongQA dataset?

Issue - State: closed - Opened by finallymint about 1 year ago - 1 comment

#137 - How to eval Llama-2-7b-longlora-16k-ft?

Issue - State: closed - Opened by rabi-fei about 1 year ago - 4 comments

#136 - Perplexity Validation Error

Issue - State: closed - Opened by panpanli521 about 1 year ago - 2 comments

#135 - SFT Problem: Attention Mask doesn't match

Issue - State: closed - Opened by Busdriver26 about 1 year ago - 1 comment

#134 - Confused with eval.py perplexity implementation

Issue - State: closed - Opened by weicheng113 about 1 year ago - 1 comment

#133 - Cannot Convert Checkpint to Trainable Model

Issue - State: open - Opened by believewhat about 1 year ago - 3 comments

#132 - intel xpu qlora support related code changes

Pull Request - State: closed - Opened by rnwang04 about 1 year ago

#131 - intel xpu qlora support related code changes

Pull Request - State: closed - Opened by rnwang04 about 1 year ago

#130 - Bitstandbytes library verision error with sft

Issue - State: closed - Opened by Breno-de-Angelo about 1 year ago - 1 comment

#129 - How to train LongLoRA step-by-step ?

Issue - State: closed - Opened by dhcode-cpp about 1 year ago - 1 comment

#128 - uploaded inference script using qlora

Pull Request - State: closed - Opened by zhounu over 1 year ago - 1 comment

#127 - Torch.compile switches model back to training mode

Issue - State: closed - Opened by gianlucamacri over 1 year ago - 1 comment

#126 - Help to confirm understanding of forward_flashattn

Issue - State: closed - Opened by weicheng113 over 1 year ago - 2 comments

#125 - Is supervised-fine-tune.py required to run merge_lora_weight after fine-tuning?

Issue - State: closed - Opened by caochuxueeee over 1 year ago

#124 - fix starting token repetition

Pull Request - State: closed - Opened by gianlucamacri over 1 year ago - 1 comment

#123 - Saving pytorch_model.bin with QLORA

Issue - State: closed - Opened by grimulkan over 1 year ago - 7 comments

#122 - No LongLora 100K Llama 2 7B?

Issue - State: closed - Opened by TamirHCL over 1 year ago

#121 - Model training information?

Issue - State: closed - Opened by TamirHCL over 1 year ago - 6 comments

#120 - 能给一份S^2 Attension推理的代码吗？

Issue - State: open - Opened by hxs91 over 1 year ago - 4 comments

#119 - 关于sft实验效果

Issue - State: closed - Opened by AresXD over 1 year ago - 5 comments

#118 - Transformers <= 4.34.0 requirement

Issue - State: closed - Opened by Breno-de-Angelo over 1 year ago - 3 comments

#117 - Model differences?

Issue - State: closed - Opened by TamirHCL over 1 year ago - 2 comments

#116 - Catch none-valued rope scaling configs

Pull Request - State: closed - Opened by j-frei over 1 year ago - 1 comment

#115 - supervised fine_tuning for domain specific question-answering

Issue - State: closed - Opened by MyBruso over 1 year ago - 2 comments

#114 - turning exception into warning for flash attention inference

Pull Request - State: closed - Opened by gianlucamacri over 1 year ago - 1 comment

#113 - Added management of rope factor in previous configuration

Pull Request - State: closed - Opened by gianlucamacri over 1 year ago - 1 comment

#111 - RedPajama-Data-1T-Sample tokenization stuck

Issue - State: closed - Opened by weicheng113 over 1 year ago - 6 comments

#110 - Hardware requirements for 7B 100k

Issue - State: closed - Opened by nedRad88 over 1 year ago - 1 comment

#107 - support multiple round conversation

Issue - State: closed - Opened by coranholmes over 1 year ago - 15 comments
Labels: enhancement

#106 - Abnormal loss curve for supervised fine tuning on one GPU

Issue - State: closed - Opened by Oscilloscope98 over 1 year ago - 6 comments

#103 - Question: Why use "instruct" prompting on top of original LLaMa-2 prompting?

Issue - State: closed - Opened by pseudotensor over 1 year ago - 3 comments
Labels: enhancement

#102 - zero_to_fp32

Issue - State: closed - Opened by bdytx5 over 1 year ago - 2 comments

#100 - 能否在对llama-2-7b-chat-hf进行中文语料微调后的模型上，采用您的代码继续SFT？

Issue - State: closed - Opened by YinSonglin1997 over 1 year ago - 6 comments

#99 - What's the difference between finetune and supervised-finetune?

Issue - State: closed - Opened by zejunwang1 over 1 year ago - 2 comments

#98 - Get trainable weights from SFT

Issue - State: closed - Opened by mces89 over 1 year ago - 2 comments

#97 - 70B SFT out of memory?

Issue - State: closed - Opened by mces89 over 1 year ago - 2 comments

#96 - 代码/模型推理 bug？

Issue - State: closed - Opened by xxzcc over 1 year ago - 1 comment

#95 - addinput host and port in args for demo

Pull Request - State: closed - Opened by jayxio over 1 year ago - 1 comment

#94 - Is it possible to use with Mistral or Zephyr models?

Issue - State: closed - Opened by versae over 1 year ago - 1 comment

#93 - Applied flash attention usage

Issue - State: closed - Opened by gyuwon12 over 1 year ago - 5 comments

#92 - 中文长文本模型

Issue - State: closed - Opened by ccp123456789 over 1 year ago - 1 comment

#91 - supervised fine tuning 7b GPU requirement - CUDA out of memory

Issue - State: closed - Opened by weicheng113113 over 1 year ago - 22 comments

#90 - the rolling problem

Issue - State: closed - Opened by teslacool over 1 year ago - 1 comment

GitHub / dvlab-research/LongLoRA issues and pull requests