jzhang38/easycontext issues and pull requests

#54 - Inquiry Regarding Zero3 and Sequence Parallelism Compatibility

Issue - State: open - Opened by SihengLi99 4 months ago - 2 comments

#53 - dependency confilct

Issue - State: open - Opened by SihengLi99 4 months ago

#52 - saving intermediate checkpoints

Issue - State: open - Opened by 1190303125 4 months ago

#51 - Can not run the example script succesully.

Issue - State: open - Opened by feifeibear 5 months ago

#50 - feat: usp (unified sequence parallelism)

Pull Request - State: closed - Opened by feifeibear 5 months ago

#49 - unified sequence parallel

Pull Request - State: closed - Opened by feifeibear 5 months ago

#48 - add usp (Unified Sequence Parallelism)

Pull Request - State: closed - Opened by feifeibear 5 months ago

#47 - Size mismatch inside zigzag_ringattention backward

Issue - State: open - Opened by jinghan23 5 months ago

#46 - RuntimeError: CUDA error: an illegal memory access was encountered

Issue - State: open - Opened by uditsharma7 5 months ago - 1 comment

#45 - Is this SFT method or PT method？

Issue - State: open - Opened by 233function 5 months ago - 1 comment

#44 - When will the model code support the Qwen series models?

Issue - State: open - Opened by 233function 6 months ago - 2 comments

#43 - TypeError: _flash_attn_forward() missing 1 required positional argument: 'softcap'

Issue - State: open - Opened by Ziyang412 6 months ago - 2 comments

#42 - How to estimate the maximum context length this repo can support for larger models?

Issue - State: open - Opened by JingyangDeng 6 months ago

#41 - 拓展长上下文的技术是？

Issue - State: open - Opened by zzhdbw 6 months ago - 2 comments

#40 - Does this repo work with FSDP or Zero?

Issue - State: closed - Opened by LorrinWWW 7 months ago - 1 comment

#39 - Logits shift in loss computation

Issue - State: open - Opened by shivamag125 7 months ago - 1 comment

#38 - Does it support SFT training?

Issue - State: open - Opened by Lomax314 7 months ago

#37 - comparison of different sequence parallel methods

Issue - State: open - Opened by sunying2018 7 months ago - 1 comment

#36 - Dataset length question

Issue - State: open - Opened by 5taku 7 months ago - 2 comments

#35 - Will EasyContext support Qwen series model?

Issue - State: open - Opened by WeixuanXiong 8 months ago

#34 - May I see your wandb report while training?

Issue - State: open - Opened by fahadh4ilyas 8 months ago

#33 - How to auto-regression generate？

Issue - State: open - Opened by yileld 8 months ago

#32 - about seq parallel global batch size

Issue - State: closed - Opened by Liu-yuliang 8 months ago - 2 comments

#31 - Rotary embedding size missmatch

Issue - State: closed - Opened by Toan-Do 8 months ago - 4 comments

#30 - Can we just use the sloth gradient checkpointing by uncommenting this line?

Issue - State: open - Opened by vkaul11 8 months ago - 4 comments

#29 - can training codellama?

Issue - State: closed - Opened by 5taku 9 months ago - 2 comments

#28 - Support ulysses flash attn

Pull Request - State: closed - Opened by Kwen-Chen 9 months ago - 1 comment

#27 - how to infer the model?

Issue - State: open - Opened by laoda513 9 months ago

#26 - Bug: Evals might be broken in pinned HF transformers version `cache=False`

Issue - State: closed - Opened by michaelfeil 9 months ago - 2 comments

#25 - shuffle bug?

Issue - State: closed - Opened by fmmoret 9 months ago - 3 comments

#24 - how to acquire the real whole batch sequenece training loss(reduction_mode=mean) ?

Issue - State: open - Opened by littttttlebird 9 months ago - 2 comments

#23 - attention_mask

Issue - State: open - Opened by Nianqitongs 9 months ago

#22 - Need a running script for ‘dist_flash_attn’

Issue - State: open - Opened by LzhinFdu 9 months ago - 5 comments

#21 - Model stopped updating after 300-400 steps.

Issue - State: closed - Opened by Bostoncake 9 months ago - 9 comments

#20 - integrate it into the Transformers Trainer?

Issue - State: open - Opened by jkl375 10 months ago - 1 comment

#19 - Appending answer_ids to prompt in `eval_needle.py`

Issue - State: closed - Opened by shan18 10 months ago - 2 comments

#18 - Llama-2 models do not support `sliding_window` parameter

Issue - State: closed - Opened by Bostoncake 10 months ago - 3 comments

#17 - Confused by the train scripts

Issue - State: closed - Opened by Bostoncake 10 months ago - 3 comments

#16 - LongBench/InfiniteBench

Issue - State: closed - Opened by sunying2018 10 months ago

#15 - Danube2 and Unsloth offloaded gradient ck

Pull Request - State: closed - Opened by jzhang38 10 months ago

#14 - Error when the model vocabulary is larger than 120k

Issue - State: closed - Opened by microhu 10 months ago - 10 comments

#13 - error when finetuning yi-34b

Issue - State: open - Opened by puppet101 10 months ago - 2 comments

#12 - Data parallel + zigzag_ring_attn support

Issue - State: open - Opened by WallE-Chang 10 months ago - 3 comments

#11 - OOM when seq-length=700k

Issue - State: open - Opened by jkl375 10 months ago - 4 comments

#10 - Requirements for input length

Issue - State: open - Opened by LzhinFdu 10 months ago - 2 comments

#9 - train speed is too slow

Issue - State: open - Opened by jkl375 10 months ago - 2 comments

#8 - Not the real auto-regressive decoding mode ?

Issue - State: open - Opened by microhu 10 months ago - 1 comment

#7 - dataset description

Issue - State: closed - Opened by sunying2018 10 months ago - 3 comments

#6 - Which image is used for this job?

Issue - State: open - Opened by AatroxZZ 10 months ago - 9 comments

#5 - Modify interface

Pull Request - State: closed - Opened by jzhang38 10 months ago - 1 comment

#4 - Lightseq

Pull Request - State: closed - Opened by jzhang38 10 months ago - 5 comments

#3 - Does the input sharding match exact optimization of long sequence?

Issue - State: closed - Opened by guanzhchen 10 months ago - 2 comments

#2 - Switching to monkey patch

Pull Request - State: closed - Opened by jzhang38 10 months ago

#1 - LICENSE

Issue - State: closed - Opened by fmmoret 10 months ago - 1 comment

GitHub / jzhang38/easycontext issues and pull requests