Ecosyste.ms: Issues

An open API service for providing issue and pull request metadata for open source projects.

GitHub / jzhang38/easycontext issues and pull requests

#53 - dependency confilct

Issue - State: open - Opened by SihengLi99 about 1 month ago

#52 - saving intermediate checkpoints

Issue - State: open - Opened by 1190303125 about 1 month ago

#51 - Can not run the example script succesully.

Issue - State: open - Opened by feifeibear about 2 months ago

#50 - feat: usp (unified sequence parallelism)

Pull Request - State: closed - Opened by feifeibear about 2 months ago

#49 - unified sequence parallel

Pull Request - State: closed - Opened by feifeibear about 2 months ago

#48 - add usp (Unified Sequence Parallelism)

Pull Request - State: closed - Opened by feifeibear about 2 months ago

#47 - Size mismatch inside zigzag_ringattention backward

Issue - State: open - Opened by jinghan23 about 2 months ago

#46 - RuntimeError: CUDA error: an illegal memory access was encountered

Issue - State: open - Opened by uditsharma7 2 months ago - 1 comment

#45 - Is this SFT method or PT method?

Issue - State: open - Opened by 233function 2 months ago - 1 comment

#44 - When will the model code support the Qwen series models?

Issue - State: open - Opened by 233function 3 months ago - 2 comments

#41 - 拓展长上下文的技术是?

Issue - State: open - Opened by zzhdbw 3 months ago - 2 comments

#40 - Does this repo work with FSDP or Zero?

Issue - State: closed - Opened by LorrinWWW 4 months ago - 1 comment

#39 - Logits shift in loss computation

Issue - State: open - Opened by shivamag125 4 months ago - 1 comment

#38 - Does it support SFT training?

Issue - State: open - Opened by Lomax314 4 months ago

#37 - comparison of different sequence parallel methods

Issue - State: open - Opened by sunying2018 4 months ago - 1 comment

#36 - Dataset length question

Issue - State: open - Opened by 5taku 4 months ago - 2 comments

#35 - Will EasyContext support Qwen series model?

Issue - State: open - Opened by WeixuanXiong 5 months ago

#34 - May I see your wandb report while training?

Issue - State: open - Opened by fahadh4ilyas 5 months ago

#33 - How to auto-regression generate?

Issue - State: open - Opened by yileld 5 months ago

#32 - about seq parallel global batch size

Issue - State: closed - Opened by Liu-yuliang 5 months ago - 2 comments

#31 - Rotary embedding size missmatch

Issue - State: closed - Opened by Toan-Do 6 months ago - 4 comments

#29 - can training codellama?

Issue - State: closed - Opened by 5taku 6 months ago - 2 comments

#28 - Support ulysses flash attn

Pull Request - State: closed - Opened by Kwen-Chen 6 months ago - 1 comment

#27 - how to infer the model?

Issue - State: open - Opened by laoda513 6 months ago

#25 - shuffle bug?

Issue - State: closed - Opened by fmmoret 6 months ago - 3 comments

#23 - attention_mask

Issue - State: open - Opened by Nianqitongs 6 months ago

#22 - Need a running script for ‘dist_flash_attn’

Issue - State: open - Opened by LzhinFdu 6 months ago - 5 comments

#21 - Model stopped updating after 300-400 steps.

Issue - State: closed - Opened by Bostoncake 6 months ago - 9 comments

#20 - integrate it into the Transformers Trainer?

Issue - State: open - Opened by jkl375 7 months ago - 1 comment

#19 - Appending answer_ids to prompt in `eval_needle.py`

Issue - State: closed - Opened by shan18 7 months ago - 2 comments

#18 - Llama-2 models do not support `sliding_window` parameter

Issue - State: closed - Opened by Bostoncake 7 months ago - 3 comments

#17 - Confused by the train scripts

Issue - State: closed - Opened by Bostoncake 7 months ago - 3 comments

#16 - LongBench/InfiniteBench

Issue - State: closed - Opened by sunying2018 7 months ago

#15 - Danube2 and Unsloth offloaded gradient ck

Pull Request - State: closed - Opened by jzhang38 7 months ago

#14 - Error when the model vocabulary is larger than 120k

Issue - State: closed - Opened by microhu 7 months ago - 10 comments

#13 - error when finetuning yi-34b

Issue - State: open - Opened by puppet101 7 months ago - 2 comments

#12 - Data parallel + zigzag_ring_attn support

Issue - State: open - Opened by WallE-Chang 7 months ago - 2 comments

#11 - OOM when seq-length=700k

Issue - State: open - Opened by jkl375 7 months ago - 4 comments

#10 - Requirements for input length

Issue - State: open - Opened by LzhinFdu 7 months ago - 2 comments

#9 - train speed is too slow

Issue - State: open - Opened by jkl375 7 months ago - 2 comments

#8 - Not the real auto-regressive decoding mode ?

Issue - State: open - Opened by microhu 7 months ago - 1 comment

#7 - dataset description

Issue - State: closed - Opened by sunying2018 7 months ago - 3 comments

#6 - Which image is used for this job?

Issue - State: open - Opened by AatroxZZ 7 months ago - 9 comments

#5 - Modify interface

Pull Request - State: closed - Opened by jzhang38 7 months ago - 1 comment

#4 - Lightseq

Pull Request - State: closed - Opened by jzhang38 7 months ago - 5 comments

#3 - Does the input sharding match exact optimization of long sequence?

Issue - State: closed - Opened by guanzhchen 7 months ago - 2 comments

#2 - Switching to monkey patch

Pull Request - State: closed - Opened by jzhang38 7 months ago

#1 - LICENSE

Issue - State: closed - Opened by fmmoret 7 months ago - 1 comment